Name: common-pile/caselaw_access_project
Creator: common-pile
License: Unknown
Keywords: huggingface, task_categories:text-generation, language:en, size_categories:1M<n<10M, format:json, modality:text, library:datasets, library:dask, library:mlcroissant, text-generation

About common-pile/caselaw_access_project

Description This dataset contains 6.7 million cases from the Caselaw Access Project and Court Listener. The Caselaw Access Project consists of nearly 40 million pages of U.S. federal and state court decisions and judges’ opinions from the last 365 years. In addition, Court Listener adds over 900 thousand cases scraped from 479 courts. The Caselaw Access Project and Court Listener source legal data from a wide variety of resources such as the Harvard Law Library, the Law Library of Congress, and the Supreme Court Database. From these sources, we only included documents that were in the public domain. Erroneous OCR errors were further corrected after digitization, and additional post-processing was done to fix formatting and parsing. Code for collecting, processing, and preparing this dataset is available in the common-pile GitHub repo.

common-pile/caselaw_access_project

Dataset details

About common-pile/caselaw_access_project