HomeDatasetscommon-pile/caselaw_access_project
C

common-pile/caselaw_access_project

Text Generation · common-pile· 2.7K
Unknown 25 GB task_categories:text-generationlanguage:ensize_categories:1M<n<10Mformat:jsonmodality:text

Description This dataset contains 6.7 million cases from the Caselaw Access Project and Court Listener. The Caselaw Access Project consists of nearly 40 million pages of U.S. federal and state court decisions and judges’ opinions from the last 365 years. In addition, Court Listener adds over 900 thousand cases scraped from 479 courts. The Caselaw Access Project and Court Listener source legal da

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull common-pile/caselaw_access_project

Dataset details

Task
Text Generation
Language
en
License
Unknown
Size
25 GB
Rows / images
351.6K
Creator
common-pile
Downloads
2.7K
Source
huggingface_datasets
Updated
2025-06-06

About common-pile/caselaw_access_project

Description This dataset contains 6.7 million cases from the Caselaw Access Project and Court Listener. The Caselaw Access Project consists of nearly 40 million pages of U.S. federal and state court decisions and judges’ opinions from the last 365 years. In addition, Court Listener adds over 900 thousand cases scraped from 479 courts. The Caselaw Access Project and Court Listener source legal data from a wide variety of resources such as the Harvard Law Library, the Law Library of Congress, and the Supreme Court Database. From these sources, we only included documents that were in the public domain. Erroneous OCR errors were further corrected after digitization, and additional post-processing was done to fix formatting and parsing. Code for collecting, processing, and preparing this dataset is available in the common-pile GitHub repo.