HomeDatasetsmonology/pile-uncopyrighted
P

monology/pile-uncopyrighted

General · monology· 69.6K
other 315 GB license:othersize_categories:100M<n<1Bformat:jsonmodality:textlibrary:datasets

Pile Uncopyrighted In response to authors demanding that LLMs stop using their works, here's a copy of The Pile with all copyrighted content removed. Please consider using this dataset to train your future LLMs, to respect authors and abide by copyright law. Creating an uncopyrighted version of a larger dataset (ie RedPajama) is planned, with no ETA.

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull monology/pile-uncopyrighted

Dataset details

Task
General
License
other
Size
315 GB
Rows / images
891.7K
Creator
monology
Downloads
69.6K
Source
huggingface_datasets
Updated
2023-08-31

About monology/pile-uncopyrighted

Pile Uncopyrighted In response to authors demanding that LLMs stop using their works, here's a copy of The Pile with all copyrighted content removed. Please consider using this dataset to train your future LLMs, to respect authors and abide by copyright law. Creating an uncopyrighted version of a larger dataset (ie RedPajama) is planned, with no ETA.