Task
General
Pile Uncopyrighted In response to authors demanding that LLMs stop using their works, here's a copy of The Pile with all copyrighted content removed. Please consider using this dataset to train your future LLMs, to respect authors and abide by copyright law. Creating an uncopyrighted version of a larger dataset (ie RedPajama) is planned, with no ETA.
Pile Uncopyrighted In response to authors demanding that LLMs stop using their works, here's a copy of The Pile with all copyrighted content removed. Please consider using this dataset to train your future LLMs, to respect authors and abide by copyright law. Creating an uncopyrighted version of a larger dataset (ie RedPajama) is planned, with no ETA.