jobs-git/Zyda-2

Name: jobs-git/Zyda-2
Creator: jobs-git
License: odc-by
Keywords: huggingface, task_categories:text-generation, language:en, license:odc-by, size_categories:n>1T, region:us, text-generation

Text Generation · jobs-git· 160.7K

odc-by 1.5 TB task_categories:text-generationlanguage:enlicense:odc-bysize_categories:n>1Tregion:us

Zyda-2 is a 5 trillion token language modeling dataset created by collecting open and high quality datasets and combining them and cross-deduplication and model-based quality filtering. Zyda-2 comprises diverse sources of web data, highly educational content, math, code, and scientific papers.

Open in MLForge Sign up free Desktop app

# download instantly
mlforge datasets pull jobs-git/Zyda-2

Dataset details

Task

Text Generation

Language

License

odc-by

Size

1.5 TB

Creator

jobs-git

Downloads

160.7K

Source

huggingface_datasets

Updated

2025-03-07

jobs-git/Zyda-2

Dataset details

About jobs-git/Zyda-2