HomeDatasetsallenai/olmo-mix-1124
O

allenai/olmo-mix-1124

Text Generation · allenai· 61.0K
odc-by 6.8 TB task_categories:text-generationlanguage:enlicense:odc-bysize_categories:100M<n<1Bmodality:text

Collection of data used to train OLMo-2-1124 models. The majority of this dataset comes from DCLM-Baseline with no additional filtering, but we provide the explicit breakdowns below.

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull allenai/olmo-mix-1124

Dataset details

Task
Text Generation
Language
en
License
odc-by
Size
6.8 TB
Rows / images
3.3M
Creator
allenai
Downloads
61.0K
Source
huggingface_datasets
Updated
2025-08-19

About allenai/olmo-mix-1124

Collection of data used to train OLMo-2-1124 models. The majority of this dataset comes from DCLM-Baseline with no additional filtering, but we provide the explicit breakdowns below.