HomeDatasetsallenai/tulu-3-sft-olmo-2-mixture-0225
T

allenai/tulu-3-sft-olmo-2-mixture-0225

General · allenai· 1.8K
Unknown 5.3 GB size_categories:100K<n<1Mformat:parquetmodality:textlibrary:datasetslibrary:dask

Used to train OLMo 2 32B. From the blog post: Filtered out instructions from the SFT dataset and the chosen responses of the preference data that included mentions of a date cutoff from the synthetic data generation process. This resulted in a new version of the instruction dataset, Tulu 3 SFT Mixture 0225, and preference dataset, OLMo-2-32B-pref-mix-0325. We use majority voting to improve the q

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull allenai/tulu-3-sft-olmo-2-mixture-0225

Dataset details

Task
General
License
Unknown
Size
5.3 GB
Rows / images
866.1K
Classes
3
Creator
allenai
Downloads
1.8K
Source
huggingface_datasets
Updated
2025-03-14

About allenai/tulu-3-sft-olmo-2-mixture-0225

Used to train OLMo 2 32B. From the blog post: Filtered out instructions from the SFT dataset and the chosen responses of the preference data that included mentions of a date cutoff from the synthetic data generation process. This resulted in a new version of the instruction dataset, Tulu 3 SFT Mixture 0225, and preference dataset, OLMo-2-32B-pref-mix-0325. We use majority voting to improve the quality of answers to our synthetic math questions. For our Persona MATH and Grade School Math datasets from Tülu 3, we only include prompts and completions where the model reaches a majority vote over 5 completions. New versions of the math and grade school math datasets are available.