HomeDatasetsnvidia/Nemotron-CC-Math-v1
N

nvidia/Nemotron-CC-Math-v1

Text Generation · nvidia· 63.1K
other 242 GB task_categories:text-generationlicense:othersize_categories:100M<n<1Bformat:parquetmodality:text

Nemotron-Pre-Training-Dataset-v1 Release 👩‍💻 Authors: Rabeeh Karimi Mahabadi, Sanjeev Satheesh 📘 Paper: Nemotron-cc-math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset 📝 Blog: Nemotron-cc-math blog Data Overview We’re excited to introduce Nemotron-CC-Math - a large-scale, high-quality math corpus extracted from Common Crawl which was used in nemotron pre-training. This dataset is built to preserve and surface high-value mathematical and code content… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/Nemotron-CC-Math-v1.

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull nvidia/Nemotron-CC-Math-v1

Dataset details

Task
Text Generation
License
other
Size
242 GB
Creator
nvidia
Downloads
63.1K
Source
huggingface_datasets
Updated
2025-12-23