HomeDatasetsanisoleai/fineweb-tokenized
F

anisoleai/fineweb-tokenized

Text Generation · anisoleai· 131.5K
odc-by 7.5 TB task_categories:text-generationlanguage:enlicense:odc-bysize_categories:n>1Tmodality:tabular

4 trillion tokens of the pre-tokenized data the 🌐 web has to offer

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull anisoleai/fineweb-tokenized

Dataset details

Task
Text Generation
Language
en
License
odc-by
Size
7.5 TB
Rows / images
4132.0B
Creator
anisoleai
Downloads
131.5K
Source
huggingface_datasets
Updated
2026-05-29

About anisoleai/fineweb-tokenized

4 trillion tokens of the pre-tokenized data the 🌐 web has to offer