HomeDatasetsm-a-p/FineFineWeb
F

m-a-p/FineFineWeb

Text Classification · m-a-p· 805.1K
apache-2.0 task_categories:text-classificationtask_categories:text-generationlanguage:enlicense:apache-2.0size_categories:1B<n<10B

FineFineWeb: A Comprehensive Study on Fine-Grained Domain Web Corpus arXiv: Coming Soon Project Page: Coming Soon Blog: Coming Soon Data Statistics Domain (#tokens/#samples) Iteration 1 Tokens Iteration 2 Tokens Iteration 3 Tokens Total Tokens Iteration 1 Count Iteration 2 Count Iteration 3 Count Total Count aerospace 5.77B 261.63M 309.33M 6.34B 9100000 688505 611034 10399539 agronomy 13.08B 947.41M 229.04M 14.26B 15752828 2711790 649404 19114022 artistic… See the full description on the dataset page: https://huggingface.co/datasets/m-a-p/FineFineWeb.

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull m-a-p/FineFineWeb

Dataset details

Task
Text Classification
Language
en
License
apache-2.0
Creator
m-a-p
Downloads
805.1K
Source
huggingface_datasets
Updated
2024-12-19

About m-a-p/FineFineWeb

FineFineWeb: A Comprehensive Study on Fine-Grained Domain Web Corpus