HomeDatasetswannaphong/wikipedia-monthly
W

wannaphong/wikipedia-monthly

Text Generation · wannaphong· 37.4K
cc-by-sa-4.0 748 GB task_categories:text-generationlanguage:ablanguage:acelanguage:adylanguage:af

This repository provides monthly, multilingual dumps of Wikipedia, processed and prepared for easy use in NLP projects.

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull wannaphong/wikipedia-monthly

Dataset details

Task
Text Generation
Language
ab
License
cc-by-sa-4.0
Size
748 GB
Rows / images
1.6M
Creator
wannaphong
Downloads
37.4K
Source
huggingface_datasets
Updated
2026-05-04

About wannaphong/wikipedia-monthly

--- datasetinfo: 20260301.en: &id001 splits: - name: train numexamples: 7155624 - name: '1000' numexamples: 1000 - name: '5000' numexamples: 5000 - name: '10000' numexamples: 10000 latest.en: id001 20260301.de: &id002 splits: - name: train numexamples: 3098600 - name: '1000' numexamples: 1000 - name: '5000' numexamples: 5000 - name: '10000' numexamples: 10000 latest.de: id002 20260301.es: &id003 splits: - name: train numexamples: 2030756 - name: '1000' numexamples: 1000 - name: '5000' numexamples: 5000 - name: '10000' numexamples: 10000 latest.es: id003 20260301.fr: &id004 splits: - name: train numexamples: 2743535 - name: '1000' numexamples: 1000 - name: '5000' numexamples: 5000 - name: '10000' numexamples: 10000 latest.fr: id004 20260301.ceb: &id005 splits: - name: train numexamples: 6116120 - name: '1000' numexamples: 1000 - name: '5000' numexamples: 5000 - name: '10000' numexamples: 10000 latest.ceb: id005 20260301.he: &id006 splits: - name: train numexamples: 381145 - name: '1000' numexamples: 1000 - name: '5000' numexamples: 5000 - name: '10000' numexamples: 10000 latest.he: id006 20260301.ko: &id007 splits: - name: train numexamples: 738861 - name: '1000' numexamples: 1000 - name: '5000' numexamples: 5000 - name: '10000' numexamples: 10000