wannaphong/wikipedia-monthly
This repository provides monthly, multilingual dumps of Wikipedia, processed and prepared for easy use in NLP projects.
mlforge datasets pull wannaphong/wikipedia-monthly
Dataset details
About wannaphong/wikipedia-monthly
--- datasetinfo: 20260301.en: &id001 splits: - name: train numexamples: 7155624 - name: '1000' numexamples: 1000 - name: '5000' numexamples: 5000 - name: '10000' numexamples: 10000 latest.en: id001 20260301.de: &id002 splits: - name: train numexamples: 3098600 - name: '1000' numexamples: 1000 - name: '5000' numexamples: 5000 - name: '10000' numexamples: 10000 latest.de: id002 20260301.es: &id003 splits: - name: train numexamples: 2030756 - name: '1000' numexamples: 1000 - name: '5000' numexamples: 5000 - name: '10000' numexamples: 10000 latest.es: id003 20260301.fr: &id004 splits: - name: train numexamples: 2743535 - name: '1000' numexamples: 1000 - name: '5000' numexamples: 5000 - name: '10000' numexamples: 10000 latest.fr: id004 20260301.ceb: &id005 splits: - name: train numexamples: 6116120 - name: '1000' numexamples: 1000 - name: '5000' numexamples: 5000 - name: '10000' numexamples: 10000 latest.ceb: id005 20260301.he: &id006 splits: - name: train numexamples: 381145 - name: '1000' numexamples: 1000 - name: '5000' numexamples: 5000 - name: '10000' numexamples: 10000 latest.he: id006 20260301.ko: &id007 splits: - name: train numexamples: 738861 - name: '1000' numexamples: 1000 - name: '5000' numexamples: 5000 - name: '10000' numexamples: 10000