HomeDatasetsgoogle/WaxalNLP
W

google/WaxalNLP

Automatic Speech Recognition · google· 30.5K
["cc-by-sa-4.0","cc-by-4.0"] 3.2 TB task_categories:automatic-speech-recognitiontask_categories:text-to-speechlanguage_creators:creator_1multilinguality:multilingualsource_datasets:UGSpeechData

The WAXAL dataset is a large-scale multilingual speech corpus for African languages, introduced in the paper WAXAL: A Large-Scale Multilingual African Language Speech Corpus.

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull google/WaxalNLP

Dataset details

Task
Automatic Speech Recognition
Language
ach
License
["cc-by-sa-4.0","cc-by-4.0"]
Size
3.2 TB
Rows / images
1.7M
Creator
google
Downloads
30.5K
Source
huggingface_datasets
Updated
2026-06-11

About google/WaxalNLP

The WAXAL dataset is a large-scale multilingual speech corpus for African languages, introduced in the paper WAXAL: A Large-Scale Multilingual African Language Speech Corpus.