google/WaxalNLP

Name: google/WaxalNLP
Creator: google
License: ["cc-by-sa-4.0","cc-by-4.0"]
Keywords: huggingface, task_categories:automatic-speech-recognition, task_categories:text-to-speech, language_creators:creator_1, multilinguality:multilingual, source_datasets:UGSpeechData, source_datasets:DigitalUmuganda/AfriVoice, source_datasets:original, language:ach, automatic-speech-recognition, text-to-speech

Automatic Speech Recognition · google· 30.5K

["cc-by-sa-4.0","cc-by-4.0"] 3.2 TB task_categories:automatic-speech-recognitiontask_categories:text-to-speechlanguage_creators:creator_1multilinguality:multilingualsource_datasets:UGSpeechData

The WAXAL dataset is a large-scale multilingual speech corpus for African languages, introduced in the paper WAXAL: A Large-Scale Multilingual African Language Speech Corpus.

Open in MLForge Sign up free Desktop app

# download instantly
mlforge datasets pull google/WaxalNLP

Dataset details

Task

Automatic Speech Recognition

Language

ach

License

["cc-by-sa-4.0","cc-by-4.0"]

Size

3.2 TB

Rows / images

1.7M

Creator

google

Downloads

30.5K

Source

huggingface_datasets

Updated

2026-06-11

About google/WaxalNLP

The WAXAL dataset is a large-scale multilingual speech corpus for African languages, introduced in the paper WAXAL: A Large-Scale Multilingual African Language Speech Corpus.