disco-eth/WorldSpeech

Name: disco-eth/WorldSpeech
Creator: disco-eth
License: cc-by-nc-4.0
Keywords: huggingface, task_categories:automatic-speech-recognition, task_categories:text-to-speech, task_categories:audio-classification, language:af, language:am, language:ar, language:az, language:be, automatic-speech-recognition, text-to-speech, audio-classification

Automatic Speech Recognition · disco-eth· 28.7K

cc-by-nc-4.0 2.5 TB task_categories:automatic-speech-recognitiontask_categories:text-to-speechtask_categories:audio-classificationlanguage:aflanguage:am

A multilingual ASR dataset containing over 65k hours of human transcribed speech across 127 language-region variants, drawn from national parliaments, public broadcasters, public-domain audiobooks, and international institutions. Rows consist of 24 kHz speech utterances paired with a human-provided transcript, an aligned ASR transcript, character error rate (CER) between the two, a WADA-SNR estima

Open in MLForge Sign up free Desktop app

# download instantly
mlforge datasets pull disco-eth/WorldSpeech

Dataset details

Task

Automatic Speech Recognition

Language

License

cc-by-nc-4.0

Size

2.5 TB

Rows / images

16.5M

Creator

disco-eth

Downloads

28.7K

Source

huggingface_datasets

Updated

2026-05-18

disco-eth/WorldSpeech

Dataset details

About disco-eth/WorldSpeech