disco-eth/WorldSpeech
A multilingual ASR dataset containing over 65k hours of human transcribed speech across 127 language-region variants, drawn from national parliaments, public broadcasters, public-domain audiobooks, and international institutions. Rows consist of 24 kHz speech utterances paired with a human-provided transcript, an aligned ASR transcript, character error rate (CER) between the two, a WADA-SNR estima
mlforge datasets pull disco-eth/WorldSpeech
Dataset details
About disco-eth/WorldSpeech
A multilingual ASR dataset containing over 65k hours of human transcribed speech across 127 language-region variants, drawn from national parliaments, public broadcasters, public-domain audiobooks, and international institutions. Rows consist of 24 kHz speech utterances paired with a human-provided transcript, an aligned ASR transcript, character error rate (CER) between the two, a WADA-SNR estimate, and four DNSMOS-P.835 quality scores.