Task
Automatic Speech Recognition
Dataset Card for GigaSpeech 2 Dataset Description GigaSpeech 2 is an evolving, large-scale, multi-domain, and multilingual ASR corpus focusing on low-resource languages. GigaSpeech 2 raw comprises about 30,000 hours of automatically transcribed speech, across Thai, Indonesian, and Vietnamese. GigaSpeech 2 refine consists of 10,000 hours of Thai, 6,000 hours each for Indonesian and Vietnamese. Repository: https://github.com/SpeechColab/GigaSpeech2 Paper:… See the full description on the dataset page: https://huggingface.co/datasets/speechcolab/gigaspeech2.