HomeDatasetsspeechcolab/gigaspeech
G

speechcolab/gigaspeech

Automatic Speech Recognition · speechcolab· 30.6K
["apache-2.0"] 2.8 TB task_categories:automatic-speech-recognitiontask_categories:text-to-speechtask_categories:text-to-audiomultilinguality:monolinguallanguage:en

Dataset Card for Gigaspeech Dataset Description GigaSpeech is an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training. The transcribed audio data is collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. Example Usage The training split has several configurations of… See the full description on the dataset page: https://huggingface.co/datasets/speechcolab/gigaspeech.

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull speechcolab/gigaspeech

Dataset details

Task
Automatic Speech Recognition
Language
en
License
["apache-2.0"]
Size
2.8 TB
Creator
speechcolab
Downloads
30.6K
Source
huggingface_datasets
Updated
2026-02-07