Name: speechcolab/gigaspeech
Creator: speechcolab
License: ["apache-2.0"]
Keywords: huggingface, task_categories:automatic-speech-recognition, task_categories:text-to-speech, task_categories:text-to-audio, multilinguality:monolingual, language:en, license:apache-2.0, size_categories:10M<n<100M, format:parquet, automatic-speech-recognition, text-to-speech, text-to-audio

Dataset Card for Gigaspeech Dataset Description GigaSpeech is an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training. The transcribed audio data is collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. Example Usage The training split has several configurations of… See the full description on the dataset page: https://huggingface.co/datasets/speechcolab/gigaspeech.

speechcolab/gigaspeech

Dataset details