HomeDatasetsamphion/Emilia-Dataset
E

amphion/Emilia-Dataset

Text To Speech · amphion· 74.5K
cc-by-4.0 6.5 TB task_categories:text-to-speechtask_categories:automatic-speech-recognitionlanguage:zhlanguage:enlanguage:ja

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation This is the official repository 👑 for the Emilia dataset and the source code for the Emilia-Pipe speech data preprocessing pipeline. News 🔥 2025/02/26: The Emilia-Large dataset, featuring over 200,000 hours of data, is now available!!! Emilia-Large combines the original 101k-hour Emilia dataset (licensed under CC BY-NC 4.0) with the brand-new 114k-hour Emilia-YODAS… See the full description on the dataset page: https://huggingface.co/datasets/amphion/Emilia-Dataset.

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull amphion/Emilia-Dataset

Dataset details

Task
Text To Speech
Language
zh
License
cc-by-4.0
Size
6.5 TB
Creator
amphion
Downloads
74.5K
Source
huggingface_datasets
Updated
2025-02-28