HomeDatasetsSinoosoida/SpeechRu
S

Sinoosoida/SpeechRu

Automatic Speech Recognition · Sinoosoida· 54.4K
other 6.8 TB task_categories:automatic-speech-recognitiontask_categories:text-to-speechtask_categories:audio-classificationlanguage:rulicense:other

~186k unlabeled Russian-language podcast episodes scraped from the web, packaged as Parquet shards with the audio bytes embedded. The audio has no transcripts — this is an unsupervised / self-supervised audio corpus, suitable for ASR pre-training, speech-representation learning, TTS data mining, audio classification, and similar tasks.

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull Sinoosoida/SpeechRu

Dataset details

Task
Automatic Speech Recognition
Language
ru
License
other
Size
6.8 TB
Rows / images
186.1K
Classes
36
Creator
Sinoosoida
Downloads
54.4K
Source
huggingface_datasets
Updated
2026-06-15

About Sinoosoida/SpeechRu

~186k unlabeled Russian-language podcast episodes scraped from the web, packaged as Parquet shards with the audio bytes embedded. The audio has no transcripts — this is an unsupervised / self-supervised audio corpus, suitable for ASR pre-training, speech-representation learning, TTS data mining, audio classification, and similar tasks.