Sinoosoida/SpeechRu

Name: Sinoosoida/SpeechRu
Creator: Sinoosoida
License: other
Keywords: huggingface, task_categories:automatic-speech-recognition, task_categories:text-to-speech, task_categories:audio-classification, language:ru, license:other, size_categories:100K<n<1M, format:parquet, format:optimized-parquet, automatic-speech-recognition, text-to-speech, audio-classification

Automatic Speech Recognition · Sinoosoida· 54.4K

other 6.8 TB task_categories:automatic-speech-recognitiontask_categories:text-to-speechtask_categories:audio-classificationlanguage:rulicense:other

~186k unlabeled Russian-language podcast episodes scraped from the web, packaged as Parquet shards with the audio bytes embedded. The audio has no transcripts — this is an unsupervised / self-supervised audio corpus, suitable for ASR pre-training, speech-representation learning, TTS data mining, audio classification, and similar tasks.

Open in MLForge Sign up free Desktop app

# download instantly
mlforge datasets pull Sinoosoida/SpeechRu

Dataset details

Task

Automatic Speech Recognition

Language

License

other

Size

6.8 TB

Rows / images

186.1K

Classes

Creator

Sinoosoida

Downloads

54.4K

Source

huggingface_datasets

Updated

2026-06-15

Sinoosoida/SpeechRu

Dataset details

About Sinoosoida/SpeechRu