Helsinki-NLP/nemotron-cc-translated

Name: Helsinki-NLP/nemotron-cc-translated
Creator: Helsinki-NLP
License: cc0-1.0
Keywords: huggingface, task_categories:translation, task_categories:text-generation, language:bos, language:bul, language:cat, language:ces, language:dan, language:deu, translation, text-generation

Translation · Helsinki-NLP· 43.6K

cc0-1.0 8.3 TB task_categories:translationtask_categories:text-generationlanguage:boslanguage:bullanguage:cat

nemotron-cc-tanslated is a collection of automatically translated documents from nemotron-cc taken out of the high-quality subset. Translations are based on OPUS-MT and HPLT-MT models. The data in v1.0 covers 156,431,999 documents with over 70 billion space-searated tokens of English data translated into 36 languages. The total v1.0 data set includes over 2.4 trillion tokens and the translated doc

Open in MLForge Sign up free Desktop app

# download instantly
mlforge datasets pull Helsinki-NLP/nemotron-cc-translated

Dataset details

Task

Translation

Language

bos

License

cc0-1.0

Size

8.3 TB

Rows / images

7.4B

Creator

Helsinki-NLP

Downloads

43.6K

Source

huggingface_datasets

Updated

2026-04-27

Helsinki-NLP/nemotron-cc-translated

Dataset details

About Helsinki-NLP/nemotron-cc-translated