HomeDatasetsnatgillin/translations-raw
T

natgillin/translations-raw

General · natgillin· 86.2K
mit language:multilinguallicense:mitsize_categories:1M<n<10Mformat:parquetmodality:text

natgillin/translations-raw Frozen, canonical raw bitext consolidated from upstream alvations/mtdata-raw* snapshots (since deleted). This is the read-only source-of-truth for downstream quality-filtering pipelines. 31,663 parquet files (1566.8 GB) 49 language pairs under data/<src-tgt>/ Schema: 5 columns — see below Read-only for downstream pipelines. Do not delete or modify. Schema Each parquet has 5 columns: column type description source string… See the full description on the dataset page: https://huggingface.co/datasets/natgillin/translations-raw.

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull natgillin/translations-raw

Dataset details

Task
General
Language
multilingual
License
mit
Creator
natgillin
Downloads
86.2K
Source
huggingface_datasets
Updated
2026-06-09

About natgillin/translations-raw

Frozen, canonical raw bitext consolidated from upstream alvations/mtdata-raw snapshots (since deleted). This is the read-only source-of-truth for downstream quality-filtering pipelines.