HomeDatasetsnvidia/OCR-Synthetic-Multilingual-v1
O

nvidia/OCR-Synthetic-Multilingual-v1

Object Detection · nvidia· 22.3K
cc-by-4.0 5.0 TB task_categories:object-detectiontask_categories:image-to-textlanguage:enlanguage:jalanguage:ko

Large-scale synthetically generated OCR training dataset for multilingual text detection and recognition. The data was produced using a heavily modified and extended version of SynthDoG (Synthetic Document Generator), originally introduced in the Donut project by Kim et al.

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull nvidia/OCR-Synthetic-Multilingual-v1

Dataset details

Task
Object Detection
Language
en
License
cc-by-4.0
Size
5.0 TB
Creator
nvidia
Downloads
22.3K
Source
huggingface_datasets
Updated
2026-04-20

About nvidia/OCR-Synthetic-Multilingual-v1

Large-scale synthetically generated OCR training dataset for multilingual text detection and recognition. The data was produced using a heavily modified and extended version of SynthDoG (Synthetic Document Generator), originally introduced in the Donut project by Kim et al.