nvidia/OCR-Synthetic-Multilingual-v1

Name: nvidia/OCR-Synthetic-Multilingual-v1
Creator: nvidia
License: cc-by-4.0
Keywords: huggingface, task_categories:object-detection, task_categories:image-to-text, language:en, language:ja, language:ko, language:ru, language:zh, license:cc-by-4.0, object-detection, image-to-text

Object Detection · nvidia· 22.3K

cc-by-4.0 5.0 TB task_categories:object-detectiontask_categories:image-to-textlanguage:enlanguage:jalanguage:ko

Large-scale synthetically generated OCR training dataset for multilingual text detection and recognition. The data was produced using a heavily modified and extended version of SynthDoG (Synthetic Document Generator), originally introduced in the Donut project by Kim et al.

Open in MLForge Sign up free Desktop app

# download instantly
mlforge datasets pull nvidia/OCR-Synthetic-Multilingual-v1

Dataset details

Task

Object Detection

Language

License

cc-by-4.0

Size

5.0 TB

Creator

nvidia

Downloads

22.3K

Source

huggingface_datasets

Updated

2026-04-20

nvidia/OCR-Synthetic-Multilingual-v1

Dataset details

About nvidia/OCR-Synthetic-Multilingual-v1