HomeDatasetsmvp-lab/LLaVA-OneVision-1.5-Instruct-Data
L

mvp-lab/LLaVA-OneVision-1.5-Instruct-Data

Image Text To Text · mvp-lab· 56.8K
apache-2.0 6.1 TB task_categories:image-text-to-textlanguage:enlicense:apache-2.0size_categories:10M<n<100Mmodality:image

📌 Introduction This dataset, LLaVA-OneVision-1.5-Instruct, was collected and integrated during the development of LLaVA-OneVision-1.5. LLaVA-OneVision-1.5 is a novel family of Large Multimodal Models (LMMs) that achieve state-of-the-art performance with significantly reduced computational and financial costs. This meticulously curated 22M instruction dataset (LLaVA-OneVision-1.5-Instruct) is part

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull mvp-lab/LLaVA-OneVision-1.5-Instruct-Data

Dataset details

Task
Image Text To Text
Language
en
License
apache-2.0
Size
6.1 TB
Rows / images
21.2M
Creator
mvp-lab
Downloads
56.8K
Source
huggingface_datasets
Updated
2025-11-21

About mvp-lab/LLaVA-OneVision-1.5-Instruct-Data

--- license: apache-2.0 taskcategories: - image-text-to-text language: - en tags: - multimodal - vision-language-model - lmm - instruction-tuning - pretraining - dataset-collection - vqa - image-captioning - large-language-model configs: - configname: CLEVR datafiles: - split: train path: CLEVR/train- - configname: CLEVR-Math datafiles: - split: train path: CLEVR-Math/train- - configname: Docmatix datafiles: - split: train path: Docmatix/train- - configname: Docmatix-part-00-of-10 datafiles: - split: train path: Docmatix-part-00-of-10/train- - configname: Docmatix-part-01-of-10 datafiles: - split: train path: Docmatix-part-01-of-10/train- - configname: Docmatix-part-02-of-10 datafiles: - split: train path: Docmatix-part-02-of-10/train- - configname: Docmatix-part-03-of-10 datafiles: - split: train path: Docmatix-part-03-of-10/train- - configname: Docmatix-part-04-of-10 datafiles: - split: train path: Docmatix-part-04-of-10/train- - configname: Docmatix-part-05-of-10 datafiles: - split: train path: Docmatix-part-05-of-10/train- - configname: Docmatix-part-06-of-10 datafiles: - split: train path: Docmatix-part-06-of-10/train- - configname: Docmatix-part-07-of-10 datafiles: - split: train path: Docmatix-part-07-of-10/train- - configname: Docmatix-part-08-of-10 datafiles: - split: train path: Docmatix-part-08-of-10/train- - configname: Docmatix-part-09-of-10 datafiles: - split: train path: Docmatix-part-09-of-10/train- - configname: Evol-Instruct-GPT4-Turbo datafiles: - s