Task
Image Text To Text
FineVision is a massive collection of datasets with 17.3M images, 24.3M samples, 88.9M turns, and 9.5B answer tokens, designed for training state-of-the-art open Vision-Language-Models.
FineVision is a massive collection of datasets with 17.3M images, 24.3M samples, 88.9M turns, and 9.5B answer tokens, designed for training state-of-the-art open Vision-Language-Models.