HomeModelsImage Text To Textallenai/Molmo2-8B
M

allenai/Molmo2-8B

Image Text To Text·allenai· 637.6K· 189
transformers apache-2.0 8.7B params dataset:allenai/Molmo2-Capdataset:allenai/Molmo2-VideoCapQA

Molmo2 is a family of open vision-language models developed by the Allen Institute for AI (Ai2) that support image, video and multi-image understanding and grounding. Molmo2 models are trained on publicly available third party datasets as referenced in our technical report and Molmo2 data, a collection of datasets with highly-curated image-text and video-text pairs. It has state-of-the-art perfor

Open in MLForge Sign up free Desktop app Source ↗
# pull & run locally
pip install mlforge-sdk && mlforge pull allenai/Molmo2-8B

Model details

Task
Image Text To Text
Provider
allenai
Framework
transformers
Parameters
8.7B
Size
32 GB
License
apache-2.0
Downloads
637.6K
Likes
189
Updated
2026-01-23

About allenai/Molmo2-8B

Molmo2 is a family of open vision-language models developed by the Allen Institute for AI (Ai2) that support image, video and multi-image understanding and grounding. Molmo2 models are trained on publicly available third party datasets as referenced in our technical report and Molmo2 data, a collection of datasets with highly-curated image-text and video-text pairs. It has state-of-the-art performance among multimodal models with a similar size. You can find all models in the Molmo2 family here.

Related Image Text To Text

G google/gemma-4-26B-A4B-it Image Text To Text ·26.5B params 13.1M 1.2K 🤗 HF G google/gemma-4-31B-it Image Text To Text ·32.7B params 11.2M 3.1K 🤗 HF Q Qwen/Qwen3.5-9B Image Text To Text ·9.7B params 9.8M 1.6K 🤗 HF Q Qwen/Qwen3.5-4B Image Text To Text ·4.7B params 9.6M 683 🤗 HF Q Qwen/Qwen2.5-VL-7B-Instruct Image Text To Text ·8.3B params 9.4M 1.6K 🤗 HF Q Qwen/Qwen3.6-35B-A3B-FP8 Image Text To Text ·36.0B params 5.8M 284 🤗 HF