HomeModelsImage Text To Textnvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1
L

nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1

Image Text To Text·nvidia· 1.2M· 181
transformers other 8.7B params license:otherregion:us

Llama Nemotron Nano VL is a leading document intelligence vision language model (VLMs) that enables the ability to query and summarize images from the physical or virtual world. Llama Nemotron Nano VL is deployable in the data center, cloud and at the edge, including Jetson Orin and laptop by AWQ 4bit quantization through TinyChat framework. We find: (1) image-text pairs are not enough, interleave

Open in MLForge Sign up free Desktop app Source ↗
# pull & run locally
pip install mlforge-sdk && mlforge pull nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1

Model details

Task
Image Text To Text
Provider
nvidia
Framework
transformers
Parameters
8.7B
Size
16 GB
License
other
Downloads
1.2M
Likes
181
Updated
2025-12-04

About nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1

Llama Nemotron Nano VL is a leading document intelligence vision language model (VLMs) that enables the ability to query and summarize images from the physical or virtual world. Llama Nemotron Nano VL is deployable in the data center, cloud and at the edge, including Jetson Orin and laptop by AWQ 4bit quantization through TinyChat framework. We find: (1) image-text pairs are not enough, interleaved image-text is essential; (2) unfreezing LLM during interleaved image-text pre-training enables in-context learning; (3)re-blending text-only instruction data is crucial to boost both VLM and text-only performance.

Related Image Text To Text

G google/gemma-4-26B-A4B-it Image Text To Text ·26.5B params 13.1M 1.2K 🤗 HF G google/gemma-4-31B-it Image Text To Text ·32.7B params 11.2M 3.1K 🤗 HF Q Qwen/Qwen3.5-9B Image Text To Text ·9.7B params 9.8M 1.6K 🤗 HF Q Qwen/Qwen3.5-4B Image Text To Text ·4.7B params 9.6M 683 🤗 HF Q Qwen/Qwen2.5-VL-7B-Instruct Image Text To Text ·8.3B params 9.4M 1.6K 🤗 HF Q Qwen/Qwen3.6-35B-A3B-FP8 Image Text To Text ·36.0B params 5.8M 284 🤗 HF