HomeModelsImage Text To TextHuggingFaceTB/SmolVLM-256M-Instruct
S

HuggingFaceTB/SmolVLM-256M-Instruct

Image Text To Text·HuggingFaceTB· 979.4K· 372
transformers apache-2.0 256.5M params dataset:HuggingFaceM4/the_cauldrondataset:HuggingFaceM4/Docmatixarxiv:2504.05299base_model:HuggingFaceTB/SmolLM2-135M-Instructbase_model:quantized:HuggingFaceTB/SmolLM2-135M-Instruct

SmolVLM-256M is the smallest multimodal model in the world. It accepts arbitrary sequences of image and text inputs to produce text outputs. It's designed for efficiency. SmolVLM can answer questions about images, describe visual content, or transcribe text. Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance on multimodal tasks. It can ru

Open in MLForge Sign up free Desktop app Source ↗
# pull & run locally
pip install mlforge-sdk && mlforge pull HuggingFaceTB/SmolVLM-256M-Instruct

Model details

Task
Image Text To Text
Provider
HuggingFaceTB
Framework
transformers
Parameters
256.5M
Size
5.3 GB
License
apache-2.0
Downloads
979.4K
Likes
372
Paper
arXiv:2504.05299
Updated
2025-04-08

About HuggingFaceTB/SmolVLM-256M-Instruct

SmolVLM-256M is the smallest multimodal model in the world. It accepts arbitrary sequences of image and text inputs to produce text outputs. It's designed for efficiency. SmolVLM can answer questions about images, describe visual content, or transcribe text. Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance on multimodal tasks. It can run inference on one image with under 1GB of GPU RAM.

Related Image Text To Text

G google/gemma-4-26B-A4B-it Image Text To Text ·26.5B params 13.1M 1.2K 🤗 HF G google/gemma-4-31B-it Image Text To Text ·32.7B params 11.2M 3.1K 🤗 HF Q Qwen/Qwen3.5-9B Image Text To Text ·9.7B params 9.8M 1.6K 🤗 HF Q Qwen/Qwen3.5-4B Image Text To Text ·4.7B params 9.6M 683 🤗 HF Q Qwen/Qwen2.5-VL-7B-Instruct Image Text To Text ·8.3B params 9.4M 1.6K 🤗 HF Q Qwen/Qwen3.6-35B-A3B-FP8 Image Text To Text ·36.0B params 5.8M 284 🤗 HF