M
MLForge
Search models, datasets, tasks…
Models
Datasets
Pricing
Start free
Home
›
Models
›
Image Text To Text
›
nvidia/LocateAnything-3B
L
nvidia/LocateAnything-3B
Image Text To Text
·
nvidia
·
896.1K
·
2.5K
transformers
other
3.8B params
arxiv:2605.27365
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding
Open in MLForge
Sign up free
Desktop app
Source ↗
# pull & run locally
pip install mlforge-sdk && mlforge pull nvidia/LocateAnything-3B
Model details
Task
Image Text To Text
Provider
nvidia
Framework
transformers
Parameters
3.8B
Size
7.3 GB
License
other
Downloads
896.1K
Likes
2.5K
Paper
arXiv:2605.27365
Updated
2026-06-12
About nvidia/LocateAnything-3B
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding
Related Image Text To Text
G
google/
gemma-4-26B-A4B-it
Image Text To Text
·
26.5B params
13.1M
1.2K
🤗 HF
G
google/
gemma-4-31B-it
Image Text To Text
·
32.7B params
11.2M
3.1K
🤗 HF
Q
Qwen/
Qwen3.5-9B
Image Text To Text
·
9.7B params
9.8M
1.6K
🤗 HF
Q
Qwen/
Qwen3.5-4B
Image Text To Text
·
4.7B params
9.6M
683
🤗 HF
Q
Qwen/
Qwen2.5-VL-7B-Instruct
Image Text To Text
·
8.3B params
9.4M
1.6K
🤗 HF
Q
Qwen/
Qwen3.6-35B-A3B-FP8
Image Text To Text
·
36.0B params
5.8M
284
🤗 HF