nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
------ Total parameters 31B (Mamba2-Transformer hybrid MoE) Active parameters ~3B per token Max context 256k tokens Modalities (in) Video, Audio, Image, Text Modality (out) Text Reasoning mode On by default; toggle via enablethinking Best for Video+speech analysis, document intelligence (OCR/charts/long docs), GUI/agentic workflows, ASR Minimum GPU (BF16) 1× H100 80GB (singl
pip install mlforge-sdk && mlforge pull nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
Model details
About nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
------ Total parameters 31B (Mamba2-Transformer hybrid MoE) Active parameters ~3B per token Max context 256k tokens Modalities (in) Video, Audio, Image, Text Modality (out) Text Reasoning mode On by default; toggle via enablethinking Best for Video+speech analysis, document intelligence (OCR/charts/long docs), GUI/agentic workflows, ASR Minimum GPU (BF16) 1× H100 80GB (single-GPU); 1× B200 / 1× H200 recommended Minimum GPU (FP8) 1× L40S 48GB; 1× RTX Pro 6000 / 1× B200 recommended Minimum GPU (NVFP4) 1× RTX 5090 32GB; 1× DGX Spark / 1× Jetson Thor also supported Precisions BF16 (62 GB) · FP8 (33 GB) · NVFP4 (21 GB)