nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
:---:--- Total Parameters 120B (12B active) Architecture LatentMoE - Mamba-2 + MoE + Attention hybrid with Multi-Token Prediction (MTP) Context Length Up to 1M tokens Minimum GPU Requirement 8× H100-80GB Supported Languages English, French, German, Italian, Japanese, Spanish, Chinese Best For Agentic workflows, long-context reasoning, high-volume workloads (e.g. IT ticket automati
pip install mlforge-sdk && mlforge pull nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
Model details
About nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
:---:--- Total Parameters 120B (12B active) Architecture LatentMoE - Mamba-2 + MoE + Attention hybrid with Multi-Token Prediction (MTP) Context Length Up to 1M tokens Minimum GPU Requirement 8× H100-80GB Supported Languages English, French, German, Italian, Japanese, Spanish, Chinese Best For Agentic workflows, long-context reasoning, high-volume workloads (e.g. IT ticket automation), tool use, RAG Reasoning Mode Configurable on/off via chat template (enablethinking=True/False) License NVIDIA Nemotron Open Model License Release Date March 11, 2026