HomeModelsText Generationnvidia/Nemotron-Labs-Diffusion-8B-Base
N

nvidia/Nemotron-Labs-Diffusion-8B-Base

Text Generation·nvidia· 504.6K· 6
transformers other 8.5B params license:otherregion:us

Nemotron-Labs-Diffusion is a tri-mode language model that supports both AR decoding and diffusion-based parallel decoding by simply switching the attention pattern of the same model during inference. The synergy between these two modes enables a third mode, called self-speculation: the same model performs diffusion-based parallel drafting and AR verification with shared KV cache, achieving high ac

Open in MLForge Sign up free Desktop app Source ↗
# pull & run locally
pip install mlforge-sdk && mlforge pull nvidia/Nemotron-Labs-Diffusion-8B-Base

Model details

Task
Text Generation
Provider
nvidia
Framework
transformers
Parameters
8.5B
Size
47 GB
License
other
Downloads
504.6K
Likes
6
Updated
2026-06-03

About nvidia/Nemotron-Labs-Diffusion-8B-Base

Nemotron-Labs-Diffusion is a tri-mode language model that supports both AR decoding and diffusion-based parallel decoding by simply switching the attention pattern of the same model during inference. The synergy between these two modes enables a third mode, called self-speculation: the same model performs diffusion-based parallel drafting and AR verification with shared KV cache, achieving high acceptance lengths and decoding efficiency. The seamless mode switching by simply changing attention patterns enables high efficiency at different concurrency levels in varying deployment scenarios with one single model.

Related Text Generation

Q Qwen/Qwen3-0.6B Text Generation ·751.6M params 27.8M 1.4K 🤗 HF Q Qwen/Qwen3-4B Text Generation ·4.0B params 16.4M 641 🤗 HF G openai-community/gpt2 Text Generation ·137.0M params 13.3M 3.3K 🤗 HF Q Qwen/Qwen3-8B Text Generation ·8.2B params 13.0M 1.2K 🤗 HF Q Qwen/Qwen2.5-7B-Instruct Text Generation ·7.6B params 12.8M 1.4K 🤗 HF O facebook/opt-125m Text Generation 12.3M 267 🤗 HF