nvidia/Nemotron-Labs-Diffusion-8B-Base

Text Generation·nvidia· 504.6K· 6

transformers other 8.5B params license:otherregion:us

Open in MLForge Sign up free Desktop app Source ↗

# pull & run locally
pip install mlforge-sdk && mlforge pull nvidia/Nemotron-Labs-Diffusion-8B-Base

Model details

Task

Text Generation

Provider

nvidia

Framework

transformers

Parameters

8.5B

Size

47 GB

License

other

Downloads

504.6K

Likes

Updated

2026-06-03

About nvidia/Nemotron-Labs-Diffusion-8B-Base

Nemotron-Labs-Diffusion is a tri-mode language model that supports both AR decoding and diffusion-based parallel decoding by simply switching the attention pattern of the same model during inference. The synergy between these two modes enables a third mode, called self-speculation: the same model performs diffusion-based parallel drafting and AR verification with shared KV cache, achieving high acceptance lengths and decoding efficiency. The seamless mode switching by simply changing attention patterns enables high efficiency at different concurrency levels in varying deployment scenarios with one single model.

Related Text Generation

Q Qwen/Qwen3-0.6B Text Generation ·751.6M params 27.8M 1.4K 🤗 HF Q Qwen/Qwen3-4B Text Generation ·4.0B params 16.4M 641 🤗 HF G openai-community/gpt2 Text Generation ·137.0M params 13.3M 3.3K 🤗 HF Q Qwen/Qwen3-8B Text Generation ·8.2B params 13.0M 1.2K 🤗 HF Q Qwen/Qwen2.5-7B-Instruct Text Generation ·7.6B params 12.8M 1.4K 🤗 HF O facebook/opt-125m Text Generation 12.3M 267 🤗 HF