HomeModelsOtherdeepseek-ai/DeepSeek-R1-Distill-Qwen-14B
D

deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

Other·deepseek-ai· 463.6K· 658
transformers mit 14.8B params arxiv:2501.12948license:mitregion:us

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-

Open in MLForge Sign up free Desktop app Source ↗
# pull & run locally
pip install mlforge-sdk && mlforge pull deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

Model details

Task
Other
Provider
deepseek-ai
Framework
transformers
Parameters
14.8B
Size
28 GB
License
mit
Downloads
463.6K
Likes
658
Paper
arXiv:2501.12948
Updated
2025-02-24

About deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

Related Other

E google/electra-base-discriminator Other 41.9M 128 🤗 HF A Bingsu/adetailer Other 12.6M 729 🤗 HF C colbert-ir/colbertv2.0 Other ·109.6M params 11.8M 362 🤗 HF C facebook/contriever Other 7.5M 93 🤗 HF W pyannote/wespeaker-voxceleb-resnet34-LM Other 7.3M 146 🤗 HF U lpiccinelli/unidepth-v2-vitl14 Other ·353.8M params 5.4M 34 🤗 HF