HomeModelsText Generationmicrosoft/Phi-tiny-MoE-instruct
P

microsoft/Phi-tiny-MoE-instruct

Text Generation·microsoft· 831.7K· 38
transformers mit 3.8B params arxiv:2506.18349arxiv:2404.14219arxiv:2409.12136license:mitregion:us

Phi-tiny-MoE is a lightweight Mixture of Experts (MoE) model with 3.8B total parameters and 1.1B activated parameters. It is compressed and distilled from the base model shared by Phi-3.5-MoE and GRIN-MoE using the SlimMoE approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and

Open in MLForge Sign up free Desktop app Source ↗
# pull & run locally
pip install mlforge-sdk && mlforge pull microsoft/Phi-tiny-MoE-instruct

Model details

Task
Text Generation
Provider
microsoft
Framework
transformers
Parameters
3.8B
Size
7.0 GB
License
mit
Downloads
831.7K
Likes
38
Paper
arXiv:2506.18349
Updated
2025-12-10

About microsoft/Phi-tiny-MoE-instruct

Phi-tiny-MoE is a lightweight Mixture of Experts (MoE) model with 3.8B total parameters and 1.1B activated parameters. It is compressed and distilled from the base model shared by Phi-3.5-MoE and GRIN-MoE using the SlimMoE approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and filtered public documents, with a focus on high-quality, reasoning-dense content. It is part of the SlimMoE series, which includes a larger variant, Phi-mini-MoE, with 7.6B total and 2.4B activated parameters.

Related Text Generation

Q Qwen/Qwen3-0.6B Text Generation ·751.6M params 27.8M 1.4K 🤗 HF Q Qwen/Qwen3-4B Text Generation ·4.0B params 16.4M 641 🤗 HF G openai-community/gpt2 Text Generation ·137.0M params 13.3M 3.3K 🤗 HF Q Qwen/Qwen3-8B Text Generation ·8.2B params 13.0M 1.2K 🤗 HF Q Qwen/Qwen2.5-7B-Instruct Text Generation ·7.6B params 12.8M 1.4K 🤗 HF O facebook/opt-125m Text Generation 12.3M 267 🤗 HF