HomeModelsText Generationibm-research/PowerMoE-3b
P

ibm-research/PowerMoE-3b

Text Generation·ibm-research· 1.6M· 21
transformers apache-2.0 3.4B params arxiv:2408.13359license:apache-2.0region:us

Model Summary PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language

Open in MLForge Sign up free Desktop app Source ↗
# pull & run locally
pip install mlforge-sdk && mlforge pull ibm-research/PowerMoE-3b

Model details

Task
Text Generation
Provider
ibm-research
Framework
transformers
Parameters
3.4B
Size
19 GB
License
apache-2.0
Downloads
1.6M
Likes
21
Paper
arXiv:2408.13359
Updated
2024-09-24

About ibm-research/PowerMoE-3b

Model Summary PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning. Paper: https://arxiv.org/abs/2408.13359

Related Text Generation

Q Qwen/Qwen3-0.6B Text Generation ·751.6M params 27.8M 1.4K 🤗 HF Q Qwen/Qwen3-4B Text Generation ·4.0B params 16.4M 641 🤗 HF G openai-community/gpt2 Text Generation ·137.0M params 13.3M 3.3K 🤗 HF Q Qwen/Qwen3-8B Text Generation ·8.2B params 13.0M 1.2K 🤗 HF Q Qwen/Qwen2.5-7B-Instruct Text Generation ·7.6B params 12.8M 1.4K 🤗 HF O facebook/opt-125m Text Generation 12.3M 267 🤗 HF