MoE
MiniMax · 2026-02
MiniMax-M2.5
MoE decoder architecture with GQA + QK-Norm attention mechanism.
MiniMax-M2.5 decoder block architecture: Attention: GQA + QK-Norm with QK-Norm. Normalization: RMSNorm. FFN: Mixture of Experts (10B active parameters). Position encoding: RoPE. Scale: 230B, 196K context, 128 layers. Decoder type: MoE.
GQA + QK-Norm·MoE · 10B active
10B active / 230B total|196K context|GQA + QK-Norm|MoE
Architecture Specifications
Parameters10B active / 230B total
Context Window196K
Decoder TypeMoE
AttentionGQA + QK-Norm
Active Parameters10B
Release Date2026-02
CategoryMixture of Experts
OrganizationMiniMax
Key Features
M2 upgrade10B activeImproved routing
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.