MiniMax M2.5 230B
Popular 230B coder that opts for a classic architecture instead of the newer hybrid-attention ideas.
MiniMax M2.5 230B decoder block architecture: Attention: GQA + QK-Norm with QK-Norm. Normalization: RMSNorm. FFN: Mixture of Experts (10B active parameters). Position encoding: RoPE. Scale: 230B, 197K context, 62 layers. Decoder type: MoE.
Architecture Specifications
Key Features
Deep Dive
Overview
MiniMax M2.5 is MiniMax's February 2026 refresh of MiniMax M2 — same 230 B total / 10 B active MoE architecture, same 62-layer GQA stack, same QK-Norm attention stability trick. The primary delta versus M2 is in post-training and the data mix, not architecture. Read the MiniMax M2 230B deep dive first — this one covers only the deltas.
What Changed from M2
- Partial RoPE removed: M2.5's config.json applies full RoPE to every query/key dimension, dropping M2's partial-RoPE experiment. MiniMax apparently found the content-based dimensions weren't pulling their weight at scale.
- Post-training refresh: updated agentic and reasoning SFT mixes.
- Architecture otherwise unchanged: 62 layers, 10 B active, ≈ 248 KiB/token KV cache, 197 K native context.
Architecture at a Glance
| Parameter | Value | Notes |
|---|---|---|
| Total parameters | ≈ 230 B | MoE — same as M2 |
| Active parameters | ≈ 10 B | per token |
| Layers | 62 | all GQA |
| Attention | GQA + QK-Norm | partial RoPE dropped |
| KV cache | ≈ 248 KiB/token | |
| Max position | ≈ 201,728 | 197 K native |
| Precision | bfloat16 |
Verdict: M2 + Better Post-Training
MiniMax M2.5 is a point release targeted at improving downstream quality rather than architectural efficiency. For teams already running M2 the upgrade is a drop-in. The quiet interesting signal is the removal of partial RoPE: MiniMax was one of the few labs experimenting with partial positional encoding at frontier scale, and rolling it back suggests the research community's 'full RoPE is enough' consensus was correct for this scale band.
References
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.