MiniMax M2.5 230B

Architecture Specifications

Parameters10B active / 230B total

Context Window197K

Decoder TypeMoE

AttentionGQA + QK-Norm

Active Parameters10B

Layers62

Hidden Size3,072

Vocabulary Size200K

Release Date2026-02

CategoryMixture of Experts

OrganizationUnknown

Key Features

Grouped Query AttentionQK normalizationLayer mix: 62 GQAKV cache: 248 KiB/token

Deep Dive

Overview

MiniMax M2.5 is MiniMax's February 2026 refresh of MiniMax M2 — same 230 B total / 10 B active MoE architecture, same 62-layer GQA stack, same QK-Norm attention stability trick. The primary delta versus M2 is in post-training and the data mix, not architecture. Read the MiniMax M2 230B deep dive first — this one covers only the deltas.

What Changed from M2

Partial RoPE removed: M2.5's config.json applies full RoPE to every query/key dimension, dropping M2's partial-RoPE experiment. MiniMax apparently found the content-based dimensions weren't pulling their weight at scale.
Post-training refresh: updated agentic and reasoning SFT mixes.
Architecture otherwise unchanged: 62 layers, 10 B active, ≈ 248 KiB/token KV cache, 197 K native context.

Architecture at a Glance

Parameter	Value	Notes
Total parameters	≈ 230 B	MoE — same as M2
Active parameters	≈ 10 B	per token
Layers	62	all GQA
Attention	GQA + QK-Norm	partial RoPE dropped
KV cache	≈ 248 KiB/token
Max position	≈ 201,728	197 K native
Precision	bfloat16

MiniMax M2.5 230B configuration (source: HuggingFace config.json)

Verdict: M2 + Better Post-Training

MiniMax M2.5 is a point release targeted at improving downstream quality rather than architectural efficiency. For teams already running M2 the upgrade is a drop-in. The quiet interesting signal is the removal of partial RoPE: MiniMax was one of the few labs experimenting with partial positional encoding at frontier scale, and rolling it back suggests the research community's 'full RoPE is enough' consensus was correct for this scale band.

References

MiniMax M2.5 — HuggingFace config.json

Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Request demo Back to gallery

MiniMax M2.5 230BMiniMax M2.5 230B