MoE
Zhipu AI · 2026-04
GLM-5.1
MoE decoder architecture with MLA + Sparse Attention attention mechanism.
GLM-5.1 decoder block architecture: Attention: MLA + Sparse Attention. Normalization: RMSNorm. FFN: Mixture of Experts (40B active parameters). Position encoding: RoPE. Scale: 744B, 202K context, 78 layers. Decoder type: MoE.
MLA + Sparse Attention·MoE · 40B active
40B active / 744B total|202K context|MLA + Sparse Attention|MoE
Architecture Specifications
Parameters40B active / 744B total
Context Window202K
Decoder TypeMoE
AttentionMLA + Sparse Attention
Active Parameters40B
Layers78
Hidden Size6,144
Vocabulary Size155K
Release Date2026-04
CategoryMixture of Experts
OrganizationZhipu AI
Key Features
GLM-5 refreshMLA + DeepSeek Sparse Attention40B activeLayer mix: 78 MLA
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.