MoE
Zhipu AI · 2026-02
GLM-5
MoE decoder architecture with MLA + Sparse Attention attention mechanism.
GLM-5 decoder block architecture: Attention: MLA + Sparse Attention. Normalization: RMSNorm. FFN: Mixture of Experts (40B active parameters). Position encoding: RoPE. Scale: 744B, 202K context, 128 layers. Decoder type: MoE.
MLA + Sparse Attention·MoE · 40B active
40B active / 744B total|202K context|MLA + Sparse Attention|MoE
Architecture Specifications
Parameters40B active / 744B total
Context Window202K
Decoder TypeMoE
AttentionMLA + Sparse Attention
Active Parameters40B
Release Date2026-02
CategoryMixture of Experts
OrganizationZhipu AI
Key Features
MLA adoptionSparse attention40B active
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.