GLM-5.1

MoE decoder architecture with MLA + Sparse Attention attention mechanism.

MLA + Sparse Attention·MoE · 40B active

40B active / 744B total|202K context|MLA + Sparse Attention|MoE

Architecture Specifications

Parameters40B active / 744B total

Context Window202K

Decoder TypeMoE

AttentionMLA + Sparse Attention

Active Parameters40B

Layers78

Hidden Size6,144

Vocabulary Size155K

Release Date2026-04

CategoryMixture of Experts

OrganizationZhipu AI

GLM-5 refreshMLA + DeepSeek Sparse Attention40B activeLayer mix: 78 MLA

Enterprise AI platform

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.