Gemma 4 26B-A4B

Architecture Specifications

Parameters3.8B active / 25.2B total

Context Window256K

Decoder TypeMoE

AttentionGQA + QK-Norm + SWA

Active Parameters3.8B

Vocabulary Size262K

Release Date2026-04

CategoryMixture of Experts

OrganizationGoogle

Key Features

Grouped Query AttentionSliding Window AttentionQK normalizationExpert routingLayer mix: 25 sliding-window + 5 globalKV cache: 210 KiB/token

Deep Dive

Overview

Gemma 4 26B-A4B is Google's first MoE release in the Gemma line, published April 2026. At 25.2 B total / 3.8 B active it is a sparse MoE in the same band as Qwen3-30B-A3B. The Gemma 3 → Gemma 4 transition is therefore a dense → MoE generational shift that mirrors what Mistral did with Small 4. Structural inheritances from Gemma 3 — local-global attention interleave, QK-Norm, large vocabulary — are preserved, making this a natural A/B partner for the Gemma 3 27B deep dive.

Architecture at a Glance

Parameter	Value	Notes
Total parameters	≈ 25.2 B	MoE — Gemma's first
Active parameters	≈ 3.8 B	per token
Layers	30	25 sliding-window + 5 global (5:1)
Attention	GQA + QK-Norm + SWA	inherited from Gemma 3
KV cache	≈ 210 KiB/token
Max position	262,144	256 K native — double Gemma 3's 128 K
Precision	bfloat16

Gemma 4 26B-A4B configuration (source: HuggingFace config.json)

Google's First Gemma MoE

Gemma 1, Gemma 2, and Gemma 3 were all dense. Gemma 4 26B-A4B is the first MoE release in the Gemma family, alongside a dense Gemma 4 31B variant. Per-token compute at 3.8 B active is lower than Gemma 3 27B's 27 B dense compute by nearly 7×, so Gemma 4 26B-A4B serves dramatically faster than Gemma 3 27B while carrying slightly less total capacity. This is a deliberate tradeoff: Google is betting that for Gemma's typical deployment context (single-GPU inference on consumer hardware), serving speed matters more than peak quality.

Local-Global 5:1 Preserved

The 5:1 sliding-window to global attention ratio — Gemma 3's signature choice — carries over unchanged to Gemma 4. Only 5 of 30 layers attend to the full 256 K context, which keeps per-token KV cost manageable even as the context window doubled from 128 K to 256 K. QK-Norm on queries and keys is also preserved, as is Gemma's distinctive logit soft-capping.

Verdict: Gemma Goes MoE

Gemma 4 26B-A4B is architecturally conservative within the Gemma family — it inherits every Gemma 3 design choice and adds MoE on top — but strategically significant as Google's signal that MoE is now the default for Gemma deployments. For teams running Gemma 3 27B in production, Gemma 4 26B-A4B is a drop-in upgrade that roughly doubles serving throughput and doubles context window while preserving the same local-global attention semantics.

References

Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Request demo Back to gallery

Gemma 4 26B-A4BGemma 4 26B-A4B