Gemma 4 31B

Architecture Specifications

Parameters30.7B

Context Window256K

Decoder TypeDense

AttentionGQA + QK-Norm + SWA

Vocabulary Size262K

Release Date2026-04

CategoryLong Context

OrganizationGoogle

Key Features

Grouped Query AttentionSliding Window AttentionQK normalizationLayer mix: 50 sliding-window + 10 globalKV cache: 840 KiB/token

Deep Dive

Overview

Gemma 4 31B is the dense variant of the Gemma 4 family, released April 2026 alongside the Gemma 4 26B-A4B MoE. At 30.7 B dense parameters it is a direct successor to Gemma 3 27B, with the same local-global attention interleave, QK-Norm, and soft-cap heritage but a doubled context window (128 K → 256 K). For teams that cannot or will not adopt MoE serving, Gemma 4 31B is the dense continuation of the Gemma 3 lineage.

Architecture at a Glance

Parameter	Value	Notes
Total parameters	≈ 30.7 B	dense
Layers	60	50 sliding-window + 10 global (5:1)
Attention	GQA + QK-Norm + SWA	inherited from Gemma 3
KV cache	≈ 840 KiB/token	large — dense attention on 60 layers
Max position	262,144	256 K native
Precision	bfloat16

Gemma 4 31B configuration (source: HuggingFace config.json)

840 KiB/Token KV Cache

Gemma 4 31B's KV cache at ≈ 840 KiB per token is the largest per-token footprint in this gallery, which is a direct consequence of running full-MHA-width attention across 60 layers on a dense stack. At the full 256 K native context this is ≈ 215 GiB of KV cache per sequence, which makes long-context serving expensive. The Gemma 4 31B dense variant is therefore best matched to shorter-context workloads where the per-token serving cost is amortized across reasonable sequence lengths; for 128 K+ context workloads, the Gemma 4 26B-A4B MoE variant is a much better economic fit.

5:1 Local-Global Preserved

Like Gemma 3 27B and Gemma 4 26B-A4B, Gemma 4 31B uses a 5:1 sliding-window to global ratio — 50 local layers + 10 global layers. This is the signature Gemma attention structure, chosen in Gemma 2 and scaled up with each release. Gemma 3 27B's dual-RoPE frequencies (local θ=10K, global θ=1M) and logit soft-capping (attention ≈ 50.0, output ≈ 30.0) both carry over to Gemma 4.

Verdict: The Dense Gemma Continuation

Gemma 4 31B is for teams that cannot adopt MoE serving: fine-tuning researchers working with standard dense-optimized frameworks, hardware environments without MoE kernel support, or anyone whose production pipeline is already tuned for Gemma 3 dense serving and does not want to migrate the expert-routing infrastructure. Architecturally it is a point upgrade to Gemma 3 27B with a doubled context window. For new deployments where MoE is on the table, Gemma 4 26B-A4B is the better economic choice.

References

Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Request demo Back to gallery

Gemma 4 31BGemma 4 31B