Sarvam 105B

Architecture Specifications

Parameters10.3B active / 105B total

Context Window131K

Decoder TypeMoE

AttentionMLA + KV LayerNorm + NoPE + RoPE

Active Parameters10.3B

Layers32

Hidden Size4,096

Vocabulary Size262K

Release Date2026-03

CategoryMixture of Experts

OrganizationUnknown

Key Features

Multi-head Latent AttentionExpert routingLayer mix: 32 MLAKV cache: 36 KiB/token

Deep Dive

Overview

Sarvam 105B is the larger of Sarvam AI's March 2026 pair of open-weight MoEs, targeted at Indic-language workloads. At 105 B total / 10.3 B active, it is a mid-tier MoE with one of the more unusual attention stacks in this gallery: MLA + KV LayerNorm + NoPE + RoPE mixed. The config.json shows 32 MLA layers, a 262 K vocabulary (one of the largest in this gallery — explicitly sized for Indic scripts), and a 131 K context window.

Architecture at a Glance

Parameter	Value	Notes
Total parameters	≈ 105 B	MoE
Active parameters	≈ 10.3 B	per token
Layers	32	all MLA
Attention	MLA + KV LayerNorm + NoPE + RoPE	hybrid position scheme
KV cache	≈ 36 KiB/token	MLA compression
Max position	131,072	128 K native
Vocabulary	≈ 262,000	large — sized for Indic scripts
Precision	bfloat16

Sarvam 105B configuration (source: HuggingFace config.json)

NoPE + RoPE Hybrid

Sarvam 105B is one of the few models in this gallery to ship a hybrid NoPE + RoPE positional scheme: some layers use no positional information at all, others use standard RoPE. 'NoPE' (no positional encoding) layers let the model rely purely on content-based attention patterns, which research in 2024 showed can be surprisingly effective for in-context learning tasks. Mixing NoPE with RoPE layers is an attempt to get the best of both: content-addressable memory on some layers, relative-position awareness on others.

The config also adds KV LayerNorm — a LayerNorm applied to keys and values before the attention operation. This is an attention-stability trick in the same family as QK-Norm, pioneered in a handful of open-weight models but not yet standard.

262K Vocabulary for Indic Coverage

The 262 K vocabulary is twice the size of most peer models (Llama 3 ships 128 K, Qwen3 ships 152 K). Sarvam's pitch: Indic scripts (Devanagari, Tamil, Bengali, etc.) need dense tokenizer coverage to avoid the byte-per-token penalty that hits English-optimized BPE tokenizers on non-Latin scripts. At 262 K tokens, a Sarvam tokenization of Hindi or Tamil text is roughly 2–3× more compressive than the same text under Llama 3's tokenizer, which directly translates to better effective context and lower inference cost on Indic workloads.

Verdict: The Indic-First Frontier

Sarvam 105B is the default pick for Indic-language production workloads at the 100 B-class tier. Its architectural novelty is in the position-encoding hybrid and the large vocabulary, both directly serving the Indic-coverage mission. For English-only use cases, Qwen3-235B-A22B or DeepSeek-V3 benchmark higher. For Indic-first teams that cannot use closed Indian models (e.g., Krutrim, Bhashini), Sarvam 105B is the strongest open-weight option.

References

Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Request demo Back to gallery

Sarvam 105BSarvam 105B