Tiny Aya 3.35B

Architecture Specifications

Parameters3.35B

Context Window8K

Decoder TypeDense

AttentionGQA + 3:1 SWA attention

Release Date2026-02

CategoryEfficient & Small

OrganizationCohere

Key Features

Grouped Query AttentionSliding Window AttentionRoPE embeddingsLayer mix: 27 sliding-window + 9 globalKV cache: 72 KiB/token

Deep Dive

Overview

Tiny Aya 3.35B is Cohere's February 2026 multilingual-first small model, part of the Aya research line that has historically focused on coverage across long-tail languages. At 3.35 B dense parameters, it competes with Llama 3.2 3B, SmolLM3 3B, and Qwen3-4B in the edge-tier band. The distinctive design choices per the shipped config.json are a 3:1 sliding-window to global attention ratio and a very tight 8 K context window — one of the smallest in any 2026 release.

Architecture at a Glance

Parameter	Value	Notes
Total parameters	≈ 3.35 B	dense
Layers	36	27 sliding-window + 9 global (3:1)
Attention	GQA + 3:1 SWA + RoPE
KV cache	≈ 72 KiB/token
Max position	8,192	8 K native — intentionally small
Precision	bfloat16

Tiny Aya 3.35B configuration (source: HuggingFace config.json)

Why 8K Context?

8 K is an unusually short context window by 2026 standards — nearly every comparable small model ships at 128 K or longer. The Aya team's focus is multilingual coverage, not long-context retrieval: budget that would have gone into long-context pretraining instead goes into broader language-mix coverage (the Aya line covers 100+ languages) and higher-quality per-language data. For chat, translation, and summarization tasks on languages that competing 3B models handle poorly, this is a better budget allocation than an unused 128 K window.

Verdict: The Multilingual Small Model

Tiny Aya 3.35B is the default 3B-class pick for multilingual workloads — especially any language outside the top 10 by training-data volume. Architecturally it is conservative and nothing in the config is novel, but Cohere's Aya line has consistently out-benchmarked general 3B models on long-tail languages, and the tight 8 K context is a feature for the specific workload Aya targets, not a limitation.

References

Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Request demo Back to gallery

Tiny Aya 3.35BTiny Aya 3.35B