Tiny Aya 3.35B
Compact multilingual model from Cohere with a rare parallel transformer block.
Tiny Aya 3.35B decoder block architecture: Attention: GQA + 3:1 SWA attention with Sliding Window Attention. Normalization: RMSNorm. FFN: SwiGLU. Position encoding: RoPE. Scale: 3.35B, 8K context, 24 layers. Decoder type: Dense.
Architecture Specifications
Key Features
Deep Dive
Overview
Tiny Aya 3.35B is Cohere's February 2026 multilingual-first small model, part of the Aya research line that has historically focused on coverage across long-tail languages. At 3.35 B dense parameters, it competes with Llama 3.2 3B, SmolLM3 3B, and Qwen3-4B in the edge-tier band. The distinctive design choices per the shipped config.json are a 3:1 sliding-window to global attention ratio and a very tight 8 K context window — one of the smallest in any 2026 release.
Architecture at a Glance
| Parameter | Value | Notes |
|---|---|---|
| Total parameters | ≈ 3.35 B | dense |
| Layers | 36 | 27 sliding-window + 9 global (3:1) |
| Attention | GQA + 3:1 SWA + RoPE | |
| KV cache | ≈ 72 KiB/token | |
| Max position | 8,192 | 8 K native — intentionally small |
| Precision | bfloat16 |
Why 8K Context?
8 K is an unusually short context window by 2026 standards — nearly every comparable small model ships at 128 K or longer. The Aya team's focus is multilingual coverage, not long-context retrieval: budget that would have gone into long-context pretraining instead goes into broader language-mix coverage (the Aya line covers 100+ languages) and higher-quality per-language data. For chat, translation, and summarization tasks on languages that competing 3B models handle poorly, this is a better budget allocation than an unused 128 K window.
Verdict: The Multilingual Small Model
Tiny Aya 3.35B is the default 3B-class pick for multilingual workloads — especially any language outside the top 10 by training-data volume. Architecturally it is conservative and nothing in the config is novel, but Cohere's Aya line has consistently out-benchmarked general 3B models on long-tail languages, and the tight 8 K context is a feature for the specific workload Aya targets, not a limitation.
References
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.