LLM Architecture Gallery

LLM Architectures

A comprehensive catalog of large language model architectures — decoder types, attention mechanisms, parameter counts, and context windows — curated for enterprise AI research and evaluation.

Explore ontology

Colaberry AI catalogs 79+ large language model architectures from 23 organizations including Meta, Google, OpenAI, DeepSeek, Alibaba, and Mistral. The gallery covers Dense Transformers, Mixture-of-Experts (MoE), Hybrid SSM-Transformer, and Recurrent models with architecture specifications, attention mechanisms, and context window sizes.

79 architectures
23 organizations
4 decoder types
LLM-indexed

Architectures

Curated gallery

Organizations

Global AI labs

Decoder Types

Dense · MoE · Hybrid · Recurrent

Filters

Search and filter

Find architectures by decoder type, organization, parameters, and features.

Showing 24 of 79 architectures

GQA·SwiGLU

Llama 3.2 3B

Meta · Unknown

View details

Scale

Context

128K

DecoderDense

AttentionGQA

Layers—

Grouped Query AttentionLayer mix: 28 GQAKV cache: 112 KiB/token

GQA + QK-Norm + SWA·MoE · 3.8B active

Gemma 4 26B-A4B

Google · 2026-04

View details config.json Tech report

Scale

3.8B / 25.2B

Context

256K

DecoderMoE

AttentionGQA + QK-Norm + SWA

Layers—

Grouped Query AttentionSliding Window AttentionQK normalization

GQA + QK-Norm + SWA·SwiGLU

Gemma 4 31B

Google · 2026-04

View details config.json Tech report

Scale

30.7B

Context

256K

DecoderDense

AttentionGQA + QK-Norm + SWA

Layers—

Grouped Query AttentionSliding Window AttentionQK normalization

MQA + QK-Norm + SWA·SwiGLU

Gemma 4 (E2B)

Google · 2026-04

View details

Scale

5.1B

Context

128K

DecoderDense

AttentionMQA + QK-Norm + SWA

Layers—

Effective 2.3B parametersMQA efficiencyOn-device

GQA + QK-Norm + SWA·SwiGLU

Gemma 4 (E4B)

Google · 2026-04

View details

Scale

Context

128K

DecoderDense

AttentionGQA + QK-Norm + SWA

Layers—

Effective 4.5B parametersDistilledEfficient

MLA + Sparse Attention·MoE · 40B active

GLM-5.1

Zhipu AI · 2026-04

View details

Scale

40B / 744B

Context

202K

DecoderMoE

AttentionMLA + Sparse Attention

Layers78

GLM-5 refreshMLA + DeepSeek Sparse Attention40B active

MLA·MoE · 6.63B active

Mistral Small 4

Mistral · 2026-03

View details config.json Tech report

Scale

6.63B / 119B

Context

256K

DecoderMoE

AttentionMLA

Layers—

Multi-head Latent AttentionExpert routingLayer mix: 36 MLA

GQA + only 4 attention layers·SwiGLU

Nemotron 3 Nano 4B

NVIDIA · 2026-03

View details config.json Tech report

Scale

Context

262K

DecoderHybrid

AttentionGQA + only 4 attention layers

Layers42

Grouped Query AttentionLayer mix: 4 GQA + 21 Mamba-2 + 17 FFNKV cache: 16 KiB/token

Mostly Mamba-2 + a few GQA layers·MoE · 12B active

Nemotron 3 Super 120B-A12B

NVIDIA · 2026-03

View details config.json Tech report

Scale

12B / 120B

Context

DecoderMoE

AttentionMostly Mamba-2 + a few GQA layers

Layers88

Grouped Query AttentionExpert routingLayer mix: 8 GQA + 40 Mamba-2 + 40 MoE

MLA + KV LayerNorm + NoPE + RoPE·MoE · 10.3B active

Sarvam 105B

Unknown · 2026-03

View details config.json Tech report

Scale

10.3B / 105B

Context

131K

DecoderMoE

AttentionMLA + KV LayerNorm + NoPE + RoPE

Layers32

Multi-head Latent AttentionExpert routingLayer mix: 32 MLA

GQA + QK-Norm·MoE · 2.4B active

Sarvam 30B

Unknown · 2026-03

View details config.json Tech report

Scale

2.4B / 30B

Context

131K

DecoderMoE

AttentionGQA + QK-Norm

Layers19

Grouped Query AttentionQK normalizationExpert routing

Mostly Mamba-2 + GQA·SwiGLU

Nemotron 3 Super

NVIDIA · 2026-03

View details

Scale

12B / 120B

Context

DecoderHybrid

AttentionMostly Mamba-2 + GQA

Layers—

Mamba-2 SSM1M context12B active hybrid

MLA + DeepSeek Sparse Attention·MoE · 40B active

GLM-5 744B

Zhipu AI · 2026-02

View details config.json Tech report

Scale

40B / 744B

Context

203K

DecoderMoE

AttentionMLA + DeepSeek Sparse Attention

Layers78

Multi-head Latent AttentionExpert routingLayer mix: 78 MLA

Lightning Attention plus MLA·MoE · 63B active

Ling 2.5 1T

Unknown · 2026-02

View details config.json

Scale

63B / 1T

Context

256K

DecoderMoE

AttentionLightning Attention plus MLA

Layers80

Multi-head Latent AttentionLayer mix: 10 MLA + 70 Lightning AttentionKV cache: 11.2 KiB/token

GQA + QK-Norm·MoE · 10B active

MiniMax M2.5 230B

Unknown · 2026-02

View details config.json

Scale

10B / 230B

Context

197K

DecoderMoE

AttentionGQA + QK-Norm

Layers62

Grouped Query AttentionQK normalizationLayer mix: 62 GQA

GQA·SwiGLU

Nanbeige 4.1 3B

Unknown · 2026-02

View details config.json Tech report

Scale

Context

262K

DecoderDense

AttentionGQA

Layers32

Grouped Query AttentionLayer mix: 32 GQAKV cache: 64 KiB/token

3:1 Gated DeltaNet + Gated Attn·MoE · 17B active

Qwen3.5 397B

Alibaba · 2026-02

View details config.json

Scale

17B / 397B

Context

262K

DecoderMoE

Attention3:1 Gated DeltaNet + Gated Attn

Layers—

Expert routingLayer mix: 15 gated attention + 45 DeltaNetKV cache: 30 KiB/token

GQA + 3:1 SWA attention·MoE · 11B active

Step 3.5 Flash 196B

Unknown · 2026-02

View details config.json Tech report

Scale

11B / 196B

Context

262K

DecoderMoE

AttentionGQA + 3:1 SWA attention

Layers45

Grouped Query AttentionSliding Window AttentionLayer mix: 36 sliding-window + 12 global

GQA + 3:1 SWA attention·SwiGLU

Tiny Aya 3.35B

Cohere · 2026-02

View details config.json Tech report

Scale

3.35B

Context

DecoderDense

AttentionGQA + 3:1 SWA attention

Layers—

Grouped Query AttentionSliding Window AttentionRoPE embeddings

MLA + Sparse Attention·MoE · 40B active

GLM-5

Zhipu AI · 2026-02

View details

Scale

40B / 744B

Context

202K

DecoderMoE

AttentionMLA + Sparse Attention

Layers—

MLA adoptionSparse attention40B active

GQA + SWA·MoE · 11B active

Step 3.5 Flash

StepFun · 2026-02

View details

Scale

11B / 196B

Context

262K

DecoderMoE

AttentionGQA + SWA

Layers—

Fast inference MoESWA11B active

GQA·SwiGLU

Nanbeige 4.1

Nanbeige · 2026-02

View details

Scale

Context

262K

DecoderDense

AttentionGQA

Layers—

Ultra-compact262K contextChinese-focused

GQA + QK-Norm·MoE · 10B active

MiniMax-M2.5

MiniMax · 2026-02

View details

Scale

10B / 230B

Context

196K

DecoderMoE

AttentionGQA + QK-Norm

Layers—

M2 upgrade10B activeImproved routing

GQA + SWA + NoPE·SwiGLU

Tiny Aya

Cohere · 2026-02

View details

Scale

3.35B

Context

8,192

DecoderDense

AttentionGQA + SWA + NoPE

Layers—

No positional embeddingsMassively multilingualCompact

Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance across dense transformers, MoE, hybrid, and recurrent models.

Request demo Explore platform

LLM Architectures

Architecture Diff Tool

Search and filter

Llama 3.2 3B

Gemma 4 26B-A4B

Gemma 4 31B

Gemma 4 (E2B)

Gemma 4 (E4B)

GLM-5.1

Mistral Small 4

Nemotron 3 Nano 4B

Nemotron 3 Super 120B-A12B

Sarvam 105B

Sarvam 30B

Nemotron 3 Super

GLM-5 744B

Ling 2.5 1T

MiniMax M2.5 230B

Nanbeige 4.1 3B

Qwen3.5 397B

Step 3.5 Flash 196B

Tiny Aya 3.35B

GLM-5

Step 3.5 Flash

Nanbeige 4.1

MiniMax-M2.5

Tiny Aya

Compare, evaluate, and deploy LLM architectures at scale

Discover agents, MCP servers, and skills in one governed surface

LLM ArchitecturesLLM Architectures

Architecture Diff Tool

Search and filter

Llama 3.2 3B

Gemma 4 26B-A4B

Gemma 4 31B

Gemma 4 (E2B)

Gemma 4 (E4B)

GLM-5.1

Mistral Small 4

Nemotron 3 Nano 4B

Nemotron 3 Super 120B-A12B

Sarvam 105B

Sarvam 30B

Nemotron 3 Super

GLM-5 744B

Ling 2.5 1T

MiniMax M2.5 230B

Nanbeige 4.1 3B

Qwen3.5 397B

Step 3.5 Flash 196B

Tiny Aya 3.35B

GLM-5

Step 3.5 Flash

Nanbeige 4.1

MiniMax-M2.5

Tiny Aya

Compare, evaluate, and deploy LLM architectures at scale

Discover agents, MCP servers, and skills in one governed surface

LLM Architectures