MoE
StepFun · 2026-02
Step 3.5 Flash
MoE decoder architecture with GQA + SWA attention mechanism.
Step 3.5 Flash decoder block architecture: Attention: GQA + SWA with Sliding Window Attention. Normalization: RMSNorm. FFN: Mixture of Experts (11B active parameters). Position encoding: RoPE. Scale: 196B, 262K context, 96 layers. Decoder type: MoE.
GQA + SWA·MoE · 11B active
11B active / 196B total|262K context|GQA + SWA|MoE
Architecture Specifications
Parameters11B active / 196B total
Context Window262K
Decoder TypeMoE
AttentionGQA + SWA
Active Parameters11B
Release Date2026-02
CategoryMixture of Experts
OrganizationStepFun
Key Features
Fast inference MoESWA11B active
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.