Skip to content
MoEVerified
Unknown · 2026-02

Step 3.5 Flash 196B

Throughput-oriented MoE model that stays competitive with much larger DeepSeek-style systems.

Step 3.5 Flash 196B decoder block architecture: Attention: GQA + 3:1 SWA attention with Sliding Window Attention. Normalization: RMSNorm. FFN: Mixture of Experts (11B active parameters). Position encoding: RoPE. Scale: 196B, 262K context, 45 layers. Decoder type: MoE.

GQA + 3:1 SWA attention·MoE · 11B active
11B active / 196B total|262K context|GQA + 3:1 SWA attention|MoE

Architecture Specifications

Parameters11B active / 196B total
Context Window262K
Decoder TypeMoE
AttentionGQA + 3:1 SWA attention
Active Parameters11B
Layers45
Hidden Size4,096
Vocabulary Size129K
Release Date2026-02
CategoryMixture of Experts
OrganizationUnknown

Key Features

Grouped Query AttentionSliding Window AttentionLayer mix: 36 sliding-window + 12 globalKV cache: 192 KiB/token

Deep Dive

Overview

Step 3.5 Flash is a 196 B total / 11 B active parameter sparse MoE from StepFun AI, released February 2026. The 'Flash' suffix matches the positioning of Xiaomi MiMo-V2-Flash: low-latency serving at MoE scale. Step 3.5 Flash uses a 3:1 sliding-window to global attention ratio with 36 local + 12 global layers across 48 total — a moderate global budget similar to OLMo 3 but with much larger total parameter count.

Architecture at a Glance

ParameterValueNotes
Total parameters≈ 196 BMoE
Active parameters≈ 11 Bper token
Layers4836 sliding-window + 12 global
AttentionGQA + 3:1 SWA
KV cache≈ 192 KiB/token
Max position262,144256 K native
Vocabulary≈ 129,000
Precisionbfloat16
Step 3.5 Flash 196B configuration (source: HuggingFace config.json)

3:1 Local-Global

Step 3.5 Flash's 3:1 ratio sits between GPT-OSS 120B's 1:1 (very global-heavy) and Gemma 3 / MiMo-V2's 5:1 (very global-sparse). At this ratio roughly 25% of layers attend to the full context, which preserves long-range mixing at every fourth layer while keeping per-token attention compute manageable. The 11 B active compute budget makes Step 3.5 Flash serve at roughly the speed of a dense 11 B decoder.

Verdict: StepFun's Flash Tier

Step 3.5 Flash is StepFun's answer to MiniMax M2 and Xiaomi MiMo-V2-Flash — a latency-optimized MoE in the 200 B total-parameter band. The 3:1 local-global ratio is a middle-ground choice and the overall design is conservative. For Chinese-ecosystem deployments where StepFun is a known vendor, this is the default 'fast MoE' pick.

References

Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Catalog Workspace

Discover agents, MCP servers, and skills in one governed surface

Use structured catalog views to compare readiness, ownership, integrations, and deployment posture before rollout.