Step 3.5 Flash 196B

Architecture Specifications

Parameters11B active / 196B total

Context Window262K

Decoder TypeMoE

AttentionGQA + 3:1 SWA attention

Active Parameters11B

Layers45

Hidden Size4,096

Vocabulary Size129K

Release Date2026-02

CategoryMixture of Experts

OrganizationUnknown

Key Features

Grouped Query AttentionSliding Window AttentionLayer mix: 36 sliding-window + 12 globalKV cache: 192 KiB/token

Deep Dive

Overview

Step 3.5 Flash is a 196 B total / 11 B active parameter sparse MoE from StepFun AI, released February 2026. The 'Flash' suffix matches the positioning of Xiaomi MiMo-V2-Flash: low-latency serving at MoE scale. Step 3.5 Flash uses a 3:1 sliding-window to global attention ratio with 36 local + 12 global layers across 48 total — a moderate global budget similar to OLMo 3 but with much larger total parameter count.

Architecture at a Glance

Parameter	Value	Notes
Total parameters	≈ 196 B	MoE
Active parameters	≈ 11 B	per token
Layers	48	36 sliding-window + 12 global
Attention	GQA + 3:1 SWA
KV cache	≈ 192 KiB/token
Max position	262,144	256 K native
Vocabulary	≈ 129,000
Precision	bfloat16

Step 3.5 Flash 196B configuration (source: HuggingFace config.json)

3:1 Local-Global

Step 3.5 Flash's 3:1 ratio sits between GPT-OSS 120B's 1:1 (very global-heavy) and Gemma 3 / MiMo-V2's 5:1 (very global-sparse). At this ratio roughly 25% of layers attend to the full context, which preserves long-range mixing at every fourth layer while keeping per-token attention compute manageable. The 11 B active compute budget makes Step 3.5 Flash serve at roughly the speed of a dense 11 B decoder.

Verdict: StepFun's Flash Tier

Step 3.5 Flash is StepFun's answer to MiniMax M2 and Xiaomi MiMo-V2-Flash — a latency-optimized MoE in the 200 B total-parameter band. The 3:1 local-global ratio is a middle-ground choice and the overall design is conservative. For Chinese-ecosystem deployments where StepFun is a known vendor, this is the default 'fast MoE' pick.

References

Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Request demo Back to gallery

Step 3.5 Flash 196BStep 3.5 Flash 196B