What is the best local model for: Best overall local model?

Gemma 4 31B. AIME 89.2, GPQA 84.3, Codeforces 2150

What is the best local model for: Fastest quality inference?

Gemma 4 26B-A4B. 97% of 31B quality, ~300 t/s on M-series

What is the best local model for: Best tool/function calling?

Qwen3.5-122B-A10B. BFCL-V4: 72.2 — beats GPT-5 mini by 30%

What is the best local model for: Best sub-10B (side agent)?

Qwen3.5-9B. Beats 13x larger GPT-oss-120B

What is the best local model for: Best multilingual / CJK?

Qwen3.5-27B. 250K vocab, 201 langs, best Cantonese

What is the best local model for: Fastest prototyping?

Qwen3.5-35B-A3B. 3B active = blazing; beats prev-gen 235B

What is the best local model for: Audio / voice input?

Gemma 4 E4B. Only open model with native audio

What is the best local model for: Fine-tuning base?

Gemma 4 31B. Dense = best for QLoRA customization

What is the best API model for: Best vision/docs?

Qwen3.5-397B-A17B. MMMU 85, OmniDoc 90.8, MathVision 88.6

What is the best API model for: Best pure coding?

Kimi K2.5. HumanEval 99%, free API tier

What is the best API model for: Best SWE-bench?

GLM-5. SWE-bench 77.8% — #1 open model

What is the best API model for: Cheapest frontier?

DeepSeek V3.2. $0.28/M, frontier reasoning

What is the best API model for: Cheapest coding?

MiMo-V2-Flash. $0.10/M, LCB 87%, built for agents

What is the best API model for: Max context (10M)?

Llama 4 Scout. 10M tokens — nothing else close

Open Model Showdown — Qwen 3.5 vs Gemma 4

Your RAM:

Benchmark Scores

Every model shows Q4 size and fit indicator for your selected RAM. Highest per benchmark in gold.

Reasoning & Knowledge

Model	MMLU-Pro	GPQA Diamond	BigBench-EH	IFBench
GLM-5~370GB Q4	87.1	86	—	—
Qwen3.5-27B~15GB Q4	86.1	85.5	—	—
Gemma 4 31B~20GB Q4	85.2	84.3	74.4	—
Gemma 4 26B-A4B~18GB Q4	82.6	82.3	64.8	—
Qwen3.5-9B~5.1GB Q4	82.5	81.7	—	—
GPT-oss 120B~60GB Q4	80.8	80.1	—	—
Qwen3.5-397B-A17B~199GB Q4	—	88.4	—	76.5
Kimi K2.5~500GB Q4	—	87.6	—	94
DeepSeek V3.2~340GB Q4	—	79.9	—	—

Mathematics

Model	AIME 2025/2026	MATH-500	HMMT Feb 2025
Kimi K2.5~500GB Q4	96.1(AIME 2025)	98	—
GLM-5~370GB Q4	95.7(AIME 2025)	—	—
Qwen3.5-397B-A17B~199GB Q4	91.3(AIME 2026)	—	—
DeepSeek V3.2~340GB Q4	89.3(AIME 2025)	—	—
Gemma 4 31B~20GB Q4	89.2(AIME 2026)	—	—
Gemma 4 26B-A4B~18GB Q4	88.3(AIME 2026)	—	—
Gemma 4 E4B~5GB Q4	42.5(AIME 2026)	—	—
Qwen3.5-9B~5.1GB Q4	—	—	83.2

Coding

Model	LiveCodeBench v6	SWE-bench	HumanEval	Codeforces ELO	Terminal-Bench 2.0
MiMo-V2-Flash~155GB Q4	87	73.4	—	—	—
Kimi K2.5~500GB Q4	85	76.8	99	—	—
Qwen3.5-397B-A17B~199GB Q4	83.6	76.4	—	—	52.5
Qwen3.5-9B~5.1GB Q4	82.7	—	—	—	—
Gemma 4 31B~20GB Q4	80	—	—	2150	—
Gemma 4 26B-A4B~18GB Q4	77.1	—	—	1718	—
GLM-5~370GB Q4	52	77.8	—	—	—

Vision / Multimodal

Model	MMMU	MMMU-Pro	MathVision	OmniDocBench
Qwen3.5-397B-A17B~199GB Q4	85	—	88.6	90.8
Gemma 4 31B~20GB Q4	—	76.9	85.6	—
Gemma 4 26B-A4B~18GB Q4	—	73.8	82.4	—
Qwen3.5-9B~5.1GB Q4	—	70.1	—	—

Agentic

Model	Tau2-Bench	BrowseComp	BFCL-V4 (Tool Use)
Qwen3.5-397B-A17B~199GB Q4	86.7	78.6	—
Qwen3.5-122B-A10B~65GB Q4	—	—	72.2

Benchmark Version Warning: Qwen 3.5 and Gemma 4 report on AIME 2026 / LiveCodeBench v6. Kimi K2.5, GLM, DeepSeek often report on AIME 2025 / earlier versions. Treat cross-family comparisons as directional.

On your 128 GB hardware, Qwen3.5-122B-A10B is the best-performing model that fits (~65 GB Q4). Kimi K2.5, Qwen3.5-397B-A17B, MiMo-V2-Flash lead in some categories but are API-only at 128 GB.

Multi-agent combo: Qwen3.5-27B + Gemma 4 31B + Qwen3.5-9B = ~40 GB total. Leaves 88 GB for KV caches and OS. For ceiling performance: Kimi K2.5, Qwen3.5-397B-A17B, MiMo-V2-Flash via API.

Benchmark data compiled April 2026 from official model papers, Artificial Analysis, and LMSYS Arena. Qwen 3.5 and Gemma 4 report on AIME 2026 / LiveCodeBench v6. Kimi K2.5, GLM, DeepSeek report on AIME 2025 / earlier versions. Cross-family comparisons are directional.