On your hardware, Gemma 4 31B and Gemma 4 26B-A4B are the best-performing models that fit comfortably. Qwen3.5-122B-A10B is the biggest that fits and has the best tool-calling. The massive models (Kimi, GLM-5, DeepSeek) lead in some categories — but they're API-only for most setups.
14 Models
7 Families
18 Benchmarks
5 GB–500 GB Q4 Range
Your RAM:
Benchmark Scores
Every model shows Q4 size and fit indicator for your selected RAM. Highest per benchmark in gold.
Reasoning & Knowledge
Model
MMLU-Pro
GPQA Diamond
BigBench-EH
IFBench
GLM-5~370GB Q4
87.1
86
—
—
Qwen3.5-27B~15GB Q4
86.1
85.5
—
—
Gemma 4 31B~20GB Q4
85.2
84.3
74.4
—
Gemma 4 26B-A4B~18GB Q4
82.6
82.3
64.8
—
Qwen3.5-9B~5.1GB Q4
82.5
81.7
—
—
GPT-oss 120B~60GB Q4
80.8
80.1
—
—
Qwen3.5-397B-A17B~199GB Q4
—
88.4
—
76.5
Kimi K2.5~500GB Q4
—
87.6
—
94
DeepSeek V3.2~340GB Q4
—
79.9
—
—
Mathematics
Model
AIME 2025/2026
MATH-500
HMMT Feb 2025
Kimi K2.5~500GB Q4
96.1(AIME 2025)
98
—
GLM-5~370GB Q4
95.7(AIME 2025)
—
—
Qwen3.5-397B-A17B~199GB Q4
91.3(AIME 2026)
—
—
DeepSeek V3.2~340GB Q4
89.3(AIME 2025)
—
—
Gemma 4 31B~20GB Q4
89.2(AIME 2026)
—
—
Gemma 4 26B-A4B~18GB Q4
88.3(AIME 2026)
—
—
Gemma 4 E4B~5GB Q4
42.5(AIME 2026)
—
—
Qwen3.5-9B~5.1GB Q4
—
—
83.2
Coding
Model
LiveCodeBench v6
SWE-bench
HumanEval
Codeforces ELO
Terminal-Bench 2.0
MiMo-V2-Flash~155GB Q4
87
73.4
—
—
—
Kimi K2.5~500GB Q4
85
76.8
99
—
—
Qwen3.5-397B-A17B~199GB Q4
83.6
76.4
—
—
52.5
Qwen3.5-9B~5.1GB Q4
82.7
—
—
—
—
Gemma 4 31B~20GB Q4
80
—
—
2150
—
Gemma 4 26B-A4B~18GB Q4
77.1
—
—
1718
—
GLM-5~370GB Q4
52
77.8
—
—
—
Vision / Multimodal
Model
MMMU
MMMU-Pro
MathVision
OmniDocBench
Qwen3.5-397B-A17B~199GB Q4
85
—
88.6
90.8
Gemma 4 31B~20GB Q4
—
76.9
85.6
—
Gemma 4 26B-A4B~18GB Q4
—
73.8
82.4
—
Qwen3.5-9B~5.1GB Q4
—
70.1
—
—
Agentic
Model
Tau2-Bench
BrowseComp
BFCL-V4 (Tool Use)
Qwen3.5-397B-A17B~199GB Q4
86.7
78.6
—
Qwen3.5-122B-A10B~65GB Q4
—
—
72.2
Benchmark Version Warning: Qwen 3.5 and Gemma 4 report on AIME 2026 / LiveCodeBench v6. Kimi K2.5, GLM, DeepSeek often report on AIME 2025 / earlier versions. Treat cross-family comparisons as directional.
On your 128 GB hardware, Qwen3.5-122B-A10B is the best-performing model that fits (~65 GB Q4). Kimi K2.5, Qwen3.5-397B-A17B, MiMo-V2-Flash lead in some categories but are API-only at 128 GB.
Multi-agent combo: Qwen3.5-27B + Gemma 4 31B + Qwen3.5-9B = ~40 GB total. Leaves 88 GB for KV caches and OS. For ceiling performance: Kimi K2.5, Qwen3.5-397B-A17B, MiMo-V2-Flash via API.
Benchmark data compiled April 2026 from official model papers, Artificial Analysis, and LMSYS Arena. Qwen 3.5 and Gemma 4 report on AIME 2026 / LiveCodeBench v6. Kimi K2.5, GLM, DeepSeek report on AIME 2025 / earlier versions. Cross-family comparisons are directional.