arXiv:2605.03195: Terminus-4B — 4 billion parameters for terminal execution matches Claude Opus and GPT-5.3-Codex on SWE-Bench Pro with ~30% fewer main agent tokens
Terminus-4B is a 4-billion-parameter Qwen3 fine-tune specialized for terminal execution in agentic systems — on the SWE-Bench Pro benchmark it matches and sometimes outperforms Claude Sonnet/Opus and GPT-5.3-Codex baselines, while reducing main agent token consumption by approximately 30% by isolating verbose build/test logs in a subagent context.