arXiv:2605.13301 SU-01: 30B model reaches gold-medal level at IMO 2025, USAMO 2026, and IPhO through three-phase training
SU-01 is a new reasoning training methodology published on May 14, 2026 on arXiv (Yafu Li and 27 co-authors, corresponding author Runzhe Zhan). A 30B parameter A3B backbone reaches gold-medal performance on the International Mathematical Olympiad 2025, USAMO 2026, and International Physics Olympiad 2024-2025 through three sequential phases: reverse-perplexity curriculum SFT on 340K trajectories, two-stage RL, and test-time scaling. Reasoning chains reach 100K+ tokens.