ArXiv PRA: 4B model achieves 80.8% on medical benchmark — new SOTA for small scale
Process Reward Agents enable small frozen models (0.5B-8B) to significantly improve medical reasoning without any training — Qwen3-4B achieves a new state-of-the-art of 80.8% on MedQA.