ArXiv PRA: 4B model achieves 80.8% on medical benchmark β new SOTA for small scale
Process Reward Agents enable small frozen models (0.5B-8B) to significantly improve medical reasoning without any training β Qwen3-4B achieves a new state-of-the-art of 80.8% on MedQA.