ArXiv SUPERNOVA: reinforcement learning on natural instructions improves reasoning by 52.8%
A new paper, SUPERNOVA, shows that systematic curation of existing instruction-tuning datasets can significantly improve reasoning in LLMs. Models trained on SUPERNOVA achieve up to a 52.8% relative improvement on the BBEH benchmark.