🤖 24 AI
🟡 🤖 Models Saturday, April 11, 2026 · 2 min read

ArXiv SUPERNOVA: reinforcement learning on natural instructions improves reasoning by 52.8%

Why it matters

A new paper, SUPERNOVA, shows that systematic curation of existing instruction-tuning datasets can significantly improve reasoning in LLMs. Models trained on SUPERNOVA achieve up to a 52.8% relative improvement on the BBEH benchmark.

Leveraging existing data for better reasoning

Researchers have published SUPERNOVA — a framework that shows existing instruction-tuning datasets contain “rich reasoning patterns” that can be systematically adapted for reinforcement learning. The result: a relative improvement of up to 52.8% on the BBEH benchmark compared to strong baselines such as Qwen3.5.

Why is this important?

There are currently two approaches to improving reasoning in LLMs:

  1. Synthetic data generation — generate new examples and train on them (expensive)
  2. Human-curated data — experts write new examples (expensive and slow)

SUPERNOVA demonstrates a third way: use the data you already have (instruction-tuning sets) but systematically prepare it for RL with verifiable rewards. This is significantly cheaper and faster.

Methodology

The authors conducted more than 100 controlled experiments analyzing three key factors:

  1. Source task selection — which tasks best transfer knowledge to the target domain
  2. Task mixing strategies — optimal combinations of training data
  3. Synthetic interventions — targeted modifications to improve data quality

The key finding: selecting tasks by individual target performance outperforms strategies that use averages. In other words, do not go for a “balanced” approach — choose tasks that concretely help your goal.

Performance

Testing was conducted on several challenging benchmarks:

  • BBEH — complex multi-step reasoning
  • Zebralogic — logical inference
  • MMLU-Pro — extended knowledge across domains

Code and data are publicly available on GitHub, which means other research groups can reproduce and build on the results.

Broader implications

The “use what exists, don’t create new” trend is important for the democratization of AI research. You don’t need the billion-dollar budget of OpenAI or Anthropic — you can significantly improve reasoning using datasets that already exist on HuggingFace and other platforms.

For small AI labs and open-source projects, the SUPERNOVA approach could be what brings them closer to the performance of frontier models.

🤖 This article was generated using artificial intelligence from primary sources.