🤖 Models

5 articles

🟡 🤖 Models April 14, 2026 · 2 min read

ArXiv: Process Reward Agents — real-time feedback improves AI reasoning in medicine without retraining

Researchers have introduced Process Reward Agents (PRA), a new approach that provides step-by-step feedback during AI reasoning in medical domains. The system works with existing models without retraining and achieves significant results on medical benchmarks.

🟡 🤖 Models April 13, 2026 · 1 min read

ArXiv PRA: 4B model achieves 80.8% on medical benchmark — new SOTA for small scale

Process Reward Agents enable small frozen models (0.5B-8B) to significantly improve medical reasoning without any training — Qwen3-4B achieves a new state-of-the-art of 80.8% on MedQA.

🟡 🤖 Models April 13, 2026 · 1 min read

ArXiv SPPO: Sequence-level PPO solves the credit assignment problem in long reasoning chains

Sequence-Level PPO reformulates LLM reasoning as a contextual bandit problem, achieving the performance of expensive group methods like GRPO with dramatically fewer resources — without multi-sampling.

🟡 🤖 Models April 11, 2026 · 2 min read

ArXiv SUPERNOVA: reinforcement learning on natural instructions improves reasoning by 52.8%

A new paper, SUPERNOVA, shows that systematic curation of existing instruction-tuning datasets can significantly improve reasoning in LLMs. Models trained on SUPERNOVA achieve up to a 52.8% relative improvement on the BBEH benchmark.

🟢 🤖 Models April 10, 2026 · 2 min read

Sentence Transformers v5.4 adds support for multimodal embedding and reranker models

HuggingFace's Sentence Transformers library has received version 5.4, which introduces multimodal embedding and reranker models. Users can now map text, images, audio and video into a shared embedding space and perform cross-modal similarity — a unification of search across different content types.