🟡 📦 Open Source Published: · 1 min read ·

vLLM: Semantic Router Fusion Combines a Model Panel with a Judge That Synthesizes a Single Response

Editorial illustration: panel of AI models and a judge model synthesizing a single response

vLLM introduced Semantic Router Fusion, a primitive in which multiple models work in parallel as a panel, and a judge model analyzes consensus and differences to synthesize a single response. It supports local vLLM and private endpoints as well as public providers such as Gemini, Kimi, DeepSeek, and Claude. External validation on OpenRouter DRACO showed 69% for the fused panel versus 65.3% for the best single model, with full OpenAI API compatibility.

🤖

This article was generated using artificial intelligence from primary sources.

vLLM, the popular large language model serving library, introduced Semantic Router Fusion, a mechanism that combines multiple models into a single response.

How does Fusion work?

Fusion is a primitive in which a panel of models runs in parallel, and a dedicated judge model then analyzes the consensus and differences among the responses and synthesizes a single final output. The pipeline has clear steps: panel execution, judge analysis, synthesis, and trace recording (tokens, errors, route). The approach resembles a “council” of models making a better decision than any individual member.

Which models and interfaces does it support?

Fusion supports local vLLM instances and private endpoints, as well as public providers such as Gemini, Kimi, DeepSeek, and Claude. It offers three input routes (vllm-sr/auto, vllm-sr/fusion, and a request-level plugin) with full OpenAI API compatibility, fitting into existing integrations without major code changes.

How much does it improve results?

External validation on the OpenRouter DRACO benchmark showed 69% for the fused panel versus 65.3% for the best single model. The gain confirms the idea that aggregating multiple models with a judge can outperform each model individually, which is useful for tasks where accuracy matters more than latency and cost.

Frequently Asked Questions

How does Semantic Router Fusion work?
A panel of models runs in parallel; a judge model analyzes consensus and differences and synthesizes a single final response.
How much better is the fused panel?
69% on OpenRouter DRACO validation versus 65.3% for the best single model.