vLLM: open-source inference engine takes first place on the Artificial Analysis leaderboard
vLLM is an open-source inference engine that claimed first place on the Artificial Analysis leaderboard for three frontier models — DeepSeek V3.2, MiniMax-M2.5, and Qwen 3.5 397B — through aggressive kernel fusion (33→10 launches per layer, 1.28× speedup), a custom EAGLE3 draft model for speculative decoding, and linear attention path optimizations.