How many papers received an AI review at AAAI-26?

All 22,977 papers submitted to the main conference track received one AI-generated review alongside the regular human reviews.

ArXiv: AAAI-26 Conducted AI Reviews on 22,977 Papers — Reviewers Rated Them Higher Than Human Reviews

Q: Were the AI reviews better than human reviews?

According to a program committee survey, AI reviews were rated higher for technical accuracy and the quality of research suggestions — but they worked alongside human reviews, not in place of them.

What Exactly Happened at AAAI-26?

AAAI-26 (Association for the Advancement of Artificial Intelligence) — one of the world’s most important artificial intelligence conferences — conducted an unprecedented experiment. All 22,977 papers submitted to the main track received one AI-generated review alongside the standard human reviews. The AI reviews were clearly labeled so that reviewers and authors knew they came from a machine.

The system used advanced large language models (LLMs) with tool integration and safety measures, and all reviews were generated within a single day — drastically faster than the human process, which typically takes weeks.

The Surprising Result: AI Outperformed Humans

According to a survey of program committee members and paper authors, AI reviews were rated higher than human reviews in two key categories: technical accuracy and the quality of research suggestions.

This does not mean that AI reviews are perfect or that they can replace human reviewers. The experiment was designed as a supplement, not a replacement — each paper still goes through the standard human review process. However, the fact that participants found AI feedback more useful than the average human review raises important questions about the future of academic publishing.

The researchers also developed a new evaluation benchmark that shows the system significantly outperforms a baseline LLM approach in identifying scientific weaknesses — suggesting that a specialized tool-assisted approach yields better results than simply sending a paper to a language model.

Why Does This Matter for the Academic Community?

Academic publishing faces a growing problem: the number of conference submissions is growing exponentially, while the number of qualified reviewers is not keeping pace. The result is superficial reviews, long waits, and inconsistent standards.

AI reviews do not solve the problem entirely, but they can serve as a first filter that gives authors quick, technical feedback while they await human reviews. For program committees, AI can identify obvious problems in papers — from mathematical errors to missing references — freeing human reviewers for deeper analytical tasks.

The paper’s authors — Joydeep Biswas, Sheila Schoepp, and Gautham Vasan — conclude that “state-of-the-art AI methods can already significantly contribute to scientific review at conference scale,” pointing future research toward improved human–AI collaboration in research evaluation.

ArXiv: AAAI-26 Conducted AI Reviews on 22,977 Papers — Reviewers Rated Them Higher Than Human Reviews

What Exactly Happened at AAAI-26?

The Surprising Result: AI Outperformed Humans

Why Does This Matter for the Academic Community?

Sources

Related news