arXiv:2605.06651: AI Co-Mathematician sets FrontierMath record

The Google DeepMind team has published a paper on the AI Co-Mathematician, an interactive workspace where agents collaborate with mathematicians on open problems. The system achieved 48% on the FrontierMath Tier 4 benchmark — a new record among all AI systems.

A team of Google DeepMind researchers published on 7 May 2026 a paper titled “AI Co-Mathematician: Accelerating Mathematicians with Agentic AI” on arXiv. The system is an interactive workspace where AI agents collaborate with mathematicians on open research problems — not an autonomous theorem prover.

What is the AI Co-Mathematician?

The system functions as an interactive research workspace designed to support open-ended mathematical inquiry. It covers five primary dimensions of work: ideation (conceptual development), literature search, computational exploration, theorem proving and theory building. The authors describe the design as “holistic support for the exploratory and iterative reality of mathematical workflows”, with a collaboration model that “mirrors human collaborative processes” — the emphasis is on partnership, not automation.

How does the workspace technically operate?

The workspace is asynchronous and stateful: the agent can work on hypotheses in the background while the researcher does something else, and context persists across sessions. The system performs four operational functions: uncertainty management, refinement of user intent, tracking of failed hypotheses so the same attempts are not repeated, and generation of mathematical artefacts in standard formats (LaTeX, Lean proofs, computational notebooks).

What does 48% on FrontierMath Tier 4 mean?

FrontierMath is a benchmark of closed, unpublished problems constructed by PhD-level mathematicians; Tier 4 is the hardest level and requires research-level mathematics, not just olympiad-level. A score of 48% represents a new record among all AI systems evaluated to date — a significant jump over previously published results that were considerably lower. The authors note that early testing with selected mathematicians has already helped solve open problems, suggesting that the benchmark number corresponds to real utility in research.

What does this change for the mathematical community?

The paper positions AI not as a replacement for the researcher but as a partner that accelerates the research cycle. Tracking of failed hypotheses and asynchrony mean that a mathematician can delegate exploration and return to results — a pattern similar to how agentic development tools are used in software. The open question the paper does not address is whether the system will be made publicly available or remain an internal Google research tool. Among the 18 authors are Daniel Zheng, Ingrid von Glehn, Yori Zwols, Pushmeet Kohli and Fernanda Viegas.

Frequently Asked Questions

What is FrontierMath Tier 4?

FrontierMath is a benchmark of hundreds of exceptionally hard mathematical problems; Tier 4 is the highest level, requiring research-level mathematics at PhD standard. Previous systems scored well below 48%.

Who are the authors of the paper?

A Google DeepMind team led by Daniel Zheng, Ingrid von Glehn, Yori Zwols, Pushmeet Kohli and Fernanda Viegas, 18 authors in total.

Is the system publicly available?

The paper describes early testing with selected mathematicians; public availability or an API have not been announced in the abstract.

arXiv:2605.06651: Google DeepMind introduces AI Co-Mathematician with 48% on FrontierMath Tier 4

What is the AI Co-Mathematician?

How does the workspace technically operate?

What does 48% on FrontierMath Tier 4 mean?

What does this change for the mathematical community?

Frequently Asked Questions

Sources

Related news