arXiv:2605.05191: LongSeeker achieves 61.5% on BrowseComp

Researchers introduced LongSeeker — a long-horizon search agent using the Context-ReAct framework with five dynamic context management operations. The model achieves 61.5% on the BrowseComp benchmark, outperforming Tongyi DeepResearch by 18 percentage points.

A team from Chinese universities (authors: Yijun Lu, Rui Ye, Yuwen Du, Jiajun Wang, Songhua Liu, Siheng Chen) published on May 6, 2026, the paper arXiv:2605.05191 introducing LongSeeker, a long-horizon search agent based on the Context-ReAct framework.

Five dynamic operations over the working context

The core idea of the Context-ReAct framework is that the agent must not treat the entire trajectory equally. The paper states that “parts of the trajectory are maintained at different levels of detail” depending on their relevance to the current step. The framework introduces five operations over the working context:

Skip — bypassing irrelevant steps.
Compress — summarizing longer segments into shorter representations.
Rollback — returning to an earlier trajectory node if the current branch does not lead to the goal.
Snippet — retaining a concrete excerpt from a retrieved page.
Delete — removing erroneous or outdated content from the context.

Each of these operations protects the agent from context overflow — a chronic problem in agentic systems operating over a longer sequence of steps.

Performance and comparison

LongSeeker is fine-tuned on a Qwen3-30B-A3B base using 10,000 synthesized trajectories. On the BrowseComp benchmark it achieves 61.5%, and on the Chinese variant BrowseComp-ZH 62.5%. Competitors fall significantly short: Tongyi DeepResearch achieves 43.2% and 46.7% respectively, while AgentFold reaches 36.2% and 47.3%. The margin of over 18 percentage points over Tongyi DeepResearch is the largest gap currently recorded on BrowseComp.

Why does this matter?

Most ReAct-based agents suffer from linear context growth — each step appends the raw content of previous steps. Context-ReAct introduces explicit context management as a first-class operation, similar to how a programmer manages memory. This opens the path to long-horizon agent sessions not bounded by the size of the underlying model’s context window.

Frequently Asked Questions

What is BrowseComp?

BrowseComp is a benchmark for agents solving complex long-horizon web search tasks — multiple steps, multiple pages, integrating findings into a final answer.

What are the five operations in the Context-ReAct framework?

Skip, Compress, Rollback, Snippet, and Delete — operations that allow the agent to adaptively reshape its working context during a long-horizon task.

How does LongSeeker compare to others?

It achieves 61.5% on BrowseComp and 62.5% on BrowseComp-ZH, versus 43.2% / 46.7% for Tongyi DeepResearch and 36.2% / 47.3% for AgentFold.

arXiv:2605.05191: LongSeeker with Context-ReAct framework achieves 61.5% on BrowseComp

Five dynamic operations over the working context

Performance and comparison

Why does this matter?

Frequently Asked Questions

Sources

Related news