SEA: Agents That Self-Modify With Formal Safety Guarantees in Real Time
The SEA (Self-Evolving Agents with Anytime-Valid Certificates) architecture allows agents to update their own parameters while retaining formal learning-theoretic guarantees. Five verification mechanisms and auditable certificates approve or block each self-modification in real time, achieving +4 to +5 solved instances on SWE-bench Verified with strong base models.
This article was generated using artificial intelligence from primary sources.
Researcher Biswa Sengupta published the SEA (Self-Evolving Agents with Anytime-Valid Certificates) architecture on July 1, 2026, addressing one of the fundamental tensions in AI agent development: how to enable self-modification without sacrificing formal safety guarantees.
The problem of uncontrolled self-modification
Agents that can update their own weights or governance mechanisms violate the foundational assumptions of classical learning theory. When an agent itself generates data for its own learning and itself evaluates the quality of that data, standard statistical frameworks cease to hold — there is no independent evaluator to confirm that the change is beneficial.
Previous approaches solved the problem either by prohibiting self-modification (forgoing the potential for adaptation) or by accepting unconstrained self-improvement (accepting unpredictable behavior).
SEA: architecture with formal gates
SEA resolves this tension at three levels.
First, blast-radius containment: all self-modifications are restricted exclusively to a steering adapter that surrounds the frozen base model. The underlying model weights are never changed — meaning that even in a scenario of completely erroneous self-modification, the base capability of the model remains intact.
Second, five verification mechanisms that generate approval or rejection signals for each modification without requiring an external evaluator:
- Best-of-N selection — compares multiple candidate modifications
- Micro-step search — fine-grained search over the adaptation space
- Self-written oracles — the agent constructs its own tests for its own modifications
- Search layer control — oversight of the depth and direction of the search
- Self-repair — real-time detection and correction of regressions
What are “anytime-valid certificates”?
The third layer is statistical: SEA uses anytime-valid statistical gates that emit an auditable certificate for each proposed self-modification. The certificate confirms that the modification does not exceed a predetermined error budget — at any point in the process, not just at the end of an evaluation period.
“Anytime-valid” means the conclusion holds regardless of when the evaluation is stopped — no predetermined number of steps is required. This matters for deployment scenarios where the agent operates in real time and must continuously make decisions about self-modification.
Results on SWE-bench Verified
SEA was tested on a 52-instance subset of SWE-bench Verified across four base models. The key finding: the base model is the dominant factor — SEA amplifies the capability of strong models but does not mask the weaknesses of weak ones.
On strong base models with a no-op control, SEA achieves +4 to +5 additional solved instances. Concrete results: GLM improved from 24 to 28 solved instances, GPT from 29 to 34. Event logs confirmed that the verification mechanisms actively prevented performance regressions during testing.
The researchers note that evaluations were conducted in a single iteration due to task cost, and variance confirmation across runs remains for future research.
SEA demonstrates that self-improvement and safety governance are not in conflict — formal certification is both possible and practically useful within the boundaries of an operational agent.
Frequently Asked Questions
- What makes SEA different from previous self-improving agents?
- SEA does not permit unconstrained self-modification — every change passes through anytime-valid statistical gates that emit auditable certificates and block modifications that would exceed a predetermined error budget.
- How is the 'blast radius' of a bad self-modification bounded?
- SEA restricts all modifications exclusively to a steering adapter around the frozen base model, so that potentially harmful self-edits cannot change the underlying model weights.
- How much did SEA improve performance on SWE-bench testing?
- On a 52-instance SWE-bench Verified subset tested on four base models, SEA achieved +4 to +5 additional solved instances on strong base models — GLM improved from 24 to 28, GPT from 29 to 34.