AI agents gain formal policy-as-code

Researchers developed a pipeline that automatically translates natural-language instructions for AI agents, MCP tool descriptions, and policy documents into formally verified code using Cedar Policy Language and an LLM generator-critic loop, achieving significantly greater specification coverage than manually written symbolic enforcement.

The gap between promises and guarantees

Most of today’s safety measures for AI agents rest on probabilistic guardrails — constraints imposed through system prompts and fine-tuning, without formal guarantees. Mondl, Maisel, and Brock are not satisfied with that. Their work, accepted at the AIWILD workshop at ICML 2026, introduces an automated pipeline that translates natural-language policy documents, prompts, and MCP tool descriptions into Cedar Policy Language — a formal language for verified rules that Amazon uses for cloud access control.

How does the generator-critic loop work?

An LLM generator proposes a Cedar policy from the source text, while a second LLM critic checks coverage and consistency. The loop repeats until the policy satisfies all conditions. The result is machine-verifiable code, not text that an agent can ignore.

What do the results on MedAgentBench show?

On the medical agent MedAgentBench, the autoformalized policies covered a substantially greater share of the original NL specification than manually coded symbolic enforcement from prior work. The authors do not publish exact percentages in the abstract, but the qualitative difference is described as “substantially greater coverage” — which is precisely the point of failure where the manual approach breaks down when scaling to more complex domains.

Why is Cedar the right choice?

Cedar Policy Language is designed for expressiveness and fast formal reasoning. Unlike Python scripts or regex filters, Cedar policies allow static analysis: you can prove that an agent will never call a specific tool without appropriate authorization, without running a single real invocation.

Frequently Asked Questions

What is policy-as-code and why does it matter for AI agents?

Policy-as-code is an approach in which security rules are written as formally verified program code rather than natural-language instructions. For AI agents this means constraints can be mathematically proven, not merely hoped to be respected by the model.

How does autoformalization differ from existing guardrails?

Existing probabilistic guardrails (e.g., system prompts) offer no formal guarantees, and manually written symbolic enforcement does not scale. This pipeline combines LLM flexibility with formal verification via Cedar Policy Language.

arXiv:2606.26649: Agent instructions converted into formally verified policy-as-code

The gap between promises and guarantees

How does the generator-critic loop work?

What do the results on MedAgentBench show?

Why is Cedar the right choice?

Frequently Asked Questions

Sources

Related news