Anthropic publishes 'Trustworthy agents in practice' policy framework

A policy framework for the agent era

In its research/policy section, Anthropic has published “Trustworthy agents in practice” — a comprehensive document that defines what makes an AI agent trustworthy and how companies can build and use agents in a way that minimizes risks.

The release comes at a time when AI agents are rapidly being commercialized — Claude Cowork, OpenAI Codex, Microsoft Agent-Framework, AWS AgentCore, Anthropic Managed Agents — all offering powerful agentic capabilities, but questions of reliability remain open.

What is in the document?

Anthropic structures “trustworthy” agents through several dimensions:

Predictability — the agent behaves consistently and does not improvise in edge cases
Auditability — all decisions and actions can be reviewed after the fact
Boundaries — clearly defined what the agent may and may not do
Escalation — rules for when the agent must ask a human for approval
Reversibility — the agent performs reversible actions wherever possible

Why now?

Anthropic has a direct commercial interest — Claude Mythos demonstrates an AI capable of autonomously finding and exploiting vulnerabilities in operating systems. Project Glasswing distributes that capability to only 40 selected organizations.

The Trustworthy Agents framework is a companion to that strategy: if Anthropic is building the most powerful agents in the world, it must also set the standards for how they are used safely. Otherwise, regulators (EU AI Act, NIST) will set the standards instead — and perhaps more strictly than the industry wants.

Practical recommendations

The document concludes with a series of concrete recommendations for:

Agent developers — how to design permission systems and guardrails
Enterprise users — how to evaluate agents before deployment
Regulators — what to look for in standards for enterprise AI

Anthropic has so far been a consistent voice for “AI safety as a feature” — Trustworthy agents in practice is a continuation of that strategy and a potentially influential document for future regulation.

Anthropic publishes 'Trustworthy agents in practice' policy framework

A policy framework for the agent era

What is in the document?

Why now?

Practical recommendations

Sources

Related news