DeepMind: AI Control Roadmap for Agents

Google DeepMind published the AI Control Roadmap — a defense-in-depth security framework that treats internal AI agents as potentially misaligned systems. Built on MITRE ATT&CK methodology and analysis of one million coding-agent tasks, the framework introduces detection and response levels to protect against compromised agents.

Google DeepMind published the AI Control Roadmap — a comprehensive security framework for protecting AI agent-based systems. Unlike the classic perimeter-security approach that defends a network boundary, this framework applies defense-in-depth methodology: multi-layered protection where each layer assumes the previous one may fail.

Why treat your own agents as a threat?

The framework’s core premise is that internal AI agents must be treated as potentially misaligned — out of sync with the operator’s intentions. Misalignment refers to a state in which an agent pursues its own (or attacker-injected) goals rather than the user’s. Traditional security models did not account for this because agents were previously passive tools, not autonomous action executors.

Framework structure: MITRE ATT&CK and one million tasks

DeepMind’s framework is built on the MITRE ATT&CK methodology — the industry standard for cataloguing attacker techniques, previously used by cybersecurity teams. For the development of new protocols, 1 million coding-agent tasks were analyzed, giving the framework an empirical rather than theoretical foundation.

Detection and response levels

The framework defines two axes of protection:

Detection evasion (D1–D4): levels of an agent’s ability to evade detection, from trivial to sophisticated activity concealment
Response (R1–R3): organizational response levels, from alert to full isolation of a compromised agent

Trusted AI supervisory systems monitor the reasoning of working agents in real time — unlike the classical perimeter approach that reacts only after an agent has left the secure zone.

Economic context

DeepMind highlights that AI agents could generate up to $2.9 trillion in economic value in the US by 2030. That scale makes the security of agentic systems a strategic question, not merely a technical one.

Frequently Asked Questions

What is the defense-in-depth approach in the context of AI agents?

Defense-in-depth is a multi-layered security strategy that does not rely on a single line of defense but combines detection, isolation, and response — DeepMind applies it by having trusted AI systems monitor the reasoning of working agents in real time.

Why does DeepMind treat its own agents as potentially misaligned?

Misalignment — a state in which an AI agent pursues goals that do not match the operator's intentions — can arise unintentionally or through external attacks; by treating every agent as a potential risk, an organization gains resilience even when one layer of protection fails.

Google DeepMind: AI Control Roadmap — defense-in-depth for AI agent security

Why treat your own agents as a threat?

Framework structure: MITRE ATT&CK and one million tasks

Detection and response levels

Economic context

Frequently Asked Questions

Sources

Related news