arXiv:2605.04785: AgentTrust Intercepts AI Agent Tool Calls with 95–97% Accuracy
AgentTrust is an open-source runtime system that intercepts AI agent tool calls — file operations, SQL queries and shell commands — and returns one of four verdicts before execution. Across 930 test scenarios it achieves 95–97% accuracy, and approximately 93% on shell-obfuscated attacks.
This article was generated using artificial intelligence from primary sources.
A new paper on arXiv introduces AgentTrust, a runtime safety layer that sits between an AI agent and its tools, evaluating every call in real time before it executes. The system targets vulnerabilities that emerge when agents are given broad access to the operating system and external services.
How does AgentTrust decide what to allow?
For each incoming tool call, AgentTrust returns one of four verdicts: allow, warn, block or send for review. The architecture combines a shell deobfuscation normalizer, a SafeFix component that suggests safer alternatives, a RiskChain detector for multi-step attack chains, and a cache-aware LLM-as-Judge layer for ambiguous inputs. Covered tools include file operations, SQL queries and shell commands — the three most common attack surfaces in production agentic systems.
How accurate is the system?
Evaluation was conducted on a total of 930 scenarios — 300 internal across six risk categories and 630 independently constructed adversarial real-world situations. The production ruleset achieves 95.0% verdict accuracy on the internal benchmark and 73.7% risk-level accuracy with latency in the millisecond range. On the 630-scenario set, AgentTrust reaches 96.7% accuracy, including approximately 93% accuracy on shell-obfuscated payloads that typically bypass simpler filters.
What does open-source delivery mean?
Author Chenglin Yang has released AgentTrust under the AGPL-3.0 license, which requires derived works to share the same open-source terms. The system is delivered as an MCP server (Model Context Protocol — the open standard for calling external tools from LLMs), so it can be attached to any agent that supports MCP without modifying agent code. This lowers the barrier to introducing runtime control into existing agentic workflows.
Frequently Asked Questions
- What does AgentTrust intercept?
- The system intercepts AI agent tool calls before they execute — specifically file operations, SQL queries and shell commands — and decides in real time whether to allow or block them.
- What verdicts does AgentTrust return?
- Four possible verdicts: allow, warn, block and send for human review. There is also a SafeFix component that suggests safer alternatives.
- Under what license is it available?
- The system is released under the AGPL-3.0 open-source license and delivered as an MCP server, making it compatible with any agent that supports the Model Context Protocol.
Sources
Related news
arXiv:2605.04019: automated red teaming agent achieves 85% success rate against Meta Llama Scout with 45+ attacks and 450+ transformations
GitHub: Secret scanning via MCP server reaches GA — AI agents detect credentials before commit
ArXiv: Visual inputs bypass safety filters in vision-language models 40.9% of the time, ICML 2026 authors find