What does 'consumer-side execution' mean?

Instead of the LLM executing actions directly, the model generates a typed action request that the consumer application validates against a static type contract before execution. If the request fails the type check, the action does not happen.

Why not train the model not to make errors?

Because an LLM is probabilistic — there is never a 100% accuracy guarantee. A type system is deterministic — if properly defined, it 100% blocks a class of errors. The authors argue that real enterprise security requirements cannot rely on the model alone.

Bounded Autonomy: typed action contracts for enterprise AI

What is the problem?

In enterprise software — CRM, ERP, internal tools, customer support platforms — AI agents increasingly execute actions with consequences: updating records, sending emails, triggering workflows, accessing different client workspaces. The problem arises when an LLM makes a mistake that breaches security boundaries:

Unauthorized actions — the agent executes a function for which the user lacks permissions
Malformed requests — the structure of a tool call violates the expected format, the API breaks
Cross-workspace execution — in a multi-tenant environment, the agent touches another client’s data
Unauthorized escalation — the agent uses tools that require a higher privilege level than the user has approved

The classic solution is “train the model better” or “add guardrails in the prompt.” Both are probabilistic — the model can still make mistakes, just less often. In enterprise environments where an error can mean a GDPR violation or loss of client trust, that is not enough.

The solution the paper proposes

The paper presented on arXiv on April 17, 2026 proposes a deterministic layer outside the model:

Typed Action Contracts explicitly define which actions the agent may execute, with what arguments, in what context, and under what preconditions
Consumer-side execution means the LLM does not execute actions directly — it generates a structured action request that the consumer application then validates against the type contract before any execution
If the request fails the type check (wrong type, missing permission, wrong workspace), the action does not happen — regardless of what the LLM “thought”

Architecturally, this shifts the security burden from a probabilistic model to a deterministic type system — static checking instead of runtime prayer.

What does it look like in practice?

The authors provide concrete examples from enterprise environments:

Example 1 — Workspace isolation:

UpdateCustomerRecord(customerId: CustomerId, fields: CustomerFields)
  requires: caller.workspace == customer.workspace

If the LLM tries to update a customer from another workspace, the type system rejects it before execution.

Example 2 — Privileges:

SendExternalEmail(to: EmailAddress, body: String)
  requires: caller.permissions.includes(SEND_EXTERNAL)

The model can compose a perfect email — if the user lacks the SEND_EXTERNAL permission, the action fails statically.

Example 3 — Semantic constraints:

DeleteRecord(id: RecordId)
  requires: record.createdBy == caller || caller.isAdmin

The model cannot delete someone else’s record even if it seems logical to it.

Why is this better than prompt engineering?

Prompt engineering relies on the model reading and respecting the instruction. There is always a chance the model interprets edge cases incorrectly or violates a constraint by mistake.
Type contracts operate at compiler-level checking. They do not depend on model behavior. If properly defined, errors in the classes they cover are impossible.

The trade-off is implementation complexity. The type system must be carefully designed to cover real scenarios without excessive rigidity. The paper includes examples from several enterprise domains (sales, support, HR) and shows that it is practically feasible.

Implications for AI tool building

For developers building enterprise AI integrations, the paper provides a concrete design pattern:

Explicitly define all actions the agent is allowed to perform
For each action, write a typed contract with preconditions
Have the model produce a structured action request rather than executing directly
Validation passes through a deterministic type checker before any side effects

The approach aligns with MCP (Model Context Protocol) trends, which also promote structured tool calls over free execution. Combined with MCP, the result is layered defense where both MCP and type contracts block different classes of errors.

The paper is a preprint, but the idea is concrete enough that teams building enterprise AI today can immediately apply the principles — without waiting for formal peer-reviewed publication.

Bounded Autonomy: typed action contracts on the consumer side stop LLM errors in enterprise software

What is the problem?

The solution the paper proposes

What does it look like in practice?

Why is this better than prompt engineering?

Implications for AI tool building

Sources

Related news