AWS AgentCore Browser: 8 OS-level actions for agents

AWS announced OS Level Actions for Amazon Bedrock AgentCore Browser on May 5 — a capability enabling agents to interact with the native OS interface outside the DOM. It introduces 8 actions and an action-screenshot-reaction loop, available with no additional configuration.

What did AWS announce?

Amazon Web Services introduced OS Level Actions for Amazon Bedrock AgentCore Browser on May 5 — a new capability enabling agents to break out of the DOM and interact with the native operating system interface. The feature is available to all AgentCore Browser users with no additional configuration.

DOM (Document Object Model) is the structured HTML representation that a browser exposes to automation tools like Playwright.

Why does this matter for agents?

Until now, agents could only control HTML elements through Playwright. When a system dialog appeared — a print window, certificate prompt, or security alert — the agent effectively stopped. It could see it in a screenshot but had no mechanism to click something outside the DOM.

The new action set bridges exactly that gap, opening workflows that cross the browser boundary.

Which primitives does the new set introduce?

Eight actions cover keyboard, mouse, and screenshot:

mouseClick, mouseMove, mouseDrag, mouseScroll for pointer gestures
keyType, keyPress, keyShortcut for text input and key combinations
screenshot capturing the full OS desktop (not just the browser viewport)

The full desktop capture is critical for agents — it gives them a complete picture of the machine’s state.

What does the working pattern look like?

The pattern is an action-screenshot-reaction loop: the agent sends an action, AgentCore executes it against the operating system, the agent requests a screenshot, a vision model analyzes the new state, and then decides the next action. The cycle repeats until the task completes.

This approach treats the computer as a state that is observed and changed — the same pattern a human uses when operating a machine.

Frequently Asked Questions

What actions does the new set include?

Eight primitives: mouseClick, mouseMove, mouseDrag, mouseScroll, keyType, keyPress, keyShortcut, and screenshot, which captures the full OS desktop.

Is additional configuration required?

No. The feature is available to all AgentCore Browser users immediately, with no extra setup.

How does the agent handle system dialogs?

Through an action-screenshot-reaction loop: the agent sends an action, AgentCore executes it, a screenshot is taken, the vision model analyzes the new state, and decides the next move.

AWS: AgentCore Browser gains OS-level actions — 8 new primitives

What did AWS announce?

Why does this matter for agents?

Which primitives does the new set introduce?

What does the working pattern look like?

Frequently Asked Questions

Sources

Related news