🟡 🤝 Agents Wednesday, May 6, 2026 · 2 min read ·

AWS: AgentCore Browser gains OS-level actions — 8 new primitives

Editorial illustration: an agent clicking a system dialog outside the browser boundary in the Amazon Bedrock AgentCore environment.

AWS announced OS Level Actions for Amazon Bedrock AgentCore Browser on May 5 — a capability enabling agents to interact with the native OS interface outside the DOM. It introduces 8 actions and an action-screenshot-reaction loop, available with no additional configuration.

🤖

This article was generated using artificial intelligence from primary sources.

What did AWS announce?

Amazon Web Services introduced OS Level Actions for Amazon Bedrock AgentCore Browser on May 5 — a new capability enabling agents to break out of the DOM and interact with the native operating system interface. The feature is available to all AgentCore Browser users with no additional configuration.

DOM (Document Object Model) is the structured HTML representation that a browser exposes to automation tools like Playwright.

Why does this matter for agents?

Until now, agents could only control HTML elements through Playwright. When a system dialog appeared — a print window, certificate prompt, or security alert — the agent effectively stopped. It could see it in a screenshot but had no mechanism to click something outside the DOM.

The new action set bridges exactly that gap, opening workflows that cross the browser boundary.

Which primitives does the new set introduce?

Eight actions cover keyboard, mouse, and screenshot:

  • mouseClick, mouseMove, mouseDrag, mouseScroll for pointer gestures
  • keyType, keyPress, keyShortcut for text input and key combinations
  • screenshot capturing the full OS desktop (not just the browser viewport)

The full desktop capture is critical for agents — it gives them a complete picture of the machine’s state.

What does the working pattern look like?

The pattern is an action-screenshot-reaction loop: the agent sends an action, AgentCore executes it against the operating system, the agent requests a screenshot, a vision model analyzes the new state, and then decides the next action. The cycle repeats until the task completes.

This approach treats the computer as a state that is observed and changed — the same pattern a human uses when operating a machine.

Frequently Asked Questions

What actions does the new set include?
Eight primitives: mouseClick, mouseMove, mouseDrag, mouseScroll, keyType, keyPress, keyShortcut, and screenshot, which captures the full OS desktop.
Is additional configuration required?
No. The feature is available to all AgentCore Browser users immediately, with no extra setup.
How does the agent handle system dialogs?
Through an action-screenshot-reaction loop: the agent sends an action, AgentCore executes it, a screenshot is taken, the vision model analyzes the new state, and decides the next move.