AWS: AgentCore Browser gains OS-level actions — 8 new primitives
AWS announced OS Level Actions for Amazon Bedrock AgentCore Browser on May 5 — a capability enabling agents to interact with the native OS interface outside the DOM. It introduces 8 actions and an action-screenshot-reaction loop, available with no additional configuration.
This article was generated using artificial intelligence from primary sources.
What did AWS announce?
Amazon Web Services introduced OS Level Actions for Amazon Bedrock AgentCore Browser on May 5 — a new capability enabling agents to break out of the DOM and interact with the native operating system interface. The feature is available to all AgentCore Browser users with no additional configuration.
DOM (Document Object Model) is the structured HTML representation that a browser exposes to automation tools like Playwright.
Why does this matter for agents?
Until now, agents could only control HTML elements through Playwright. When a system dialog appeared — a print window, certificate prompt, or security alert — the agent effectively stopped. It could see it in a screenshot but had no mechanism to click something outside the DOM.
The new action set bridges exactly that gap, opening workflows that cross the browser boundary.
Which primitives does the new set introduce?
Eight actions cover keyboard, mouse, and screenshot:
mouseClick,mouseMove,mouseDrag,mouseScrollfor pointer gestureskeyType,keyPress,keyShortcutfor text input and key combinationsscreenshotcapturing the full OS desktop (not just the browser viewport)
The full desktop capture is critical for agents — it gives them a complete picture of the machine’s state.
What does the working pattern look like?
The pattern is an action-screenshot-reaction loop: the agent sends an action, AgentCore executes it against the operating system, the agent requests a screenshot, a vision model analyzes the new state, and then decides the next action. The cycle repeats until the task completes.
This approach treats the computer as a state that is observed and changed — the same pattern a human uses when operating a machine.
Frequently Asked Questions
- What actions does the new set include?
- Eight primitives: mouseClick, mouseMove, mouseDrag, mouseScroll, keyType, keyPress, keyShortcut, and screenshot, which captures the full OS desktop.
- Is additional configuration required?
- No. The feature is available to all AgentCore Browser users immediately, with no extra setup.
- How does the agent handle system dialogs?
- Through an action-screenshot-reaction loop: the agent sends an action, AgentCore executes it, a screenshot is taken, the vision model analyzes the new state, and decides the next move.
Related news
Anthropic: 10 ready-made financial-services agent templates + Claude Opus 4.7 at 64.37% on Vals AI Finance benchmark
arXiv:2605.02503: DataClaw — process-level benchmark measures the quality of AI agent workflows in exploratory data analysis
ArXiv GUI-SD: first on-policy self-distillation framework for GUI grounding outperforms GRPO across six benchmarks in accuracy and training efficiency