Browser OS action
The InvokeBrowser API provides direct operating system-level control over Amazon Bedrock AgentCore Browser sessions. While the WebSocket-based automation endpoint uses Chrome DevTools Protocol (CDP) for browser interaction, InvokeBrowser operates at the OS level, enabling actions that CDP cannot handle — such as interacting with print dialogs, keyboard shortcuts, right-click context menus, JavaScript alerts, and capturing full-screen screenshots.
Overview
The Amazon Bedrock AgentCore Browser provides two ways to interact with a browser session:
-
WebSocket-based automation (CDP) : Uses the Chrome DevTools Protocol over a WebSocket connection. This is ideal for standard browser automation tasks such as navigating pages, clicking DOM elements, filling forms, and extracting page content. Libraries like Playwright and browser-use connect through this endpoint.
-
OS-level actions (InvokeBrowser) : Uses a REST API to perform operating system-level interactions through mouse, keyboard, and screenshot actions. This complements CDP by handling scenarios where browser-level automation is insufficient.
Use InvokeBrowser when your agent needs to:
-
Interact with native OS dialogs such as print dialogs, file upload/download dialogs, or authentication prompts that are outside the browser DOM
-
Dismiss JavaScript alerts, confirms, or prompts that block CDP execution
-
Use keyboard shortcuts (for example, ctrl+a, ctrl+p) that trigger OS-level behavior
-
Interact with right-click context menus rendered by the operating system
-
Capture full desktop screenshots that include content outside the browser viewport, such as OS notifications or multi-window layouts
-
Perform drag-and-drop operations that span across browser windows or between the browser and the desktop
InvokeBrowser follows the same pattern as InvokeCodeInterpreter: a single unified operation with action-type dispatch. You send a request with exactly one action, and receive a corresponding result.
Supported actions
InvokeBrowser supports the following action types through the BrowserAction union. Exactly one action member must be set per request.
Mouse actions
For all mouse actions, coordinate values ( x , y ) must be strictly within the browser session viewport bounds. Valid ranges are 1 < x < viewportWidth-2 and 1 < y < viewportHeight-2. The default viewport size is 1456×819 pixels, which can be configured when starting a session using the viewPort parameter.
| Action | Required fields | Optional fields | Description |
|---|---|---|---|
|
|
|
|
Click at the specified coordinates. |
|
|
|
— |
Move cursor to the specified coordinates. |
|
|
|
|
Drag from start to end position. |
|
|
|
|
Scroll at the specified position. |
Keyboard actions
| Action | Required fields | Optional fields | Description |
|---|---|---|---|
|
|
|
— |
Type a string of text. Maximum length: 10,000 characters. |
|
|
|
|
Press a key N times. |
|
|
|
— |
Press a key combination (for example, |
Screenshot action
| Action | Required fields | Optional fields | Description |
|---|---|---|---|
|
|
— |
|
Capture the full OS desktop (not just the browser viewport). Format: PNG only. |
Considerations
-
ASCII-only text input : The
keyTypeaction supports ASCII characters only. Non-ASCII characters (such as Unicode or multi-byte characters) are skipped during input. -
No key name validation : The
keyPressandkeyShortcutactions do not validate whether the specified key names are supported. If you provide an unrecognized key name, the API returns a SUCCESS status without performing the intended action. Refer to the supported key names listed above. -
Supported key names : Key names for
keyPressandkeyShortcutactions must be in lowercase. Supported keys include single characters (a–z,0–9), and named keys such asenter,tab,space,backspace,delete,escape,ctrl,alt,shift.
Request and response format
Request
POST /browsers/{browserIdentifier}/sessions/invoke HTTP/1.1 x-amzn-browser-session-id: sessionId Content-type: application/json
The request body contains an action field with exactly one member of the BrowserAction union set:
{ "action": { "mouseClick": { "x": 100, "y": 200, "button": "LEFT", "clickCount": 1 } } }
Response
The sessionId is returned via the x-amzn-browser-session-id response header. The response body contains a result field with the corresponding action result.
On success:
{ "result": { "mouseClick": { "status": "SUCCESS", "error": null } } }
On failure, the status field is set to FAILED and the error field contains a description of the failure.
Examples
The following examples show how to invoke browser actions using the AWS CLI, AWS SDK for Python (Boto3), and the API.