feat: document the experimental browser agent, its configuration, session modes, and security.

2026-06-10 11:12:35 -07:00 · 2026-02-17 07:07:11 -08:00
parent 067d0ecab3
commit c7ec983d31
4 changed files with 129 additions and 0 deletions
@@ -126,6 +126,17 @@ they appear in the UI.
 | --------------------------------- | ------------------------------ | --------------------------------------------- | ------- |
 | Auto Configure Max Old Space Size | `advanced.autoConfigureMemory` | Automatically configure Node.js memory limits | `false` |

+### Agents
+
+| UI Label             | Setting                                                         | Description                                                                                                       | Default        |
+| -------------------- | --------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- | -------------- |
+| Enable Browser Agent | `agents.overrides.browser_agent.enabled`                        | Enable the browser automation sub-agent.                                                                          | `false`        |
+| Session Mode         | `agents.overrides.browser_agent.customConfig.sessionMode`       | How Chrome is managed: `"persistent"`, `"isolated"`, or `"existing"`.                                             | `"persistent"` |
+| Headless             | `agents.overrides.browser_agent.customConfig.headless`          | Run Chrome in headless mode (no visible window).                                                                  | `false`        |
+| Chrome Profile Path  | `agents.overrides.browser_agent.customConfig.chromeProfilePath` | Custom path to a Chrome profile directory.                                                                        | `undefined`    |
+| Visual Model         | `agents.overrides.browser_agent.customConfig.visualModel`       | Model override for visual analysis (for example, `"gemini-2.5-computer-use-preview-10-2025"`).                    | `undefined`    |
+| Allowed Domains      | `agents.overrides.browser_agent.customConfig.allowedDomains`    | Restrict navigation to these domain patterns. Supports `*` wildcards. If empty, all non-blocked URLs are allowed. | `[]`           |
+
 ### Experimental

 | UI Label                   | Setting                                  | Description                                                                                                                                               | Default |
@@ -80,6 +80,119 @@ Gemini CLI comes with the following built-in subagents:
  invoked by the user.
 - **Configuration:** Enabled by default. No specific configuration options.

+### Browser Agent (experimental)
+
+- **Name:** `browser_agent`
+- **Purpose:** Automate web browser tasks — navigating websites, filling forms,
+  clicking buttons, and extracting information from web pages — using the
+  accessibility tree.
+- **When to use:** "Go to example.com and fill out the contact form," "Extract
+  the pricing table from this page," "Click the login button and enter my
+  credentials."
+
+> **Note:** This is a preview feature currently under active development.
+
+#### Prerequisites
+
+The browser agent requires:
+
+- **Chrome** version 144 or later installed on your system.
+- **Node.js** with `npx` available (used to launch the
+  [`chrome-devtools-mcp`](https://www.npmjs.com/package/chrome-devtools-mcp)
+  server).
+
+#### Enabling the browser agent
+
+The browser agent is disabled by default. Enable it in your `settings.json`:
+
+```json
+{
+  "agents": {
+    "overrides": {
+      "browser_agent": {
+        "enabled": true
+      }
+    }
+  }
+}
+```
+
+#### Session modes
+
+The `sessionMode` setting controls how Chrome is launched and managed. Set it
+under `agents.browser`:
+
+```json
+{
+  "agents": {
+    "overrides": {
+      "browser_agent": {
+        "enabled": true
+      }
+    },
+    "browser": {
+      "sessionMode": "persistent"
+    }
+  }
+}
+```
+
+The available modes are:
+
+| Mode         | Description                                                                                                                                                                                 |
+| :----------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `persistent` | **(Default)** Launches Chrome with a persistent profile stored at `~/.cache/chrome-devtools-mcp/`. Cookies, history, and settings are preserved between sessions.                           |
+| `isolated`   | Launches Chrome with a temporary profile that is deleted after each session. Use this for clean-state automation.                                                                           |
+| `existing`   | Attaches to an already-running Chrome instance. You must enable remote debugging first by navigating to `chrome://inspect/#remote-debugging` in Chrome. No new browser process is launched. |
+
+#### Configuration reference
+
+All browser-specific settings go under `agents.browser` in your `settings.json`.
+
+| Setting       | Type      | Default        | Description                                                                                     |
+| :------------ | :-------- | :------------- | :---------------------------------------------------------------------------------------------- |
+| `sessionMode` | `string`  | `"persistent"` | How Chrome is managed: `"persistent"`, `"isolated"`, or `"existing"`.                           |
+| `headless`    | `boolean` | `false`        | Run Chrome in headless mode (no visible window).                                                |
+| `profilePath` | `string`  | —              | Custom path to a browser profile directory.                                                     |
+| `visualModel` | `string`  | —              | Model override for the visual agent (for example, `"gemini-2.5-computer-use-preview-10-2025"`). |
+
+#### Security
+
+The browser agent enforces the following security restrictions:
+
+- **Blocked URL patterns:** `file://`, `javascript:`, `data:text/html`,
+  `chrome://extensions`, and `chrome://settings/passwords` are always blocked.
+- **Sensitive action confirmation:** Actions like form filling, file uploads,
+  and form submissions require user confirmation through the standard policy
+  engine.
+
+#### Visual agent
+
+By default, the browser agent interacts with pages through the accessibility
+tree using element `uid` values. For tasks that require visual identification
+(for example, "click the yellow button" or "find the red error message"), you
+can enable the visual agent by setting a `visualModel`:
+
+```json
+{
+  "agents": {
+    "overrides": {
+      "browser_agent": {
+        "enabled": true
+      }
+    },
+    "browser": {
+      "visualModel": "gemini-2.5-computer-use-preview-10-2025"
+    }
+  }
+}
+```
+
+When enabled, the agent gains access to the `analyze_screenshot` tool, which
+captures a screenshot and sends it to the vision model for analysis. The model
+returns coordinates and element descriptions that the browser agent uses with
+the `click_at` tool for precise, coordinate-based interactions.
+
 ## Creating custom subagents

 You can create your own subagents to automate specific workflows or enforce
@@ -196,6 +196,7 @@
  {
    "label": "resources_tab",
    "items": [
+      {
      {
        "label": "Resources",
        "items": [
@@ -215,6 +216,7 @@
          { "label": "Uninstall", "slug": "docs/resources/uninstall" }
        ]
      }
+
    ]
  },
  {
@@ -52,6 +52,9 @@ These tools help the model manage its plan and interact with you.
  complex plans.
 - **[Agent Skills](../cli/skills.md) (`activate_skill`):** Loads specialized
  procedural expertise when needed.
+- **[Browser agent](../core/subagents.md#browser-agent-experimental)
+  (`browser_agent`):** Automates web browser tasks through the accessibility
+  tree.
 - **Internal docs (`get_internal_docs`):** Accesses Gemini CLI's own
  documentation to help answer your questions.