From c7ec983d3197684e3913ddd9f03cb1146ac2d869 Mon Sep 17 00:00:00 2001 From: Gaurav Ghosh Date: Tue, 17 Feb 2026 07:07:11 -0800 Subject: [PATCH] feat: document the experimental browser agent, its configuration, session modes, and security. --- docs/cli/settings.md | 11 ++++ docs/core/subagents.md | 113 +++++++++++++++++++++++++++++++++++++++++ docs/sidebar.json | 2 + docs/tools/index.md | 3 ++ 4 files changed, 129 insertions(+) diff --git a/docs/cli/settings.md b/docs/cli/settings.md index 5011f55b2c..abcd9017ec 100644 --- a/docs/cli/settings.md +++ b/docs/cli/settings.md @@ -126,6 +126,17 @@ they appear in the UI. | --------------------------------- | ------------------------------ | --------------------------------------------- | ------- | | Auto Configure Max Old Space Size | `advanced.autoConfigureMemory` | Automatically configure Node.js memory limits | `false` | +### Agents + +| UI Label | Setting | Description | Default | +| -------------------- | --------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- | -------------- | +| Enable Browser Agent | `agents.overrides.browser_agent.enabled` | Enable the browser automation sub-agent. | `false` | +| Session Mode | `agents.overrides.browser_agent.customConfig.sessionMode` | How Chrome is managed: `"persistent"`, `"isolated"`, or `"existing"`. | `"persistent"` | +| Headless | `agents.overrides.browser_agent.customConfig.headless` | Run Chrome in headless mode (no visible window). | `false` | +| Chrome Profile Path | `agents.overrides.browser_agent.customConfig.chromeProfilePath` | Custom path to a Chrome profile directory. | `undefined` | +| Visual Model | `agents.overrides.browser_agent.customConfig.visualModel` | Model override for visual analysis (for example, `"gemini-2.5-computer-use-preview-10-2025"`). | `undefined` | +| Allowed Domains | `agents.overrides.browser_agent.customConfig.allowedDomains` | Restrict navigation to these domain patterns. Supports `*` wildcards. If empty, all non-blocked URLs are allowed. | `[]` | + ### Experimental | UI Label | Setting | Description | Default | diff --git a/docs/core/subagents.md b/docs/core/subagents.md index 3619609e95..24785145d9 100644 --- a/docs/core/subagents.md +++ b/docs/core/subagents.md @@ -80,6 +80,119 @@ Gemini CLI comes with the following built-in subagents: invoked by the user. - **Configuration:** Enabled by default. No specific configuration options. +### Browser Agent (experimental) + +- **Name:** `browser_agent` +- **Purpose:** Automate web browser tasks — navigating websites, filling forms, + clicking buttons, and extracting information from web pages — using the + accessibility tree. +- **When to use:** "Go to example.com and fill out the contact form," "Extract + the pricing table from this page," "Click the login button and enter my + credentials." + +> **Note:** This is a preview feature currently under active development. + +#### Prerequisites + +The browser agent requires: + +- **Chrome** version 144 or later installed on your system. +- **Node.js** with `npx` available (used to launch the + [`chrome-devtools-mcp`](https://www.npmjs.com/package/chrome-devtools-mcp) + server). + +#### Enabling the browser agent + +The browser agent is disabled by default. Enable it in your `settings.json`: + +```json +{ + "agents": { + "overrides": { + "browser_agent": { + "enabled": true + } + } + } +} +``` + +#### Session modes + +The `sessionMode` setting controls how Chrome is launched and managed. Set it +under `agents.browser`: + +```json +{ + "agents": { + "overrides": { + "browser_agent": { + "enabled": true + } + }, + "browser": { + "sessionMode": "persistent" + } + } +} +``` + +The available modes are: + +| Mode | Description | +| :----------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `persistent` | **(Default)** Launches Chrome with a persistent profile stored at `~/.cache/chrome-devtools-mcp/`. Cookies, history, and settings are preserved between sessions. | +| `isolated` | Launches Chrome with a temporary profile that is deleted after each session. Use this for clean-state automation. | +| `existing` | Attaches to an already-running Chrome instance. You must enable remote debugging first by navigating to `chrome://inspect/#remote-debugging` in Chrome. No new browser process is launched. | + +#### Configuration reference + +All browser-specific settings go under `agents.browser` in your `settings.json`. + +| Setting | Type | Default | Description | +| :------------ | :-------- | :------------- | :---------------------------------------------------------------------------------------------- | +| `sessionMode` | `string` | `"persistent"` | How Chrome is managed: `"persistent"`, `"isolated"`, or `"existing"`. | +| `headless` | `boolean` | `false` | Run Chrome in headless mode (no visible window). | +| `profilePath` | `string` | — | Custom path to a browser profile directory. | +| `visualModel` | `string` | — | Model override for the visual agent (for example, `"gemini-2.5-computer-use-preview-10-2025"`). | + +#### Security + +The browser agent enforces the following security restrictions: + +- **Blocked URL patterns:** `file://`, `javascript:`, `data:text/html`, + `chrome://extensions`, and `chrome://settings/passwords` are always blocked. +- **Sensitive action confirmation:** Actions like form filling, file uploads, + and form submissions require user confirmation through the standard policy + engine. + +#### Visual agent + +By default, the browser agent interacts with pages through the accessibility +tree using element `uid` values. For tasks that require visual identification +(for example, "click the yellow button" or "find the red error message"), you +can enable the visual agent by setting a `visualModel`: + +```json +{ + "agents": { + "overrides": { + "browser_agent": { + "enabled": true + } + }, + "browser": { + "visualModel": "gemini-2.5-computer-use-preview-10-2025" + } + } +} +``` + +When enabled, the agent gains access to the `analyze_screenshot` tool, which +captures a screenshot and sends it to the vision model for analysis. The model +returns coordinates and element descriptions that the browser agent uses with +the `click_at` tool for precise, coordinate-based interactions. + ## Creating custom subagents You can create your own subagents to automate specific workflows or enforce diff --git a/docs/sidebar.json b/docs/sidebar.json index 8a4bd7391c..6f40d047f2 100644 --- a/docs/sidebar.json +++ b/docs/sidebar.json @@ -196,6 +196,7 @@ { "label": "resources_tab", "items": [ + { { "label": "Resources", "items": [ @@ -215,6 +216,7 @@ { "label": "Uninstall", "slug": "docs/resources/uninstall" } ] } + ] }, { diff --git a/docs/tools/index.md b/docs/tools/index.md index f496ad591a..6bdf298fea 100644 --- a/docs/tools/index.md +++ b/docs/tools/index.md @@ -52,6 +52,9 @@ These tools help the model manage its plan and interact with you. complex plans. - **[Agent Skills](../cli/skills.md) (`activate_skill`):** Loads specialized procedural expertise when needed. +- **[Browser agent](../core/subagents.md#browser-agent-experimental) + (`browser_agent`):** Automates web browser tasks through the accessibility + tree. - **Internal docs (`get_internal_docs`):** Accesses Gemini CLI's own documentation to help answer your questions.