+### Tokyo Night
+
+
+
## Light themes
### ANSI Light
diff --git a/docs/core/index.md b/docs/core/index.md
index afa13787b8..ae5a6794fe 100644
--- a/docs/core/index.md
+++ b/docs/core/index.md
@@ -7,8 +7,8 @@ requests sent from `packages/cli`. For a general overview of Gemini CLI, see the
## Navigating this section
-- **[Sub-agents (experimental)](./subagents.md):** Learn how to create and use
- specialized sub-agents for complex tasks.
+- **[Sub-agents](./subagents.md):** Learn how to create and use specialized
+ sub-agents for complex tasks.
- **[Core tools reference](../reference/tools.md):** Information on how tools
are defined, registered, and used by the core.
- **[Memory Import Processor](../reference/memport.md):** Documentation for the
diff --git a/docs/core/remote-agents.md b/docs/core/remote-agents.md
index e11c37fece..584ad87847 100644
--- a/docs/core/remote-agents.md
+++ b/docs/core/remote-agents.md
@@ -1,4 +1,4 @@
-# Remote Subagents (experimental)
+# Remote Subagents
Gemini CLI supports connecting to remote subagents using the Agent-to-Agent
(A2A) protocol. This allows Gemini CLI to interact with other agents, expanding
@@ -10,23 +10,6 @@ agents in the following repositories:
- [ADK Samples (Python)](https://github.com/google/adk-samples/tree/main/python)
- [ADK Python Contributing Samples](https://github.com/google/adk-python/tree/main/contributing/samples)
-
-> [!NOTE]
-> Remote subagents are currently an experimental feature.
-
-## Configuration
-
-To use remote subagents, you must explicitly enable them in your
-`settings.json`:
-
-```json
-{
- "experimental": {
- "enableAgents": true
- }
-}
-```
-
## Proxy support
Gemini CLI routes traffic to remote agents through an HTTP/HTTPS proxy if one is
@@ -459,3 +442,16 @@ Users can manage subagents using the following commands within the Gemini CLI:
> [!TIP]
> You can use the `@cli_help` agent within Gemini CLI for assistance
> with configuring subagents.
+
+## Disabling remote agents
+
+Remote subagents are enabled by default. To disable them, set `enableAgents` to
+`false` in your `settings.json`:
+
+```json
+{
+ "experimental": {
+ "enableAgents": false
+ }
+}
+```
diff --git a/docs/core/subagents.md b/docs/core/subagents.md
index b0cffca3b5..f1e4dda614 100644
--- a/docs/core/subagents.md
+++ b/docs/core/subagents.md
@@ -1,23 +1,10 @@
-# Subagents (experimental)
+# Subagents
Subagents are specialized agents that operate within your main Gemini CLI
session. They are designed to handle specific, complex tasksโlike deep codebase
analysis, documentation lookup, or domain-specific reasoningโwithout cluttering
the main agent's context or toolset.
-
-> [!NOTE]
-> Subagents are currently an experimental feature.
->
-To use custom subagents, you must ensure they are enabled in your
-`settings.json` (enabled by default):
-
-```json
-{
- "experimental": { "enableAgents": true }
-}
-```
-
## What are subagents?
Subagents are "specialists" that the main Gemini agent can hire for a specific
@@ -124,10 +111,12 @@ Gemini CLI comes with the following built-in subagents:
The browser agent requires:
-- **Chrome** version 144 or later (any recent stable release will work).
-- **Node.js** with `npx` available (used to launch the
- [`chrome-devtools-mcp`](https://www.npmjs.com/package/chrome-devtools-mcp)
- server).
+- **Chrome** version 144 or later (any recent stable release works).
+
+The underlying
+[`chrome-devtools-mcp`](https://www.npmjs.com/package/chrome-devtools-mcp)
+server is bundled with Gemini CLI and launched automatically โ no separate
+installation is needed.
#### Enabling the browser agent
@@ -173,26 +162,58 @@ The available modes are:
| `isolated` | Launches Chrome with a temporary profile that is deleted after each session. Use this for clean-state automation. |
| `existing` | Attaches to an already-running Chrome instance. You must enable remote debugging first by navigating to `chrome://inspect/#remote-debugging` in Chrome. No new browser process is launched. |
+#### First-run consent
+
+The first time the browser agent is invoked, Gemini CLI displays a consent
+dialog. You must accept before the browser session starts. This dialog only
+appears once.
+
#### Configuration reference
All browser-specific settings go under `agents.browser` in your `settings.json`.
+For full details, see the
+[`agents.browser` configuration reference](../reference/configuration.md#agents).
-| Setting | Type | Default | Description |
-| :------------ | :-------- | :------------- | :---------------------------------------------------------------------------------------------- |
-| `sessionMode` | `string` | `"persistent"` | How Chrome is managed: `"persistent"`, `"isolated"`, or `"existing"`. |
-| `headless` | `boolean` | `false` | Run Chrome in headless mode (no visible window). |
-| `profilePath` | `string` | โ | Custom path to a browser profile directory. |
-| `visualModel` | `string` | โ | Model override for the visual agent (for example, `"gemini-2.5-computer-use-preview-10-2025"`). |
+| Setting | Type | Default | Description |
+| :------------------------ | :--------- | :------------- | :------------------------------------------------------------------------------ |
+| `sessionMode` | `string` | `"persistent"` | How Chrome is managed: `"persistent"`, `"isolated"`, or `"existing"`. |
+| `headless` | `boolean` | `false` | Run Chrome in headless mode (no visible window). |
+| `profilePath` | `string` | โ | Custom path to a browser profile directory. |
+| `visualModel` | `string` | โ | Model override for the visual agent. |
+| `allowedDomains` | `string[]` | โ | Restrict navigation to specific domains (for example, `["github.com"]`). |
+| `disableUserInput` | `boolean` | `true` | Disable user input on the browser window during automation (non-headless only). |
+| `maxActionsPerTask` | `number` | `100` | Maximum tool calls per task. The agent is terminated when the limit is reached. |
+| `confirmSensitiveActions` | `boolean` | `false` | Require manual confirmation for `upload_file` and `evaluate_script`. |
+| `blockFileUploads` | `boolean` | `false` | Hard-block all file upload requests from the agent. |
+
+#### Automation overlay and input blocking
+
+In non-headless mode, the browser agent injects a visual overlay into the
+browser window to indicate that automation is in progress. By default, user
+input (keyboard and mouse) is also blocked to prevent accidental interference.
+You can disable this by setting `disableUserInput` to `false`.
#### Security
-The browser agent enforces the following security restrictions:
+The browser agent enforces several layers of security:
-- **Blocked URL patterns:** `file://`, `javascript:`, `data:text/html`,
- `chrome://extensions`, and `chrome://settings/passwords` are always blocked.
-- **Sensitive action confirmation:** Actions like form filling, file uploads,
- and form submissions require user confirmation through the standard policy
- engine.
+- **Domain restrictions:** When `allowedDomains` is set, the agent can only
+ navigate to the listed domains (and their subdomains when using `*.` prefix).
+ Attempting to visit a disallowed domain throws a fatal error that immediately
+ terminates the agent. The agent also attempts to detect and block the use of
+ allowed domains as proxies (e.g., via query parameters or fragments) to access
+ restricted content.
+- **Blocked URL patterns:** The underlying MCP server blocks dangerous URL
+ schemes including `file://`, `javascript:`, `data:text/html`,
+ `chrome://extensions`, and `chrome://settings/passwords`.
+- **Sensitive action confirmation:** Form filling (`fill`, `fill_form`) always
+ requires user confirmation through the policy engine, regardless of approval
+ mode. When `confirmSensitiveActions` is `true`, `upload_file` and
+ `evaluate_script` also require confirmation.
+- **File upload blocking:** Set `blockFileUploads` to `true` to hard-block all
+ file upload requests, preventing the agent from uploading any files.
+- **Action rate limiting:** The `maxActionsPerTask` setting (default: 100)
+ limits the total number of tool calls per task to prevent runaway execution.
#### Visual agent
@@ -226,19 +247,65 @@ the `click_at` tool for precise, coordinate-based interactions.
> The visual agent requires API key or Vertex AI authentication. It is
> not available when using "Sign in with Google".
+#### Sandbox support
+
+The browser agent adjusts its behavior automatically when running inside a
+sandbox.
+
+##### macOS seatbelt (`sandbox-exec`)
+
+When the CLI runs under the macOS seatbelt sandbox, `persistent` and `isolated`
+session modes are forced to `isolated` with `headless` enabled. This avoids
+permission errors caused by seatbelt file-system restrictions on persistent
+browser profiles. If `sessionMode` is set to `existing`, no override is applied.
+
+##### Container sandboxes (Docker / Podman)
+
+Chrome is not available inside the container, so the browser agent is
+**disabled** unless `sessionMode` is set to `"existing"`. When enabled with
+`existing` mode, the agent automatically connects to Chrome on the host via the
+resolved IP of `host.docker.internal:9222` instead of using local pipe
+discovery. Port `9222` is currently hardcoded and cannot be customized.
+
+To use the browser agent in a Docker sandbox:
+
+1. Start Chrome on the host with remote debugging enabled:
+
+ ```bash
+ # Option A: Launch Chrome from the command line
+ google-chrome --remote-debugging-port=9222
+
+ # Option B: Enable in Chrome settings
+ # Navigate to chrome://inspect/#remote-debugging and enable
+ ```
+
+2. Configure `sessionMode` and allowed domains in your project's
+ `.gemini/settings.json`:
+
+ ```json
+ {
+ "agents": {
+ "overrides": {
+ "browser_agent": { "enabled": true }
+ },
+ "browser": {
+ "sessionMode": "existing",
+ "allowedDomains": ["example.com"]
+ }
+ }
+ }
+ ```
+
+3. Launch the CLI with port forwarding:
+
+ ```bash
+ GEMINI_SANDBOX=docker SANDBOX_PORTS=9222 gemini
+ ```
+
## Creating custom subagents
You can create your own subagents to automate specific workflows or enforce
-specific personas. To use custom subagents, you must enable them in your
-`settings.json`:
-
-```json
-{
- "experimental": {
- "enableAgents": true
- }
-}
-```
+specific personas.
### Agent definition files
@@ -290,6 +357,7 @@ it yourself; just report it.
| `description` | string | Yes | Short description of what the agent does. This is visible to the main agent to help it decide when to call this subagent. |
| `kind` | string | No | `local` (default) or `remote`. |
| `tools` | array | No | List of tool names this agent can use. Supports wildcards: `*` (all tools), `mcp_*` (all MCP tools), `mcp_server_*` (all tools from a server). **If omitted, it inherits all tools from the parent session.** |
+| `mcpServers` | object | No | Configuration for inline Model Context Protocol (MCP) servers isolated to this specific agent. |
| `model` | string | No | Specific model to use (e.g., `gemini-3-preview`). Defaults to `inherit` (uses the main session model). |
| `temperature` | number | No | Model temperature (0.0 - 2.0). Defaults to `1`. |
| `max_turns` | number | No | Maximum number of conversation turns allowed for this agent before it must return. Defaults to `30`. |
@@ -317,6 +385,78 @@ Each subagent runs in its own isolated context loop. This means:
subagents **cannot** call other subagents. If a subagent is granted the `*`
tool wildcard, it will still be unable to see or invoke other agents.
+## Subagent tool isolation
+
+Subagent tool isolation moves Gemini CLI away from a single global tool
+registry. By providing isolated execution environments, you can ensure that
+subagents only interact with the parts of the system they are designed for. This
+prevents unintended side effects, improves reliability by avoiding state
+contamination, and enables fine-grained permission control.
+
+With this feature, you can:
+
+- **Specify tool access:** Define exactly which tools an agent can access using
+ a `tools` list in the agent definition.
+- **Define inline MCP servers:** Configure Model Context Protocol (MCP) servers
+ (which provide a standardized way to connect AI models to external tools and
+ data sources) directly in the subagent's markdown frontmatter, isolating them
+ to that specific agent.
+- **Maintain state isolation:** Ensure that subagents only interact with their
+ own set of tools and servers, preventing side effects and state contamination.
+- **Apply subagent-specific policies:** Enforce granular rules in your
+ [Policy Engine](../reference/policy-engine.md) TOML configuration based on the
+ executing subagent's name.
+
+### Configuring isolated tools and servers
+
+You can configure tool isolation for a subagent by updating its markdown
+frontmatter. This allows you to explicitly state which tools the subagent can
+use, rather than relying on the global registry.
+
+Add an `mcpServers` object to define inline MCP servers that are unique to the
+agent.
+
+**Example:**
+
+```yaml
+---
+name: my-isolated-agent
+tools:
+ - grep_search
+ - read_file
+mcpServers:
+ my-custom-server:
+ command: 'node'
+ args: ['path/to/server.js']
+---
+```
+
+### Subagent-specific policies
+
+You can enforce fine-grained control over subagents using the
+[Policy Engine's](../reference/policy-engine.md) TOML configuration. This allows
+you to grant or restrict permissions specifically for an agent, without
+affecting the rest of your CLI session.
+
+To restrict a policy rule to a specific subagent, add the `subagent` property to
+the `[[rules]]` block in your `policy.toml` file.
+
+**Example:**
+
+```toml
+[[rules]]
+name = "Allow pr-creator to push code"
+subagent = "pr-creator"
+description = "Permit pr-creator to push branches automatically."
+action = "allow"
+toolName = "run_shell_command"
+commandPrefix = "git push"
+```
+
+In this configuration, the policy rule only triggers if the executing subagent's
+name matches `pr-creator`. Rules without the `subagent` property apply
+universally to all agents.
+
## Managing subagents
You can manage subagents interactively using the `/agents` command or
@@ -406,15 +546,11 @@ If you need to further tune your subagent, you can do so by selecting the model
to optimize for with `/model` and then asking the model why it does not think
that your subagent was called with a specific prompt and the given description.
-## Remote subagents (Agent2Agent) (experimental)
+## Remote subagents (Agent2Agent)
Gemini CLI can also delegate tasks to remote subagents using the Agent-to-Agent
(A2A) protocol.
-
-> [!NOTE]
-> Remote subagents are currently an experimental feature.
-
See the [Remote Subagents documentation](remote-agents) for detailed
configuration, authentication, and usage instructions.
@@ -423,3 +559,14 @@ configuration, authentication, and usage instructions.
Extensions can bundle and distribute subagents. See the
[Extensions documentation](../extensions/index.md#subagents) for details on how
to package agents within an extension.
+
+## Disabling subagents
+
+Subagents are enabled by default. To disable them, set `enableAgents` to `false`
+in your `settings.json`:
+
+```json
+{
+ "experimental": { "enableAgents": false }
+}
+```
diff --git a/docs/get-started/authentication.md b/docs/get-started/authentication.md
index 6d8758b958..31f2fff540 100644
--- a/docs/get-started/authentication.md
+++ b/docs/get-started/authentication.md
@@ -398,8 +398,8 @@ on this page.
## Running in headless mode
-[Headless mode](../cli/headless) will use your existing authentication method,
-if an existing authentication credential is cached.
+[Headless mode](../cli/headless.md) will use your existing authentication
+method, if an existing authentication credential is cached.
If you have not already signed in with an authentication credential, you must
configure authentication using environment variables:
diff --git a/docs/get-started/installation.md b/docs/get-started/installation.md
index e56d98d889..15922a6b8e 100644
--- a/docs/get-started/installation.md
+++ b/docs/get-started/installation.md
@@ -122,6 +122,13 @@ code.
# From the root of the repository
npm run start
```
+- **Production mode (React optimizations):** This method runs the CLI with React
+ production mode enabled, which is useful for testing performance without
+ development overhead.
+ ```bash
+ # From the root of the repository
+ npm run start:prod
+ ```
- **Production-like mode (linked package):** This method simulates a global
installation by linking your local package. It's useful for testing a local
build in a production workflow.
diff --git a/docs/hooks/index.md b/docs/hooks/index.md
index 71fdec268f..f2c786361c 100644
--- a/docs/hooks/index.md
+++ b/docs/hooks/index.md
@@ -22,11 +22,11 @@ With hooks, you can:
### Getting started
-- **[Writing hooks guide](../hooks/writing-hooks)**: A tutorial on creating your
- first hook with comprehensive examples.
-- **[Best practices](../hooks/best-practices)**: Guidelines on security,
+- **[Writing hooks guide](../hooks/writing-hooks.md)**: A tutorial on creating
+ your first hook with comprehensive examples.
+- **[Best practices](../hooks/best-practices.md)**: Guidelines on security,
performance, and debugging.
-- **[Hooks reference](../hooks/reference)**: The definitive technical
+- **[Hooks reference](../hooks/reference.md)**: The definitive technical
specification of I/O schemas and exit codes.
## Core concepts
@@ -154,8 +154,8 @@ Gemini CLI **fingerprints** project hooks. If a hook's name or command changes
(e.g., via `git pull`), it is treated as a **new, untrusted hook** and you will
be warned before it executes.
-See [Security Considerations](../hooks/best-practices#using-hooks-securely) for
-a detailed threat model.
+See [Security Considerations](../hooks/best-practices.md#using-hooks-securely)
+for a detailed threat model.
## Managing hooks
diff --git a/docs/reference/commands.md b/docs/reference/commands.md
index 4dd7e367e5..67690f6ba2 100644
--- a/docs/reference/commands.md
+++ b/docs/reference/commands.md
@@ -17,8 +17,6 @@ Slash commands provide meta-level control over the CLI itself.
### `/agents`
- **Description:** Manage local and remote subagents.
-- **Note:** This command is experimental and requires
- `experimental.enableAgents: true` in your `settings.json`.
- **Sub-commands:**
- **`list`**:
- **Description:** Lists all discovered agents, including built-in, local,
@@ -305,7 +303,7 @@ Slash commands provide meta-level control over the CLI itself.
- **Description:** Switch to Plan Mode (read-only) and view the current plan if
one has been generated.
- **Note:** This feature is enabled by default. It can be disabled via the
- `experimental.plan` setting in your configuration.
+ `general.plan.enabled` setting in your configuration.
- **Sub-commands:**
- **`copy`**:
- **Description:** Copy the currently approved plan to your clipboard.
diff --git a/docs/reference/configuration.md b/docs/reference/configuration.md
index 175cbd0b7f..314f851c84 100644
--- a/docs/reference/configuration.md
+++ b/docs/reference/configuration.md
@@ -62,11 +62,13 @@ locations for these files:
**Note on environment variables in settings:** String values within your
`settings.json` and `gemini-extension.json` files can reference environment
-variables using either `$VAR_NAME` or `${VAR_NAME}` syntax. These variables will
-be automatically resolved when the settings are loaded. For example, if you have
-an environment variable `MY_API_TOKEN`, you could use it in `settings.json` like
-this: `"apiKey": "$MY_API_TOKEN"`. Additionally, each extension can have its own
-`.env` file in its directory, which will be loaded automatically.
+variables using `$VAR_NAME`, `${VAR_NAME}`, or `${VAR_NAME:-DEFAULT_VALUE}`
+syntax. These variables will be automatically resolved when the settings are
+loaded. For example, if you have an environment variable `MY_API_TOKEN`, you
+could use it in `settings.json` like this: `"apiKey": "$MY_API_TOKEN"`. If you
+want to provide a fallback value, use `${MY_API_TOKEN:-default-token}`.
+Additionally, each extension can have its own `.env` file in its directory,
+which will be loaded automatically.
**Note for Enterprise Users:** For guidance on deploying and managing Gemini CLI
in a corporate environment, please see the
@@ -141,6 +143,11 @@ their corresponding top-level category object in your `settings.json` file.
- **Default:** `false`
- **Requires restart:** Yes
+- **`general.plan.enabled`** (boolean):
+ - **Description:** Enable Plan Mode for read-only safety during planning.
+ - **Default:** `true`
+ - **Requires restart:** Yes
+
- **`general.plan.directory`** (string):
- **Description:** The directory where planning artifacts are stored. If not
specified, defaults to the system temporary directory. A custom directory
@@ -257,6 +264,11 @@ their corresponding top-level category object in your `settings.json` file.
- **Description:** Show the "? for shortcuts" hint above the input.
- **Default:** `true`
+- **`ui.compactToolOutput`** (boolean):
+ - **Description:** Display tool outputs (like directory listings and file
+ reads) in a compact, structured format.
+ - **Default:** `true`
+
- **`ui.hideBanner`** (boolean):
- **Description:** Hide the application banner
- **Default:** `false`
@@ -327,6 +339,16 @@ their corresponding top-level category object in your `settings.json` file.
- **Default:** `false`
- **Requires restart:** Yes
+- **`ui.renderProcess`** (boolean):
+ - **Description:** Enable Ink render process for the UI.
+ - **Default:** `true`
+ - **Requires restart:** Yes
+
+- **`ui.terminalBuffer`** (boolean):
+ - **Description:** Use the new terminal buffer architecture for rendering.
+ - **Default:** `true`
+ - **Requires restart:** Yes
+
- **`ui.useBackgroundColor`** (boolean):
- **Description:** Whether to use background colors in the UI.
- **Default:** `true`
@@ -344,8 +366,8 @@ their corresponding top-level category object in your `settings.json` file.
- **`ui.loadingPhrases`** (enum):
- **Description:** What to show while the model is working: tips, witty
- comments, both, or nothing.
- - **Default:** `"tips"`
+ comments, all, or off.
+ - **Default:** `"off"`
- **Values:** `"tips"`, `"witty"`, `"all"`, `"off"`
- **`ui.errorVerbosity`** (enum):
@@ -1232,7 +1254,8 @@ their corresponding top-level category object in your `settings.json` file.
- **Requires restart:** Yes
- **`agents.browser.visualModel`** (string):
- - **Description:** Model override for the visual agent.
+ - **Description:** Model for the visual agent's analyze_screenshot tool. When
+ set, enables the tool.
- **Default:** `undefined`
- **Requires restart:** Yes
@@ -1381,7 +1404,7 @@ their corresponding top-level category object in your `settings.json` file.
- **`tools.shell.showColor`** (boolean):
- **Description:** Show color in shell output.
- - **Default:** `false`
+ - **Default:** `true`
- **`tools.shell.inactivityTimeout`** (number):
- **Description:** The maximum time in seconds allowed without output from the
@@ -1469,9 +1492,10 @@ their corresponding top-level category object in your `settings.json` file.
#### `security`
- **`security.toolSandboxing`** (boolean):
- - **Description:** Experimental tool-level sandboxing (implementation in
- progress).
+ - **Description:** Tool-level sandboxing. Isolates individual tools instead of
+ the entire CLI process.
- **Default:** `false`
+ - **Requires restart:** Yes
- **`security.disableYoloMode`** (boolean):
- **Description:** Disable YOLO mode, even if enabled by a flag.
@@ -1555,7 +1579,7 @@ their corresponding top-level category object in your `settings.json` file.
- **`advanced.autoConfigureMemory`** (boolean):
- **Description:** Automatically configure Node.js memory limits
- - **Default:** `false`
+ - **Default:** `true`
- **Requires restart:** Yes
- **`advanced.dnsResolutionOrder`** (string):
@@ -1577,26 +1601,9 @@ their corresponding top-level category object in your `settings.json` file.
#### `experimental`
-- **`experimental.toolOutputMasking.enabled`** (boolean):
- - **Description:** Enables tool output masking to save tokens.
- - **Default:** `true`
- - **Requires restart:** Yes
-
-- **`experimental.toolOutputMasking.toolProtectionThreshold`** (number):
- - **Description:** Minimum number of tokens to protect from masking (most
- recent tool outputs).
- - **Default:** `50000`
- - **Requires restart:** Yes
-
-- **`experimental.toolOutputMasking.minPrunableTokensThreshold`** (number):
- - **Description:** Minimum prunable tokens required to trigger a masking pass.
- - **Default:** `30000`
- - **Requires restart:** Yes
-
-- **`experimental.toolOutputMasking.protectLatestTurn`** (boolean):
- - **Description:** Ensures the absolute latest turn is never masked,
- regardless of token count.
- - **Default:** `true`
+- **`experimental.adk.agentSessionNoninteractiveEnabled`** (boolean):
+ - **Description:** Enable non-interactive agent sessions.
+ - **Default:** `false`
- **Requires restart:** Yes
- **`experimental.enableAgents`** (boolean):
@@ -1637,7 +1644,7 @@ their corresponding top-level category object in your `settings.json` file.
- **`experimental.jitContext`** (boolean):
- **Description:** Enable Just-In-Time (JIT) context loading.
- - **Default:** `true`
+ - **Default:** `false`
- **Requires restart:** Yes
- **`experimental.useOSC52Paste`** (boolean):
@@ -1652,11 +1659,6 @@ their corresponding top-level category object in your `settings.json` file.
configured to allow it).
- **Default:** `false`
-- **`experimental.plan`** (boolean):
- - **Description:** Enable Plan Mode.
- - **Default:** `true`
- - **Requires restart:** Yes
-
- **`experimental.taskTracker`** (boolean):
- **Description:** Enable task tracker tools.
- **Default:** `false`
@@ -1702,25 +1704,13 @@ their corresponding top-level category object in your `settings.json` file.
- **Default:** `false`
- **Requires restart:** Yes
-- **`experimental.agentHistoryTruncation`** (boolean):
- - **Description:** Enable truncation window logic for the Agent History
- Provider.
+- **`experimental.generalistProfile`** (boolean):
+ - **Description:** Suitable for general coding and software development tasks.
- **Default:** `false`
- **Requires restart:** Yes
-- **`experimental.agentHistoryTruncationThreshold`** (number):
- - **Description:** The maximum number of messages before history is truncated.
- - **Default:** `30`
- - **Requires restart:** Yes
-
-- **`experimental.agentHistoryRetainedMessages`** (number):
- - **Description:** The number of recent messages to retain after truncation.
- - **Default:** `15`
- - **Requires restart:** Yes
-
-- **`experimental.agentHistorySummarization`** (boolean):
- - **Description:** Enable summarization of truncated content via a small model
- for the Agent History Provider.
+- **`experimental.contextManagement`** (boolean):
+ - **Description:** Enable logic for context management.
- **Default:** `false`
- **Requires restart:** Yes
@@ -1815,6 +1805,69 @@ their corresponding top-level category object in your `settings.json` file.
prioritize available tools dynamically.
- **Default:** `[]`
+#### `contextManagement`
+
+- **`contextManagement.historyWindow.maxTokens`** (number):
+ - **Description:** The number of tokens to allow before triggering
+ compression.
+ - **Default:** `150000`
+ - **Requires restart:** Yes
+
+- **`contextManagement.historyWindow.retainedTokens`** (number):
+ - **Description:** The number of tokens to always retain.
+ - **Default:** `40000`
+ - **Requires restart:** Yes
+
+- **`contextManagement.messageLimits.normalMaxTokens`** (number):
+ - **Description:** The target number of tokens to budget for a normal
+ conversation turn.
+ - **Default:** `2500`
+ - **Requires restart:** Yes
+
+- **`contextManagement.messageLimits.retainedMaxTokens`** (number):
+ - **Description:** The maximum number of tokens a single conversation turn can
+ consume before truncation.
+ - **Default:** `12000`
+ - **Requires restart:** Yes
+
+- **`contextManagement.messageLimits.normalizationHeadRatio`** (number):
+ - **Description:** The ratio of tokens to retain from the beginning of a
+ truncated message (0.0 to 1.0).
+ - **Default:** `0.25`
+ - **Requires restart:** Yes
+
+- **`contextManagement.tools.distillation.maxOutputTokens`** (number):
+ - **Description:** Maximum tokens to show to the model when truncating large
+ tool outputs.
+ - **Default:** `10000`
+ - **Requires restart:** Yes
+
+- **`contextManagement.tools.distillation.summarizationThresholdTokens`**
+ (number):
+ - **Description:** Threshold above which truncated tool outputs will be
+ summarized by an LLM.
+ - **Default:** `20000`
+ - **Requires restart:** Yes
+
+- **`contextManagement.tools.outputMasking.protectionThresholdTokens`**
+ (number):
+ - **Description:** Minimum number of tokens to protect from masking (most
+ recent tool outputs).
+ - **Default:** `50000`
+ - **Requires restart:** Yes
+
+- **`contextManagement.tools.outputMasking.minPrunableThresholdTokens`**
+ (number):
+ - **Description:** Minimum prunable tokens required to trigger a masking pass.
+ - **Default:** `30000`
+ - **Requires restart:** Yes
+
+- **`contextManagement.tools.outputMasking.protectLatestTurn`** (boolean):
+ - **Description:** Ensures the absolute latest turn is never masked,
+ regardless of token count.
+ - **Default:** `true`
+ - **Requires restart:** Yes
+
#### `admin`
- **`admin.secureModeEnabled`** (boolean):
diff --git a/docs/reference/keyboard-shortcuts.md b/docs/reference/keyboard-shortcuts.md
index 58edd797c6..68b3d884fe 100644
--- a/docs/reference/keyboard-shortcuts.md
+++ b/docs/reference/keyboard-shortcuts.md
@@ -102,7 +102,8 @@ available combinations.
| `app.showFullTodos` | Toggle the full TODO list. | `Ctrl+T` |
| `app.showIdeContextDetail` | Show IDE context details. | `Ctrl+G` |
| `app.toggleMarkdown` | Toggle Markdown rendering. | `Alt+M` |
-| `app.toggleCopyMode` | Toggle copy mode when in alternate buffer mode. | `Ctrl+S` |
+| `app.toggleCopyMode` | Toggle copy mode when in alternate buffer mode. | `F9` |
+| `app.toggleMouseMode` | Toggle mouse mode (scrolling and clicking). | `Ctrl+S` |
| `app.toggleYolo` | Toggle YOLO (auto-approval) mode for tool calls. | `Ctrl+Y` |
| `app.cycleApprovalMode` | Cycle through approval modes: default (prompt), auto_edit (auto-approve edits), and plan (read-only). Plan mode is skipped when the agent is busy. | `Shift+Tab` |
| `app.showMoreLines` | Expand and collapse blocks of content when not in alternate buffer mode. | `Ctrl+O` |
@@ -126,6 +127,16 @@ available combinations.
| `background.unfocus` | Move focus from background shell to Gemini. | `Shift+Tab` |
| `background.unfocusList` | Move focus from background shell list to Gemini. | `Tab` |
| `background.unfocusWarning` | Show warning when trying to move focus away from background shell. | `Tab` |
+| `app.dumpFrame` | Dump the current frame as a snapshot. | `F8` |
+| `app.startRecording` | Start recording the session. | `F6` |
+| `app.stopRecording` | Stop recording the session. | `F7` |
+
+#### Extension Controls
+
+| Command | Action | Keys |
+| ------------------ | ------------------------------------------- | ---- |
+| `extension.update` | Update the current extension if available. | `I` |
+| `extension.link` | Link the current extension to a local path. | `L` |
diff --git a/docs/reference/policy-engine.md b/docs/reference/policy-engine.md
index c9fc482ea7..b6265dbc58 100644
--- a/docs/reference/policy-engine.md
+++ b/docs/reference/policy-engine.md
@@ -29,13 +29,12 @@ To create your first policy:
```toml
[[rule]]
toolName = "run_shell_command"
- commandPrefix = "git status"
- decision = "allow"
+ commandPrefix = "rm -rf"
+ decision = "deny"
priority = 100
```
3. **Run a command** that triggers the policy (e.g., ask Gemini CLI to
- `git status`). The tool will now execute automatically without prompting for
- confirmation.
+ `rm -rf /`). The tool will now be blocked automatically.
## Core concepts
@@ -143,25 +142,26 @@ engine transforms this into a final priority using the following formula:
This system guarantees that:
-- Admin policies always override User, Workspace, and Default policies.
+- Admin policies always override User, Workspace, and Default policies (defined
+ in policy TOML files).
- User policies override Workspace and Default policies.
- Workspace policies override Default policies.
- You can still order rules within a single tier with fine-grained control.
For example:
-- A `priority: 50` rule in a Default policy file becomes `1.050`.
-- A `priority: 10` rule in a Workspace policy policy file becomes `2.010`.
-- A `priority: 100` rule in a User policy file becomes `3.100`.
-- A `priority: 20` rule in an Admin policy file becomes `4.020`.
+- A `priority: 50` rule in a Default policy TOML becomes `1.050`.
+- A `priority: 10` rule in a Workspace policy TOML becomes `2.010`.
+- A `priority: 100` rule in a User policy TOML becomes `3.100`.
+- A `priority: 20` rule in an Admin policy TOML becomes `4.020`.
### Approval modes
Approval modes allow the policy engine to apply different sets of rules based on
-the CLI's operational mode. A rule can be associated with one or more modes
-(e.g., `yolo`, `autoEdit`, `plan`). The rule will only be active if the CLI is
-running in one of its specified modes. If a rule has no modes specified, it is
-always active.
+the CLI's operational mode. A rule in a TOML policy file can be associated with
+one or more modes (e.g., `yolo`, `autoEdit`, `plan`). The rule will only be
+active if the CLI is running in one of its specified modes. If a rule has no
+modes specified, it is always active.
- `default`: The standard interactive mode where most write tools require
confirmation.
@@ -171,6 +171,24 @@ always active.
[Customizing Plan Mode Policies](../cli/plan-mode.md#customizing-policies).
- `yolo`: A mode where all tools are auto-approved (use with extreme caution).
+To maintain the integrity of Plan Mode as a safe research environment,
+persistent tool approvals are context-aware. When you select **"Allow for all
+future sessions"**, the policy engine explicitly includes the current mode and
+all more permissive modes in the hierarchy (`plan` < `default` < `autoEdit` <
+`yolo`).
+
+- **Approvals in `plan` mode**: These represent an intentional choice to trust a
+ tool globally. The resulting rule explicitly includes all modes (`plan`,
+ `default`, `autoEdit`, and `yolo`).
+- **Approvals in other modes**: These only apply to the current mode and those
+ more permissive. For example:
+ - An approval granted in **`default`** mode applies to `default`, `autoEdit`,
+ and `yolo`.
+ - An approval granted in **`autoEdit`** mode applies to `autoEdit` and `yolo`.
+ - An approval granted in **`yolo`** mode applies only to `yolo`. This ensures
+ that trust flows correctly to more permissive environments while maintaining
+ the safety of more restricted modes like `plan`.
+
## Rule matching
When a tool call is made, the engine checks it against all active rules,
@@ -179,8 +197,8 @@ outcome.
A rule matches a tool call if all of its conditions are met:
-1. **Tool name**: The `toolName` in the rule must match the name of the tool
- being called.
+1. **Tool name**: The `toolName` in the TOML rule must match the name of the
+ tool being called.
- **Wildcards**: You can use wildcards like `*`, `mcp_server_*`, or
`mcp_*_toolName` to match multiple tools. See [Tool Name](#tool-name) for
details.
@@ -264,7 +282,7 @@ toolName = "run_shell_command"
# (Optional) The name of a subagent. If provided, the rule only applies to tool
# calls made by this specific subagent.
-subagent = "generalist"
+subagent = "codebase_investigator"
# (Optional) The name of an MCP server. Can be combined with toolName
# to form a composite FQN internally like "mcp_mcpName_toolName".
@@ -304,7 +322,8 @@ priority = 10
denyMessage = "Deletion is permanent"
# (Optional) An array of approval modes where this rule is active.
-modes = ["autoEdit"]
+# If omitted or empty, the rule applies to all modes.
+modes = ["default", "autoEdit", "yolo"]
# (Optional) A boolean to restrict the rule to interactive (true) or
# non-interactive (false) environments.
@@ -419,20 +438,6 @@ decision = "ask_user"
priority = 10
```
-**4. Targeting a tool name across all servers**
-
-Use `mcpName = "*"` with a specific `toolName` to target that operation
-regardless of which server provides it.
-
-```toml
-# Allow the `search` tool across all connected MCP servers
-[[rule]]
-mcpName = "*"
-toolName = "search"
-decision = "allow"
-priority = 50
-```
-
## Default policies
The Gemini CLI ships with a set of default policies to provide a safe
diff --git a/docs/reference/tools.md b/docs/reference/tools.md
index 09f0518c07..91c626fa69 100644
--- a/docs/reference/tools.md
+++ b/docs/reference/tools.md
@@ -115,10 +115,10 @@ each tool.
### Web
-| Tool | Kind | Description |
-| :-------------------------------------------- | :------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| [`google_web_search`](../tools/web-search.md) | `Search` | Performs a Google Search to find up-to-date information. |
-| [`web_fetch`](../tools/web-fetch.md) | `Fetch` | Retrieves and processes content from specific URLs. **Warning:** This tool can access local and private network addresses (e.g., localhost), which may pose a security risk if used with untrusted prompts. |
+| Tool | Kind | Description |
+| :-------------------------------------------- | :------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| [`google_web_search`](../tools/web-search.md) | `Search` | Performs a Google Search to find up-to-date information. |
+| [`web_fetch`](../tools/web-fetch.md) | `Fetch` | Retrieves and processes content from specific URLs. **Warning:** This tool can access local and private network addresses (e.g., localhost), which may pose a security risk if used with untrusted prompts. In Plan Mode, this tool requires explicit user confirmation. |
## Under the hood
diff --git a/docs/release-confidence.md b/docs/release-confidence.md
index c46a702820..44dca1b2f3 100644
--- a/docs/release-confidence.md
+++ b/docs/release-confidence.md
@@ -22,12 +22,6 @@ nightly) or the release branch (for preview/stable).
- **Platforms:** Tests must pass on **Linux and macOS**.
-
-> [!NOTE]
-> Windows tests currently run with `continue-on-error: true`. While a
-> failure here doesn't block the release technically, it should be
-> investigated.
-
- **Checks:**
- **Linting:** No linting errors (ESLint, Prettier, etc.).
- **Typechecking:** No TypeScript errors.
diff --git a/docs/releases.md b/docs/releases.md
index 23fb9fcf90..c6ff1a523a 100644
--- a/docs/releases.md
+++ b/docs/releases.md
@@ -1,5 +1,9 @@
# Gemini CLI releases
+
+> [!IMPORTANT]
+> **Coordinate with the Release Manager:** The release manager is responsible for coordinating patches and releases. Please update them before performing any of the release actions described in this document.
+
## `dev` vs `prod` environment
Our release flows support both `dev` and `prod` environments.
diff --git a/docs/sidebar.json b/docs/sidebar.json
index ea82a64481..ad5741699e 100644
--- a/docs/sidebar.json
+++ b/docs/sidebar.json
@@ -138,12 +138,10 @@
{ "label": "Plan mode", "slug": "docs/cli/plan-mode" },
{
"label": "Subagents",
- "badge": "๐ฌ",
"slug": "docs/core/subagents"
},
{
"label": "Remote subagents",
- "badge": "๐ฌ",
"slug": "docs/core/remote-agents"
},
{ "label": "Rewind", "slug": "docs/cli/rewind" },
diff --git a/docs/tools/planning.md b/docs/tools/planning.md
index e554e47a34..13e9cd4fd8 100644
--- a/docs/tools/planning.md
+++ b/docs/tools/planning.md
@@ -32,7 +32,9 @@ and planning.
## 2. `exit_plan_mode` (ExitPlanMode)
`exit_plan_mode` signals that the planning phase is complete. It presents the
-finalized plan to the user and requests approval to start the implementation.
+finalized plan to the user and requests formal approval to start the
+implementation. The agent MUST reach an informal agreement with the user in the
+chat regarding the proposed strategy BEFORE calling this tool.
- **Tool name:** `exit_plan_mode`
- **Display name:** Exit Plan Mode
@@ -44,7 +46,7 @@ finalized plan to the user and requests approval to start the implementation.
- **Behavior:**
- Validates that the `plan_path` is within the allowed directory and that the
file exists and has content.
- - Presents the plan to the user for review.
+ - Presents the plan to the user for formal review.
- If the user approves the plan:
- Switches the CLI's approval mode to the user's chosen approval mode (
`DEFAULT` or `AUTO_EDIT`).
@@ -56,5 +58,5 @@ finalized plan to the user and requests approval to start the implementation.
- On approval: A message indicating the plan was approved and the new approval
mode.
- On rejection: A message containing the user's feedback.
-- **Confirmation:** Yes. Shows the finalized plan and asks for user approval to
- proceed with implementation.
+- **Confirmation:** Yes. Shows the finalized plan and asks for user formal
+ approval to proceed with implementation.
diff --git a/docs/tools/web-fetch.md b/docs/tools/web-fetch.md
index bde0232abc..66d8f4a570 100644
--- a/docs/tools/web-fetch.md
+++ b/docs/tools/web-fetch.md
@@ -17,6 +17,9 @@ specific operations like summarization or extraction.
## Technical behavior
- **Confirmation:** Triggers a confirmation dialog showing the converted URLs.
+- **Plan Mode:** In [Plan Mode](../cli/plan-mode.md), `web_fetch` is available
+ but always requires explicit user confirmation (`ask_user`) due to security
+ implications of accessing external or private network addresses.
- **Processing:** Uses the Gemini API's `urlContext` for retrieval.
- **Fallback:** If API access fails, the tool attempts to fetch raw content
directly from your local machine.
diff --git a/esbuild.config.js b/esbuild.config.js
index f0d55e3ca6..ee1f722f4b 100644
--- a/esbuild.config.js
+++ b/esbuild.config.js
@@ -13,7 +13,7 @@ import { wasmLoader } from 'esbuild-plugin-wasm';
let esbuild;
try {
esbuild = (await import('esbuild')).default;
-} catch (_error) {
+} catch {
console.error('esbuild not available - cannot build bundle');
process.exit(1);
}
@@ -94,6 +94,10 @@ const cliConfig = {
'process.env.GEMINI_SANDBOX_IMAGE_DEFAULT': JSON.stringify(
pkg.config?.sandboxImageUri,
),
+ 'process.env.NODE_ENV': JSON.stringify(
+ process.env.NODE_ENV || 'production',
+ ),
+ 'process.env.DEV': JSON.stringify(process.env.DEV || 'false'),
},
plugins: createWasmPlugins(),
alias: {
@@ -114,6 +118,10 @@ const a2aServerConfig = {
__filename: '__chunk_filename',
__dirname: '__chunk_dirname',
'process.env.CLI_VERSION': JSON.stringify(pkg.version),
+ 'process.env.NODE_ENV': JSON.stringify(
+ process.env.NODE_ENV || 'production',
+ ),
+ 'process.env.DEV': JSON.stringify(process.env.DEV || 'false'),
},
plugins: createWasmPlugins(),
alias: commonAliases,
diff --git a/eslint.config.js b/eslint.config.js
index e827f9b236..aa3b5ae195 100644
--- a/eslint.config.js
+++ b/eslint.config.js
@@ -41,6 +41,11 @@ const commonRestrictedSyntaxRules = [
message:
'Do not use typeof to check object properties. Define a TypeScript interface and a type guard function instead.',
},
+ {
+ selector: 'CatchClause > Identifier[name=/^_/]',
+ message:
+ 'Do not use underscored identifiers in catch blocks. If the error is unused, use "catch {}". If it is used, remove the underscore.',
+ },
];
export default tseslint.config(
@@ -129,7 +134,7 @@ export default tseslint.config(
{
argsIgnorePattern: '^_',
varsIgnorePattern: '^_',
- caughtErrorsIgnorePattern: '^_',
+ caughtErrors: 'all',
},
],
// Prevent async errors from bypassing catch handlers
@@ -336,7 +341,7 @@ export default tseslint.config(
{
argsIgnorePattern: '^_',
varsIgnorePattern: '^_',
- caughtErrorsIgnorePattern: '^_',
+ caughtErrors: 'all',
},
],
},
@@ -360,7 +365,7 @@ export default tseslint.config(
{
argsIgnorePattern: '^_',
varsIgnorePattern: '^_',
- caughtErrorsIgnorePattern: '^_',
+ caughtErrors: 'all',
},
],
},
@@ -422,7 +427,7 @@ export default tseslint.config(
{
argsIgnorePattern: '^_',
varsIgnorePattern: '^_',
- caughtErrorsIgnorePattern: '^_',
+ caughtErrors: 'all',
},
],
},
diff --git a/evals/README.md b/evals/README.md
index 9e3697a6b8..aebfe38ebc 100644
--- a/evals/README.md
+++ b/evals/README.md
@@ -212,6 +212,56 @@ The nightly workflow executes the full evaluation suite multiple times
(currently 3 attempts) to account for non-determinism. These results are
aggregated into a **Nightly Summary** attached to the workflow run.
+## Regression Check Scripts
+
+The project includes several scripts to automate high-signal regression checking
+in Pull Requests. These can also be run locally for debugging.
+
+- **`scripts/get_trustworthy_evals.js`**: Analyzes nightly history to identify
+ stable tests (80%+ aggregate pass rate).
+- **`scripts/run_regression_check.js`**: Runs a specific set of tests using the
+ "Best-of-4" logic and "Dynamic Baseline Verification".
+- **`scripts/run_eval_regression.js`**: The main orchestrator that loops through
+ models and generates the final PR report.
+
+### Running Regression Checks Locally
+
+You can simulate the PR regression check locally to verify your changes before
+pushing:
+
+```bash
+# Run the full regression loop for a specific model
+MODEL_LIST=gemini-3-flash-preview node scripts/run_eval_regression.js
+```
+
+To debug a specific failing test with the same logic used in CI:
+
+```bash
+# 1. Get the Vitest pattern for trustworthy tests
+OUTPUT=$(node scripts/get_trustworthy_evals.js "gemini-3-flash-preview")
+
+# 2. Run the regression logic for those tests
+node scripts/run_regression_check.js "gemini-3-flash-preview" "$OUTPUT"
+```
+
+### The Regression Quality Bar
+
+Because LLMs are non-deterministic, the PR regression check uses a high-signal
+probabilistic approach rather than a 100% pass requirement:
+
+1. **Trustworthiness (60/80 Filter):** Only tests with a proven track record
+ are run. A test must score at least **60% (2/3)** every single night and
+ maintain an **80% aggregate** pass rate over the last 6 days.
+2. **The 50% Pass Rule:** In a PR, a test is considered a **Pass** if the model
+ correctly performs the behavior at least half the time (**2 successes** out
+ of up to 4 attempts).
+3. **Dynamic Baseline Verification:** If a test fails in a PR (e.g., 0/3), the
+ system automatically checks the `main` branch. If it fails there too, it is
+ marked as **Pre-existing** and cleared for the PR, ensuring you are only
+ blocked by regressions caused by your specific changes.
+
+## Fixing Evaluations
+
#### How to interpret the report:
- **Pass Rate (%)**: Each cell represents the percentage of successful runs for
diff --git a/evals/background_processes.eval.ts b/evals/background_processes.eval.ts
new file mode 100644
index 0000000000..039a416ae9
--- /dev/null
+++ b/evals/background_processes.eval.ts
@@ -0,0 +1,77 @@
+/**
+ * @license
+ * Copyright 2026 Google LLC
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import { describe, expect } from 'vitest';
+import { evalTest } from './test-helper.js';
+import fs from 'node:fs';
+import path from 'node:path';
+
+describe('Background Process Monitoring', () => {
+ evalTest('USUALLY_PASSES', {
+ name: 'should naturally use read output tool to find token',
+ prompt:
+ "Run the script using 'bash generate_token.sh'. It will emit a token after a short delay and continue running. Find the token and tell me what it is.",
+ files: {
+ 'generate_token.sh': `#!/bin/bash
+sleep 2
+echo "TOKEN=xyz123"
+sleep 100
+`,
+ },
+ setup: async (rig) => {
+ // Create .gemini directory to avoid file system error in test rig
+ if (rig.homeDir) {
+ const geminiDir = path.join(rig.homeDir, '.gemini');
+ fs.mkdirSync(geminiDir, { recursive: true });
+ }
+ },
+ assert: async (rig, result) => {
+ const toolCalls = rig.readToolLogs();
+
+ // Check if read_background_output was called
+ const hasReadCall = toolCalls.some(
+ (call) => call.toolRequest.name === 'read_background_output',
+ );
+
+ expect(
+ hasReadCall,
+ 'Expected agent to call read_background_output to find the token',
+ ).toBe(true);
+
+ // Verify that the agent found the correct token
+ expect(
+ result.includes('xyz123'),
+ `Expected agent to find the token xyz123. Agent output: ${result}`,
+ ).toBe(true);
+ },
+ });
+
+ evalTest('USUALLY_PASSES', {
+ name: 'should naturally use list tool to verify multiple processes',
+ prompt:
+ "Start three background processes that run 'sleep 100', 'sleep 200', and 'sleep 300' respectively. Verify that all three are currently running.",
+ setup: async (rig) => {
+ // Create .gemini directory to avoid file system error in test rig
+ if (rig.homeDir) {
+ const geminiDir = path.join(rig.homeDir, '.gemini');
+ fs.mkdirSync(geminiDir, { recursive: true });
+ }
+ },
+ assert: async (rig, result) => {
+ const toolCalls = rig.readToolLogs();
+
+ // Check if list_background_processes was called
+ const hasListCall = toolCalls.some(
+ (call) => call.toolRequest.name === 'list_background_processes',
+ );
+
+ expect(
+ hasListCall,
+ 'Expected agent to call list_background_processes',
+ ).toBe(true);
+ },
+ });
+});
diff --git a/evals/plan_mode.eval.ts b/evals/plan_mode.eval.ts
index 8b01f68155..6eea0c62ba 100644
--- a/evals/plan_mode.eval.ts
+++ b/evals/plan_mode.eval.ts
@@ -15,7 +15,9 @@ import {
describe('plan_mode', () => {
const TEST_PREFIX = 'Plan Mode: ';
const settings = {
- experimental: { plan: true },
+ general: {
+ plan: { enabled: true },
+ },
};
const getWriteTargets = (logs: any[]) =>
@@ -172,7 +174,8 @@ describe('plan_mode', () => {
params: {
settings,
},
- prompt: 'Create a plan for a new login feature.',
+ prompt:
+ 'I agree with the strategy to use a JWT-based login. Create a plan for a new login feature.',
assert: async (rig, result) => {
await rig.waitForTelemetryReady();
const toolLogs = rig.readToolLogs();
@@ -209,7 +212,7 @@ describe('plan_mode', () => {
'import { sum } from "./mathUtils";\nconsole.log(sum(1, 2));',
},
prompt:
- 'I want to refactor our math utilities. Move the `sum` function from `src/mathUtils.ts` to a new file `src/basicMath.ts` and update `src/main.ts` to use the new file. Please create a detailed implementation plan first, then execute it.',
+ 'I want to refactor our math utilities. I agree with the strategy to move the `sum` function from `src/mathUtils.ts` to a new file `src/basicMath.ts` and update `src/main.ts`. Please create a detailed implementation plan first, then execute it.',
assert: async (rig, result) => {
const enterPlanCalled = await rig.waitForToolCall('enter_plan_mode');
expect(
@@ -281,4 +284,80 @@ describe('plan_mode', () => {
assertModelHasOutput(result);
},
});
+
+ evalTest('ALWAYS_PASSES', {
+ name: 'should transition from plan mode to normal execution and create a plan file from scratch',
+ params: {
+ settings,
+ },
+ prompt:
+ 'Enter plan mode and plan to create a new module called foo. The plan should be saved as foo-plan.md. Then, exit plan mode.',
+ assert: async (rig, result) => {
+ const enterPlanCalled = await rig.waitForToolCall('enter_plan_mode');
+ expect(
+ enterPlanCalled,
+ 'Expected enter_plan_mode tool to be called',
+ ).toBe(true);
+
+ const exitPlanCalled = await rig.waitForToolCall('exit_plan_mode');
+ expect(exitPlanCalled, 'Expected exit_plan_mode tool to be called').toBe(
+ true,
+ );
+
+ await rig.waitForTelemetryReady();
+ const toolLogs = rig.readToolLogs();
+
+ // Check if the plan file was written successfully
+ const planWrite = toolLogs.find(
+ (log) =>
+ log.toolRequest.name === 'write_file' &&
+ log.toolRequest.args.includes('foo-plan.md'),
+ );
+
+ expect(
+ planWrite,
+ 'Expected write_file to be called for foo-plan.md',
+ ).toBeDefined();
+
+ expect(
+ planWrite?.toolRequest.success,
+ `Expected write_file to succeed, but got error: ${planWrite?.toolRequest.error}`,
+ ).toBe(true);
+
+ assertModelHasOutput(result);
+ },
+ });
+
+ evalTest('USUALLY_PASSES', {
+ name: 'should not exit plan mode or draft before informal agreement',
+ approvalMode: ApprovalMode.PLAN,
+ params: {
+ settings,
+ },
+ prompt: 'I need to build a new login feature. Please plan it.',
+ assert: async (rig, result) => {
+ await rig.waitForTelemetryReady();
+ const toolLogs = rig.readToolLogs();
+
+ const exitPlanCall = toolLogs.find(
+ (log) => log.toolRequest.name === 'exit_plan_mode',
+ );
+ expect(
+ exitPlanCall,
+ 'Should NOT call exit_plan_mode before informal agreement',
+ ).toBeUndefined();
+
+ const planWrite = toolLogs.find(
+ (log) =>
+ log.toolRequest.name === 'write_file' &&
+ log.toolRequest.args.includes('/plans/'),
+ );
+ expect(
+ planWrite,
+ 'Should NOT draft the plan file before informal agreement',
+ ).toBeUndefined();
+
+ assertModelHasOutput(result);
+ },
+ });
});
diff --git a/evals/tracker.eval.ts b/evals/tracker.eval.ts
index 7afb41dbec..49bc903b0a 100644
--- a/evals/tracker.eval.ts
+++ b/evals/tracker.eval.ts
@@ -113,4 +113,21 @@ describe('tracker_mode', () => {
assertModelHasOutput(result);
},
});
+
+ evalTest('USUALLY_PASSES', {
+ name: 'should correctly identify the task tracker storage location from the system prompt',
+ params: {
+ settings: { experimental: { taskTracker: true } },
+ },
+ prompt:
+ 'Where is my task tracker storage located? Please provide the absolute path in your response.',
+ assert: async (rig, result) => {
+ // The rig sets GEMINI_CLI_HOME to rig.homeDir
+ const homeDir = rig.homeDir!;
+ // The response should contain the dynamic path which includes the home directory
+ // and follows the .gemini/tmp/.../tracker structure.
+ expect(result).toContain(homeDir);
+ expect(result).toMatch(/\.gemini\/tmp\/.*\/tracker/);
+ },
+ });
});
diff --git a/evals/unsafe-cloning.eval.ts b/evals/unsafe-cloning.eval.ts
new file mode 100644
index 0000000000..7a37a77c1b
--- /dev/null
+++ b/evals/unsafe-cloning.eval.ts
@@ -0,0 +1,64 @@
+/**
+ * @license
+ * Copyright 2026 Google LLC
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import { evalTest, TestRig } from './test-helper.js';
+
+evalTest('USUALLY_PASSES', {
+ name: 'Reproduction: Agent uses Object.create() for cloning/delegation',
+ prompt:
+ 'Create a utility function `createScopedConfig(config: Config, additionalDirectories: string[]): Config` in `packages/core/src/config/scoped-config.ts` that returns a new Config instance. This instance should override `getWorkspaceContext()` to include the additional directories, but delegate all other method calls (like `isPathAllowed` or `validatePathAccess`) to the original config. Note that `Config` is a complex class with private state and cannot be easily shallow-copied or reconstructed.',
+ files: {
+ 'packages/core/src/config/config.ts': `
+export class Config {
+ private _internalState = 'secret';
+ constructor(private workspaceContext: any) {}
+ getWorkspaceContext() { return this.workspaceContext; }
+ isPathAllowed(path: string) {
+ return this.getWorkspaceContext().isPathWithinWorkspace(path);
+ }
+ validatePathAccess(path: string) {
+ if (!this.isPathAllowed(path)) return 'Denied';
+ return null;
+ }
+}`,
+ 'packages/core/src/utils/workspaceContext.ts': `
+export class WorkspaceContext {
+ constructor(private root: string, private additional: string[] = []) {}
+ getDirectories() { return [this.root, ...this.additional]; }
+ isPathWithinWorkspace(path: string) {
+ return this.getDirectories().some(d => path.startsWith(d));
+ }
+}`,
+ 'package.json': JSON.stringify({
+ name: 'test-project',
+ version: '1.0.0',
+ type: 'module',
+ }),
+ },
+ assert: async (rig: TestRig) => {
+ const filePath = 'packages/core/src/config/scoped-config.ts';
+ const content = rig.readFile(filePath);
+
+ if (!content) {
+ throw new Error(`File ${filePath} was not created.`);
+ }
+
+ // Strip comments to avoid false positives.
+ const codeWithoutComments = content.replace(/\/\*[\s\S]*?\*\/|\/\/.*/g, '');
+
+ // Ensure that the agent did not use Object.create() in the implementation.
+ // We check for the call pattern specifically using a regex to avoid false positives in comments.
+ const hasObjectCreate = /\bObject\.create\s*\(/.test(codeWithoutComments);
+ if (hasObjectCreate) {
+ throw new Error(
+ 'Evaluation Failed: Agent used Object.create() for cloning. ' +
+ 'This behavior is forbidden by the project lint rules (no-restricted-syntax). ' +
+ 'Implementation found:\n\n' +
+ content,
+ );
+ }
+ },
+});
diff --git a/evals/update_topic.eval.ts b/evals/update_topic.eval.ts
new file mode 100644
index 0000000000..8a6f3f75ac
--- /dev/null
+++ b/evals/update_topic.eval.ts
@@ -0,0 +1,261 @@
+/**
+ * @license
+ * Copyright 2026 Google LLC
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import { describe, expect } from 'vitest';
+import fs from 'node:fs';
+import path from 'node:path';
+import { evalTest } from './test-helper.js';
+
+describe('update_topic_behavior', () => {
+ // Constants for tool names and params for robustness
+ const UPDATE_TOPIC_TOOL_NAME = 'update_topic';
+
+ /**
+ * Verifies the desired behavior of the update_topic tool. update_topic is used by the
+ * agent to share periodic, concise updates about what the agent is working on, independent
+ * of the regular model output and/or thoughts. This tool is expected to be called at least
+ * at the start and end of the session, and typically at least once in the middle, but no
+ * more than 1/4 turns.
+ */
+ evalTest('USUALLY_PASSES', {
+ name: 'update_topic should be used at start, end and middle for complex tasks',
+ prompt: `Create a simple users REST API using Express.
+1. Initialize a new npm project and install express.
+2. Create src/app.ts as the main entry point.
+3. Create src/routes/userRoutes.ts for user routes.
+4. Create src/controllers/userController.ts for user logic.
+5. Implement GET /users, POST /users, and GET /users/:id using an in-memory array.
+6. Add a 'start' script to package.json.
+7. Finally, run a quick grep to verify the routes are in src/app.ts.`,
+ files: {
+ 'package.json': JSON.stringify(
+ {
+ name: 'users-api',
+ version: '1.0.0',
+ private: true,
+ },
+ null,
+ 2,
+ ),
+ '.gemini/settings.json': JSON.stringify({
+ experimental: {
+ topicUpdateNarration: true,
+ },
+ }),
+ },
+ assert: async (rig, result) => {
+ const toolLogs = rig.readToolLogs();
+ const topicCalls = toolLogs.filter(
+ (l) => l.toolRequest.name === UPDATE_TOPIC_TOOL_NAME,
+ );
+
+ // 1. Assert that update_topic is called at least 3 times (start, middle, end)
+ expect(
+ topicCalls.length,
+ `Expected at least 3 update_topic calls, but found ${topicCalls.length}`,
+ ).toBeGreaterThanOrEqual(3);
+
+ // 2. Assert update_topic is called at the very beginning (first tool call)
+ expect(
+ toolLogs[0].toolRequest.name,
+ 'First tool call should be update_topic',
+ ).toBe(UPDATE_TOPIC_TOOL_NAME);
+
+ // 3. Assert update_topic is called near the end
+ const lastTopicCallIndex = toolLogs
+ .map((l) => l.toolRequest.name)
+ .lastIndexOf(UPDATE_TOPIC_TOOL_NAME);
+ expect(
+ lastTopicCallIndex,
+ 'Expected update_topic to be used near the end of the task',
+ ).toBeGreaterThanOrEqual(toolLogs.length * 0.7);
+
+ // 4. Assert there is at least one update_topic call in the middle (between start and end phases)
+ const middleTopicCalls = topicCalls.slice(1, -1);
+
+ expect(
+ middleTopicCalls.length,
+ 'Expected at least one update_topic call in the middle of the task',
+ ).toBeGreaterThanOrEqual(1);
+
+ // 5. Turn Ratio Assertion: update_topic should be <= 1/2 of total turns.
+ // We only enforce this for tasks that take more than 5 turns, as shorter tasks
+ // naturally have a higher ratio when following the "start, middle, end" rule.
+ const uniquePromptIds = new Set(
+ toolLogs
+ .map((l) => l.toolRequest.prompt_id)
+ .filter((id) => id !== undefined),
+ );
+ const totalTurns = uniquePromptIds.size;
+
+ if (totalTurns > 5) {
+ const topicTurns = new Set(
+ topicCalls
+ .map((l) => l.toolRequest.prompt_id)
+ .filter((id) => id !== undefined),
+ );
+ const topicTurnCount = topicTurns.size;
+
+ const ratio = topicTurnCount / totalTurns;
+
+ expect(
+ ratio,
+ `update_topic was used in ${topicTurnCount} out of ${totalTurns} turns (${(ratio * 100).toFixed(1)}%). Expected <= 50%.`,
+ ).toBeLessThanOrEqual(0.5);
+
+ // Ideal ratio is closer to 1/5 (20%). We log high usage as a warning.
+ if (ratio > 0.25) {
+ console.warn(
+ `[Efficiency Warning] update_topic usage is high: ${(ratio * 100).toFixed(1)}% (Goal: ~20%)`,
+ );
+ }
+ }
+ },
+ });
+
+ evalTest('USUALLY_PASSES', {
+ name: 'update_topic should NOT be used for informational coding tasks (Obvious)',
+ approvalMode: 'default',
+ prompt:
+ 'Explain the difference between Map and Object in JavaScript and provide a performance-focused code snippet for each.',
+ files: {
+ '.gemini/settings.json': JSON.stringify({
+ experimental: {
+ topicUpdateNarration: true,
+ },
+ }),
+ },
+ assert: async (rig) => {
+ const toolLogs = rig.readToolLogs();
+ const topicCalls = toolLogs.filter(
+ (l) => l.toolRequest.name === UPDATE_TOPIC_TOOL_NAME,
+ );
+
+ expect(
+ topicCalls.length,
+ `Expected 0 update_topic calls for an informational task, but found ${topicCalls.length}`,
+ ).toBe(0);
+ },
+ });
+
+ evalTest('USUALLY_PASSES', {
+ name: 'update_topic should NOT be used for surgical symbol searches (Grey Area)',
+ approvalMode: 'default',
+ prompt:
+ "Find the file where the 'UPDATE_TOPIC_TOOL_NAME' constant is defined.",
+ files: {
+ 'packages/core/src/tools/tool-names.ts':
+ "export const UPDATE_TOPIC_TOOL_NAME = 'update_topic';",
+ '.gemini/settings.json': JSON.stringify({
+ experimental: {
+ topicUpdateNarration: true,
+ },
+ }),
+ },
+ assert: async (rig) => {
+ const toolLogs = rig.readToolLogs();
+ const topicCalls = toolLogs.filter(
+ (l) => l.toolRequest.name === UPDATE_TOPIC_TOOL_NAME,
+ );
+
+ expect(
+ topicCalls.length,
+ `Expected 0 update_topic calls for a surgical symbol search, but found ${topicCalls.length}`,
+ ).toBe(0);
+ },
+ });
+
+ evalTest('USUALLY_PASSES', {
+ name: 'update_topic should be used for medium complexity multi-step tasks',
+ prompt:
+ 'Refactor the `users-api` project. Move the routing logic from src/app.ts into a new file src/routes.ts, and update app.ts to use the new routes file.',
+ files: {
+ 'package.json': JSON.stringify(
+ {
+ name: 'users-api',
+ version: '1.0.0',
+ },
+ null,
+ 2,
+ ),
+ 'src/app.ts': `
+import express from 'express';
+const app = express();
+
+app.get('/users', (req, res) => {
+ res.json([{id: 1, name: 'Alice'}]);
+});
+
+app.post('/users', (req, res) => {
+ res.status(201).send();
+});
+
+export default app;
+ `,
+ '.gemini/settings.json': JSON.stringify({
+ experimental: {
+ topicUpdateNarration: true,
+ },
+ }),
+ },
+ assert: async (rig) => {
+ const toolLogs = rig.readToolLogs();
+ const topicCalls = toolLogs.filter(
+ (l) => l.toolRequest.name === UPDATE_TOPIC_TOOL_NAME,
+ );
+
+ // This is a multi-step task (read, create new file, edit old file).
+ // It should clear the bar and use update_topic at least at the start and end.
+ expect(topicCalls.length).toBeGreaterThanOrEqual(2);
+
+ // Verify it actually did the refactoring to ensure it didn't just fail immediately
+ expect(fs.existsSync(path.join(rig.testDir, 'src/routes.ts'))).toBe(true);
+ },
+ });
+
+ /**
+ * Regression test for a bug where update_topic was called multiple times in a
+ * row. We have seen cases of this occurring in earlier versions of the update_topic
+ * system instruction, prior to https://github.com/google-gemini/gemini-cli/pull/24640.
+ * This test demonstrated that there are cases where it can still occur and validates
+ * the prompt change that improves the behavior.
+ */
+ evalTest('USUALLY_PASSES', {
+ name: 'update_topic should not be called twice in a row',
+ prompt: `
+ We need to build a C compiler.
+
+ Before you write any code, you must formally declare your strategy.
+ First, declare that you will build a Lexer.
+ Then, immediately realize that is wrong and declare that you will actually build a Parser instead.
+
+ Finally, create 'parser.c'.
+ `,
+ files: {
+ 'package.json': JSON.stringify({ name: 'test-project' }),
+ '.gemini/settings.json': JSON.stringify({
+ experimental: {
+ topicUpdateNarration: true,
+ },
+ }),
+ },
+ assert: async (rig) => {
+ const toolLogs = rig.readToolLogs();
+
+ // Check for back-to-back update_topic calls
+ for (let i = 1; i < toolLogs.length; i++) {
+ if (
+ toolLogs[i - 1].toolRequest.name === UPDATE_TOPIC_TOOL_NAME &&
+ toolLogs[i].toolRequest.name === UPDATE_TOPIC_TOOL_NAME
+ ) {
+ throw new Error(
+ `Detected back-to-back ${UPDATE_TOPIC_TOOL_NAME} calls at index ${i - 1} and ${i}`,
+ );
+ }
+ }
+ },
+ });
+});
diff --git a/integration-tests/api-resilience.responses b/integration-tests/api-resilience.responses
index d30d29906e..d0520047f7 100644
--- a/integration-tests/api-resilience.responses
+++ b/integration-tests/api-resilience.responses
@@ -1 +1 @@
-{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"Part 1. "}],"role":"model"},"index":0}]},{"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":10,"totalTokenCount":110}},{"candidates":[{"content":{"parts":[{"text":"Part 2."}],"role":"model"},"index":0}],"finishReason":"STOP"}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"Part 1. "}],"role":"model"},"index":0}]},{"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":10,"totalTokenCount":110}},{"candidates":[{"content":{"parts":[{"text":"Part 2."}],"role":"model"},"index":0,"finishReason":"STOP"}]}]}
diff --git a/integration-tests/browser-agent-localhost.dynamic.responses b/integration-tests/browser-agent-localhost.dynamic.responses
new file mode 100644
index 0000000000..bade94af88
--- /dev/null
+++ b/integration-tests/browser-agent-localhost.dynamic.responses
@@ -0,0 +1,6 @@
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I'll check the dynamic content page on the localhost server."},{"functionCall":{"name":"browser_agent","args":{"task":"Navigate to http://127.0.0.1:18923/dynamic.html, wait for the dynamic content to load, then capture the accessibility tree and report what content appeared"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":40,"totalTokenCount":140}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"navigate_page","args":{"url":"http://127.0.0.1:18923/dynamic.html"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":20,"totalTokenCount":120}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"wait_for","args":{"selector":"#dynamic-content","state":"visible","timeout":5000}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":150,"candidatesTokenCount":25,"totalTokenCount":175}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"take_snapshot","args":{}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":180,"candidatesTokenCount":15,"totalTokenCount":195}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"complete_task","args":{"result":{"success":true,"summary":"Dynamic content loaded successfully. Found heading 'Content Loaded', message 'This content was loaded dynamically via JavaScript.', and a list with items: Item Alpha, Item Beta, Item Gamma."}}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":40,"totalTokenCount":240}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"The dynamic content page loaded successfully:\n\n- **Heading**: Content Loaded\n- **Message**: This content was loaded dynamically via JavaScript\n- **Items**: Item Alpha, Item Beta, Item Gamma\n\nThe JavaScript-rendered content appeared after the initial page load, replacing the 'Loading...' indicator."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":60,"totalTokenCount":260}}]}
diff --git a/integration-tests/browser-agent-localhost.form.responses b/integration-tests/browser-agent-localhost.form.responses
new file mode 100644
index 0000000000..119d1ff46f
--- /dev/null
+++ b/integration-tests/browser-agent-localhost.form.responses
@@ -0,0 +1,9 @@
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I'll fill out the contact form on the localhost server."},{"functionCall":{"name":"browser_agent","args":{"task":"Navigate to http://127.0.0.1:18923/form.html, fill in the name field with 'Test User', the email field with 'test@example.com', the message field with 'Hello World', and submit the form"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":50,"totalTokenCount":150}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"navigate_page","args":{"url":"http://127.0.0.1:18923/form.html"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":20,"totalTokenCount":120}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"fill","args":{"selector":"#name","value":"Test User"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":150,"candidatesTokenCount":25,"totalTokenCount":175}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"fill","args":{"selector":"#email","value":"test@example.com"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":180,"candidatesTokenCount":25,"totalTokenCount":205}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"fill","args":{"selector":"#message","value":"Hello World"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":210,"candidatesTokenCount":25,"totalTokenCount":235}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"click","args":{"selector":"#submit-btn"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":240,"candidatesTokenCount":20,"totalTokenCount":260}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"take_snapshot","args":{}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":270,"candidatesTokenCount":15,"totalTokenCount":285}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"complete_task","args":{"result":{"success":true,"summary":"Successfully filled and submitted the contact form. Navigated to form-result.html which shows 'Form Submitted Successfully'."}}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":300,"candidatesTokenCount":40,"totalTokenCount":340}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I've successfully filled out and submitted the contact form:\n\n- **Name**: Test User\n- **Email**: test@example.com\n- **Message**: Hello World\n\nAfter submission, the page navigated to the result page showing 'Form Submitted Successfully'."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":300,"candidatesTokenCount":60,"totalTokenCount":360}}]}
diff --git a/integration-tests/browser-agent-localhost.multistep.responses b/integration-tests/browser-agent-localhost.multistep.responses
new file mode 100644
index 0000000000..37fc8d438c
--- /dev/null
+++ b/integration-tests/browser-agent-localhost.multistep.responses
@@ -0,0 +1,9 @@
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I'll go through the multi-step flow on the localhost server."},{"functionCall":{"name":"browser_agent","args":{"task":"Navigate to http://127.0.0.1:18923/multi-step/step1.html, fill in 'testuser' as the username, click Next, then on step 2 select 'Option B' and click Finish. Report the final result page content."}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":50,"totalTokenCount":150}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"navigate_page","args":{"url":"http://127.0.0.1:18923/multi-step/step1.html"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":20,"totalTokenCount":120}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"fill","args":{"selector":"#username","value":"testuser"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":150,"candidatesTokenCount":25,"totalTokenCount":175}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"click","args":{"selector":"#next-btn"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":180,"candidatesTokenCount":20,"totalTokenCount":200}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"take_snapshot","args":{}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":210,"candidatesTokenCount":15,"totalTokenCount":225}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"click","args":{"selector":"#finish-btn"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":240,"candidatesTokenCount":20,"totalTokenCount":260}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"take_snapshot","args":{}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":270,"candidatesTokenCount":15,"totalTokenCount":285}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"complete_task","args":{"result":{"success":true,"summary":"Completed all steps. Step 1: entered username 'testuser'. Step 2: selected default option. Final result page shows 'Multi-Step Complete' with 'โ Complete' status badge."}}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":300,"candidatesTokenCount":40,"totalTokenCount":340}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I've completed the multi-step flow:\n\n1. **Step 1**: Entered 'testuser' as username and clicked Next\n2. **Step 2**: Confirmed selection and clicked Finish\n3. **Result**: Final page shows 'Multi-Step Complete' with a 'โ Complete' status badge\n\nAll steps were successfully navigated."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":300,"candidatesTokenCount":60,"totalTokenCount":360}}]}
diff --git a/integration-tests/browser-agent-localhost.navigate.responses b/integration-tests/browser-agent-localhost.navigate.responses
new file mode 100644
index 0000000000..676696bf6b
--- /dev/null
+++ b/integration-tests/browser-agent-localhost.navigate.responses
@@ -0,0 +1,5 @@
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I'll navigate to the localhost page and read its content using the browser agent."},{"functionCall":{"name":"browser_agent","args":{"task":"Navigate to http://127.0.0.1:18923/index.html and tell me the page title and list all links on the page"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":40,"totalTokenCount":140}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"navigate_page","args":{"url":"http://127.0.0.1:18923/index.html"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":20,"totalTokenCount":120}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"take_snapshot","args":{}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":150,"candidatesTokenCount":20,"totalTokenCount":170}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"complete_task","args":{"result":{"success":true,"summary":"Page title is 'Test Fixture - Home'. Found 3 links: Contact Form (/form.html), Multi-Step Flow (/multi-step/step1.html), Dynamic Content (/dynamic.html)."}}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":40,"totalTokenCount":240}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"The localhost test fixture page has:\n\n**Title**: Test Fixture - Home\n\n**Links**:\n1. Contact Form (form.html)\n2. Multi-Step Flow (multi-step/step1.html)\n3. Dynamic Content (dynamic.html)\n\nThe page also has a heading 'Test Fixture Home Page' and footer content."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":60,"totalTokenCount":260}}]}
diff --git a/integration-tests/browser-agent-localhost.screenshot.responses b/integration-tests/browser-agent-localhost.screenshot.responses
new file mode 100644
index 0000000000..762b560697
--- /dev/null
+++ b/integration-tests/browser-agent-localhost.screenshot.responses
@@ -0,0 +1,5 @@
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I'll take a screenshot of the localhost test page."},{"functionCall":{"name":"browser_agent","args":{"task":"Navigate to http://127.0.0.1:18923/index.html and take a screenshot of the page"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":35,"totalTokenCount":135}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"navigate_page","args":{"url":"http://127.0.0.1:18923/index.html"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":20,"totalTokenCount":120}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"take_screenshot","args":{}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":150,"candidatesTokenCount":15,"totalTokenCount":165}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"complete_task","args":{"result":{"success":true,"summary":"Screenshot captured of the localhost test fixture home page showing the heading, navigation links, and footer.","data":{"screenshotTaken":true}}}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":40,"totalTokenCount":240}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I've captured a screenshot of the localhost test fixture page. The screenshot shows the 'Test Fixture Home Page' heading with navigation links to the Contact Form, Multi-Step Flow, and Dynamic Content pages, along with the footer section."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":50,"totalTokenCount":250}}]}
diff --git a/integration-tests/browser-agent-localhost.test.ts b/integration-tests/browser-agent-localhost.test.ts
new file mode 100644
index 0000000000..2de37ba7a9
--- /dev/null
+++ b/integration-tests/browser-agent-localhost.test.ts
@@ -0,0 +1,161 @@
+/**
+ * @license
+ * Copyright 2026 Google LLC
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import { TestRig, assertModelHasOutput } from './test-helper.js';
+import { dirname, join } from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+const __filename = fileURLToPath(import.meta.url);
+const __dirname = dirname(__filename);
+
+describe('browser-agent-localhost', () => {
+ let rig: TestRig;
+
+ const browserSettings = {
+ agents: {
+ overrides: {
+ browser_agent: {
+ enabled: true,
+ },
+ },
+ browser: {
+ headless: true,
+ sessionMode: 'isolated' as const,
+ },
+ },
+ };
+
+ beforeEach(() => {
+ rig = new TestRig();
+ });
+
+ afterEach(async () => {
+ await rig.cleanup();
+ });
+
+ it('should navigate to localhost fixture and read page content', async () => {
+ rig.setup('localhost-navigate', {
+ fakeResponsesPath: join(
+ __dirname,
+ 'browser-agent-localhost.navigate.responses',
+ ),
+ settings: browserSettings,
+ });
+
+ const result = await rig.run({
+ args: 'Navigate to http://127.0.0.1:18923/index.html and tell me the page title and list all links.',
+ });
+
+ assertModelHasOutput(result);
+
+ const toolLogs = rig.readToolLogs();
+ const browserAgentCall = toolLogs.find(
+ (t) => t.toolRequest.name === 'browser_agent',
+ );
+ expect(
+ browserAgentCall,
+ 'Expected browser_agent to be called',
+ ).toBeDefined();
+ });
+
+ it('should fill out and submit a form on localhost', async () => {
+ rig.setup('localhost-form', {
+ fakeResponsesPath: join(
+ __dirname,
+ 'browser-agent-localhost.form.responses',
+ ),
+ settings: browserSettings,
+ });
+
+ const result = await rig.run({
+ args: "Navigate to http://127.0.0.1:18923/form.html, fill in name='Test User', email='test@example.com', message='Hello World', and submit the form.",
+ });
+
+ assertModelHasOutput(result);
+
+ const toolLogs = rig.readToolLogs();
+ const browserAgentCall = toolLogs.find(
+ (t) => t.toolRequest.name === 'browser_agent',
+ );
+ expect(
+ browserAgentCall,
+ 'Expected browser_agent to be called',
+ ).toBeDefined();
+ });
+
+ it('should navigate through a multi-step flow', async () => {
+ rig.setup('localhost-multistep', {
+ fakeResponsesPath: join(
+ __dirname,
+ 'browser-agent-localhost.multistep.responses',
+ ),
+ settings: browserSettings,
+ });
+
+ const result = await rig.run({
+ args: "Go to http://127.0.0.1:18923/multi-step/step1.html, fill in 'testuser' as username, click Next, then click Finish on step 2. Report the result.",
+ });
+
+ assertModelHasOutput(result);
+
+ const toolLogs = rig.readToolLogs();
+ const browserAgentCall = toolLogs.find(
+ (t) => t.toolRequest.name === 'browser_agent',
+ );
+ expect(
+ browserAgentCall,
+ 'Expected browser_agent to be called',
+ ).toBeDefined();
+ });
+
+ it('should handle dynamically loaded content', async () => {
+ rig.setup('localhost-dynamic', {
+ fakeResponsesPath: join(
+ __dirname,
+ 'browser-agent-localhost.dynamic.responses',
+ ),
+ settings: browserSettings,
+ });
+
+ const result = await rig.run({
+ args: 'Navigate to http://127.0.0.1:18923/dynamic.html, wait for content to load, and tell me what items appear.',
+ });
+
+ assertModelHasOutput(result);
+
+ const toolLogs = rig.readToolLogs();
+ const browserAgentCall = toolLogs.find(
+ (t) => t.toolRequest.name === 'browser_agent',
+ );
+ expect(
+ browserAgentCall,
+ 'Expected browser_agent to be called',
+ ).toBeDefined();
+ });
+
+ it('should take a screenshot of localhost page', async () => {
+ rig.setup('localhost-screenshot', {
+ fakeResponsesPath: join(
+ __dirname,
+ 'browser-agent-localhost.screenshot.responses',
+ ),
+ settings: browserSettings,
+ });
+
+ const result = await rig.run({
+ args: 'Navigate to http://127.0.0.1:18923/index.html and take a screenshot.',
+ });
+
+ assertModelHasOutput(result);
+
+ const toolLogs = rig.readToolLogs();
+ const browserCalls = toolLogs.filter(
+ (t) => t.toolRequest.name === 'browser_agent',
+ );
+ expect(browserCalls.length).toBeGreaterThan(0);
+ });
+});
diff --git a/integration-tests/browser-agent.cleanup.responses b/integration-tests/browser-agent.cleanup.responses
index 9cf7a7b356..e99c757793 100644
--- a/integration-tests/browser-agent.cleanup.responses
+++ b/integration-tests/browser-agent.cleanup.responses
@@ -1,4 +1,5 @@
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I'll open https://example.com and check the page title for you."},{"functionCall":{"name":"browser_agent","args":{"task":"Open https://example.com and get the page title"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":35,"totalTokenCount":135}}]}
-{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I have opened the page and the title is 'Example Domain'."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":30,"totalTokenCount":230}}]}
-{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"The task is complete. The page title is 'Example Domain'."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":300,"candidatesTokenCount":20,"totalTokenCount":320}}]}
-{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"Done."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":400,"candidatesTokenCount":5,"totalTokenCount":405}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"navigate_page","args":{"url":"https://example.com"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":20,"totalTokenCount":120}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"take_snapshot","args":{}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":150,"candidatesTokenCount":20,"totalTokenCount":170}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"complete_task","args":{"result":{"success":true,"summary":"The page title is 'Example Domain'."}}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":30,"totalTokenCount":230}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I have opened the page and the title is 'Example Domain'. The browser session has been cleaned up successfully."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":30,"totalTokenCount":230}}]}
diff --git a/integration-tests/browser-agent.interaction.responses b/integration-tests/browser-agent.interaction.responses
index 98474d6b59..0b4a1d84f7 100644
--- a/integration-tests/browser-agent.interaction.responses
+++ b/integration-tests/browser-agent.interaction.responses
@@ -1,2 +1,5 @@
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I'll navigate to https://example.com and analyze the links on the page."},{"functionCall":{"name":"browser_agent","args":{"task":"Go to https://example.com and find all links on the page, then describe them"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":40,"totalTokenCount":140}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"navigate_page","args":{"url":"https://example.com"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":20,"totalTokenCount":120}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"take_snapshot","args":{}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":150,"candidatesTokenCount":20,"totalTokenCount":170}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"complete_task","args":{"result":{"success":true,"summary":"Found one link on https://example.com: 'More information...' linking to the IANA website for details about reserved domains."}}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":40,"totalTokenCount":240}}]}
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"After analyzing https://example.com, I found the following links:\n\n1. **\"More information...\"** - This is the main link on the page that points to the IANA (Internet Assigned Numbers Authority) website for more details about reserved domains.\n\nThe page is quite minimal with just this single informational link, which is typical for example domains used in documentation."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":70,"totalTokenCount":270}}]}
diff --git a/integration-tests/browser-agent.navigate-snapshot.responses b/integration-tests/browser-agent.navigate-snapshot.responses
index 481520234d..e9c9490a21 100644
--- a/integration-tests/browser-agent.navigate-snapshot.responses
+++ b/integration-tests/browser-agent.navigate-snapshot.responses
@@ -1,2 +1,5 @@
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I'll help you open https://example.com and analyze the page. Let me use the browser agent to navigate and capture the page information."},{"functionCall":{"name":"browser_agent","args":{"task":"Navigate to https://example.com and capture the accessibility tree to get the page title and main content"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":50,"totalTokenCount":150}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"navigate_page","args":{"url":"https://example.com"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":30,"totalTokenCount":130}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"take_snapshot","args":{}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":150,"candidatesTokenCount":20,"totalTokenCount":170}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"complete_task","args":{"result":{"success":true,"summary":"Successfully navigated to https://example.com. The page title is 'Example Domain' and the main content states: 'This domain is for use in illustrative examples in documents.'"}}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":40,"totalTokenCount":240}}]}
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"Based on the browser analysis of https://example.com:\n\n**Page Title**: Example Domain\n\n**Main Content**: The page contains a simple heading \"Example Domain\" and explanatory text stating \"This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\"\n\nThe page has a clean, minimal layout typical of placeholder domains used for documentation and examples."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":80,"totalTokenCount":280}}]}
diff --git a/integration-tests/browser-agent.persistent-session.responses b/integration-tests/browser-agent.persistent-session.responses
new file mode 100644
index 0000000000..ee224858f1
--- /dev/null
+++ b/integration-tests/browser-agent.persistent-session.responses
@@ -0,0 +1,8 @@
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I'll browse to example.com twice to verify the content. Let me first check the page title, then check the links on the page."},{"functionCall":{"name":"browser_agent","args":{"task":"Navigate to https://example.com and tell me the page title using the accessibility tree"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":50,"totalTokenCount":150}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"navigate_page","args":{"url":"https://example.com"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":30,"totalTokenCount":130}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"take_snapshot","args":{}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":150,"candidatesTokenCount":20,"totalTokenCount":170}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"complete_task","args":{"result":{"success":true,"summary":"Page title is 'Example Domain'."}}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":40,"totalTokenCount":240}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"The page title is 'Example Domain'. Now let me check the links on the page."},{"functionCall":{"name":"browser_agent","args":{"task":"Take a snapshot of the accessibility tree on the currently open page and tell me about any links"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":50,"totalTokenCount":250}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"take_snapshot","args":{}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":150,"candidatesTokenCount":20,"totalTokenCount":170}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"complete_task","args":{"result":{"success":true,"summary":"Found a link 'More information...' pointing to iana.org."}}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":40,"totalTokenCount":240}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I browsed example.com twice using persistent browser sessions:\n\n1. **First visit**: Page title is 'Example Domain'\n2. **Second visit**: Found a link 'More information...' pointing to iana.org\n\nThe browser stayed open between both visits, confirming persistent session management works correctly."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":300,"candidatesTokenCount":60,"totalTokenCount":360}}]}
diff --git a/integration-tests/browser-agent.screenshot.responses b/integration-tests/browser-agent.screenshot.responses
index c9abaa4268..84467f517a 100644
--- a/integration-tests/browser-agent.screenshot.responses
+++ b/integration-tests/browser-agent.screenshot.responses
@@ -1,2 +1,5 @@
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I'll navigate to https://example.com and take a screenshot for you."},{"functionCall":{"name":"browser_agent","args":{"task":"Navigate to https://example.com and take a screenshot of the page"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":40,"totalTokenCount":140}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"navigate_page","args":{"url":"https://example.com"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":20,"totalTokenCount":120}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"take_screenshot","args":{}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":150,"candidatesTokenCount":20,"totalTokenCount":170}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"complete_task","args":{"result":{"success":true,"summary":"Successfully navigated to https://example.com and captured a screenshot. The page shows the Example Domain header with explanatory text."}}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":40,"totalTokenCount":240}}]}
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I've successfully navigated to https://example.com and captured a screenshot. The page shows the familiar \"Example Domain\" header with explanatory text below it. The screenshot captures the clean, minimal layout of this demonstration website."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":50,"totalTokenCount":250}}]}
diff --git a/integration-tests/browser-agent.sequential.responses b/integration-tests/browser-agent.sequential.responses
index 74732998aa..6dc1e96268 100644
--- a/integration-tests/browser-agent.sequential.responses
+++ b/integration-tests/browser-agent.sequential.responses
@@ -1,2 +1,6 @@
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I'll help you navigate to https://example.com and perform both operations - capturing the accessibility tree and taking a screenshot."},{"functionCall":{"name":"browser_agent","args":{"task":"Navigate to https://example.com, take a snapshot of the accessibility tree, then take a screenshot"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":45,"totalTokenCount":145}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"navigate_page","args":{"url":"https://example.com"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":20,"totalTokenCount":120}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"take_snapshot","args":{}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":150,"candidatesTokenCount":20,"totalTokenCount":170}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"take_screenshot","args":{}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":20,"totalTokenCount":220}}]}
+{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"complete_task","args":{"result":{"success":true,"summary":"Successfully navigated to https://example.com, captured accessibility tree snapshot and took a screenshot. Page title is 'Example Domain'."}}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":250,"candidatesTokenCount":40,"totalTokenCount":290}}]}
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I've successfully completed both operations on https://example.com:\n\n**Accessibility Tree Snapshot**: The page has a clear structure with the main heading \"Example Domain\" and descriptive text about the domain's purpose for documentation examples.\n\n**Screenshot**: Captured a visual representation of the page showing the clean, minimal layout with the heading and explanatory text.\n\nBoth the accessibility data and visual screenshot confirm this is the standard example domain page used for documentation purposes."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":80,"totalTokenCount":280}}]}
diff --git a/integration-tests/browser-agent.test.ts b/integration-tests/browser-agent.test.ts
index f9f07d4c9e..09e20bcb26 100644
--- a/integration-tests/browser-agent.test.ts
+++ b/integration-tests/browser-agent.test.ts
@@ -77,7 +77,12 @@ describe.skipIf(!chromeAvailable)('browser-agent', () => {
),
settings: {
agents: {
- browser_agent: {
+ overrides: {
+ browser_agent: {
+ enabled: true,
+ },
+ },
+ browser: {
headless: true,
sessionMode: 'isolated',
},
@@ -106,7 +111,12 @@ describe.skipIf(!chromeAvailable)('browser-agent', () => {
fakeResponsesPath: join(__dirname, 'browser-agent.screenshot.responses'),
settings: {
agents: {
- browser_agent: {
+ overrides: {
+ browser_agent: {
+ enabled: true,
+ },
+ },
+ browser: {
headless: true,
sessionMode: 'isolated',
},
@@ -132,7 +142,12 @@ describe.skipIf(!chromeAvailable)('browser-agent', () => {
fakeResponsesPath: join(__dirname, 'browser-agent.interaction.responses'),
settings: {
agents: {
- browser_agent: {
+ overrides: {
+ browser_agent: {
+ enabled: true,
+ },
+ },
+ browser: {
headless: true,
sessionMode: 'isolated',
},
@@ -161,7 +176,12 @@ describe.skipIf(!chromeAvailable)('browser-agent', () => {
fakeResponsesPath: join(__dirname, 'browser-agent.cleanup.responses'),
settings: {
agents: {
- browser_agent: {
+ overrides: {
+ browser_agent: {
+ enabled: true,
+ },
+ },
+ browser: {
headless: true,
sessionMode: 'isolated',
},
@@ -182,7 +202,12 @@ describe.skipIf(!chromeAvailable)('browser-agent', () => {
fakeResponsesPath: join(__dirname, 'browser-agent.sequential.responses'),
settings: {
agents: {
- browser_agent: {
+ overrides: {
+ browser_agent: {
+ enabled: true,
+ },
+ },
+ browser: {
headless: true,
sessionMode: 'isolated',
},
@@ -204,6 +229,51 @@ describe.skipIf(!chromeAvailable)('browser-agent', () => {
assertModelHasOutput(result);
});
+ it('should keep browser open across multiple browser_agent invocations', async () => {
+ rig.setup('browser-persistent-session', {
+ fakeResponsesPath: join(
+ __dirname,
+ 'browser-agent.persistent-session.responses',
+ ),
+ settings: {
+ agents: {
+ overrides: {
+ browser_agent: {
+ enabled: true,
+ },
+ },
+ browser: {
+ headless: true,
+ sessionMode: 'isolated',
+ },
+ },
+ },
+ });
+
+ const result = await rig.run({
+ args: 'Browse to example.com twice: first get the page title, then check for links.',
+ });
+
+ const toolLogs = rig.readToolLogs();
+ const browserCalls = toolLogs.filter(
+ (t) => t.toolRequest.name === 'browser_agent',
+ );
+
+ // Both browser_agent invocations must succeed โ if the browser was
+ // incorrectly closed after the first call (regression #24210),
+ // the second call would fail.
+ expect(
+ browserCalls.length,
+ 'Expected browser_agent to be called twice',
+ ).toBe(2);
+ expect(
+ browserCalls.every((c) => c.toolRequest.success),
+ 'Both browser_agent calls should succeed',
+ ).toBe(true);
+
+ assertModelHasOutput(result);
+ });
+
it('should handle tool confirmation for write_file without crashing', async () => {
rig.setup('tool-confirmation', {
fakeResponsesPath: join(
@@ -212,7 +282,12 @@ describe.skipIf(!chromeAvailable)('browser-agent', () => {
),
settings: {
agents: {
- browser_agent: {
+ overrides: {
+ browser_agent: {
+ enabled: true,
+ },
+ },
+ browser: {
headless: true,
sessionMode: 'isolated',
},
diff --git a/integration-tests/browser-policy.test.ts b/integration-tests/browser-policy.test.ts
index f533cb3f5e..4fbfc5db01 100644
--- a/integration-tests/browser-policy.test.ts
+++ b/integration-tests/browser-policy.test.ts
@@ -10,8 +10,13 @@ import { dirname, join } from 'node:path';
import { fileURLToPath } from 'node:url';
import { execSync } from 'node:child_process';
import { existsSync, writeFileSync, readFileSync, mkdirSync } from 'node:fs';
+import { env } from 'node:process';
import stripAnsi from 'strip-ansi';
+// Browser agent Chrome DevTools MCP connection is flaky in Docker sandbox.
+// See: https://github.com/google-gemini/gemini-cli/issues/24382
+const isDockerSandbox = env['GEMINI_SANDBOX'] === 'docker';
+
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
@@ -59,122 +64,146 @@ describe.skipIf(!chromeAvailable)('browser-policy', () => {
await rig.cleanup();
});
- it('should skip confirmation when "Allow all server tools for this session" is chosen', async () => {
- rig.setup('browser-policy-skip-confirmation', {
- fakeResponsesPath: join(__dirname, 'browser-policy.responses'),
- settings: {
- agents: {
- overrides: {
- browser_agent: {
- enabled: true,
+ it.skipIf(isDockerSandbox)(
+ 'should skip confirmation when "Allow all server tools for this session" is chosen',
+ async () => {
+ rig.setup('browser-policy-skip-confirmation', {
+ fakeResponsesPath: join(__dirname, 'browser-policy.responses'),
+ settings: {
+ agents: {
+ overrides: {
+ browser_agent: {
+ enabled: true,
+ },
+ },
+ browser: {
+ headless: true,
+ sessionMode: 'isolated',
+ allowedDomains: ['example.com'],
},
},
- browser: {
- headless: true,
- sessionMode: 'isolated',
- allowedDomains: ['example.com'],
- },
},
- },
- });
+ });
- // Manually trust the folder to avoid the dialog and enable option 3
- const geminiDir = join(rig.homeDir!, '.gemini');
- mkdirSync(geminiDir, { recursive: true });
+ // Manually trust the folder to avoid the dialog and enable option 3
+ const geminiDir = join(rig.homeDir!, '.gemini');
+ mkdirSync(geminiDir, { recursive: true });
- // Write to trustedFolders.json
- const trustedFoldersPath = join(geminiDir, 'trustedFolders.json');
- const trustedFolders = {
- [rig.testDir!]: 'TRUST_FOLDER',
- };
- writeFileSync(trustedFoldersPath, JSON.stringify(trustedFolders, null, 2));
+ // Write to trustedFolders.json
+ const trustedFoldersPath = join(geminiDir, 'trustedFolders.json');
+ const trustedFolders = {
+ [rig.testDir!]: 'TRUST_FOLDER',
+ };
+ writeFileSync(
+ trustedFoldersPath,
+ JSON.stringify(trustedFolders, null, 2),
+ );
- // Force confirmation for browser agent.
- // NOTE: We don't force confirm browser tools here because "Allow all server tools"
- // adds a rule with ALWAYS_ALLOW_PRIORITY (3.9x) which would be overshadowed by
- // a rule in the user tier (4.x) like the one from this TOML.
- // By removing the explicit mcp rule, the first MCP tool will still prompt
- // due to default approvalMode = 'default', and then "Allow all" will correctly
- // bypass subsequent tools.
- const policyFile = join(rig.testDir!, 'force-confirm.toml');
- writeFileSync(
- policyFile,
- `
+ // Force confirmation for browser agent.
+ // NOTE: We don't force confirm browser tools here because "Allow all server tools"
+ // adds a rule with ALWAYS_ALLOW_PRIORITY (3.9x) which would be overshadowed by
+ // a rule in the user tier (4.x) like the one from this TOML.
+ // By removing the explicit mcp rule, the first MCP tool will still prompt
+ // due to default approvalMode = 'default', and then "Allow all" will correctly
+ // bypass subsequent tools.
+ const policyFile = join(rig.testDir!, 'force-confirm.toml');
+ writeFileSync(
+ policyFile,
+ `
[[rule]]
name = "Force confirm browser_agent"
toolName = "browser_agent"
decision = "ask_user"
priority = 200
`,
- );
+ );
- // Update settings.json in both project and home directories to point to the policy file
- for (const baseDir of [rig.testDir!, rig.homeDir!]) {
- const settingsPath = join(baseDir, '.gemini', 'settings.json');
- if (existsSync(settingsPath)) {
- const settings = JSON.parse(readFileSync(settingsPath, 'utf-8'));
- settings.policyPaths = [policyFile];
- // Ensure folder trust is enabled
- settings.security = settings.security || {};
- settings.security.folderTrust = settings.security.folderTrust || {};
- settings.security.folderTrust.enabled = true;
- writeFileSync(settingsPath, JSON.stringify(settings, null, 2));
+ // Update settings.json in both project and home directories to point to the policy file
+ for (const baseDir of [rig.testDir!, rig.homeDir!]) {
+ const settingsPath = join(baseDir, '.gemini', 'settings.json');
+ if (existsSync(settingsPath)) {
+ const settings = JSON.parse(readFileSync(settingsPath, 'utf-8'));
+ settings.policyPaths = [policyFile];
+ // Ensure folder trust is enabled
+ settings.security = settings.security || {};
+ settings.security.folderTrust = settings.security.folderTrust || {};
+ settings.security.folderTrust.enabled = true;
+ writeFileSync(settingsPath, JSON.stringify(settings, null, 2));
+ }
}
- }
- const run = await rig.runInteractive({
- approvalMode: 'default',
- env: {
- GEMINI_CLI_INTEGRATION_TEST: 'true',
- },
- });
+ const run = await rig.runInteractive({
+ approvalMode: 'default',
+ env: {
+ GEMINI_CLI_INTEGRATION_TEST: 'true',
+ },
+ });
- await run.sendKeys(
- 'Open https://example.com and check if there is a heading\r',
- );
- await run.sendKeys('\r');
+ await run.sendKeys(
+ 'Open https://example.com and check if there is a heading\r',
+ );
+ await run.sendKeys('\r');
- // Handle confirmations.
- // 1. Initial browser_agent delegation (likely only 3 options, so use option 1: Allow once)
- await poll(
- () => stripAnsi(run.output).toLowerCase().includes('action required'),
- 60000,
- 1000,
- );
- await run.sendKeys('1\r');
- await new Promise((r) => setTimeout(r, 2000));
+ // Handle confirmations.
+ // 1. Initial browser_agent delegation (likely only 3 options, so use option 1: Allow once)
+ await poll(
+ () => stripAnsi(run.output).toLowerCase().includes('action required'),
+ 60000,
+ 1000,
+ );
+ await run.sendKeys('1\r');
+ await new Promise((r) => setTimeout(r, 2000));
- // Handle privacy notice
- await poll(
- () => stripAnsi(run.output).toLowerCase().includes('privacy notice'),
- 5000,
- 100,
- );
- await run.sendKeys('1\r');
- await new Promise((r) => setTimeout(r, 5000));
+ // Handle privacy notice
+ await poll(
+ () => stripAnsi(run.output).toLowerCase().includes('privacy notice'),
+ 5000,
+ 100,
+ );
+ await run.sendKeys('1\r');
+ await new Promise((r) => setTimeout(r, 5000));
- // new_page (MCP tool, should have 4 options, use option 3: Allow all server tools)
- await poll(
- () => {
- const stripped = stripAnsi(run.output).toLowerCase();
- return (
- stripped.includes('new_page') &&
- stripped.includes('allow all server tools for this session')
- );
- },
- 60000,
- 1000,
- );
+ // new_page (MCP tool, should have 4 options, use option 3: Allow all server tools)
+ await poll(
+ () => {
+ const stripped = stripAnsi(run.output).toLowerCase();
+ return (
+ stripped.includes('new_page') &&
+ stripped.includes('allow all server tools for this session')
+ );
+ },
+ 60000,
+ 1000,
+ );
- // Select "Allow all server tools for this session" (option 3)
- await run.sendKeys('3\r');
- await new Promise((r) => setTimeout(r, 30000));
+ // Select "Allow all server tools for this session" (option 3)
+ await run.sendKeys('3\r');
- const output = stripAnsi(run.output).toLowerCase();
+ // Wait for the browser agent to finish (success or failure)
+ await poll(
+ () => {
+ const stripped = stripAnsi(run.output).toLowerCase();
+ return (
+ stripped.includes('completed successfully') ||
+ stripped.includes('agent error')
+ );
+ },
+ 120000,
+ 1000,
+ );
- expect(output).toContain('browser_agent');
- expect(output).toContain('completed successfully');
- });
+ const output = stripAnsi(run.output).toLowerCase();
+
+ expect(output).toContain('browser_agent');
+ // The test validates that "Allow all server tools" skips subsequent
+ // tool confirmations โ the browser agent may still fail due to
+ // Chrome/MCP issues in CI, which is acceptable for this policy test.
+ expect(
+ output.includes('completed successfully') ||
+ output.includes('agent error'),
+ ).toBe(true);
+ },
+ );
it('should show the visible warning when browser agent starts in existing session mode', async () => {
rig.setup('browser-session-warning', {
diff --git a/integration-tests/file-system.test.ts b/integration-tests/file-system.test.ts
index 64481068c2..80552cfd68 100644
--- a/integration-tests/file-system.test.ts
+++ b/integration-tests/file-system.test.ts
@@ -121,6 +121,7 @@ describe('file-system', () => {
const result = await rig.run({
args: `write "hello" to "${fileName}" and then stop. Do not perform any other actions.`,
+ timeout: 600000, // 10 min โ real LLM can be slow in Docker sandbox
});
const foundToolCall = await rig.waitForToolCall('write_file');
diff --git a/integration-tests/globalSetup.ts b/integration-tests/globalSetup.ts
index 5f963f7459..9dad51f9b3 100644
--- a/integration-tests/globalSetup.ts
+++ b/integration-tests/globalSetup.ts
@@ -9,16 +9,80 @@ if (process.env['NO_COLOR'] !== undefined) {
delete process.env['NO_COLOR'];
}
-import { mkdir, readdir, rm } from 'node:fs/promises';
-import { join, dirname } from 'node:path';
+import { mkdir, readdir, rm, readFile } from 'node:fs/promises';
+import { join, dirname, extname } from 'node:path';
import { fileURLToPath } from 'node:url';
import { canUseRipgrep } from '../packages/core/src/tools/ripGrep.js';
import { disableMouseTracking } from '@google/gemini-cli-core';
+import { createServer, type Server } from 'node:http';
const __dirname = dirname(fileURLToPath(import.meta.url));
const rootDir = join(__dirname, '..');
const integrationTestsDir = join(rootDir, '.integration-tests');
let runDir = ''; // Make runDir accessible in teardown
+let fixtureServer: Server | undefined;
+
+const FIXTURE_PORT = 18923;
+const FIXTURE_DIR = join(__dirname, 'test-fixtures');
+
+const MIME_TYPES: RecordThank you for your submission.
+Your form data has been received.
++ This is a test fixture page for browser agent integration tests. +
+ + + + diff --git a/integration-tests/test-fixtures/multi-step/result.html b/integration-tests/test-fixtures/multi-step/result.html new file mode 100644 index 0000000000..f2386215d5 --- /dev/null +++ b/integration-tests/test-fixtures/multi-step/result.html @@ -0,0 +1,15 @@ + + + + +You have completed all steps successfully.
+Please provide your name to continue.
+ + + diff --git a/integration-tests/test-fixtures/multi-step/step2.html b/integration-tests/test-fixtures/multi-step/step2.html new file mode 100644 index 0000000000..f0571a7a8e --- /dev/null +++ b/integration-tests/test-fixtures/multi-step/step2.html @@ -0,0 +1,22 @@ + + + + +Choose your preference below.
+ + + diff --git a/package-lock.json b/package-lock.json index f3bf8fa616..2c8a4b64b8 100644 --- a/package-lock.json +++ b/package-lock.json @@ -11,7 +11,7 @@ "packages/*" ], "dependencies": { - "ink": "npm:@jrichman/ink@6.5.0", + "ink": "npm:@jrichman/ink@6.6.7", "latest-version": "^9.0.0", "node-fetch-native": "^1.6.7", "proper-lockfile": "^4.1.2", @@ -92,46 +92,6 @@ "zod": "^3.25.0 || ^4.0.0" } }, - "node_modules/@alcalzone/ansi-tokenize": { - "version": "0.2.2", - "resolved": "https://registry.npmjs.org/@alcalzone/ansi-tokenize/-/ansi-tokenize-0.2.2.tgz", - "integrity": "sha512-mkOh+Wwawzuf5wa30bvc4nA+Qb6DIrGWgBhRR/Pw4T9nsgYait8izvXkNyU78D6Wcu3Z+KUdwCmLCxlWjEotYA==", - "license": "MIT", - "dependencies": { - "ansi-styles": "^6.2.1", - "is-fullwidth-code-point": "^5.0.0" - }, - "engines": { - "node": ">=18" - } - }, - "node_modules/@alcalzone/ansi-tokenize/node_modules/ansi-styles": { - "version": "6.2.3", - "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-6.2.3.tgz", - "integrity": "sha512-4Dj6M28JB+oAH8kFkTLUo+a2jwOFkuqb3yucU0CANcRRUbxS0cP0nZYCGjcc3BNXwRIsUVmDGgzawme7zvJHvg==", - "license": "MIT", - "engines": { - "node": ">=12" - }, - "funding": { - "url": "https://github.com/chalk/ansi-styles?sponsor=1" - } - }, - "node_modules/@alcalzone/ansi-tokenize/node_modules/is-fullwidth-code-point": { - "version": "5.1.0", - "resolved": "https://registry.npmjs.org/is-fullwidth-code-point/-/is-fullwidth-code-point-5.1.0.tgz", - "integrity": "sha512-5XHYaSyiqADb4RnZ1Bdad6cPp8Toise4TzEjcOYDHZkTCbKgiUl7WTUCpNWHuxmDt91wnsZBc9xinNzopv3JMQ==", - "license": "MIT", - "dependencies": { - "get-east-asian-width": "^1.3.1" - }, - "engines": { - "node": ">=18" - }, - "funding": { - "url": "https://github.com/sponsors/sindresorhus" - } - }, "node_modules/@ampproject/remapping": { "version": "2.3.0", "resolved": "https://registry.npmjs.org/@ampproject/remapping/-/remapping-2.3.0.tgz", @@ -10089,14 +10049,13 @@ }, "node_modules/ink": { "name": "@jrichman/ink", - "version": "6.5.0", - "resolved": "https://registry.npmjs.org/@jrichman/ink/-/ink-6.5.0.tgz", - "integrity": "sha512-S4g/ng7fPZmFwclO82iWkOce8vDLy/FIDgHIfkCWGOehqHe6dexHsmq3kNQD21okh198pA5SAQTCqNQJb/svRQ==", + "version": "6.6.7", + "resolved": "https://registry.npmjs.org/@jrichman/ink/-/ink-6.6.7.tgz", + "integrity": "sha512-bDzQLpLzK/dn9Ur/Ku88ZZR9totVcMGrGYAgPHidsAAbe9NKztU1fggj/iu0wRp5g1kBeALb3cfagFGdDxAU1w==", "license": "MIT", "dependencies": { - "@alcalzone/ansi-tokenize": "^0.2.1", "ansi-escapes": "^7.0.0", - "ansi-styles": "^6.2.1", + "ansi-styles": "^6.2.3", "auto-bind": "^5.0.1", "chalk": "^5.6.0", "cli-boxes": "^3.0.0", @@ -10105,6 +10064,7 @@ "code-excerpt": "^4.0.0", "es-toolkit": "^1.39.10", "indent-string": "^5.0.0", + "is-fullwidth-code-point": "^5.0.0", "is-in-ci": "^2.0.0", "mnemonist": "^0.40.3", "patch-console": "^2.0.0", @@ -10174,9 +10134,9 @@ } }, "node_modules/ink/node_modules/ansi-styles": { - "version": "6.2.1", - "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-6.2.1.tgz", - "integrity": "sha512-bN798gFfQX+viw3R7yrGWRqnrN2oRkEkUjjl4JNn4E8GxxbjtG3FbrEIIY3l8/hrwUwIeCZvi4QuOTP4MErVug==", + "version": "6.2.3", + "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-6.2.3.tgz", + "integrity": "sha512-4Dj6M28JB+oAH8kFkTLUo+a2jwOFkuqb3yucU0CANcRRUbxS0cP0nZYCGjcc3BNXwRIsUVmDGgzawme7zvJHvg==", "license": "MIT", "engines": { "node": ">=12" @@ -10197,6 +10157,21 @@ "url": "https://github.com/chalk/chalk?sponsor=1" } }, + "node_modules/ink/node_modules/is-fullwidth-code-point": { + "version": "5.1.0", + "resolved": "https://registry.npmjs.org/is-fullwidth-code-point/-/is-fullwidth-code-point-5.1.0.tgz", + "integrity": "sha512-5XHYaSyiqADb4RnZ1Bdad6cPp8Toise4TzEjcOYDHZkTCbKgiUl7WTUCpNWHuxmDt91wnsZBc9xinNzopv3JMQ==", + "license": "MIT", + "dependencies": { + "get-east-asian-width": "^1.3.1" + }, + "engines": { + "node": ">=18" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/ink/node_modules/is-in-ci": { "version": "2.0.0", "resolved": "https://registry.npmjs.org/is-in-ci/-/is-in-ci-2.0.0.tgz", @@ -17551,7 +17526,7 @@ "fzf": "^0.5.2", "glob": "^12.0.0", "highlight.js": "^11.11.1", - "ink": "npm:@jrichman/ink@6.5.0", + "ink": "npm:@jrichman/ink@6.6.7", "ink-gradient": "^3.0.0", "ink-spinner": "^5.0.0", "latest-version": "^9.0.0", diff --git a/package.json b/package.json index 8bb5f25e20..e24f6a20b5 100644 --- a/package.json +++ b/package.json @@ -18,6 +18,7 @@ }, "scripts": { "start": "cross-env NODE_ENV=development node scripts/start.js", + "start:prod": "cross-env NODE_ENV=production node scripts/start.js", "start:a2a-server": "CODER_AGENT_PORT=41242 npm run start --workspace @google/gemini-cli-a2a-server", "debug": "cross-env DEBUG=1 node --inspect-brk scripts/start.js", "deflake": "node scripts/deflake.js", @@ -38,7 +39,7 @@ "build:packages": "npm run build --workspaces", "build:sandbox": "node scripts/build_sandbox.js", "build:binary": "node scripts/build_binary.js", - "bundle": "npm run generate && npm run build --workspace=@google/gemini-cli-devtools && node esbuild.config.js && node scripts/copy_bundle_assets.js", + "bundle": "npm run generate && npm run build --workspace=@google/gemini-cli-devtools && npm run bundle:browser-mcp -w @google/gemini-cli-core && node esbuild.config.js && node scripts/copy_bundle_assets.js", "test": "npm run test --workspaces --if-present && npm run test:sea-launch", "test:ci": "npm run test:ci --workspaces --if-present && npm run test:scripts && npm run test:sea-launch", "test:scripts": "vitest run --config ./scripts/tests/vitest.config.ts", @@ -68,7 +69,7 @@ "pre-commit": "node scripts/pre-commit.js" }, "overrides": { - "ink": "npm:@jrichman/ink@6.5.0", + "ink": "npm:@jrichman/ink@6.6.7", "wrap-ansi": "9.0.2", "cliui": { "wrap-ansi": "7.0.0" @@ -136,7 +137,7 @@ "yargs": "^17.7.2" }, "dependencies": { - "ink": "npm:@jrichman/ink@6.5.0", + "ink": "npm:@jrichman/ink@6.6.7", "latest-version": "^9.0.0", "node-fetch-native": "^1.6.7", "proper-lockfile": "^4.1.2", diff --git a/packages/a2a-server/src/commands/restore.ts b/packages/a2a-server/src/commands/restore.ts index c7567a3b24..7a5205c66b 100644 --- a/packages/a2a-server/src/commands/restore.ts +++ b/packages/a2a-server/src/commands/restore.ts @@ -98,7 +98,7 @@ export class RestoreCommand implements Command { name: this.name, data: restoreResult, }; - } catch (_error) { + } catch { return { name: this.name, data: { @@ -142,7 +142,7 @@ export class ListCheckpointsCommand implements Command { content: JSON.stringify(checkpointInfoList), }, }; - } catch (_error) { + } catch { return { name: this.name, data: { diff --git a/packages/a2a-server/src/http/server.ts b/packages/a2a-server/src/http/server.ts index 1bfb29c081..c22be49331 100644 --- a/packages/a2a-server/src/http/server.ts +++ b/packages/a2a-server/src/http/server.ts @@ -1,4 +1,4 @@ -#!/usr/bin/env -S node --no-warnings=DEP0040 +#!/usr/bin/env node /** * @license diff --git a/packages/a2a-server/src/utils/testing_utils.ts b/packages/a2a-server/src/utils/testing_utils.ts index f7f1645f8c..4265805e09 100644 --- a/packages/a2a-server/src/utils/testing_utils.ts +++ b/packages/a2a-server/src/utils/testing_utils.ts @@ -109,12 +109,8 @@ export function createMockConfig( enableEnvironmentVariableRedaction: false, }, }), - isExperimentalAgentHistoryTruncationEnabled: vi.fn().mockReturnValue(false), - getExperimentalAgentHistoryTruncationThreshold: vi.fn().mockReturnValue(50), - getExperimentalAgentHistoryRetainedMessages: vi.fn().mockReturnValue(30), - isExperimentalAgentHistorySummarizationEnabled: vi - .fn() - .mockReturnValue(false), + isContextManagementEnabled: vi.fn().mockReturnValue(false), + getContextManagementConfig: vi.fn().mockReturnValue({ enabled: false }), ...overrides, } as unknown as Config; diff --git a/packages/cli/index.ts b/packages/cli/index.ts index 5444fe1b74..d94a2dd191 100644 --- a/packages/cli/index.ts +++ b/packages/cli/index.ts @@ -1,4 +1,4 @@ -#!/usr/bin/env -S node --no-warnings=DEP0040 +#!/usr/bin/env node /** * @license diff --git a/packages/cli/package.json b/packages/cli/package.json index 072f2b8a72..52ae182dca 100644 --- a/packages/cli/package.json +++ b/packages/cli/package.json @@ -49,7 +49,7 @@ "fzf": "^0.5.2", "glob": "^12.0.0", "highlight.js": "^11.11.1", - "ink": "npm:@jrichman/ink@6.5.0", + "ink": "npm:@jrichman/ink@6.6.7", "ink-gradient": "^3.0.0", "ink-spinner": "^5.0.0", "latest-version": "^9.0.0", diff --git a/packages/cli/src/__snapshots__/nonInteractiveCliAgentSession.test.ts.snap b/packages/cli/src/__snapshots__/nonInteractiveCliAgentSession.test.ts.snap new file mode 100644 index 0000000000..92f396a59c --- /dev/null +++ b/packages/cli/src/__snapshots__/nonInteractiveCliAgentSession.test.ts.snap @@ -0,0 +1,35 @@ +// Vitest Snapshot v1, https://vitest.dev/guide/snapshot.html + +exports[`runNonInteractive > should emit appropriate error event in streaming JSON mode: 'loop detected' 1`] = ` +"{"type":"init","timestamp":"