Merge branch 'main' into doc-skill-callouts

This commit is contained in:
Sam Roberts
2026-03-16 17:57:07 -07:00
committed by GitHub
394 changed files with 18226 additions and 5210 deletions

View File

@@ -15,6 +15,8 @@ requests sent from `packages/cli`. For a general overview of Gemini CLI, see the
modular GEMINI.md import feature using @file.md syntax.
- **[Policy Engine](../reference/policy-engine.md):** Use the Policy Engine for
fine-grained control over tool execution.
- **[Local Model Routing (experimental)](./local-model-routing.md):** Learn how
to enable use of a local Gemma model for model routing decisions.
## Role of the core

View File

@@ -0,0 +1,193 @@
# Local Model Routing (experimental)
Gemini CLI supports using a local model for
[routing decisions](../cli/model-routing.md). When configured, Gemini CLI will
use a locally-running **Gemma** model to make routing decisions (instead of
sending routing decisions to a hosted model).
This feature can help reduce costs associated with hosted model usage while
offering similar routing decision latency and quality.
> **Note: Local model routing is currently an experimental feature.**
## Setup
Using a Gemma model for routing decisions requires that an implementation of a
Gemma model be running locally on your machine, served behind an HTTP endpoint
and accessed via the Gemini API.
To serve the Gemma model, follow these steps:
### Download the LiteRT-LM runtime
The [LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM) runtime offers
pre-built binaries for locally-serving models. Download the binary appropriate
for your system.
#### Windows
1. Download
[lit.windows_x86_64.exe](https://github.com/google-ai-edge/LiteRT-LM/releases/download/v0.9.0-alpha03/lit.windows_x86_64.exe).
2. Using GPU on Windows requires the DirectXShaderCompiler. Download the
[dxc zip from the latest release](https://github.com/microsoft/DirectXShaderCompiler/releases/download/v1.8.2505.1/dxc_2025_07_14.zip).
Unzip the archive and from the architecture-appropriate `bin\` directory, and
copy the `dxil.dll` and `dxcompiler.dll` into the same location as you saved
`lit.windows_x86_64.exe`.
3. (Optional) Test starting the runtime:
`.\lit.windows_x86_64.exe serve --verbose`
#### Linux
1. Download
[lit.linux_x86_64](https://github.com/google-ai-edge/LiteRT-LM/releases/download/v0.9.0-alpha03/lit.linux_x86_64).
2. Ensure the binary is executable: `chmod a+x lit.linux_x86_64`
3. (Optional) Test starting the runtime: `./lit.linux_x86_64 serve --verbose`
#### MacOS
1. Download
[lit-macos-arm64](https://github.com/google-ai-edge/LiteRT-LM/releases/download/v0.9.0-alpha03/lit.macos_arm64).
2. Ensure the binary is executable: `chmod a+x lit.macos_arm64`
3. (Optional) Test starting the runtime: `./lit.macos_arm64 serve --verbose`
> **Note**: MacOS can be configured to only allows binaries from "App Store &
> Known Developers". If you encounter an error message when attempting to run
> the binary, you will need to allow the application. One option is to visit
> `System Settings -> Privacy & Security`, scroll to `Security`, and click
> `"Allow Anyway"` for `"lit.macos_arm64"`. Another option is to run
> `xattr -d com.apple.quarantine lit.macos_arm64` from the commandline.
### Download the Gemma Model
Before using Gemma, you will need to download the model (and agree to the Terms
of Service).
This can be done via the LiteRT-LM runtime.
#### Windows
```bash
$ .\lit.windows_x86_64.exe pull gemma3-1b-gpu-custom
[Legal] The model you are about to download is governed by
the Gemma Terms of Use and Prohibited Use Policy. Please review these terms and ensure you agree before continuing.
Full Terms: https://ai.google.dev/gemma/terms
Prohibited Use Policy: https://ai.google.dev/gemma/prohibited_use_policy
Do you accept these terms? (Y/N): Y
Terms accepted.
Downloading model 'gemma3-1b-gpu-custom' ...
Downloading... 968.6 MB
Download complete.
```
#### Linux
```bash
$ ./lit.linux_x86_64 pull gemma3-1b-gpu-custom
[Legal] The model you are about to download is governed by
the Gemma Terms of Use and Prohibited Use Policy. Please review these terms and ensure you agree before continuing.
Full Terms: https://ai.google.dev/gemma/terms
Prohibited Use Policy: https://ai.google.dev/gemma/prohibited_use_policy
Do you accept these terms? (Y/N): Y
Terms accepted.
Downloading model 'gemma3-1b-gpu-custom' ...
Downloading... 968.6 MB
Download complete.
```
#### MacOS
```bash
$ ./lit.lit.macos_arm64 pull gemma3-1b-gpu-custom
[Legal] The model you are about to download is governed by
the Gemma Terms of Use and Prohibited Use Policy. Please review these terms and ensure you agree before continuing.
Full Terms: https://ai.google.dev/gemma/terms
Prohibited Use Policy: https://ai.google.dev/gemma/prohibited_use_policy
Do you accept these terms? (Y/N): Y
Terms accepted.
Downloading model 'gemma3-1b-gpu-custom' ...
Downloading... 968.6 MB
Download complete.
```
### Start LiteRT-LM Runtime
Using the command appropriate to your system, start the LiteRT-LM runtime.
Configure the port that you want to use for your Gemma model. For the purposes
of this document, we will use port `9379`.
Example command for MacOS: `./lit.macos_arm64 serve --port=9379 --verbose`
### (Optional) Verify Model Serving
Send a quick prompt to the model via HTTP to validate successful model serving.
This will cause the runtime to download the model and run it once.
You should see a short joke in the server output as an indicator of success.
#### Windows
```
# Run this in PowerShell to send a request to the server
$uri = "http://localhost:9379/v1beta/models/gemma3-1b-gpu-custom:generateContent"
$body = @{contents = @( @{
role = "user"
parts = @( @{ text = "Tell me a joke." } )
})} | ConvertTo-Json -Depth 10
Invoke-RestMethod -Uri $uri -Method Post -Body $body -ContentType "application/json"
```
#### Linux/MacOS
```bash
$ curl "http://localhost:9379/v1beta/models/gemma3-1b-gpu-custom:generateContent" \
-H 'Content-Type: application/json' \
-X POST \
-d '{"contents":[{"role":"user","parts":[{"text":"Tell me a joke."}]}]}'
```
## Configuration
To use a local Gemma model for routing, you must explicitly enable it in your
`settings.json`:
```json
{
"experimental": {
"gemmaModelRouter": {
"enabled": true,
"classifier": {
"host": "http://localhost:9379",
"model": "gemma3-1b-gpu-custom"
}
}
}
}
```
> Use the port you started your LiteRT-LM runtime on in the setup steps.
### Configuration schema
| Field | Type | Required | Description |
| :----------------- | :------ | :------- | :----------------------------------------------------------------------------------------- |
| `enabled` | boolean | Yes | Must be `true` to enable the feature. |
| `classifier` | object | Yes | The configuration for the local model endpoint. It includes the host and model specifiers. |
| `classifier.host` | string | Yes | The URL to the local model server. Should be `http://localhost:<port>`. |
| `classifier.model` | string | Yes | The model name to use for decisions. Must be `"gemma3-1b-gpu-custom"`. |
> **Note: You will need to restart after configuration changes for local model
> routing to take effect.**

View File

@@ -27,6 +27,20 @@ To use remote subagents, you must explicitly enable them in your
}
```
## Proxy support
Gemini CLI routes traffic to remote agents through an HTTP/HTTPS proxy if one is
configured. It uses the `general.proxy` setting in your `settings.json` file or
standard environment variables (`HTTP_PROXY`, `HTTPS_PROXY`).
```json
{
"general": {
"proxy": "http://my-proxy:8080"
}
}
```
## Defining remote subagents
Remote subagents are defined as Markdown files (`.md`) with YAML frontmatter.
@@ -42,6 +56,7 @@ You can place them in:
| `kind` | string | Yes | Must be `remote`. |
| `name` | string | Yes | A unique name for the agent. Must be a valid slug (lowercase letters, numbers, hyphens, and underscores only). |
| `agent_card_url` | string | Yes | The URL to the agent's A2A card endpoint. |
| `auth` | object | No | Authentication configuration. See [Authentication](#authentication). |
### Single-subagent example
@@ -73,6 +88,273 @@ Markdown file.
> [!NOTE] Mixed local and remote agents, or multiple local agents, are not
> supported in a single file; the list format is currently remote-only.
## Authentication
Many remote agents require authentication. Gemini CLI supports several
authentication methods aligned with the
[A2A security specification](https://a2a-protocol.org/latest/specification/#451-securityscheme).
Add an `auth` block to your agent's frontmatter to configure credentials.
### Supported auth types
Gemini CLI supports the following authentication types:
| Type | Description |
| :------------------- | :--------------------------------------------------------------------------------------------- |
| `apiKey` | Send a static API key as an HTTP header. |
| `http` | HTTP authentication (Bearer token, Basic credentials, or any IANA-registered scheme). |
| `google-credentials` | Google Application Default Credentials (ADC). Automatically selects access or identity tokens. |
| `oauth2` | OAuth 2.0 Authorization Code flow with PKCE. Opens a browser for interactive sign-in. |
### Dynamic values
For `apiKey` and `http` auth types, secret values (`key`, `token`, `username`,
`password`, `value`) support dynamic resolution:
| Format | Description | Example |
| :---------- | :-------------------------------------------------- | :------------------------- |
| `$ENV_VAR` | Read from an environment variable. | `$MY_API_KEY` |
| `!command` | Execute a shell command and use the trimmed output. | `!gcloud auth print-token` |
| literal | Use the string as-is. | `sk-abc123` |
| `$$` / `!!` | Escape prefix. `$$FOO` becomes the literal `$FOO`. | `$$NOT_AN_ENV_VAR` |
> **Security tip:** Prefer `$ENV_VAR` or `!command` over embedding secrets
> directly in agent files, especially for project-level agents checked into
> version control.
### API key (`apiKey`)
Sends an API key as an HTTP header on every request.
| Field | Type | Required | Description |
| :----- | :----- | :------- | :---------------------------------------------------- |
| `type` | string | Yes | Must be `apiKey`. |
| `key` | string | Yes | The API key value. Supports dynamic values. |
| `name` | string | No | Header name to send the key in. Default: `X-API-Key`. |
```yaml
---
kind: remote
name: my-agent
agent_card_url: https://example.com/agent-card
auth:
type: apiKey
key: $MY_API_KEY
---
```
### HTTP authentication (`http`)
Supports Bearer tokens, Basic auth, and arbitrary IANA-registered HTTP
authentication schemes.
#### Bearer token
Use the following fields to configure a Bearer token:
| Field | Type | Required | Description |
| :------- | :----- | :------- | :----------------------------------------- |
| `type` | string | Yes | Must be `http`. |
| `scheme` | string | Yes | Must be `Bearer`. |
| `token` | string | Yes | The bearer token. Supports dynamic values. |
```yaml
auth:
type: http
scheme: Bearer
token: $MY_BEARER_TOKEN
```
#### Basic authentication
Use the following fields to configure Basic authentication:
| Field | Type | Required | Description |
| :--------- | :----- | :------- | :------------------------------------- |
| `type` | string | Yes | Must be `http`. |
| `scheme` | string | Yes | Must be `Basic`. |
| `username` | string | Yes | The username. Supports dynamic values. |
| `password` | string | Yes | The password. Supports dynamic values. |
```yaml
auth:
type: http
scheme: Basic
username: $MY_USERNAME
password: $MY_PASSWORD
```
#### Raw scheme
For any other IANA-registered scheme (for example, Digest, HOBA), provide the
raw authorization value.
| Field | Type | Required | Description |
| :------- | :----- | :------- | :---------------------------------------------------------------------------- |
| `type` | string | Yes | Must be `http`. |
| `scheme` | string | Yes | The scheme name (for example, `Digest`). |
| `value` | string | Yes | Raw value sent as `Authorization: <scheme> <value>`. Supports dynamic values. |
```yaml
auth:
type: http
scheme: Digest
value: $MY_DIGEST_VALUE
```
### Google Application Default Credentials (`google-credentials`)
Uses
[Google Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials)
to authenticate with Google Cloud services and Cloud Run endpoints. This is the
recommended auth method for agents hosted on Google Cloud infrastructure.
| Field | Type | Required | Description |
| :------- | :------- | :------- | :-------------------------------------------------------------------------- |
| `type` | string | Yes | Must be `google-credentials`. |
| `scopes` | string[] | No | OAuth scopes. Defaults to `https://www.googleapis.com/auth/cloud-platform`. |
```yaml
---
kind: remote
name: my-gcp-agent
agent_card_url: https://my-agent-xyz.run.app/.well-known/agent.json
auth:
type: google-credentials
---
```
#### How token selection works
The provider automatically selects the correct token type based on the agent's
host:
| Host pattern | Token type | Use case |
| :----------------- | :----------------- | :------------------------------------------ |
| `*.googleapis.com` | **Access token** | Google APIs (Agent Engine, Vertex AI, etc.) |
| `*.run.app` | **Identity token** | Cloud Run services |
- **Access tokens** authorize API calls to Google services. They are scoped
(default: `cloud-platform`) and fetched via `GoogleAuth.getClient()`.
- **Identity tokens** prove the caller's identity to a service that validates
the token's audience. The audience is set to the target host. These are
fetched via `GoogleAuth.getIdTokenClient()`.
Both token types are cached and automatically refreshed before expiry.
#### Setup
`google-credentials` relies on ADC, which means your environment must have
credentials configured. Common setups:
- **Local development:** Run `gcloud auth application-default login` to
authenticate with your Google account.
- **CI / Cloud environments:** Use a service account. Set the
`GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of your
service account key file, or use workload identity on GKE / Cloud Run.
#### Allowed hosts
For security, `google-credentials` only sends tokens to known Google-owned
hosts:
- `*.googleapis.com`
- `*.run.app`
Requests to any other host will be rejected with an error. If your agent is
hosted on a different domain, use one of the other auth types (`apiKey`, `http`,
or `oauth2`).
#### Examples
The following examples demonstrate how to configure Google Application Default
Credentials.
**Cloud Run agent:**
```yaml
---
kind: remote
name: cloud-run-agent
agent_card_url: https://my-agent-xyz.run.app/.well-known/agent.json
auth:
type: google-credentials
---
```
**Google API with custom scopes:**
```yaml
---
kind: remote
name: vertex-agent
agent_card_url: https://us-central1-aiplatform.googleapis.com/.well-known/agent.json
auth:
type: google-credentials
scopes:
- https://www.googleapis.com/auth/cloud-platform
- https://www.googleapis.com/auth/compute
---
```
### OAuth 2.0 (`oauth2`)
Performs an interactive OAuth 2.0 Authorization Code flow with PKCE. On first
use, Gemini CLI opens your browser for sign-in and persists the resulting tokens
for subsequent requests.
| Field | Type | Required | Description |
| :------------------ | :------- | :------- | :------------------------------------------------------------------------------------------------------------------------------------------------- |
| `type` | string | Yes | Must be `oauth2`. |
| `client_id` | string | Yes\* | OAuth client ID. Required for interactive auth. |
| `client_secret` | string | No\* | OAuth client secret. Required by most authorization servers (confidential clients). Can be omitted for public clients that don't require a secret. |
| `scopes` | string[] | No | Requested scopes. Can also be discovered from the agent card. |
| `authorization_url` | string | No | Authorization endpoint. Discovered from the agent card if omitted. |
| `token_url` | string | No | Token endpoint. Discovered from the agent card if omitted. |
```yaml
---
kind: remote
name: oauth-agent
agent_card_url: https://example.com/.well-known/agent.json
auth:
type: oauth2
client_id: my-client-id.apps.example.com
---
```
If the agent card advertises an `oauth2` security scheme with
`authorizationCode` flow, the `authorization_url`, `token_url`, and `scopes` are
automatically discovered. You only need to provide `client_id` (and
`client_secret` if required).
Tokens are persisted to disk and refreshed automatically when they expire.
### Auth validation
When Gemini CLI loads a remote agent, it validates your auth configuration
against the agent card's declared `securitySchemes`. If the agent requires
authentication that you haven't configured, you'll see an error describing
what's needed.
`google-credentials` is treated as compatible with `http` Bearer security
schemes, since it produces Bearer tokens.
### Auth retry behavior
All auth providers automatically retry on `401` and `403` responses by
re-fetching credentials (up to 2 retries). This handles cases like expired
tokens or rotated credentials. For `apiKey` with `!command` values, the command
is re-executed on retry to fetch a fresh key.
### Agent card fetching and auth
When connecting to a remote agent, Gemini CLI first fetches the agent card
**without** authentication. If the card endpoint returns a `401` or `403`, it
retries the fetch **with** the configured auth headers. This lets agents have
publicly accessible cards while protecting their task endpoints, or to protect
both behind auth.
## Managing Subagents
Users can manage subagents using the following commands within the Gemini CLI:

View File

@@ -8,9 +8,9 @@ the main agent's context or toolset.
<!-- prettier-ignore -->
> [!NOTE]
> Subagents are currently an experimental feature.
To use custom subagents, you must explicitly enable them in your
`settings.json`:
>
To use custom subagents, you must ensure they are enabled in your
`settings.json` (enabled by default):
```json
{
@@ -18,13 +18,6 @@ To use custom subagents, you must explicitly enable them in your
}
```
<!-- prettier-ignore -->
> [!WARNING]
> Subagents currently operate in
> ["YOLO mode"](../reference/configuration.md#command-line-arguments), meaning
> they may execute tools without individual user confirmation for each step.
> Proceed with caution when defining agents with powerful tools like
> `run_shell_command` or `write_file`.
## What are subagents?
@@ -42,6 +35,34 @@ main agent calls the tool, it delegates the task to the subagent. Once the
subagent completes its task, it reports back to the main agent with its
findings.
## How to use subagents
You can use subagents through automatic delegation or by explicitly forcing them
in your prompt.
### Automatic delegation
Gemini CLI's main agent is instructed to use specialized subagents when a task
matches their expertise. For example, if you ask "How does the auth system
work?", the main agent may decide to call the `codebase_investigator` subagent
to perform the research.
### Forcing a subagent (@ syntax)
You can explicitly direct a task to a specific subagent by using the `@` symbol
followed by the subagent's name at the beginning of your prompt. This is useful
when you want to bypass the main agent's decision-making and go straight to a
specialist.
**Example:**
```bash
@codebase_investigator Map out the relationship between the AgentRegistry and the LocalAgentExecutor.
```
When you use the `@` syntax, the CLI injects a system note that nudges the
primary model to use that specific subagent tool immediately.
## Built-in subagents
Gemini CLI comes with the following built-in subagents:
@@ -53,15 +74,17 @@ Gemini CLI comes with the following built-in subagents:
dependencies.
- **When to use:** "How does the authentication system work?", "Map out the
dependencies of the `AgentRegistry` class."
- **Configuration:** Enabled by default. You can configure it in
`settings.json`. Example (forcing a specific model):
- **Configuration:** Enabled by default. You can override its settings in
`settings.json` under `agents.overrides`. Example (forcing a specific model
and increasing turns):
```json
{
"experimental": {
"codebaseInvestigatorSettings": {
"enabled": true,
"maxNumTurns": 20,
"model": "gemini-2.5-pro"
"agents": {
"overrides": {
"codebase_investigator": {
"modelConfig": { "model": "gemini-3-flash-preview" },
"runConfig": { "maxTurns": 50 }
}
}
}
}
@@ -241,7 +264,7 @@ kind: local
tools:
- read_file
- grep_search
model: gemini-2.5-pro
model: gemini-3-flash-preview
temperature: 0.2
max_turns: 10
---
@@ -262,16 +285,102 @@ it yourself; just report it.
### Configuration schema
| Field | Type | Required | Description |
| :------------- | :----- | :------- | :------------------------------------------------------------------------------------------------------------------------ |
| `name` | string | Yes | Unique identifier (slug) used as the tool name for the agent. Only lowercase letters, numbers, hyphens, and underscores. |
| `description` | string | Yes | Short description of what the agent does. This is visible to the main agent to help it decide when to call this subagent. |
| `kind` | string | No | `local` (default) or `remote`. |
| `tools` | array | No | List of tool names this agent can use. If omitted, it may have access to a default set. |
| `model` | string | No | Specific model to use (e.g., `gemini-2.5-pro`). Defaults to `inherit` (uses the main session model). |
| `temperature` | number | No | Model temperature (0.0 - 2.0). |
| `max_turns` | number | No | Maximum number of conversation turns allowed for this agent before it must return. Defaults to `15`. |
| `timeout_mins` | number | No | Maximum execution time in minutes. Defaults to `5`. |
| Field | Type | Required | Description |
| :------------- | :----- | :------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `name` | string | Yes | Unique identifier (slug) used as the tool name for the agent. Only lowercase letters, numbers, hyphens, and underscores. |
| `description` | string | Yes | Short description of what the agent does. This is visible to the main agent to help it decide when to call this subagent. |
| `kind` | string | No | `local` (default) or `remote`. |
| `tools` | array | No | List of tool names this agent can use. Supports wildcards: `*` (all tools), `mcp_*` (all MCP tools), `mcp_server_*` (all tools from a server). **If omitted, it inherits all tools from the parent session.** |
| `model` | string | No | Specific model to use (e.g., `gemini-3-preview`). Defaults to `inherit` (uses the main session model). |
| `temperature` | number | No | Model temperature (0.0 - 2.0). Defaults to `1`. |
| `max_turns` | number | No | Maximum number of conversation turns allowed for this agent before it must return. Defaults to `30`. |
| `timeout_mins` | number | No | Maximum execution time in minutes. Defaults to `10`. |
### Tool wildcards
When defining `tools` for a subagent, you can use wildcards to quickly grant
access to groups of tools:
- `*`: Grant access to all available built-in and discovered tools.
- `mcp_*`: Grant access to all tools from all connected MCP servers.
- `mcp_my-server_*`: Grant access to all tools from a specific MCP server named
`my-server`.
### Isolation and recursion protection
Each subagent runs in its own isolated context loop. This means:
- **Independent history:** The subagent's conversation history does not bloat
the main agent's context.
- **Isolated tools:** The subagent only has access to the tools you explicitly
grant it.
- **Recursion protection:** To prevent infinite loops and excessive token usage,
subagents **cannot** call other subagents. If a subagent is granted the `*`
tool wildcard, it will still be unable to see or invoke other agents.
## Managing subagents
You can manage subagents interactively using the `/agents` command or
persistently via `settings.json`.
### Interactive management (/agents)
If you are in an interactive CLI session, you can use the `/agents` command to
manage subagents without editing configuration files manually. This is the
recommended way to quickly enable, disable, or re-configure agents on the fly.
For a full list of sub-commands and usage, see the
[`/agents` command reference](../reference/commands.md#agents).
### Persistent configuration (settings.json)
While the `/agents` command and agent definition files provide a starting point,
you can use `settings.json` for global, persistent overrides. This is useful for
enforcing specific models or execution limits across all sessions.
#### `agents.overrides`
Use this to enable or disable specific agents or override their run
configurations.
```json
{
"agents": {
"overrides": {
"security-auditor": {
"enabled": false,
"runConfig": {
"maxTurns": 20,
"maxTimeMinutes": 10
}
}
}
}
}
```
#### `modelConfigs.overrides`
You can target specific subagents with custom model settings (like system
instruction prefixes or specific safety settings) using the `overrideScope`
field.
```json
{
"modelConfigs": {
"overrides": [
{
"match": { "overrideScope": "security-auditor" },
"modelConfig": {
"generateContentConfig": {
"temperature": 0.1
}
}
}
]
}
}
```
### Optimizing your subagent
@@ -308,7 +417,7 @@ Gemini CLI can also delegate tasks to remote subagents using the Agent-to-Agent
> Remote subagents are currently an experimental feature.
See the [Remote Subagents documentation](remote-agents) for detailed
configuration and usage instructions.
configuration, authentication, and usage instructions.
## Extension subagents