mirror of
https://github.com/google-gemini/gemini-cli.git
synced 2026-05-12 12:54:07 -07:00
cleanup(markdown): Prettier format all markdown @ 80 char width (#10714)
This commit is contained in:
@@ -4,31 +4,56 @@
|
||||
|
||||
### 1.1 Overview
|
||||
|
||||
To standardize client integrations with the Gemini CLI agent, this document proposes the `development-tool` extension for the A2A protocol.
|
||||
To standardize client integrations with the Gemini CLI agent, this document
|
||||
proposes the `development-tool` extension for the A2A protocol.
|
||||
|
||||
Rather than creating a new protocol, this specification builds upon the existing A2A protocol. As an open-source standard recently adopted by the Linux Foundation, A2A provides a robust foundation for core concepts like tasks, messages, and streaming events. This extension-based approach allows us to leverage A2A's proven architecture while defining the specific capabilities required for rich, interactive workflows with the Gemini CLI agent.
|
||||
Rather than creating a new protocol, this specification builds upon the existing
|
||||
A2A protocol. As an open-source standard recently adopted by the Linux
|
||||
Foundation, A2A provides a robust foundation for core concepts like tasks,
|
||||
messages, and streaming events. This extension-based approach allows us to
|
||||
leverage A2A's proven architecture while defining the specific capabilities
|
||||
required for rich, interactive workflows with the Gemini CLI agent.
|
||||
|
||||
### 1.2 Motivation
|
||||
|
||||
Recent work integrating Gemini CLI with clients like Zed and Gemini Code Assist’s agent mode has highlighted the need for a robust, standard communication protocol. Standardizing on A2A provides several key advantages:
|
||||
Recent work integrating Gemini CLI with clients like Zed and Gemini Code
|
||||
Assist’s agent mode has highlighted the need for a robust, standard
|
||||
communication protocol. Standardizing on A2A provides several key advantages:
|
||||
|
||||
- **Solid Foundation**: Provides a robust, open standard that ensures a stable, predictable, and consistent integration experience across different IDEs and client surfaces.
|
||||
- **Extensibility**: Creates a flexible foundation to support new tools and workflows as they emerge.
|
||||
- **Ecosystem Alignment**: Aligns Gemini CLI with a growing industry standard, fostering broader interoperability.
|
||||
- **Solid Foundation**: Provides a robust, open standard that ensures a stable,
|
||||
predictable, and consistent integration experience across different IDEs and
|
||||
client surfaces.
|
||||
- **Extensibility**: Creates a flexible foundation to support new tools and
|
||||
workflows as they emerge.
|
||||
- **Ecosystem Alignment**: Aligns Gemini CLI with a growing industry standard,
|
||||
fostering broader interoperability.
|
||||
|
||||
## 2. Communication Flow
|
||||
|
||||
The interaction follows A2A’s task-based, streaming pattern. The client sends a `message/stream` request and the agent responds with a `contextId` / `taskId` and a stream of events. `TaskStatusUpdateEvent` events are used to convey the overall state of the task. The task is complete when the agent sends a final `TaskStatusUpdateEvent` with `final: true` and a terminal status like `completed` or `failed`.
|
||||
The interaction follows A2A’s task-based, streaming pattern. The client sends a
|
||||
`message/stream` request and the agent responds with a `contextId` / `taskId`
|
||||
and a stream of events. `TaskStatusUpdateEvent` events are used to convey the
|
||||
overall state of the task. The task is complete when the agent sends a final
|
||||
`TaskStatusUpdateEvent` with `final: true` and a terminal status like
|
||||
`completed` or `failed`.
|
||||
|
||||
### 2.1 Asynchronous Responses and Notifications
|
||||
|
||||
Clients that may disconnect from the agent should supply a `PushNotificationConfig` to the agent with the initial `message/stream` method or subsequently with the `tasks/pushNotificationConfig/set` method so that the agent can call back when updates are ready.
|
||||
Clients that may disconnect from the agent should supply a
|
||||
`PushNotificationConfig` to the agent with the initial `message/stream` method
|
||||
or subsequently with the `tasks/pushNotificationConfig/set` method so that the
|
||||
agent can call back when updates are ready.
|
||||
|
||||
## 3. The `development-tool` extension
|
||||
|
||||
### 3.1 Overview
|
||||
|
||||
The `development-tool` extension establishes a communication contract for workflows between a client and the Gemini CLI agent. It consists of a specialized set of schemas, embedded within core A2A data structures, that enable the agent to stream real-time updates on its state and thought process. These schemas also provide the mechanism for the agent to request user permission before executing tools.
|
||||
The `development-tool` extension establishes a communication contract for
|
||||
workflows between a client and the Gemini CLI agent. It consists of a
|
||||
specialized set of schemas, embedded within core A2A data structures, that
|
||||
enable the agent to stream real-time updates on its state and thought process.
|
||||
These schemas also provide the mechanism for the agent to request user
|
||||
permission before executing tools.
|
||||
|
||||
**Sample Agent Card**
|
||||
|
||||
@@ -51,15 +76,24 @@ The `development-tool` extension establishes a communication contract for workfl
|
||||
|
||||
**Versioning**
|
||||
|
||||
The agent card `uri` field contains an embedded semantic version. The client must extract this version to determine compatibility with the agent extension using the compatibility logic defined in Semantic Versioning 2.0.0 spec.
|
||||
The agent card `uri` field contains an embedded semantic version. The client
|
||||
must extract this version to determine compatibility with the agent extension
|
||||
using the compatibility logic defined in Semantic Versioning 2.0.0 spec.
|
||||
|
||||
### 3.2 Schema Definitions
|
||||
|
||||
This section defines the schemas for the `development-tool` A2A extension, organized by their function within the communication flow. Note that all custom objects included in the `metadata` field (e.g. `Message.metadata`) must be keyed by the unique URI that points to that extension’s spec to prevent naming collisions with other extensions.
|
||||
This section defines the schemas for the `development-tool` A2A extension,
|
||||
organized by their function within the communication flow. Note that all custom
|
||||
objects included in the `metadata` field (e.g. `Message.metadata`) must be keyed
|
||||
by the unique URI that points to that extension’s spec to prevent naming
|
||||
collisions with other extensions.
|
||||
|
||||
**Initialization & Configuration**
|
||||
|
||||
The first message in a session must contain an `AgentSettings` object in its metadata. This object provides the agent with the necessary configuration information for proper initialization. Additional configuration settings (ex. MCP servers, allowed tools, etc.) can be added to this message.
|
||||
The first message in a session must contain an `AgentSettings` object in its
|
||||
metadata. This object provides the agent with the necessary configuration
|
||||
information for proper initialization. Additional configuration settings (ex.
|
||||
MCP servers, allowed tools, etc.) can be added to this message.
|
||||
|
||||
**Schema**
|
||||
|
||||
@@ -75,25 +109,42 @@ message AgentSettings {
|
||||
|
||||
**Agent-to-Client Messages**
|
||||
|
||||
All real-time updates from the agent (including its thoughts, tool calls, and simple text replies) are streamed to the client as `TaskStatusUpdateEvents`.
|
||||
All real-time updates from the agent (including its thoughts, tool calls, and
|
||||
simple text replies) are streamed to the client as `TaskStatusUpdateEvents`.
|
||||
|
||||
Each Event contains a `Message` object, which holds the content in one of two formats:
|
||||
Each Event contains a `Message` object, which holds the content in one of two
|
||||
formats:
|
||||
|
||||
- **TextPart**: Used for standard text messages. This part requires no custom schema.
|
||||
- **DataPart**: Used for complex, structured objects. Tool Calls and Thoughts are sent this way, each using their respective schemas defined below.
|
||||
- **TextPart**: Used for standard text messages. This part requires no custom
|
||||
schema.
|
||||
- **DataPart**: Used for complex, structured objects. Tool Calls and Thoughts
|
||||
are sent this way, each using their respective schemas defined below.
|
||||
|
||||
**Tool Calls**
|
||||
|
||||
The `ToolCall` schema is designed to provide a structured representation of a tool’s execution lifecycle. This protocol defines a clear state machine and provides detailed schemas for common development tasks (file edits, shell commands, MCP Tool), ensuring clients can build reliable UIs without being tied to a specific agent implementation.
|
||||
The `ToolCall` schema is designed to provide a structured representation of a
|
||||
tool’s execution lifecycle. This protocol defines a clear state machine and
|
||||
provides detailed schemas for common development tasks (file edits, shell
|
||||
commands, MCP Tool), ensuring clients can build reliable UIs without being tied
|
||||
to a specific agent implementation.
|
||||
|
||||
The core principle is that the agent sends a `ToolCall` object on every update. This makes client-side logic stateless and simple.
|
||||
The core principle is that the agent sends a `ToolCall` object on every update.
|
||||
This makes client-side logic stateless and simple.
|
||||
|
||||
**Tool Call Lifecycle**
|
||||
|
||||
1. **Creation**: The agent sends a `ToolCall` object with `status: PENDING`. If user permission is required, the `confirmation_request` field will be populated.
|
||||
2. **Confirmation**: If the client needs to confirm the message, the client will send a `ToolCallConfirmation`. If the client responds with a cancellation, execution will be skipped.
|
||||
3. **Execution**: Once approved (or if no approval is required), the agent sends an update with `status: EXECUTING`. It can stream real-time progress by updating the `live_content` field.
|
||||
4. **Completion**: The agent sends a final update with the status set to `SUCCEEDED`, `FAILED`, or `CANCELLED` and populates the appropriate result field.
|
||||
1. **Creation**: The agent sends a `ToolCall` object with `status: PENDING`. If
|
||||
user permission is required, the `confirmation_request` field will be
|
||||
populated.
|
||||
2. **Confirmation**: If the client needs to confirm the message, the client
|
||||
will send a `ToolCallConfirmation`. If the client responds with a
|
||||
cancellation, execution will be skipped.
|
||||
3. **Execution**: Once approved (or if no approval is required), the agent
|
||||
sends an update with `status: EXECUTING`. It can stream real-time progress
|
||||
by updating the `live_content` field.
|
||||
4. **Completion**: The agent sends a final update with the status set to
|
||||
`SUCCEEDED`, `FAILED`, or `CANCELLED` and populates the appropriate result
|
||||
field.
|
||||
|
||||
**Schema**
|
||||
|
||||
@@ -244,7 +295,8 @@ message AgentThought {
|
||||
|
||||
**Event Metadata**
|
||||
|
||||
The `metadata` object in `TaskStatusUpdateEvent` is used by the A2A client to deserialize the `TaskStatusUpdateEvents` into their appropriate objects.
|
||||
The `metadata` object in `TaskStatusUpdateEvent` is used by the A2A client to
|
||||
deserialize the `TaskStatusUpdateEvents` into their appropriate objects.
|
||||
|
||||
**Schema**
|
||||
|
||||
@@ -280,9 +332,13 @@ message DevelopmentToolEvent {
|
||||
|
||||
**Client-to-Agent Messages**
|
||||
|
||||
When the agent sends a `TaskStatusUpdateEvent` with `status.state` set to `input-required` and its message contains a `ConfirmationRequest`, the client must respond by sending a new `message/stream` request.
|
||||
When the agent sends a `TaskStatusUpdateEvent` with `status.state` set to
|
||||
`input-required` and its message contains a `ConfirmationRequest`, the client
|
||||
must respond by sending a new `message/stream` request.
|
||||
|
||||
This new request must include the `contextId` and the `taskId` from the ongoing task and contain a `ToolCallConfirmation` object. This object conveys the user's decision regarding the tool call that was awaiting approval.
|
||||
This new request must include the `contextId` and the `taskId` from the ongoing
|
||||
task and contain a `ToolCallConfirmation` object. This object conveys the user's
|
||||
decision regarding the tool call that was awaiting approval.
|
||||
|
||||
**Schema**
|
||||
|
||||
@@ -311,11 +367,14 @@ message ModifiedFileDetails {
|
||||
|
||||
### 3.3 Method Definitions
|
||||
|
||||
This section defines the new methods introduced by the `development-tool` extension.
|
||||
This section defines the new methods introduced by the `development-tool`
|
||||
extension.
|
||||
|
||||
**Method: `commands/get`**
|
||||
|
||||
This method allows the client to discover slash commands supported by Gemini CLI. The client should call this method during startup to dynamically populate its command list.
|
||||
This method allows the client to discover slash commands supported by Gemini
|
||||
CLI. The client should call this method during startup to dynamically populate
|
||||
its command list.
|
||||
|
||||
```proto
|
||||
// Response message containing the list of all top-level slash commands.
|
||||
@@ -349,7 +408,12 @@ message SlashCommandArgument {
|
||||
|
||||
**Method: `command/execute`**
|
||||
|
||||
This method allows the client to execute a slash command. Following the initial `ExecuteSlashCommandResponse`, the agent will use the standard streaming mechanism to communicate the command's progress and output. All subsequent updates, including textual output, agent thoughts, and any required user confirmations for tool calls (like executing a shell command), will be sent as `TaskStatusUpdateEvent` messages, re-using the schemas defined above.
|
||||
This method allows the client to execute a slash command. Following the initial
|
||||
`ExecuteSlashCommandResponse`, the agent will use the standard streaming
|
||||
mechanism to communicate the command's progress and output. All subsequent
|
||||
updates, including textual output, agent thoughts, and any required user
|
||||
confirmations for tool calls (like executing a shell command), will be sent as
|
||||
`TaskStatusUpdateEvent` messages, re-using the schemas defined above.
|
||||
|
||||
```proto
|
||||
// Request to execute a specific slash command.
|
||||
@@ -390,29 +454,56 @@ message ExecuteSlashCommandResponse {
|
||||
|
||||
## 4. Separation of Concerns
|
||||
|
||||
We believe that all client-side context (ex., workspace state) and client-side tool execution (ex. read active buffers) should be routed through MCP.
|
||||
We believe that all client-side context (ex., workspace state) and client-side
|
||||
tool execution (ex. read active buffers) should be routed through MCP.
|
||||
|
||||
This approach enforces a strict separation of concerns: the A2A `development-tool` extension standardizes communication to the agent, while MCP serves as the single, authoritative interface for client-side capabilities.
|
||||
This approach enforces a strict separation of concerns: the A2A
|
||||
`development-tool` extension standardizes communication to the agent, while MCP
|
||||
serves as the single, authoritative interface for client-side capabilities.
|
||||
|
||||
## Appendix
|
||||
|
||||
### A. Example Interaction Flow
|
||||
|
||||
1. **Client -> Server**: The client sends a `message/stream` request containing the initial prompt and configuration in an `AgentSettings` object.
|
||||
1. **Client -> Server**: The client sends a `message/stream` request containing
|
||||
the initial prompt and configuration in an `AgentSettings` object.
|
||||
2. **Server -> Client**: SSE stream begins.
|
||||
- **Event 1**: The server sends a `Task` object with `status.state: 'submitted'` and the new `taskId`.
|
||||
- **Event 2**: The server sends a `TaskStatusUpdateEvent` with the metadata `kind` set to `'STATE_CHANGE'` and `status.state` set to `'working'`.
|
||||
3. **Agent Logic**: The agent processes the prompt and decides to call the `write_file` tool, which requires user confirmation.
|
||||
- **Event 1**: The server sends a `Task` object with
|
||||
`status.state: 'submitted'` and the new `taskId`.
|
||||
- **Event 2**: The server sends a `TaskStatusUpdateEvent` with the metadata
|
||||
`kind` set to `'STATE_CHANGE'` and `status.state` set to `'working'`.
|
||||
3. **Agent Logic**: The agent processes the prompt and decides to call the
|
||||
`write_file` tool, which requires user confirmation.
|
||||
4. **Server -> Client**:
|
||||
- **Event 3**: The server sends a `TaskStatusUpdateEvent`. The metadata `kind` is `'TOOL_CALL_UPDATE'`, and the `DataPart` contains a `ToolCall` object with its `status` as `'PENDING'` and a populated `confirmation_request`.
|
||||
- **Event 4**: The server sends a final `TaskStatusUpdateEvent` for this exchange. The metadata `kind` is `'STATE_CHANGE'`, the `status.state` is `'input-required'`, and `final` is `true`. The stream for this request ends.
|
||||
5. **Client**: The client UI renders the confirmation prompt based on the `ToolCall` object from Event 3. The user clicks "Approve."
|
||||
6. **Client -> Server**: The client sends a new `message/stream` request. It includes the `taskId` from the ongoing task and a `DataPart` containing a `ToolCallConfirmation` object (e.g., `{"tool_call_id": "...", "selected_option_id": "proceed_once"}`).
|
||||
- **Event 3**: The server sends a `TaskStatusUpdateEvent`. The metadata
|
||||
`kind` is `'TOOL_CALL_UPDATE'`, and the `DataPart` contains a `ToolCall`
|
||||
object with its `status` as `'PENDING'` and a populated
|
||||
`confirmation_request`.
|
||||
- **Event 4**: The server sends a final `TaskStatusUpdateEvent` for this
|
||||
exchange. The metadata `kind` is `'STATE_CHANGE'`, the `status.state` is
|
||||
`'input-required'`, and `final` is `true`. The stream for this request
|
||||
ends.
|
||||
5. **Client**: The client UI renders the confirmation prompt based on the
|
||||
`ToolCall` object from Event 3. The user clicks "Approve."
|
||||
6. **Client -> Server**: The client sends a new `message/stream` request. It
|
||||
includes the `taskId` from the ongoing task and a `DataPart` containing a
|
||||
`ToolCallConfirmation` object (e.g.,
|
||||
`{"tool_call_id": "...", "selected_option_id": "proceed_once"}`).
|
||||
7. **Server -> Client**: A new SSE stream begins for the second request.
|
||||
- **Event 1**: The server sends a `TaskStatusUpdateEvent` with `kind: 'TOOL_CALL_UPDATE'`, containing the `ToolCall` object with its `status` now set to `'EXECUTING'`.
|
||||
- **Event 2**: After the tool runs, the server sends another `TaskStatusUpdateEvent` with `kind: 'TOOL_CALL_UPDATE'`, containing the `ToolCall` with its `status` as `'SUCCEEDED'`.
|
||||
8. **Agent Logic**: The agent receives the successful tool result and generates a final textual response.
|
||||
- **Event 1**: The server sends a `TaskStatusUpdateEvent` with
|
||||
`kind: 'TOOL_CALL_UPDATE'`, containing the `ToolCall` object with its
|
||||
`status` now set to `'EXECUTING'`.
|
||||
- **Event 2**: After the tool runs, the server sends another
|
||||
`TaskStatusUpdateEvent` with `kind: 'TOOL_CALL_UPDATE'`, containing the
|
||||
`ToolCall` with its `status` as `'SUCCEEDED'`.
|
||||
8. **Agent Logic**: The agent receives the successful tool result and generates
|
||||
a final textual response.
|
||||
9. **Server -> Client**:
|
||||
- **Event 3**: The server sends a `TaskStatusUpdateEvent` with `kind: 'TEXT_CONTENT'` and a `TextPart` containing the agent's final answer.
|
||||
- **Event 4**: The server sends the final `TaskStatusUpdateEvent`. The `kind` is `'STATE_CHANGE'`, the `status.state` is `'completed'`, and `final` is `true`. The stream ends.
|
||||
10. **Client**: The client displays the final answer. The task is now complete but can be continued by sending another message with the same `taskId`.
|
||||
- **Event 3**: The server sends a `TaskStatusUpdateEvent` with
|
||||
`kind: 'TEXT_CONTENT'` and a `TextPart` containing the agent's final
|
||||
answer.
|
||||
- **Event 4**: The server sends the final `TaskStatusUpdateEvent`. The
|
||||
`kind` is `'STATE_CHANGE'`, the `status.state` is `'completed'`, and
|
||||
`final` is `true`. The stream ends.
|
||||
10. **Client**: The client displays the final answer. The task is now complete
|
||||
but can be continued by sending another message with the same `taskId`.
|
||||
|
||||
Reference in New Issue
Block a user