mirror of
https://github.com/google-gemini/gemini-cli.git
synced 2026-06-15 22:07:29 -07:00
Add validated architectural notes
This commit is contained in:
@@ -0,0 +1,529 @@
|
||||
# ADK-TS Alignment Pass
|
||||
|
||||
Every interface in our outline must map cleanly to ADK-TS. This document
|
||||
verifies that mapping field-by-field, identifies gaps, and confirms
|
||||
HITL/plugin/transfer patterns work.
|
||||
|
||||
Source: ADK-TS v0.4.0 at `/Users/adamfweidman/Desktop/adk-int/adk-js/core/src/`
|
||||
|
||||
---
|
||||
|
||||
## 1. AgentDescriptor ↔ ADK Agent Hierarchy
|
||||
|
||||
### Field-by-field mapping
|
||||
|
||||
| AgentDescriptor field | ADK-TS source | Notes |
|
||||
| ---------------------------- | -------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `name` | `BaseAgent.name` | Direct. ADK validates it's a valid JS identifier. |
|
||||
| `displayName` | — | ADK doesn't have this. No conflict. |
|
||||
| `description` | `BaseAgent.description` (optional in ADK) | Direct. Used for model routing in AgentTool. |
|
||||
| `executor` | — | New concept. ADK agents are always 'adk'. Adapter sets this. |
|
||||
| `inputSchema` | `LlmAgent.inputSchema` (Zod or JSON Schema) | Direct. ADK's AgentTool uses this for tool parameter generation. |
|
||||
| `outputSchema` | `LlmAgent.outputSchema` (Zod or JSON Schema) | Direct. ADK uses for structured output + AgentTool response. |
|
||||
| `capabilities` | — | New concept. Adapter infers from agent type: LlmAgent gets `['elicitation', 'streaming', 'host_tool_execution']`, LoopAgent gets `['composition']`, etc. |
|
||||
| `ownTools` | `LlmAgent.tools: ToolUnion[]` | Maps via ToolDescriptor adapter. ADK tools have `name`, `description`, `_getDeclaration()` which returns JSON Schema. |
|
||||
| `requiredTools` | — | New concept. ADK agents don't declare required host tools. Adapter can infer from tool references. |
|
||||
| `subAgents` | `BaseAgent.subAgents: BaseAgent[]` | Recursive. Each sub-agent becomes a nested AgentDescriptor. |
|
||||
| `constraints.maxTurns` | `RunConfig.maxLlmCalls` (default 500) | Maps, though semantics differ slightly (LLM calls vs turns). |
|
||||
| `constraints.maxTimeMinutes` | — | ADK doesn't have time limits. No conflict — host enforces. |
|
||||
| `constraints.maxBudgetUsd` | — | ADK doesn't have budget. No conflict — host enforces. |
|
||||
| `metadata` | — | New concept. Adapter can populate from agent registration context. |
|
||||
|
||||
### ADK-specific fields NOT in AgentDescriptor
|
||||
|
||||
| ADK field | Where it lives | Our approach |
|
||||
| ----------------------------------- | -------------- | ---------------------------------------------------------------------------------------------------------- |
|
||||
| `instruction` / `globalInstruction` | LlmAgent | Executor-internal. Not in descriptor (it's runtime config, not identity). |
|
||||
| `model` | LlmAgent | Goes in ExecutionOptions.model or executor-internal config. |
|
||||
| `generateContentConfig` | LlmAgent | Executor-internal. |
|
||||
| `disallowTransferToParent/Peers` | LlmAgent | Could be `constraints` or `_meta`. Transfer policy is host-enforced. |
|
||||
| `includeContents` | LlmAgent | Executor-internal (context management). |
|
||||
| `outputKey` | LlmAgent | Executor-internal (state management). |
|
||||
| `beforeModelCallback`, etc. | LlmAgent | Executor-internal. These are ADK's callback system — our LifecycleInterceptor is the interface equivalent. |
|
||||
|
||||
### Verdict: CLEAN MAPPING
|
||||
|
||||
AgentDescriptor captures everything needed to describe an ADK agent externally.
|
||||
ADK-specific runtime config (instruction, model, callbacks) stays inside the
|
||||
executor — exactly right for the descriptor/executor separation.
|
||||
|
||||
**Key ADK pattern preserved:** AgentTool wraps an agent as a tool using
|
||||
`inputSchema` for parameters and `description` for the tool description. Our
|
||||
AgentDescriptor has both, so SubagentTool can do the same thing.
|
||||
|
||||
---
|
||||
|
||||
## 2. AgentSession ↔ ADK Runner
|
||||
|
||||
### Method mapping
|
||||
|
||||
| AgentSession method | ADK-TS equivalent | How adapter works |
|
||||
| ----------------------- | --------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `stream(data, options)` | `Runner.runAsync({ userId, sessionId, newMessage, runConfig })` | Adapter creates/loads session, maps data+options → runAsync params, wraps Event generator → AgentEvent generator. Each `stream()` call triggers a new `runAsync()`. |
|
||||
| `update(config)` | No direct equivalent | ADK doesn't support mid-stream config changes. Adapter queues updates for next `runAsync()` call. |
|
||||
| `steer(data)` | No direct equivalent | ADK doesn't support mid-stream intervention. Adapter can queue for next invocation or ignore. |
|
||||
| `abort()` | No direct equivalent | ADK uses `invocationContext.endInvocation = true`. Adapter sets this flag. Could also use AbortController. |
|
||||
|
||||
### ExecutionRequest → Runner.runAsync mapping
|
||||
|
||||
| ExecutionRequest field | ADK mapping |
|
||||
| --------------------------- | ---------------------------------------------------------------- |
|
||||
| `descriptor` | Used to find/create the BaseAgent instance |
|
||||
| `input` | → `newMessage: Content` (converted from ContentPart[] → Content) |
|
||||
| `sessionRef` | → `sessionId` (string) or creates session from SessionSnapshot |
|
||||
| `forkSession` | Adapter clones session before running |
|
||||
| `options.tools` | → merged into agent's `tools` config |
|
||||
| `options.model` | → `LlmAgent.model` override |
|
||||
| `options.hostToolExecution` | → `RunConfig.pauseOnToolCalls: true` |
|
||||
| `options.streaming` | → `RunConfig.streamingMode` |
|
||||
| `options.permissionMode` | → SecurityPlugin config |
|
||||
| `signal` | → wired to `invocationContext.endInvocation` |
|
||||
|
||||
### HITL: How pauseOnToolCalls works end-to-end
|
||||
|
||||
This is the critical path. Here's the full flow:
|
||||
|
||||
```
|
||||
1. LLM returns tool call (FunctionCall in Event)
|
||||
2. ADK checks RunConfig.pauseOnToolCalls === true
|
||||
3. ADK sets invocationContext.endInvocation = true
|
||||
4. ADK yields the Event (with FunctionCall) and stops
|
||||
5. Runner.runAsync() generator completes
|
||||
|
||||
--- OUR INTERFACE BOUNDARY ---
|
||||
|
||||
6. Adapter translates ADK Event → ToolRequestEvent
|
||||
7. Host receives ToolRequestEvent from session.stream() generator
|
||||
8. Host runs policy check (PolicyEvaluator.evaluate())
|
||||
9. Host fires hooks (LifecycleInterceptor.fire('before_tool', ...))
|
||||
10. If policy allows → Host executes tool → gets ToolResultData
|
||||
11. Host calls session.stream({ kind: 'tool_result', ... }) to get next stream
|
||||
|
||||
--- BACK INTO ADK ---
|
||||
|
||||
12. Adapter receives tool result
|
||||
13. Adapter creates FunctionResponse Content
|
||||
14. Adapter calls Runner.runAsync() again with FunctionResponse as newMessage
|
||||
15. ADK loads session (has prior tool call event)
|
||||
16. ADK resumes agent with tool response
|
||||
17. Loop continues from step 1
|
||||
```
|
||||
|
||||
**Why this works:** ADK's `pauseOnToolCalls` was designed exactly for this
|
||||
pattern — external tool execution by a host. The adapter translates between
|
||||
ADK's "end invocation + resume with FunctionResponse" pattern and our
|
||||
"ToolRequestEvent + send(tool_result)" pattern.
|
||||
|
||||
**Key insight:** Each `session.stream()` call triggers a new `Runner.runAsync()`
|
||||
call. This means each ADK "invocation" maps to one `stream()` call. The session
|
||||
persists state across invocations. Mid-stream `update()` and `steer()` calls are
|
||||
queued for the next invocation since ADK doesn't support mid-turn changes.
|
||||
|
||||
### HITL: ToolConfirmation flow
|
||||
|
||||
ADK also has a separate ToolConfirmation pattern (via
|
||||
`context.requestConfirmation()`):
|
||||
|
||||
```
|
||||
1. beforeToolCallback calls context.requestConfirmation({ hint: '...' })
|
||||
2. This sets eventActions.requestedToolConfirmations[functionCallId]
|
||||
3. ADK yields event with requestedToolConfirmations populated
|
||||
4. Runner completes (invocation ends)
|
||||
|
||||
--- OUR INTERFACE BOUNDARY ---
|
||||
|
||||
5. Adapter sees requestedToolConfirmations in event
|
||||
6. Adapter translates → ElicitationRequest { kind: 'tool_confirmation', ... }
|
||||
7. Host renders confirmation UI
|
||||
8. User responds → ElicitationResponse { action: 'accept' | 'decline' }
|
||||
|
||||
--- BACK INTO ADK ---
|
||||
|
||||
9. Adapter receives elicitation response
|
||||
10. If accepted: Adapter creates FunctionResponse with confirmed=true
|
||||
11. Calls Runner.runAsync() with FunctionResponse
|
||||
12. ADK's SecurityPlugin or callback reads confirmation from session
|
||||
13. Tool executes
|
||||
```
|
||||
|
||||
**Maps to our ElicitationRequest:** ADK's `ToolConfirmation.hint` →
|
||||
`ElicitationRequest.message`. ADK's `ToolConfirmation.payload` →
|
||||
`ElicitationRequest.context`. The `kind: 'tool_confirmation'` is the
|
||||
discriminator.
|
||||
|
||||
### HITL: Auth request flow
|
||||
|
||||
```
|
||||
1. Tool or callback calls context.requestCredential(authConfig)
|
||||
2. Sets eventActions.requestedAuthConfigs[functionCallId]
|
||||
3. Event yields, invocation ends
|
||||
|
||||
--- OUR INTERFACE BOUNDARY ---
|
||||
|
||||
4. Adapter sees requestedAuthConfigs
|
||||
5. Translates → ElicitationRequest { kind: 'auth_required', context: authConfig }
|
||||
6. User provides credentials
|
||||
7. ElicitationResponse { action: 'accept', content: { credential: ... } }
|
||||
|
||||
--- BACK INTO ADK ---
|
||||
|
||||
8. Adapter stores credential via CredentialService
|
||||
9. Calls Runner.runAsync() again
|
||||
10. Tool calls context.getAuthResponse() → gets credential
|
||||
```
|
||||
|
||||
**Maps to our ElicitationRequest:** ADK's auth pattern is just another
|
||||
elicitation kind. This validates our generic elicitation design — it handles
|
||||
tool confirmation, auth, and any future interaction type.
|
||||
|
||||
---
|
||||
|
||||
## 3. AgentEvent ↔ ADK Event
|
||||
|
||||
### Event type mapping
|
||||
|
||||
| Our AgentEvent | ADK Event pattern | Adapter translation |
|
||||
| --------------------- | --------------------------------------------------------------------------------- | --------------------------------------------- |
|
||||
| `InitializeEvent` | First event from Runner.runAsync() | Adapter emits on first stream() call |
|
||||
| `SessionUpdateEvent` | `eventActions.stateDelta` | Adapter emits when stateDelta is non-empty |
|
||||
| `MessageEvent` | `event.content` with text Parts | Filter text/thought parts from Content |
|
||||
| `ToolRequestEvent` | `getFunctionCalls(event)` returns FunctionCall[] | Each FunctionCall → one ToolRequestEvent |
|
||||
| `ToolUpdateEvent` | `event.longRunningToolIds` | Adapter emits progress for long-running tools |
|
||||
| `ToolResponseEvent` | `getFunctionResponses(event)` returns FunctionResponse[] | Each FunctionResponse → one ToolResponseEvent |
|
||||
| `ElicitationRequest` | `eventActions.requestedToolConfirmations` or `requestedAuthConfigs` | Map to generic elicitation |
|
||||
| `ElicitationResponse` | User input → FunctionResponse in next runAsync call | Reverse of above |
|
||||
| `UsageEvent` | `event.usageMetadata` (GenerateContentResponseUsageMetadata) | Map token counts |
|
||||
| `ErrorEvent` | `event.errorCode` + `event.errorMessage` | Map error fields |
|
||||
| `stream_end` | `isFinalResponse(event)`, `eventActions.transferToAgent`, `eventActions.escalate` | Derive `stream_end` reason from ADK signals |
|
||||
| `CustomEvent` | `event.customMetadata` | Pass through |
|
||||
|
||||
### ADK EventActions → Our events
|
||||
|
||||
| EventActions field | Our event | Notes |
|
||||
| ---------------------------- | ------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
|
||||
| `stateDelta` | SessionUpdate or embedded in other events | Delta state is a core ADK pattern |
|
||||
| `artifactDelta` | `CustomEvent { kind: 'artifact_delta' }` | Artifacts not in our core events |
|
||||
| `transferToAgent` | Tool call (`transfer_to_agent`) + `stream_end` `reason: 'completed'` | Handoff is a tool call. Host intercepts the tool request, mediates the handoff, originating agent completes. |
|
||||
| `escalate` | `stream_end` `reason: 'completed'` with `data: { escalateReason: '...' }` | LoopAgent exit signal. ADK's escalate = "I'm done, pass control back up" |
|
||||
| `requestedToolConfirmations` | `ElicitationRequest { kind: 'tool_confirmation' }` | Per function call ID |
|
||||
| `requestedAuthConfigs` | `ElicitationRequest { kind: 'auth_required' }` | Per function call ID |
|
||||
| `skipSummarization` | `_meta: { skipSummarization: true }` | ADK-specific, goes in metadata |
|
||||
|
||||
### AgentEventBase mapping
|
||||
|
||||
| AgentEventBase field | ADK Event field | Notes |
|
||||
| -------------------- | ---------------------------------------- | ------------------------------------------------- |
|
||||
| `id` | `event.id` | Direct |
|
||||
| `timestamp` | `event.timestamp` (number) | Convert to ISO 8601 string |
|
||||
| `type` | Derived from content analysis | ADK doesn't have event types — adapter classifies |
|
||||
| `agentId` | `event.author` (agent name) or context | **New field** — which agent emitted this event |
|
||||
| `threadId` | `event.branch` (e.g., "agent_1.agent_2") | Direct mapping |
|
||||
| `source` | `event.author` ("user" or agent name) | Direct |
|
||||
| `_meta` | `event.customMetadata` | Direct |
|
||||
|
||||
### Verdict: CLEAN MAPPING
|
||||
|
||||
Every ADK event pattern maps to our event types. The adapter classifies ADK's
|
||||
untyped events into our typed event taxonomy. Key insight: ADK events are richer
|
||||
(they carry EventActions, function calls, auth requests all in one event), so
|
||||
the adapter may fan out one ADK Event into multiple AgentEvents (e.g., one
|
||||
Message + one ToolRequest + one ElicitationRequest). The new `agentId` field
|
||||
maps directly from ADK's `event.author`.
|
||||
|
||||
---
|
||||
|
||||
## 4. ToolContract ↔ ADK Tool System
|
||||
|
||||
### ToolDescriptor ↔ BaseTool
|
||||
|
||||
| ToolDescriptor field | ADK source | Notes |
|
||||
| ------------------------- | ------------------------------------------------------------- | --------------------------------- |
|
||||
| `name` | `BaseTool.name` | Direct |
|
||||
| `displayName` | — | ADK doesn't have this |
|
||||
| `description` | `BaseTool.description` | Direct |
|
||||
| `parametersSchema` | `BaseTool._getDeclaration()` → FunctionDeclaration.parameters | JSON Schema from declaration |
|
||||
| `annotations.readOnly` | Inferred from tool type | FunctionTool with no side effects |
|
||||
| `annotations.longRunning` | `BaseTool.isLongRunning` | Direct |
|
||||
|
||||
### ToolCallRequest ↔ FunctionCall
|
||||
|
||||
| ToolCallRequest | ADK FunctionCall | Notes |
|
||||
| --------------- | ------------------- | ------ |
|
||||
| `requestId` | `functionCall.id` | Direct |
|
||||
| `name` | `functionCall.name` | Direct |
|
||||
| `args` | `functionCall.args` | Direct |
|
||||
|
||||
### ToolResultData ↔ FunctionResponse + tool return
|
||||
|
||||
| ToolResultData | ADK | Notes |
|
||||
| ---------------- | ------------------------------ | ------------------------------------------------ |
|
||||
| `llmContent` | `FunctionResponse.response` | Adapter wraps into ContentPart[] |
|
||||
| `displayContent` | — | ADK doesn't separate display from model content |
|
||||
| `isError` | Error thrown from `runAsync()` | Adapter catches and sets flag |
|
||||
| `tailCalls` | — | ADK doesn't have tail calls (gemini-cli concept) |
|
||||
|
||||
### AgentTool pattern
|
||||
|
||||
ADK's `AgentTool` wraps a `BaseAgent` as a `BaseTool`:
|
||||
|
||||
- Uses `agent.inputSchema` for tool parameters
|
||||
- Uses `agent.description` for tool description
|
||||
- Creates internal Runner with isolated session
|
||||
- Returns agent output as tool result
|
||||
- Merges state deltas back to parent
|
||||
|
||||
**Our equivalent:** `SubagentTool` wraps `AgentDescriptor` as a tool:
|
||||
|
||||
- Uses `descriptor.inputSchema` for tool parameters
|
||||
- Uses `descriptor.description` for tool description
|
||||
- Creates executor via `SessionFactory.create(descriptor, context)`
|
||||
- Returns execution result as tool result
|
||||
|
||||
**Mapping is 1:1.** The only difference is ADK does it with concrete agent
|
||||
instances; we do it with descriptors + factory.
|
||||
|
||||
---
|
||||
|
||||
## 5. LifecycleInterceptor ↔ ADK Plugin System
|
||||
|
||||
### Hook point mapping
|
||||
|
||||
| Our hook point string | ADK Plugin callback | Mapping |
|
||||
| --------------------- | ----------------------- | ------------------------------------------ |
|
||||
| `'before_agent'` | `beforeAgentCallback` | `payload: { agent, context }` |
|
||||
| `'after_agent'` | `afterAgentCallback` | `payload: { agent, context }` |
|
||||
| `'before_model'` | `beforeModelCallback` | `payload: { context, llmRequest }` |
|
||||
| `'after_model'` | `afterModelCallback` | `payload: { context, llmResponse }` |
|
||||
| `'before_tool'` | `beforeToolCallback` | `payload: { tool, args, context }` |
|
||||
| `'after_tool'` | `afterToolCallback` | `payload: { tool, args, context, result }` |
|
||||
| `'on_event'` | `onEventCallback` | `payload: { event }` |
|
||||
| `'on_user_message'` | `onUserMessageCallback` | `payload: { userMessage }` |
|
||||
| `'before_run'` | `beforeRunCallback` | `payload: { context }` |
|
||||
| `'after_run'` | `afterRunCallback` | `payload: { context }` |
|
||||
| `'on_model_error'` | `onModelErrorCallback` | `payload: { request, error }` |
|
||||
| `'on_tool_error'` | `onToolErrorCallback` | `payload: { tool, args, error }` |
|
||||
|
||||
### HookResult ↔ ADK callback return
|
||||
|
||||
| HookResult field | ADK pattern | Notes |
|
||||
| ------------------- | ----------------------------------------------- | ----------------------------------- |
|
||||
| `action: 'proceed'` | Return `undefined` | Plugin returns nothing → continue |
|
||||
| `action: 'block'` | Return `Content` (for agent/model) or throw | Non-undefined return short-circuits |
|
||||
| `modifications` | Return modified `LlmRequest`/`LlmResponse`/args | Plugin returns modified version |
|
||||
|
||||
### ADK's early-exit pattern
|
||||
|
||||
ADK plugins use "first non-undefined return wins":
|
||||
|
||||
- `beforeModelCallback` returns `LlmResponse` → skips LLM call entirely (cache
|
||||
hit)
|
||||
- `beforeToolCallback` returns modified `args` → tool runs with new args
|
||||
- `beforeAgentCallback` returns `Content` → skips agent run entirely
|
||||
|
||||
Our `HookResult.modifications` carries the same data. The `action: 'block'` +
|
||||
return value pattern maps cleanly.
|
||||
|
||||
### gemini-cli hooks NOT in ADK
|
||||
|
||||
| gemini-cli hook | ADK equivalent | Notes |
|
||||
| --------------------- | ------------------------------------ | ------------------------------------------------------------- |
|
||||
| `BeforeToolSelection` | — | ADK doesn't let you modify which tools are available mid-turn |
|
||||
| `Notification` | — | ADK doesn't have notification hooks |
|
||||
| `SessionStart` | `onUserMessageCallback` (first call) | Close enough |
|
||||
| `SessionEnd` | `afterRunCallback` | Close enough |
|
||||
| `PreCompress` | — | ADK doesn't have context compression hooks |
|
||||
|
||||
These gaps are fine — they're gemini-cli-specific hook points. Our generic
|
||||
`fire(hookPoint, payload)` handles them because the hook point is an open
|
||||
string. ADK executors simply don't fire these hook points, and
|
||||
`supportedHookPoints()` reflects that.
|
||||
|
||||
---
|
||||
|
||||
## 6. PolicyEvaluator ↔ ADK SecurityPlugin
|
||||
|
||||
### ADK SecurityPlugin
|
||||
|
||||
```typescript
|
||||
class SecurityPlugin extends BasePlugin {
|
||||
policyEngine: BasePolicyEngine;
|
||||
|
||||
// In beforeToolCallback:
|
||||
async beforeToolCallback({ tool, args, context }) {
|
||||
const outcome = await this.policyEngine.evaluate(tool.name, args);
|
||||
switch (outcome) {
|
||||
case PolicyOutcome.DENY:
|
||||
throw error;
|
||||
case PolicyOutcome.CONFIRM:
|
||||
context.requestConfirmation({ hint });
|
||||
case PolicyOutcome.ALLOW:
|
||||
return undefined; // proceed
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Mapping
|
||||
|
||||
| Our PolicyEvaluator | ADK SecurityPlugin | Notes |
|
||||
| ------------------------- | --------------------------------------------------------- | ------------------------------------------ |
|
||||
| `evaluate(request)` | `policyEngine.evaluate(toolName, args)` | ADK is simpler — tool name + args only |
|
||||
| `PolicyDecision.allow` | `PolicyOutcome.ALLOW` | Direct |
|
||||
| `PolicyDecision.deny` | `PolicyOutcome.DENY` | Direct |
|
||||
| `PolicyDecision.ask_user` | `PolicyOutcome.CONFIRM` → `context.requestConfirmation()` | ADK chains to ToolConfirmation |
|
||||
| `getExcluded()` | — | ADK doesn't pre-filter tools |
|
||||
| `request.principal` | — | ADK doesn't track who's calling |
|
||||
| `request.principalPath` | Could use `context.agentName` + branch | For hierarchical policy |
|
||||
| `request.context` | — | Our extension point for host-specific data |
|
||||
|
||||
### How ADK policy maps when host controls execution
|
||||
|
||||
With `pauseOnToolCalls: true`, the flow is:
|
||||
|
||||
1. ADK yields tool call → adapter converts to ToolRequestEvent
|
||||
2. **Host** runs PolicyEvaluator.evaluate() — NOT ADK's SecurityPlugin
|
||||
3. Host decides allow/deny/ask_user
|
||||
4. If allowed, host executes tool and sends result via `session.stream()`
|
||||
|
||||
This means **ADK's SecurityPlugin is bypassed when the host controls tool
|
||||
execution** — which is correct! The host's PolicyEvaluator is the authority.
|
||||
ADK's SecurityPlugin only matters when ADK executes tools internally
|
||||
(`pauseOnToolCalls: false`).
|
||||
|
||||
---
|
||||
|
||||
## 7. SessionContract ↔ ADK Session
|
||||
|
||||
### Session mapping
|
||||
|
||||
| Our SessionHandle | ADK Session | Notes |
|
||||
| ----------------- | ---------------------------------------- | ----------------------------------------- |
|
||||
| `id` | `Session.id` | Direct |
|
||||
| `agentName` | `Session.appName` | ADK uses appName, not agent name |
|
||||
| `events` | `Session.events: Event[]` | Direct (but ADK Events → our AgentEvents) |
|
||||
| `state` | `Session.state: Record<string, unknown>` | Direct |
|
||||
| `lastUpdateTime` | `Session.lastUpdateTime` | Direct |
|
||||
|
||||
### SessionProvider ↔ BaseSessionService
|
||||
|
||||
| Our SessionProvider | ADK BaseSessionService | Notes |
|
||||
| ----------------------------- | ----------------------------------------------- | -------------------------- |
|
||||
| `create(agentName, metadata)` | `createSession({ appName, userId })` | ADK requires userId |
|
||||
| `load(sessionId)` | `getSession({ appName, userId, sessionId })` | ADK requires all three IDs |
|
||||
| `list(agentName)` | `listSessions({ appName, userId })` | ADK scopes by userId |
|
||||
| `delete(sessionId)` | `deleteSession({ appName, userId, sessionId })` | Same pattern |
|
||||
|
||||
### Gap: ADK requires userId
|
||||
|
||||
ADK sessions are scoped by `(appName, userId, sessionId)`. Our interface uses
|
||||
just `sessionId`. The adapter can embed userId in the session metadata or derive
|
||||
it from HostContext.
|
||||
|
||||
### State prefixes (ADK-specific)
|
||||
|
||||
ADK uses prefixed state keys:
|
||||
|
||||
- `app:` — app-scoped, persisted
|
||||
- `user:` — user-scoped, persisted
|
||||
- `temp:` — temporary, stripped before persistence
|
||||
|
||||
Our `SessionHandle.state` is a flat `Record<string, unknown>`. The adapter
|
||||
preserves prefixes as-is — they're just string keys. No conflict.
|
||||
|
||||
---
|
||||
|
||||
## 8. ContentPart ↔ ADK Content/Part
|
||||
|
||||
### ADK uses Google GenAI types
|
||||
|
||||
ADK's `Content` and `Part` come from `@google/genai`:
|
||||
|
||||
```typescript
|
||||
interface Content {
|
||||
role?: string; // 'user' | 'model'
|
||||
parts: Part[];
|
||||
}
|
||||
|
||||
type Part = TextPart | InlineDataPart | FunctionCallPart | FunctionResponsePart | ...
|
||||
```
|
||||
|
||||
### Mapping
|
||||
|
||||
| Our ContentPart | ADK/GenAI Part | Notes |
|
||||
| --------------------------------------------------- | ---------------------------------------------- | ------------------------------------------------------ |
|
||||
| `{ type: 'text', text }` | `{ text: string }` | Direct |
|
||||
| `{ type: 'thought', thought }` | `{ thought: true, text: string }` | ADK uses `thought` boolean flag on TextPart |
|
||||
| `{ type: 'media', mimeType, data }` | `{ inlineData: { mimeType, data } }` | Restructure |
|
||||
| `{ type: 'reference', text, uri }` | `{ fileData: { fileUri, mimeType } }` | Map fileData → reference |
|
||||
| `{ type: 'refusal', text }` | — | Not in ADK/GenAI. Adapter would map from finishReason. |
|
||||
| `{ type: 'function_call', name, args, id }` | `{ functionCall: { name, args, id } }` | Unwrap |
|
||||
| `{ type: 'function_response', name, response, id }` | `{ functionResponse: { name, response, id } }` | Unwrap |
|
||||
|
||||
### Verdict: CLEAN MAPPING
|
||||
|
||||
The adapter converts between our flat discriminated union and ADK's nested Part
|
||||
structure. No information loss in either direction.
|
||||
|
||||
---
|
||||
|
||||
## 9. Composition ↔ ADK Agent Patterns
|
||||
|
||||
| Our CompositionConfig.pattern | ADK Agent type | Notes |
|
||||
| ----------------------------- | -------------------------------------- | ------------------------------------------------ |
|
||||
| `'hierarchical'` | Any agent with `subAgents` | Default — parent calls sub-agents as tools |
|
||||
| `'sequential'` | `SequentialAgent` | Runs children in order |
|
||||
| `'parallel'` | `ParallelAgent` | Runs children concurrently, branch isolation |
|
||||
| `'loop'` | `LoopAgent` | Repeats children until escalate or maxIterations |
|
||||
| `'transfer'` | LlmAgent with `transfer_to_agent` tool | Peer-to-peer handoff |
|
||||
|
||||
### Branch isolation
|
||||
|
||||
ADK's `ParallelAgent` gives each child an isolated `branch` context:
|
||||
|
||||
- Children don't see peer events
|
||||
- Each gets unique branch path: `"parent.child_0"`, `"parent.child_1"`
|
||||
- Results merged after all complete
|
||||
|
||||
Maps to our `threadId` — each parallel branch gets a unique threadId. Events
|
||||
from different branches are interleaved by the host.
|
||||
|
||||
---
|
||||
|
||||
## 10. Summary: Gaps and Resolutions
|
||||
|
||||
### No gaps blocking ADK integration:
|
||||
|
||||
| Concern | Status | Resolution |
|
||||
| ----------------------- | --------- | ------------------------------------------------------------------------- |
|
||||
| pauseOnToolCalls HITL | **Works** | Adapter maps to stream() cycle (§2) |
|
||||
| ToolConfirmation | **Works** | Maps to ElicitationRequest (§2) |
|
||||
| Auth requests | **Works** | Maps to ElicitationRequest (§2) |
|
||||
| Plugin hooks (12 types) | **Works** | Maps to LifecycleInterceptor.fire() (§5) |
|
||||
| Agent transfers | **Works** | Tool call (`transfer_to_agent`) + `stream_end` `reason: 'completed'` (§3) |
|
||||
| State delta pattern | **Works** | SessionUpdateEvent or \_meta (§3) |
|
||||
| Branch isolation | **Works** | threadId mapping (§9) |
|
||||
| AgentTool pattern | **Works** | SubagentTool with descriptor + factory (§4) |
|
||||
| Session management | **Works** | Adapter maps userId into session (§7) |
|
||||
|
||||
### Minor adapter complexity:
|
||||
|
||||
1. **Event fan-out:** One ADK Event may become multiple AgentEvents (message +
|
||||
tool call + elicitation). Adapter logic needed but straightforward.
|
||||
2. **userId scoping:** ADK sessions require userId; our interface doesn't.
|
||||
Adapter derives from HostContext.
|
||||
3. **Timestamp format:** ADK uses `number` (epoch ms); we use ISO 8601 string.
|
||||
Simple conversion.
|
||||
4. **Content structure:** ADK uses nested Part types; we use flat discriminated
|
||||
union. Adapter converts bidirectionally.
|
||||
|
||||
### ADK features our interface supports that gemini-cli doesn't have yet:
|
||||
|
||||
- `LoopAgent` / `ParallelAgent` / `SequentialAgent` composition → our
|
||||
CompositionConfig
|
||||
- `eventActions.stateDelta` → our SessionUpdateEvent
|
||||
- `eventActions.transferToAgent` → tool call (`transfer_to_agent`) +
|
||||
`stream_end` `reason: 'completed'`
|
||||
- `eventActions.escalate` → `stream_end` `reason: 'completed'` with
|
||||
`data: { escalateReason }`
|
||||
- Long-running tools → our ToolUpdateEvent
|
||||
- Auth credential flow → our ElicitationRequest with kind: 'auth_required'
|
||||
@@ -0,0 +1,274 @@
|
||||
# ADK-TS (Agent Development Kit - TypeScript) Architecture Notes
|
||||
|
||||
## Package: `@google/adk` v0.4.0
|
||||
|
||||
**Location:** `/Users/adamfweidman/Desktop/adk-int/adk-js/core/`
|
||||
|
||||
## Agent Hierarchy
|
||||
|
||||
```
|
||||
BaseAgent (abstract)
|
||||
├── LlmAgent - Model-driven agent with tools (the main one)
|
||||
├── LoopAgent - Runs sub-agents in a loop (maxIterations, escalate to exit)
|
||||
├── ParallelAgent - Runs sub-agents concurrently (isolated branches)
|
||||
└── SequentialAgent - Runs sub-agents sequentially
|
||||
```
|
||||
|
||||
### BaseAgent Config
|
||||
|
||||
- `name: string` - Unique identifier (must be valid JS identifier)
|
||||
- `description?: string` - One-line capability for model routing
|
||||
- `parentAgent?: BaseAgent` - Parent in agent tree
|
||||
- `subAgents?: BaseAgent[]` - Child agents
|
||||
- `beforeAgentCallback / afterAgentCallback` - Pre/post execution hooks
|
||||
|
||||
### LlmAgent Config (extends BaseAgent)
|
||||
|
||||
- `model?: string | BaseLlm` - LLM to use
|
||||
- `instruction?: string | InstructionProvider` - Agent-specific instructions
|
||||
- `globalInstruction?: string | InstructionProvider` - Tree-wide (root only)
|
||||
- `tools?: ToolUnion[]` - Available tools
|
||||
- `generateContentConfig?: GenerateContentConfig` - LLM params
|
||||
- `disallowTransferToParent / disallowTransferToPeers` - Transfer controls
|
||||
- `includeContents?: 'default' | 'none'` - Context history inclusion
|
||||
- `inputSchema / outputSchema` - Validation schemas
|
||||
- `outputKey?: string` - Session state key for output storage
|
||||
- `beforeModelCallback / afterModelCallback` - LLM hooks
|
||||
- `beforeToolCallback / afterToolCallback` - Tool hooks
|
||||
- `requestProcessors / responseProcessors` - LLM request/response processors
|
||||
- `codeExecutor?: BaseCodeExecutor`
|
||||
|
||||
## Event System
|
||||
|
||||
### Event Interface
|
||||
|
||||
```typescript
|
||||
interface Event extends LlmResponse {
|
||||
id: string;
|
||||
invocationId: string;
|
||||
author?: string; // "user" or agent name
|
||||
actions: EventActions; // State/artifact/auth/transfer operations
|
||||
longRunningToolIds?: string[];
|
||||
branch?: string; // Hierarchical agent path
|
||||
timestamp: number;
|
||||
content?: Content;
|
||||
partial?: boolean; // Streaming indicator
|
||||
}
|
||||
```
|
||||
|
||||
### EventActions
|
||||
|
||||
```typescript
|
||||
interface EventActions {
|
||||
skipSummarization?: boolean;
|
||||
stateDelta: Record<string, unknown>;
|
||||
artifactDelta: Record<string, number>;
|
||||
transferToAgent?: string;
|
||||
escalate?: boolean;
|
||||
requestedAuthConfigs: Record<string, AuthConfig>;
|
||||
requestedToolConfirmations: Record<string, ToolConfirmation>;
|
||||
}
|
||||
```
|
||||
|
||||
### Structured Events (utility layer)
|
||||
|
||||
Converts raw Event to discriminated union:
|
||||
|
||||
```
|
||||
EventType: THOUGHT | CONTENT | TOOL_CALL | TOOL_RESULT | CALL_CODE |
|
||||
CODE_RESULT | ERROR | ACTIVITY | TOOL_CONFIRMATION | FINISHED
|
||||
```
|
||||
|
||||
## Tool System
|
||||
|
||||
### BaseTool (abstract)
|
||||
|
||||
- `name, description, isLongRunning`
|
||||
- `_getDeclaration(): FunctionDeclaration` - OpenAPI schema for LLM
|
||||
- `runAsync(request): Promise<unknown>` - Execute tool
|
||||
- `processLlmRequest(request): Promise<void>` - Preprocessing
|
||||
|
||||
### Concrete Tool Types
|
||||
|
||||
1. **FunctionTool** - Generic typed tools (Zod schema support)
|
||||
2. **AgentTool** - Wrap agents as tools (for hierarchical composition)
|
||||
3. **MCPTool** - Model Context Protocol server tools
|
||||
4. **GoogleSearchTool** - Built-in web search
|
||||
5. **ExitLoopTool** - Signal loop exit
|
||||
6. **LongRunningFunctionTool** - Async long-running operations
|
||||
|
||||
### BaseToolset
|
||||
|
||||
- Filter tools by predicate or string list
|
||||
- `getTools(context)`, `close()`, `isToolSelected()`
|
||||
- **MCPToolset** - Toolset for MCP server connections
|
||||
|
||||
## Session Management
|
||||
|
||||
### Session Interface
|
||||
|
||||
```typescript
|
||||
interface Session {
|
||||
id: string;
|
||||
appName: string;
|
||||
userId: string;
|
||||
state: Record<string, unknown>; // Mutable key-value store
|
||||
events: Event[]; // Complete conversation history
|
||||
lastUpdateTime: number;
|
||||
}
|
||||
```
|
||||
|
||||
### Session Services
|
||||
|
||||
- `BaseSessionService` (abstract) - createSession, getSession, listSessions,
|
||||
deleteSession, appendEvent
|
||||
- `InMemorySessionService` - In-process storage
|
||||
- `DatabaseSessionService` - Mikro-ORM backed (SQL)
|
||||
|
||||
### State Management
|
||||
|
||||
- `State` class wraps base state + delta
|
||||
- `get()` returns from delta if present, else base
|
||||
- `set()` updates delta only
|
||||
- `hasDelta()` checks if changes made
|
||||
|
||||
## Human-in-the-Loop (HITL)
|
||||
|
||||
### Tool Confirmation
|
||||
|
||||
```typescript
|
||||
class ToolConfirmation {
|
||||
hint?: string; // Guidance for user
|
||||
confirmed: boolean; // User approval
|
||||
payload?: unknown; // Additional context
|
||||
}
|
||||
```
|
||||
|
||||
### Security Plugin
|
||||
|
||||
- `beforeToolCallback` - Evaluates policy before tool execution
|
||||
- `BasePolicyEngine` interface with `evaluate()` method
|
||||
- `PolicyOutcome`: DENY | CONFIRM | ALLOW
|
||||
|
||||
### Auth Requests
|
||||
|
||||
- `context.requestCredential(authConfig)` - Request auth from user
|
||||
- `context.getAuthResponse(authConfig)` - Check for auth response
|
||||
- Sets `eventActions.requestedAuthConfigs[functionCallId]`
|
||||
|
||||
## Multi-Agent Patterns
|
||||
|
||||
### Agent Transfer
|
||||
|
||||
- LlmAgent injects `transfer_to_agent(agentName)` tool
|
||||
- Sets `eventActions.transferToAgent = targetAgentName`
|
||||
- Runner resolves target and continues
|
||||
- Can transfer to: sub-agents, parent (if not disabled), peers (if not disabled)
|
||||
|
||||
### Parallel Agent
|
||||
|
||||
- Runs all subAgents concurrently
|
||||
- Isolates each via `branch` context
|
||||
- Sub-agents don't see peer history
|
||||
- Merges event streams with fair ordering
|
||||
|
||||
### Loop Agent
|
||||
|
||||
- Repeatedly runs subAgents
|
||||
- `maxIterations` caps loop count
|
||||
- Exits on `event.actions.escalate === true`
|
||||
|
||||
## Plugin System
|
||||
|
||||
### BasePlugin Lifecycle Hooks (14 hooks!)
|
||||
|
||||
- `onUserMessageCallback` - Preprocess user messages
|
||||
- `beforeRunCallback` - Before agent run (can short-circuit)
|
||||
- `onEventCallback` - Per-event (can modify events)
|
||||
- `afterRunCallback` - Final cleanup
|
||||
- `beforeAgentCallback / afterAgentCallback` - Agent lifecycle
|
||||
- `beforeModelCallback / afterModelCallback` - LLM lifecycle
|
||||
- `onModelErrorCallback` - Model error handling
|
||||
- `beforeToolCallback / afterToolCallback` - Tool lifecycle
|
||||
- `onToolErrorCallback` - Tool error handling
|
||||
|
||||
### Built-in Plugins
|
||||
|
||||
- **LoggingPlugin** - Debug logging
|
||||
- **SecurityPlugin** - Policy enforcement + tool confirmation
|
||||
- **PluginManager** - Plugin orchestration
|
||||
|
||||
## Runner
|
||||
|
||||
### Runner Config
|
||||
|
||||
```typescript
|
||||
interface RunnerConfig {
|
||||
appName: string;
|
||||
agent: BaseAgent; // Root agent
|
||||
plugins?: BasePlugin[];
|
||||
artifactService?: BaseArtifactService;
|
||||
sessionService: BaseSessionService; // Required
|
||||
memoryService?: BaseMemoryService;
|
||||
credentialService?: BaseCredentialService;
|
||||
}
|
||||
```
|
||||
|
||||
### RunConfig (per-run options)
|
||||
|
||||
```typescript
|
||||
interface RunConfig {
|
||||
speechConfig?: SpeechConfig;
|
||||
responseModalities?: Modality[];
|
||||
maxLlmCalls?: number; // Default 500
|
||||
pauseOnToolCalls?: boolean; // Client-side tool execution
|
||||
streamingMode?: StreamingMode; // NONE | SSE | BIDI
|
||||
// ... audio/live configs
|
||||
}
|
||||
```
|
||||
|
||||
### Execution Pipeline
|
||||
|
||||
1. Load or create session
|
||||
2. Create InvocationContext
|
||||
3. Run pluginManager.runOnUserMessageCallback()
|
||||
4. Append user message to session
|
||||
5. Run agent.runAsync(invocationContext) → yields events
|
||||
6. For each non-partial event: append to session
|
||||
7. Run pluginManager.runOnEventCallback()
|
||||
8. Run pluginManager.runAfterRunCallback()
|
||||
|
||||
## Model Layer
|
||||
|
||||
### BaseLlm (abstract)
|
||||
|
||||
- `generateContentAsync(llmRequest, stream?): AsyncGenerator<LlmResponse>`
|
||||
- `connect(llmRequest): Promise<BaseLlmConnection>` - For live/streaming
|
||||
|
||||
### Implementations
|
||||
|
||||
- `Gemini` - Google Gemini API
|
||||
- `ApigeeLlm` - Apigee-wrapped models
|
||||
- `LLMRegistry` - Static registry for model lookup
|
||||
|
||||
## Service Adapters (all abstract base + implementations)
|
||||
|
||||
| Service | Implementations |
|
||||
| --------------------- | ------------------------------ |
|
||||
| BaseSessionService | InMemory, Database (Mikro-ORM) |
|
||||
| BaseArtifactService | InMemory, File, GCS |
|
||||
| BaseMemoryService | InMemory |
|
||||
| BaseCredentialService | InMemory |
|
||||
| BaseCodeExecutor | BuiltIn |
|
||||
|
||||
## Design Patterns
|
||||
|
||||
1. **Symbol-based type guards** - Every class uses `Symbol.for()` + `isXxx()`
|
||||
2. **Abstract base classes** - Service interfaces via abstract classes
|
||||
3. **Async generators** - All agent execution yields events
|
||||
4. **Context objects** - Rich context passed to callbacks/tools
|
||||
5. **Delta state** - Session state + event action deltas
|
||||
6. **Plugin middleware** - 14 hooks at multiple execution points
|
||||
7. **Tree-based hierarchy** - Parent-child agents with root traversal
|
||||
8. **Branch isolation** - Parallel agents use branch paths
|
||||
9. **Callback chains** - Multiple callbacks per stage with early termination
|
||||
@@ -0,0 +1,587 @@
|
||||
# Cross-SDK Comparison: Events, Agents, and Interface Superset
|
||||
|
||||
## 1. AgentEvents: Our Outline vs Michael's
|
||||
|
||||
Our outline and Michael's `Gemini CLI Agents.txt` are **nearly identical** in
|
||||
event taxonomy. The only difference is we added a `stream_end` event type:
|
||||
|
||||
| # | Michael's Events | Our Outline | Delta |
|
||||
| --- | ---------------------- | --------------------- | ------------------------------------------------------------------------------- |
|
||||
| 1 | `initialize` | `InitializeEvent` | Same |
|
||||
| 2 | `session_update` | `SessionUpdateEvent` | Same |
|
||||
| 3 | `message` | `MessageEvent` | Same — streaming handled by AsyncGenerator |
|
||||
| 4 | `tool_request` | `ToolRequestEvent` | Same |
|
||||
| 5 | `tool_update` | `ToolUpdateEvent` | Same |
|
||||
| 6 | `tool_response` | `ToolResponseEvent` | Same |
|
||||
| 7 | `elicitation_request` | `ElicitationRequest` | Same |
|
||||
| 8 | `elicitation_response` | `ElicitationResponse` | Same |
|
||||
| 9 | `usage` | `UsageEvent` | Same |
|
||||
| 10 | `error` | `ErrorEvent` | Same |
|
||||
| 11 | `custom` | `CustomEvent` | Same |
|
||||
| 12 | — | **StreamEnd** | **Added**: completed, failed, aborted, max_turns, max_budget, max_time, refusal |
|
||||
|
||||
### Minor structural differences:
|
||||
|
||||
| Aspect | Michael | Our Outline |
|
||||
| ---------------------- | --------------------------------------------------- | ------------------------------------------------------------------------- |
|
||||
| **Base type** | `AgentEventCommon` with `type: string` (fully open) | `AgentEventBase` with `type: AgentEventType` (`'known' \| (string & {})`) |
|
||||
| **Agent ID** | — | `agentId` on event base (which agent emitted this event) |
|
||||
| **Event map** | Generic `interface AgentEvents` + mapped type | Same — adopted Michael's pattern for declaration merging extensibility |
|
||||
| **ContentPart.\_meta** | Required (`_meta: Record<string, unknown>`) | Optional (`_meta?: Record<string, unknown>`) |
|
||||
| **ErrorData.status** | Google RPC codes (`'RESOURCE_EXHAUSTED' \| '...'`) | Open string (per our generic philosophy) |
|
||||
| **Message.role** | `'user' \| 'agent' \| 'developer'` | Same |
|
||||
| **Stream end** | Only `initialize` | `stream_end` with `reason` field + open `data` bag |
|
||||
| **Handoff** | Not covered | Tool call (`transfer_to_agent`) — no dedicated event |
|
||||
| **Pausing** | Implicit (elicitation/tool events) | Same — no explicit pause/resume events |
|
||||
|
||||
### Design decisions adopted from Michael
|
||||
|
||||
1. **`interface AgentEvents` + mapped type** — Michael's pattern enables
|
||||
declaration merging, letting any module add new event types without modifying
|
||||
the base definition. Strictly better than an explicit union type.
|
||||
2. **`_meta` on ContentPart** — More extensible. We adopted it (as optional).
|
||||
3. **Implicit pausing** — No separate pause/resume events. When the agent emits
|
||||
an `elicitation_request` or `tool_request`, the stream naturally pauses. The
|
||||
host calls `stream()` to resume.
|
||||
|
||||
---
|
||||
|
||||
## 2. Claude Agent SDK — Key Interfaces
|
||||
|
||||
Source: `@anthropic-ai/claude-agent-sdk`
|
||||
|
||||
### Agent Execution Model
|
||||
|
||||
```typescript
|
||||
// Entry point — not an interface, a function
|
||||
function query({
|
||||
prompt: string | AsyncIterable<SDKUserMessage>,
|
||||
options?: Options
|
||||
}): Query // extends AsyncGenerator<SDKMessage, void>
|
||||
```
|
||||
|
||||
### Message Types (Event Stream)
|
||||
|
||||
```typescript
|
||||
type SDKMessage =
|
||||
| SystemMessage // subtype: "init" | "compact_boundary"
|
||||
| AssistantMessage // Claude's response with tool calls
|
||||
| UserMessage // Tool results fed back
|
||||
| StreamEvent // Raw API stream events (opt-in)
|
||||
| ResultMessage // Final: success | error_max_turns | error_max_budget_usd | error_during_execution
|
||||
| CompactBoundaryMessage; // Context compaction marker
|
||||
```
|
||||
|
||||
### Tool Approval (HITL)
|
||||
|
||||
```typescript
|
||||
canUseTool: async (toolName: string, input: Record<string, any>) =>
|
||||
Promise<
|
||||
| { behavior: 'allow'; updatedInput: Record<string, any> }
|
||||
| { behavior: 'deny'; message: string }
|
||||
>;
|
||||
```
|
||||
|
||||
### Subagent Definition
|
||||
|
||||
```typescript
|
||||
interface AgentDefinition {
|
||||
description: string; // When to invoke
|
||||
prompt: string; // System prompt
|
||||
tools?: string[]; // Available tools (defaults to all)
|
||||
model?: 'sonnet' | 'opus' | 'haiku' | 'inherit';
|
||||
}
|
||||
```
|
||||
|
||||
### Session Management
|
||||
|
||||
```typescript
|
||||
interface Options {
|
||||
continue?: boolean; // Resume most recent session
|
||||
resume?: string; // Resume by session ID
|
||||
forkSession?: boolean; // Branch from resume point
|
||||
persistSession?: boolean; // Default: true
|
||||
maxTurns?: number;
|
||||
maxBudgetUsd?: number; // Spend limit
|
||||
permissionMode?: 'default' | 'acceptEdits' | 'plan' | 'dontAsk' | 'bypassPermissions';
|
||||
structuredOutput?: { type: "json_schema", ... };
|
||||
}
|
||||
```
|
||||
|
||||
### Result (Termination)
|
||||
|
||||
```typescript
|
||||
interface SDKResultMessage {
|
||||
type: 'result';
|
||||
subtype:
|
||||
| 'success'
|
||||
| 'error_max_turns'
|
||||
| 'error_max_budget_usd'
|
||||
| 'error_during_execution'
|
||||
| 'error_max_structured_output_retries';
|
||||
result?: string;
|
||||
total_cost_usd: number;
|
||||
usage: { input_tokens: number; output_tokens: number };
|
||||
num_turns: number;
|
||||
session_id: string;
|
||||
stop_reason: string | null; // "end_turn", "max_tokens", "refusal"
|
||||
}
|
||||
```
|
||||
|
||||
### V2 Preview (Simpler API)
|
||||
|
||||
```typescript
|
||||
await using session = unstable_v2_createSession({ model: "..." });
|
||||
await session.send("Hello!");
|
||||
for await (const msg of session.stream()) { ... }
|
||||
await session.send("Follow-up");
|
||||
for await (const msg of session.stream()) { ... }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. OpenAI Codex SDK / Responses API — Key Interfaces
|
||||
|
||||
### Codex SDK (TypeScript)
|
||||
|
||||
```typescript
|
||||
// Client
|
||||
const codex = new Codex({ env?, config? });
|
||||
const thread = codex.startThread({ workingDirectory?, skipGitRepoCheck? });
|
||||
const thread = codex.resumeThread(threadId);
|
||||
|
||||
// Execution
|
||||
const turn = await thread.run(prompt: string | InputEntry[], options?);
|
||||
const { events } = await thread.runStreamed(prompt);
|
||||
|
||||
// Streaming
|
||||
for await (const event of events) {
|
||||
switch (event.type) {
|
||||
case "item.completed": // event.item
|
||||
case "turn.completed": // event.usage
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Responses API Streaming Events (53 types)
|
||||
|
||||
Organized hierarchically:
|
||||
|
||||
**Response Lifecycle (7):**
|
||||
|
||||
- `response.queued`, `response.created`, `response.in_progress`
|
||||
- `response.completed`, `response.incomplete`, `response.failed`
|
||||
- `error`
|
||||
|
||||
**Content Streaming (8):**
|
||||
|
||||
- `response.output_item.added`, `response.output_item.done`
|
||||
- `response.content_part.added`, `response.content_part.done`
|
||||
- `response.output_text.delta`, `response.output_text.done`
|
||||
- `response.refusal.delta`, `response.refusal.done`
|
||||
|
||||
**Reasoning (6):**
|
||||
|
||||
- `response.reasoning_text.delta`, `response.reasoning_text.done`
|
||||
- `response.reasoning_summary_part.added`,
|
||||
`response.reasoning_summary_part.done`
|
||||
- `response.reasoning_summary_text.delta`,
|
||||
`response.reasoning_summary_text.done`
|
||||
|
||||
**Function Calls (2):**
|
||||
|
||||
- `response.function_call_arguments.delta`,
|
||||
`response.function_call_arguments.done`
|
||||
|
||||
**MCP (8):**
|
||||
|
||||
- `response.mcp_call_arguments.delta`, `response.mcp_call_arguments.done`
|
||||
- `response.mcp_call.in_progress`, `response.mcp_call.completed`,
|
||||
`response.mcp_call.failed`
|
||||
- `response.mcp_list_tools.in_progress`, `response.mcp_list_tools.completed`,
|
||||
`response.mcp_list_tools.failed`
|
||||
|
||||
**Built-in Tools (15):**
|
||||
|
||||
- File search: `in_progress`, `searching`, `completed`
|
||||
- Web search: `in_progress`, `searching`, `completed`
|
||||
- Code interpreter: `in_progress`, `interpreting`, `code.delta`, `code.done`,
|
||||
`completed`
|
||||
- Image gen: `in_progress`, `generating`, `partial_image`, `completed`
|
||||
|
||||
**Audio (4):**
|
||||
|
||||
- `response.audio.delta`, `response.audio.done`
|
||||
- `response.audio.transcript.delta`, `response.audio.transcript.done`
|
||||
|
||||
**Annotations (1):**
|
||||
|
||||
- `response.output_text.annotation.added`
|
||||
|
||||
### OpenAI Agents SDK (higher-level)
|
||||
|
||||
```python
|
||||
# Python-first, but patterns apply
|
||||
class RunItemStreamEvent:
|
||||
name: Literal[
|
||||
"message_output_created",
|
||||
"handoff_requested",
|
||||
"handoff_occurred",
|
||||
"tool_called",
|
||||
"tool_output",
|
||||
"tool_search_called",
|
||||
"tool_search_output_created",
|
||||
"reasoning_item_created",
|
||||
"mcp_approval_requested",
|
||||
"mcp_approval_response",
|
||||
"mcp_list_tools",
|
||||
]
|
||||
|
||||
class AgentUpdatedStreamEvent:
|
||||
# Fires when current agent changes (handoff)
|
||||
new_agent: Agent
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Superset Analysis — What Changes Our Interfaces?
|
||||
|
||||
### Concepts Present in ALL Systems
|
||||
|
||||
| Concept | gemini-cli | ADK-TS | Claude SDK | Codex/OpenAI | Our Interfaces |
|
||||
| --------------------- | ---------- | ------ | ------------- | -------------- | ----------------------- |
|
||||
| Text streaming | ✅ | ✅ | ✅ | ✅ | ✅ MessageEvent |
|
||||
| Tool request/response | ✅ | ✅ | ✅ | ✅ | ✅ ToolRequest/Response |
|
||||
| Thinking/reasoning | ✅ | ✅ | ✅ (thinking) | ✅ (reasoning) | ✅ ContentPart.thought |
|
||||
| Error events | ✅ | ✅ | ✅ | ✅ | ✅ ErrorEvent |
|
||||
| Token usage | ✅ | ✅ | ✅ | ✅ | ✅ UsageEvent |
|
||||
| Tool progress | ✅ | ✅ | — | ✅ | ✅ ToolUpdateEvent |
|
||||
| Session resume | ✅ | ✅ | ✅ | ✅ | ✅ sessionRef |
|
||||
| Subagents | ✅ | ✅ | ✅ | — | ✅ threadId |
|
||||
| Abort/cancel | ✅ | ✅ | ✅ | ✅ | ✅ abort() |
|
||||
| Metadata escape hatch | — | ✅ | — | — | ✅ \_meta |
|
||||
|
||||
### NEW Concepts From Claude/Codex That We Should Incorporate
|
||||
|
||||
#### 4.1 Structured Stream End Reasons (HIGH PRIORITY)
|
||||
|
||||
**What:** Claude SDK has typed termination:
|
||||
`success | error_max_turns | error_max_budget_usd | error_during_execution`.
|
||||
OpenAI has `completed | incomplete | failed`.
|
||||
|
||||
**Why it matters:** We need a `stream_end` event that captures why the stream
|
||||
ended — the one signal not covered by other event types.
|
||||
|
||||
**Final design — `stream_end` with `reason` + open `data` bag:**
|
||||
|
||||
```typescript
|
||||
type StreamEndReason =
|
||||
| 'completed'
|
||||
| 'failed'
|
||||
| 'aborted'
|
||||
| 'max_turns'
|
||||
| 'max_budget'
|
||||
| 'max_time'
|
||||
| 'refusal'
|
||||
| (string & {});
|
||||
|
||||
interface StreamEnd {
|
||||
reason: StreamEndReason;
|
||||
data?: Record<string, unknown>; // { result?, cost?, usage?, numTurns?, error?, ... }
|
||||
}
|
||||
```
|
||||
|
||||
**Design rationale:**
|
||||
|
||||
- Start is covered by `initialize`. Pausing is implicit (elicitation/tool
|
||||
request events). Handoff is a tool call (`transfer_to_agent`).
|
||||
- End-of-stream details go in `data` as an open bag, not fixed fields.
|
||||
|
||||
#### 4.2 Budget Constraints (MEDIUM PRIORITY)
|
||||
|
||||
**What:** Claude SDK has `maxBudgetUsd`. Neither gemini-cli nor ADK has this
|
||||
today.
|
||||
|
||||
**Why it matters:** Cost control is critical for production deployments.
|
||||
|
||||
**Proposed change to AgentConstraints:**
|
||||
|
||||
```typescript
|
||||
interface AgentConstraints {
|
||||
maxTurns?: number;
|
||||
maxTimeMinutes?: number;
|
||||
maxLlmCalls?: number;
|
||||
maxBudgetUsd?: number; // NEW: from Claude SDK
|
||||
}
|
||||
```
|
||||
|
||||
#### 4.3 Session Forking (MEDIUM PRIORITY)
|
||||
|
||||
**What:** Claude SDK supports `forkSession: boolean` — branch from a resume
|
||||
point to explore alternatives.
|
||||
|
||||
**Why it matters:** Enables "what if" exploration without destroying history.
|
||||
Useful for plan mode.
|
||||
|
||||
**Proposed change to ExecutionRequest:**
|
||||
|
||||
```typescript
|
||||
interface ExecutionRequest {
|
||||
// ... existing fields ...
|
||||
sessionRef?: string | SessionSnapshot;
|
||||
forkSession?: boolean; // NEW: branch from sessionRef instead of continuing
|
||||
}
|
||||
```
|
||||
|
||||
#### 4.4 Permission Modes on Execution (MEDIUM PRIORITY)
|
||||
|
||||
**What:** Claude has 5 permission modes:
|
||||
`default | acceptEdits | plan | dontAsk | bypassPermissions`. gemini-cli has 4
|
||||
approval modes: `default | autoEdit | yolo | plan`.
|
||||
|
||||
**Why it matters:** Both systems have this concept. It should be in
|
||||
ExecutionOptions, not hard-coded.
|
||||
|
||||
**Proposed change to ExecutionOptions:**
|
||||
|
||||
```typescript
|
||||
interface ExecutionOptions {
|
||||
// ... existing fields ...
|
||||
permissionMode?: string; // Open string. Conventions: 'default' | 'auto_edit' | 'autonomous' | 'plan' | string
|
||||
}
|
||||
```
|
||||
|
||||
#### 4.5 Agent Handoff (MEDIUM PRIORITY)
|
||||
|
||||
**What:** OpenAI Agents SDK has explicit `handoff_requested` /
|
||||
`handoff_occurred` events plus `AgentUpdatedStreamEvent`. ADK has
|
||||
`transfer_to_agent` tool + `eventActions.transferToAgent`. Claude SDK has
|
||||
subagent invocation via Agent tool.
|
||||
|
||||
**Why it matters:** When agent A delegates to agent B, the host/UI needs to
|
||||
know.
|
||||
|
||||
**Design decision: Handoff is a tool call, not a separate event type.**
|
||||
|
||||
The agent calls `transfer_to_agent` as a tool (ToolRequest event). The host
|
||||
intercepts this tool call (since host controls tool execution), looks up the
|
||||
target agent, creates a new executor via the factory, and mediates the handoff.
|
||||
The originating agent's stream ends with `stream_end` reason `'completed'`.
|
||||
|
||||
```typescript
|
||||
// 1. Agent emits tool request:
|
||||
{ type: 'tool_request', name: 'transfer_to_agent', args: { target: 'coder', reason: '...' } }
|
||||
|
||||
// 2. Host mediates handoff, originating agent completes:
|
||||
{ type: 'stream_end', reason: 'completed', agentId: 'planner', data: { handoffTarget: 'coder' } }
|
||||
```
|
||||
|
||||
This avoids duplicating routing logic between stream_end events and tool calls.
|
||||
Matches ADK's `transfer_to_agent` tool pattern.
|
||||
|
||||
#### 4.6 Refusal as Distinct Signal (LOW PRIORITY)
|
||||
|
||||
**What:** OpenAI has explicit `response.refusal.delta/done` events. Claude has
|
||||
`stop_reason: "refusal"`.
|
||||
|
||||
**Why it matters:** Model refusals are operationally important (safety, policy).
|
||||
|
||||
**Proposed:** No new event type. Handle via `MessageEvent` with a `refusal`
|
||||
content part type, or via `ErrorEvent` with specific error code. ContentPart can
|
||||
be extended:
|
||||
|
||||
```typescript
|
||||
| { type: 'refusal'; text: string }
|
||||
```
|
||||
|
||||
#### 4.7 Content Annotations (LOW PRIORITY)
|
||||
|
||||
**What:** OpenAI has `response.output_text.annotation.added` for citations, file
|
||||
paths.
|
||||
|
||||
**Why it matters:** Citations and source attribution are increasingly important.
|
||||
|
||||
**Proposed:** Michael's `reference` ContentPart already covers this. No change
|
||||
needed — `reference` with `uri` and `text` handles citations.
|
||||
|
||||
#### 4.8 Context Compaction Events (LOW PRIORITY)
|
||||
|
||||
**What:** Claude SDK has `CompactBoundaryMessage` marking when context was
|
||||
compressed.
|
||||
|
||||
**Why it matters:** For long sessions, knowing when context was compressed helps
|
||||
with debugging and UI.
|
||||
|
||||
**Proposed:** `CustomEvent` with `kind: 'compact_boundary'`. No new event type
|
||||
needed.
|
||||
|
||||
#### 4.9 Structured Output Schema (ALREADY COVERED)
|
||||
|
||||
**What:** Both Claude (`structuredOutput`) and OpenAI support JSON Schema output
|
||||
constraints.
|
||||
|
||||
**Status:** Already covered by `AgentDescriptor.outputSchema: JsonSchema`. No
|
||||
change needed.
|
||||
|
||||
### Concepts We DON'T Need to Adopt
|
||||
|
||||
| Concept | Why Skip |
|
||||
| ------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
|
||||
| OpenAI's 53 granular streaming events | Too coupled to Responses API internals. Our `ToolUpdateEvent` + `MessageEvent` via AsyncGenerator abstracts over this. |
|
||||
| OpenAI's per-tool-type events (file_search, web_search, code_interpreter) | Tool-specific progress belongs in `ToolUpdateEvent.data`, not in the event taxonomy. |
|
||||
| Audio/Image streaming events | Handle via `ToolUpdateEvent` with media ContentParts. When needed, add as ContentPart types, not event types. |
|
||||
| Claude's raw `StreamEvent` wrapper | Implementation detail of the Claude API client. Our adapters consume these internally. |
|
||||
| MCP-specific events (mcp_call, mcp_list_tools) | MCP tools are just tools. Use generic `ToolRequestEvent/ToolResponseEvent`. MCP approval is an `ElicitationRequest`. |
|
||||
|
||||
---
|
||||
|
||||
## 5. Updated Event Type Comparison (Full Superset)
|
||||
|
||||
| # | Event Type | Michael | Our Outline | Claude SDK | OpenAI | Verdict |
|
||||
| --- | -------------------- | ------- | ----------- | ----------------------------- | --------------------------------- | ---------------------------------------------- |
|
||||
| 1 | Initialize | ✅ | ✅ | SystemMessage(init) | — | **Keep** |
|
||||
| 2 | Session Update | ✅ | ✅ | — | — | **Keep** |
|
||||
| 3 | Message | ✅ | ✅ | AssistantMessage | output_text.delta/done | **Keep** |
|
||||
| 4 | Tool Request | ✅ | ✅ | AssistantMessage.tool_use | function_call_arguments | **Keep** |
|
||||
| 5 | Tool Update | ✅ | ✅ | — | per-tool progress events | **Keep** |
|
||||
| 6 | Tool Response | ✅ | ✅ | UserMessage | — | **Keep** |
|
||||
| 7 | Elicitation Request | ✅ | ✅ | canUseTool callback | mcp_approval_requested | **Keep** |
|
||||
| 8 | Elicitation Response | ✅ | ✅ | canUseTool return | mcp_approval_response | **Keep** |
|
||||
| 9 | Usage | ✅ | ✅ | ResultMessage.usage | response.completed | **Keep** |
|
||||
| 10 | Error | ✅ | ✅ | ResultMessage(error\_\*) | response.failed | **Keep** |
|
||||
| 11 | Custom | ✅ | ✅ | — | — | **Keep** |
|
||||
| 12 | StreamEnd | — | ✅ | ResultMessage + SystemMessage | response.created/completed/failed | **Keep — `stream_end` with `reason` + `data`** |
|
||||
|
||||
**Result: Our 12 event types are the right abstraction level.** Claude and
|
||||
OpenAI validate every category. The granularity differences (OpenAI's 53 vs
|
||||
our 12) are implementation details that adapters handle internally. `stream_end`
|
||||
uses a single `reason` field with an open `data` bag. Handoff is a tool call.
|
||||
Pausing is implicit.
|
||||
|
||||
---
|
||||
|
||||
## 6. Updated ContentPart Types (Superset)
|
||||
|
||||
```typescript
|
||||
type ContentPart = (
|
||||
| { type: 'text'; text: string }
|
||||
| { type: 'thought'; thought: string; thoughtSignature?: string }
|
||||
| { type: 'media'; data?: string; uri?: string; mimeType?: string }
|
||||
| {
|
||||
type: 'reference';
|
||||
text: string;
|
||||
data?: string;
|
||||
uri?: string;
|
||||
mimeType?: string;
|
||||
}
|
||||
| { type: 'refusal'; text: string } // NEW: from OpenAI
|
||||
) &
|
||||
// Future: type: string for unknown types from new SDKs
|
||||
{ _meta?: Record<string, unknown> };
|
||||
```
|
||||
|
||||
Adding `refusal` as a ContentPart type (rather than a new event) keeps the event
|
||||
taxonomy stable while supporting model refusals from both Claude and OpenAI.
|
||||
|
||||
---
|
||||
|
||||
## 7. Key Architectural Patterns Across SDKs
|
||||
|
||||
### Pattern: Execution Entry Points
|
||||
|
||||
| SDK | Entry Point | Multi-turn Pattern |
|
||||
| ----------- | ------------------------------------------------------------------------- | ----------------------------------------------- |
|
||||
| Michael | `agent.send(trajectory, data)` / `session.send()` + `session.update()` | Same method / three-method session |
|
||||
| Our Outline | `session.stream(data)` + `session.update(config)` + `session.steer(data)` | Four-method session (stream/update/steer/abort) |
|
||||
| Claude SDK | `query({ prompt, options })` | New `query()` call with `resume: sessionId` |
|
||||
| Claude V2 | `session.send()` + `session.stream()` | Separate send/stream |
|
||||
| Codex SDK | `thread.run(prompt)` / `thread.runStreamed(prompt)` | Same thread object |
|
||||
|
||||
**Observation:** Claude V2 and Codex both use a stateful session/thread object
|
||||
with send+stream. Michael uses a single `send()` method. Our `stream()` method
|
||||
is the unified version — the first call starts, subsequent calls continue (like
|
||||
ADK's `runAsync()`).
|
||||
|
||||
### Pattern: Tool Approval
|
||||
|
||||
| SDK | Pattern | Sync/Async |
|
||||
| ----------- | -------------------------------------------------------- | ------------------------ |
|
||||
| gemini-cli | PolicyEngine + ConfirmationBus | Async (message bus) |
|
||||
| ADK-TS | SecurityPlugin.policyCheck() | Async (plugin callback) |
|
||||
| Claude SDK | `canUseTool()` callback | Async (callback) |
|
||||
| OpenAI | `mcp_approval_requested` event | Event-based |
|
||||
| Our Outline | `ElicitationRequest` event + `PolicyEvaluator` interface | Both (event + interface) |
|
||||
|
||||
**Observation:** Our approach covers both patterns — the `ElicitationRequest`
|
||||
event for event-based approval (like OpenAI), and the `PolicyEvaluator`
|
||||
interface for synchronous policy checks (like gemini-cli/ADK/Claude). This is
|
||||
the right superset.
|
||||
|
||||
### Pattern: Subagent Definition
|
||||
|
||||
| SDK | Pattern | Key Fields |
|
||||
| ------------- | -------------------------------- | ------------------------------------------------------------------------------------------ |
|
||||
| gemini-cli | `AgentDefinition` (local/remote) | name, description, kind, tools, model |
|
||||
| ADK-TS | `BaseAgentConfig` | name, description, subAgents, tools |
|
||||
| Claude SDK | `AgentDefinition` | description, prompt, tools, model |
|
||||
| OpenAI Agents | `Agent` class | name, instructions, tools, handoffs, model |
|
||||
| Our Outline | `AgentDescriptor` | name, description, executor, inputSchema, capabilities, ownTools, requiredTools, subAgents |
|
||||
|
||||
**Observation:** Our `AgentDescriptor` is the most complete. Claude's `prompt`
|
||||
field and OpenAI's `instructions` are executor-level concerns (system prompt),
|
||||
not descriptor-level. The descriptor declares identity; the executor uses the
|
||||
prompt. This separation is correct.
|
||||
|
||||
One gap: **handoffs**. OpenAI Agents has an explicit `handoffs` field listing
|
||||
which agents can be delegated to. Our `subAgents` field serves the same purpose
|
||||
but the naming implies hierarchy rather than peer delegation. Consider whether
|
||||
`subAgents` should be renamed to `delegateAgents` or kept as-is with
|
||||
documentation clarifying it covers both hierarchical and peer delegation.
|
||||
|
||||
---
|
||||
|
||||
## 8. Concrete Changes to outline.md
|
||||
|
||||
Based on this analysis, the following changes should be made:
|
||||
|
||||
### Applied (validated by multiple SDKs):
|
||||
|
||||
1. ✅ **`type: AgentEventType`** with known values + `(string & {})`
|
||||
(autocomplete + extensibility)
|
||||
2. ✅ **`interface AgentEvents` + mapped type** (adopted from Michael for
|
||||
declaration merging)
|
||||
3. ✅ **`agentId` on event base** (which agent emitted this event)
|
||||
4. ✅ **`_meta` on ContentPart** (aligned with Michael)
|
||||
5. ✅ **`stream_end` event** — signals why the stream ended, with `reason`
|
||||
field + open `data` bag
|
||||
6. ✅ **Handoff as tool call** — `transfer_to_agent` tool, not a separate event
|
||||
7. ✅ **`maxBudgetUsd` in AgentConstraints** (Claude SDK, increasingly standard)
|
||||
8. ✅ **`refusal` ContentPart type** (both Claude and OpenAI surface refusals)
|
||||
9. ✅ **`forkSession` in ExecutionRequest** (Claude SDK, valuable for
|
||||
exploration)
|
||||
10. ✅ **`permissionMode` in ExecutionOptions** (both gemini-cli and Claude SDK)
|
||||
11. ✅ **`cost` field on Usage** (Claude SDK tracks total_cost_usd)
|
||||
|
||||
### Correctly abstracted (no change needed):
|
||||
|
||||
- Event taxonomy (12 types) — validated as right abstraction level
|
||||
- `AgentDescriptor` shape — most complete across all SDKs
|
||||
- `AgentSession.stream/update/steer/abort` — covers all SDK patterns
|
||||
- ToolUpdate — correctly abstracts over OpenAI's 15+ tool-specific progress
|
||||
events
|
||||
- `ElicitationRequest/Response` — covers both callback and event patterns
|
||||
- `ContentPart` types — text/thought/media/reference/refusal
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
- [Claude Agent SDK TypeScript Reference](https://platform.claude.com/docs/en/agent-sdk/typescript)
|
||||
- [Claude Agent SDK Streaming](https://platform.claude.com/docs/en/agent-sdk/streaming-output)
|
||||
- [Claude Agent SDK Sessions](https://platform.claude.com/docs/en/agent-sdk/sessions)
|
||||
- [Claude Agent SDK Subagents](https://platform.claude.com/docs/en/agent-sdk/subagents)
|
||||
- [OpenAI Codex SDK TypeScript](https://github.com/openai/codex/tree/main/sdk/typescript)
|
||||
- [OpenAI Codex SDK Docs](https://developers.openai.com/codex/sdk/)
|
||||
- [OpenAI Responses API Streaming Events](https://developers.openai.com/api/reference/resources/responses/streaming-events/)
|
||||
- [OpenAI Agents SDK Streaming](https://openai.github.io/openai-agents-python/streaming/)
|
||||
- [Responses API Streaming Guide (Community)](https://community.openai.com/t/responses-api-streaming-the-simple-guide-to-events/1363122)
|
||||
@@ -0,0 +1,259 @@
|
||||
# Gemini CLI Architecture Notes
|
||||
|
||||
## Project Structure
|
||||
|
||||
**Monorepo packages:**
|
||||
|
||||
- `packages/core/` - Main execution engine (the big one)
|
||||
- `packages/cli/` - CLI frontend
|
||||
- `packages/sdk/` - SDK for extensions
|
||||
- `packages/a2a-server/` - Agent-to-agent server
|
||||
- `packages/devtools/` - Dev utilities
|
||||
- `packages/vscode-ide-companion/` - VS Code extension
|
||||
|
||||
## Core Execution Loop
|
||||
|
||||
### GeminiClient (`core/src/core/client.ts` ~38KB)
|
||||
|
||||
- **Primary orchestrator** for user interactions
|
||||
- Manages session lifecycle, message routing, model selection
|
||||
- Coordinates hooks, context management, error recovery
|
||||
- Enforces `MAX_TURNS = 100` per session
|
||||
- Tracks `currentSequenceModel` for multi-turn stickiness
|
||||
- Handles history compression when context grows
|
||||
|
||||
### GeminiChat (`core/src/core/geminiChat.ts` ~34KB)
|
||||
|
||||
- Bidirectional LLM communication
|
||||
- Maintains `history[]` alternating user/model turns
|
||||
- Retry logic: max 2 attempts, 500ms delay for invalid responses
|
||||
- Fires `BeforeModel` and `AfterModel` hooks
|
||||
- Integrates ChatRecordingService for persistence
|
||||
|
||||
### Scheduler (`core/src/scheduler/scheduler.ts` ~23KB)
|
||||
|
||||
- **Three-phase event-driven**: Ingestion → Processing → Completion
|
||||
- Tool call state machine:
|
||||
`Validating → AwaitingApproval → Scheduled → Executing → Terminal`
|
||||
- Terminal states: `Success`, `Error`, `Cancelled`
|
||||
- Parallel execution for read-only and agent-type tools
|
||||
- Yields to event loop for user approval
|
||||
- Publishes state changes via MessageBus
|
||||
|
||||
### CoreToolScheduler (`core/src/core/coreToolScheduler.ts` ~38KB)
|
||||
|
||||
- Sequential, queue-based tool processing
|
||||
- Validates policy via PolicyEngine
|
||||
- Confirmation handling via ToolModificationHandler (editor integration)
|
||||
- Uses MessageBus for async confirmation responses
|
||||
|
||||
## Tool System
|
||||
|
||||
### DeclarativeTool Pattern
|
||||
|
||||
- **Separation of concerns**: build() → validate → createInvocation() →
|
||||
execute()
|
||||
- `ToolBuilder` defines metadata (name, displayName, description, kind) + schema
|
||||
via `getSchema()`
|
||||
- `ToolInvocation` has: `getDescription()`, `toolLocations()`,
|
||||
`shouldConfirmExecute()`, `execute()`
|
||||
- `ToolResult` contains: `llmContent` (for LLM), `returnDisplay` (for UI), error
|
||||
details, tail calls
|
||||
|
||||
### BaseToolInvocation
|
||||
|
||||
- Abstract base with MessageBus integration for policy/confirmation
|
||||
- Three decision paths: ALLOW, DENY, ASK_USER via `getMessageBusDecision()`
|
||||
|
||||
### ToolRegistry (`core/src/tools/tool-registry.ts`)
|
||||
|
||||
- Registers tools via `registerTool()`
|
||||
- MCP tools with fully qualified names: `mcp_serverName_toolName`
|
||||
- Priority sorting: built-in → discovered → MCP (by server name)
|
||||
- Filters by active status based on configuration
|
||||
|
||||
### Confirmation System
|
||||
|
||||
- `ToolCallConfirmationDetails` union: edit, execute, MCP, info, ask_user,
|
||||
exit_plan_mode
|
||||
- `ToolConfirmationOutcome` enum: ProceedOnce, ProceedAlways, etc.
|
||||
- Async confirmation via MessageBus pub/sub
|
||||
|
||||
## Hooks System
|
||||
|
||||
### Hook Types (11 hook points)
|
||||
|
||||
| Hook | Trigger | Key Capability |
|
||||
| --------------------- | ----------------------- | --------------------------------- |
|
||||
| `BeforeTool` | Before tool execution | Modify tool_input |
|
||||
| `AfterTool` | After tool completion | Context injection, tail calls |
|
||||
| `BeforeAgent` | Before agent prompt | Additional context |
|
||||
| `AfterAgent` | After agent response | Clear context flag |
|
||||
| `BeforeModel` | Before LLM request | Modify request or inject response |
|
||||
| `AfterModel` | After LLM response | Modify response |
|
||||
| `BeforeToolSelection` | Before tool selection | Modify toolConfig |
|
||||
| `Notification` | When notifications fire | Suppress/modify message |
|
||||
| `SessionStart` | Session begins | Additional context |
|
||||
| `SessionEnd` | Session terminates | Cleanup |
|
||||
| `PreCompress` | Before compression | Suppress/modify |
|
||||
|
||||
### Hook Output Fields (common to all hooks)
|
||||
|
||||
- `continue` - Whether execution proceeds
|
||||
- `stopReason` - Reason to halt
|
||||
- `suppressOutput` - Hide from user
|
||||
- `systemMessage` - Add to system context
|
||||
- `decision` - ask/block/deny/approve/allow
|
||||
|
||||
### Hook System Components
|
||||
|
||||
- `HookSystem` - Main coordinator
|
||||
- `HookRegistry` - Stores/manages configurations
|
||||
- `HookRunner` - Executes registered hooks
|
||||
- `HookAggregator` - Combines multiple hook results
|
||||
- `HookPlanner` - Determines execution order
|
||||
- `HookEventHandler` - Orchestrates event firing
|
||||
- `HookTranslator` - Converts between formats
|
||||
|
||||
## Policy Engine
|
||||
|
||||
### Rule Structure
|
||||
|
||||
```
|
||||
PolicyRule {
|
||||
toolName: string; // wildcards supported
|
||||
decision: PolicyDecision; // ALLOW | DENY | ASK_USER
|
||||
priority: number;
|
||||
argsPattern?: RegExp; // conditional on args
|
||||
mcpName?: string;
|
||||
source: string;
|
||||
}
|
||||
```
|
||||
|
||||
### Tier Hierarchy (lowest → highest priority)
|
||||
|
||||
1. Default (1) - Core built-in policies
|
||||
2. Extension (2) - Extension contributions
|
||||
3. Workspace (3) - Project-scoped (.gemini/)
|
||||
4. User (4) - User-provided (~/.gemini/)
|
||||
5. Admin (5) - System-level policies
|
||||
|
||||
### Dynamic Rule Priorities (within User Tier)
|
||||
|
||||
- 4.9 - MCP_EXCLUDED (persistent server blocks)
|
||||
- 4.4 - EXCLUDE_TOOLS_FLAG (CLI exclusions)
|
||||
- 4.3 - ALLOWED_TOOLS_FLAG (CLI allows)
|
||||
- 4.2 - TRUSTED_MCP_SERVER
|
||||
- 4.1 - ALLOWED_MCP_SERVER
|
||||
- 3.95 - ALWAYS_ALLOW (interactive selections)
|
||||
|
||||
### Security Constraint
|
||||
|
||||
- Extensions CANNOT contribute ALLOW rules or YOLO mode
|
||||
|
||||
## Agent System
|
||||
|
||||
### Agent Registry (`core/src/agents/registry.ts`)
|
||||
|
||||
Discovery sources:
|
||||
|
||||
1. Built-in: CodebaseInvestigator, CliHelp, Generalist, Browser
|
||||
2. User-level: `~/.gemini/agents/`
|
||||
3. Project-level: `.gemini/agents/` (requires folder trust)
|
||||
4. Extension-based: From active extensions
|
||||
|
||||
### LocalAgentExecutor (`core/src/agents/local-executor.ts`)
|
||||
|
||||
- Prompt processing: input augmentation → template expansion → system prompt
|
||||
construction
|
||||
- Uses GeminiChat for accumulating conversation
|
||||
- ChatCompressionService for history management
|
||||
- Turn loop: invoke model → extract function calls → check auth → append results
|
||||
- Termination: complete_task tool, max turns, timeout
|
||||
|
||||
### SubagentTool (`core/src/agents/subagent-tool.ts`)
|
||||
|
||||
- Extends BaseDeclarativeTool - agents invoked like standard tools
|
||||
- Read-only status checking, user hint propagation
|
||||
- Execution: validate → optional confirmation → parameter enrichment →
|
||||
SubagentToolWrapper
|
||||
|
||||
### Remote Agents
|
||||
|
||||
- A2A client manager for agent-to-agent protocol
|
||||
- Remote invocation for external agents
|
||||
- Agent acknowledgement system (security for project agents)
|
||||
|
||||
## Model System
|
||||
|
||||
### ModelConfigService
|
||||
|
||||
- **Hierarchical alias system**: children override parents
|
||||
- Resolution: alias chain → level assignment → apply overrides
|
||||
- Deep merging with array override capability
|
||||
- Fallback to `chat-base` alias for unknown models
|
||||
|
||||
### ModelRouterService
|
||||
|
||||
Sequential strategy pattern:
|
||||
|
||||
1. Fallback & Override
|
||||
2. Approval Mode Strategy
|
||||
3. Gemma Classifier (if enabled)
|
||||
4. Generic Classifier
|
||||
5. Numerical Classifier
|
||||
6. Default Strategy
|
||||
|
||||
### ModelAvailabilityService
|
||||
|
||||
Health states:
|
||||
|
||||
- **Terminal** - permanently unavailable
|
||||
- **Sticky Retry** - failed once, can retry once per turn
|
||||
- **Healthy** - no issues
|
||||
|
||||
## Services
|
||||
|
||||
| Service | Purpose |
|
||||
| --------------------------- | --------------------------------------- |
|
||||
| ChatRecordingService | Session persistence (JSON files) |
|
||||
| ChatCompressionService | History summarization for token budgets |
|
||||
| ModelConfigService | Hierarchical model config with aliases |
|
||||
| ModelAvailabilityService | Model health tracking |
|
||||
| ModelRouterService | Model selection via strategies |
|
||||
| FolderTrustDiscoveryService | Workspace security scanning |
|
||||
| KeychainService | Credential storage |
|
||||
| LoopDetectionService | Detect repetitive agent loops |
|
||||
|
||||
## UI + Core Separation
|
||||
|
||||
### IDE Client (`core/src/ide/ide-client.ts`)
|
||||
|
||||
- Singleton managing CLI ↔ IDE communication via MCP
|
||||
- **Outbound** (CLI → IDE): `openDiff`, `closeDiff`
|
||||
- **Inbound** (IDE → CLI): `ide/contextUpdate`, `ide/diffAccepted`,
|
||||
`ide/diffRejected`
|
||||
|
||||
### Event Contract
|
||||
|
||||
```typescript
|
||||
interface IdeContextNotification {
|
||||
method: 'ide/contextUpdate';
|
||||
params: { workspaceState: { openFiles: string[]; isTrusted: boolean } };
|
||||
}
|
||||
```
|
||||
|
||||
### Confirmation Bus
|
||||
|
||||
- `TOOL_CONFIRMATION_REQUEST` / `TOOL_CONFIRMATION_RESPONSE`
|
||||
- Detail types: edit, execute, MCP, info, ask_user, exit_plan_mode
|
||||
- Async pub/sub via MessageBus
|
||||
|
||||
## Configuration (`core/src/config/config.ts` ~95KB!)
|
||||
|
||||
- Tool config: core tools, allowed/excluded, MCP servers
|
||||
- File filtering: git ignore, fuzzy search, max counts, timeouts
|
||||
- Approval modes: policy engine config
|
||||
- Experiments: feature flags (GEMINI_3_1_PRO_LAUNCHED, ENABLE_ADMIN_CONTROLS,
|
||||
etc.)
|
||||
- FolderTrust: discovery scans for commands, skills, settings, MCP, hooks
|
||||
@@ -0,0 +1,296 @@
|
||||
# Deep Dive: Key Gemini-CLI Systems
|
||||
|
||||
## Hooks System (Complete)
|
||||
|
||||
### 11 Hook Points
|
||||
|
||||
| Hook | Input | Key Output Capabilities |
|
||||
| ------------------- | -------------------------------------- | --------------------------------------------- |
|
||||
| BeforeTool | toolName, toolInput, mcpContext | Modify tool_input, block/allow, systemMessage |
|
||||
| AfterTool | toolName, toolInput, toolResponse | additionalContext, tailToolCallRequest |
|
||||
| BeforeAgent | prompt | Additional context |
|
||||
| AfterAgent | prompt, response, stopHookActive | Clear context |
|
||||
| BeforeModel | llmRequest (GenerateContentParameters) | Modify llm_request OR inject llm_response |
|
||||
| AfterModel | llmRequest, llmResponse | Modify llm_response |
|
||||
| BeforeToolSelection | llmRequest | Modify toolConfig (function list, mode) |
|
||||
| Notification | type, message, details | Suppress/modify |
|
||||
| SessionStart | source (Startup/Resume/Clear) | Additional context |
|
||||
| SessionEnd | reason (Exit/Clear/Logout/etc) | Cleanup |
|
||||
| PreCompress | trigger (Manual/Auto) | Suppress/modify |
|
||||
|
||||
### Hook Configuration Types
|
||||
|
||||
- **Runtime hooks** (HookType.Runtime): JS/TS functions, registered
|
||||
programmatically
|
||||
- **Command hooks** (HookType.Command): External shell commands with JSON I/O
|
||||
|
||||
### Exit Code Semantics (Command Hooks)
|
||||
|
||||
- 0 = Success (allowed with system message)
|
||||
- 1 = Non-blocking error (warning, continues)
|
||||
- 2+ = Blocking failure (denied, stderr as reason)
|
||||
|
||||
### Hook Decision Values
|
||||
|
||||
`'ask' | 'block' | 'deny' | 'approve' | 'allow' | undefined`
|
||||
|
||||
### Execution Strategies
|
||||
|
||||
- **Parallel** (default): Promise.all(), independent
|
||||
- **Sequential** (opt-in per hook): Chained, output→input cascading
|
||||
|
||||
### Aggregation
|
||||
|
||||
- Blocking decisions: OR logic (any block → all block)
|
||||
- Field replacement: later overrides earlier
|
||||
- Tool selection: union of allowed functions, mode precedence NONE > ANY > AUTO
|
||||
|
||||
### Trust Model
|
||||
|
||||
- Project hooks require folder trust verification
|
||||
- TrustedHooksManager at `~/.gemini/trusted-hooks.json`
|
||||
- Environment sanitized for command hooks (sensitive vars removed)
|
||||
- `GEMINI_PROJECT_DIR` injected
|
||||
|
||||
### Key Insight for Abstraction
|
||||
|
||||
Hooks fire inside gemini-cli's execution loop. When ADK controls the model:
|
||||
|
||||
- BeforeModel/AfterModel still fire because AdkGeminiModel wraps GeminiChat
|
||||
- BeforeTool/AfterTool still fire because AdkToolAdapter wraps DeclarativeTool
|
||||
- This is dewitt's solution: adapters preserve hook injection points
|
||||
|
||||
**For OpenRouter or opaque agents, hooks CANNOT fire unless the agent delegates
|
||||
model/tool calls back to gemini-cli.**
|
||||
|
||||
---
|
||||
|
||||
## Policy Engine (Complete)
|
||||
|
||||
### TOML Rule Format
|
||||
|
||||
```toml
|
||||
[[rules]]
|
||||
decision = "allow" | "deny" | "ask_user"
|
||||
priority = 0-999
|
||||
toolName = "tool_name" # wildcards: *, mcp_*, mcp_server_*
|
||||
mcpName = "server_name" # MCP server filter
|
||||
argsPattern = "regex" # matches JSON-stringified args
|
||||
commandPrefix = "cmd" # shell command prefix match
|
||||
commandRegex = "regex" # shell command regex (mutually exclusive with prefix)
|
||||
modes = ["default", "autoEdit", "yolo", "plan"]
|
||||
annotations = ["read-only", "experimental"]
|
||||
allowRedirection = true # for shell commands
|
||||
allowMessage = "..." # user-facing message on allow
|
||||
denyMessage = "..." # user-facing message on deny
|
||||
```
|
||||
|
||||
### 5-Tier Priority System
|
||||
|
||||
- Tier 5 (Admin): 5.000-5.999
|
||||
- Tier 4 (User): 4.000-4.999
|
||||
- Tier 3 (Workspace): 3.000-3.999
|
||||
- Tier 2 (Extension): 2.000-2.999
|
||||
- Tier 1 (Default): 1.000-1.999
|
||||
|
||||
Formula: `tier + (priority / 1000)`
|
||||
|
||||
### 4 Approval Modes
|
||||
|
||||
1. **default** — ASK_USER decisions prompt user
|
||||
2. **autoEdit** — File writes auto-approved with safety checking (conseca)
|
||||
3. **yolo** — All auto-approved except explicit ask_user rules
|
||||
4. **plan** — Read-only, blocks modifications, allows planning docs
|
||||
|
||||
### Shell Command Safety
|
||||
|
||||
- Parses multi-command sequences (&&, ;, ||)
|
||||
- Detects injection: $(...), `...`, <(...), >(...), --flag=$(...)
|
||||
- Each subcommand evaluated independently
|
||||
- DENY overrides everything; ASK_USER escalates; ALLOW only if all pass
|
||||
- Redirections (>) downgrade ALLOW → ASK_USER unless allowRedirection=true
|
||||
|
||||
### Security Constraints
|
||||
|
||||
- Extensions cannot contribute ALLOW rules or YOLO mode
|
||||
- Regex patterns validated for ReDoS
|
||||
- Tool name typos detected via Levenshtein distance ≤3
|
||||
- Policy file integrity: SHA-256 hash checking
|
||||
|
||||
### Key Insight for Abstraction
|
||||
|
||||
Policy is evaluated at the tool execution boundary. For the interface layer:
|
||||
|
||||
- If CLI controls tool execution → policy naturally applies
|
||||
- If agent controls tool execution internally → policy bypassed (danger!)
|
||||
- This reinforces the `pauseOnToolCalls: true` approach for ADK
|
||||
- Need a `PolicyEvaluator` interface that any executor can call
|
||||
|
||||
---
|
||||
|
||||
## Tool System (Complete)
|
||||
|
||||
### Core Abstraction Chain
|
||||
|
||||
```
|
||||
ToolBuilder (metadata + schema)
|
||||
→ build(params) validates → ToolInvocation (ready to execute)
|
||||
→ shouldConfirmExecute() → execute(signal) → ToolResult
|
||||
```
|
||||
|
||||
### DeclarativeTool Pattern
|
||||
|
||||
- `build(params)` — Validate and create invocation
|
||||
- `buildAndExecute(params)` — One-step convenience
|
||||
- `validateBuildAndExecute(params)` — Non-throwing variant
|
||||
|
||||
### BaseToolInvocation
|
||||
|
||||
- Message bus integration for policy decisions
|
||||
- Three decision paths: ALLOW → execute, DENY → reject, ASK_USER → confirm
|
||||
|
||||
### ToolResult Structure
|
||||
|
||||
- `llmContent` — For LLM conversation history
|
||||
- `returnDisplay` — For UI presentation
|
||||
- `displayContent` — Additional display formatting
|
||||
- `errorDetails` — Optional error info
|
||||
- `result` — Structured data payload
|
||||
- `tailCall` — Optional chaining requests
|
||||
|
||||
### Confirmation System (6 types)
|
||||
|
||||
1. **edit** — File modification with diff
|
||||
2. **execute** — Command execution
|
||||
3. **mcp** — MCP tool with allowlist mgmt
|
||||
4. **info** — Information-only
|
||||
5. **ask_user** — General user approval
|
||||
6. **exit_plan_mode** — Plan exit notification
|
||||
|
||||
### Confirmation Outcomes (7 values)
|
||||
|
||||
ProceedOnce, ProceedAlways, ProceedAlwaysAndSave, ProceedAlwaysServer,
|
||||
ProceedAlwaysTool, ModifyWithEditor, Cancel
|
||||
|
||||
### Tool Kinds
|
||||
|
||||
- **Mutator**: Edit, Delete, Move, Execute
|
||||
- **Read-Only**: Read, Search, Fetch
|
||||
- **Other**: Think, Agent, Communicate, Plan, SwitchMode, Other
|
||||
|
||||
### MCP Tools
|
||||
|
||||
- Naming: `mcp_<server>_<toolname>` (64-char limit)
|
||||
- Schema validation via LenientJsonSchemaValidator
|
||||
- Response types: McpTextBlock, McpMediaBlock, McpResourceBlock,
|
||||
McpResourceLinkBlock
|
||||
- Transform to GenAI Parts format
|
||||
|
||||
### Error Types (20+)
|
||||
|
||||
- **Recoverable**: INVALID_TOOL_PARAMS, FILE_NOT_FOUND,
|
||||
EDIT_NO_OCCURRENCE_FOUND, SHELL_TIMEOUT, MCP_TOOL_ERROR...
|
||||
- **Fatal**: NO_SPACE_LEFT (only one!)
|
||||
|
||||
### ModifiableTool
|
||||
|
||||
- Extends DeclarativeTool with external editor support
|
||||
- `getModifyContext()` → temp files → editor opens → `getUpdatedParams()` → diff
|
||||
|
||||
---
|
||||
|
||||
## Execution Loop (Complete)
|
||||
|
||||
### LocalAgentExecutor Flow
|
||||
|
||||
1. Collect user hints, setup deadline timer
|
||||
2. **Turn loop**: executeTurn() repeatedly until completion
|
||||
3. Per-turn: compress chat → callModel() → processFunctionCalls()
|
||||
4. On limit hit: executeFinalWarningTurn() with 60s grace period
|
||||
5. Return OutputObject { result, terminate_reason }
|
||||
|
||||
### AgentTerminateMode
|
||||
|
||||
GOAL | TIMEOUT | MAX_TURNS | ABORTED | ERROR | ERROR_NO_COMPLETE_TASK_CALL
|
||||
|
||||
### SubagentTool Architecture
|
||||
|
||||
```
|
||||
Parent Agent
|
||||
└─ SubagentTool (wraps AgentDefinition as DeclarativeTool)
|
||||
└─ SubagentToolWrapper (routes by agent kind)
|
||||
├─ LocalSubagentInvocation → LocalAgentExecutor
|
||||
├─ RemoteAgentInvocation → A2AClientManager
|
||||
└─ BrowserAgentInvocation
|
||||
```
|
||||
|
||||
### Agent Types
|
||||
|
||||
- `LocalAgentDefinition` — kind: 'local', has promptConfig, modelConfig,
|
||||
runConfig, toolConfig
|
||||
- `RemoteAgentDefinition` — kind: 'remote', has agentCardUrl, auth config
|
||||
|
||||
### Key Defaults
|
||||
|
||||
- DEFAULT_MAX_TURNS = 15
|
||||
- DEFAULT_MAX_TIME_MINUTES = 5
|
||||
- A2A_TIMEOUT = 1800000 (30 min for remote agents)
|
||||
|
||||
---
|
||||
|
||||
## Services/Config (Complete)
|
||||
|
||||
### ModelConfigService
|
||||
|
||||
- **Alias chains**: Inheritance with `extends`, merged root-to-leaf
|
||||
- **Overrides**: Contextual (model, scope, retry, isChatModel), sorted by
|
||||
specificity
|
||||
- **Runtime registration**: Dynamic aliases and overrides
|
||||
- **Deep merge**: Objects merged, arrays replaced entirely
|
||||
|
||||
### ModelRouterService (Strategy Chain)
|
||||
|
||||
1. Fallback & Override → 2. Approval Mode → 3. Gemma Classifier → 4. Generic
|
||||
Classifier → 5. Numerical Classifier → 6. Default
|
||||
|
||||
### ModelAvailabilityService
|
||||
|
||||
- Terminal (permanent), Sticky_retry (one retry per turn), Healthy
|
||||
- `selectFirstAvailable()` iterates fallback chain
|
||||
- `resetTurn()` at turn boundaries enables fresh retries
|
||||
|
||||
### Config (~95KB!)
|
||||
|
||||
Central dependency injection. Initializes: ModelAvailabilityService →
|
||||
ModelConfigService → FolderTrustDiscoveryService → PolicyEngine →
|
||||
FileDiscoveryService → GitService → ToolRegistry → MCP → GeminiClient →
|
||||
HookSystem
|
||||
|
||||
### CoreEventEmitter (UI Events)
|
||||
|
||||
Event types: UserFeedback, ModelChanged, ConsoleLog, Output, RetryAttempt,
|
||||
ConsentRequest, McpProgress, Hook, QuotaChanged
|
||||
|
||||
Backlog buffering (max 10,000) with head-pointer eviction and auto-compaction.
|
||||
|
||||
### Scheduler Types
|
||||
|
||||
```typescript
|
||||
ToolCallRequestInfo {
|
||||
callId, name, args, originalRequestName,
|
||||
isClientInitiated, prompt_id, checkpoint, traceId,
|
||||
parentCallId, schedulerId
|
||||
}
|
||||
ToolCallResponseInfo {
|
||||
callId, responseParts, resultDisplay, error, errorType,
|
||||
outputFile, contentLength, data
|
||||
}
|
||||
CoreToolCallStatus: Validating → AwaitingApproval → Scheduled → Executing → Success|Error|Cancelled
|
||||
```
|
||||
|
||||
### FolderTrust
|
||||
|
||||
Scans: commands (.toml), skills (SKILL.md), settings.json, MCP servers, hooks
|
||||
Security warnings: auto-approved tools, autonomous agents, disabled trust,
|
||||
disabled sandbox Pattern: discovery → review → execution (no code runs during
|
||||
scan)
|
||||
@@ -0,0 +1,349 @@
|
||||
# Interface Priority Analysis & Open Questions
|
||||
|
||||
## The Big Picture
|
||||
|
||||
We're defining **framework-agnostic interfaces** that allow gemini-cli to:
|
||||
|
||||
1. Keep its existing execution loop working unchanged (Legacy path)
|
||||
2. Swap in ADK as an alternative runtime via config flag
|
||||
3. Eventually support OpenRouter or other agent backends
|
||||
4. Maintain all existing CLI behavior: hooks, policies, confirmations, UI events
|
||||
|
||||
## Proposed Interface Layers (Priority Order)
|
||||
|
||||
---
|
||||
|
||||
### P0 (Critical Path - Must Define First)
|
||||
|
||||
#### 1. AgentEvent / Event Stream Contract
|
||||
|
||||
**Why first:** Everything else consumes or produces these events. The UI renders
|
||||
them. The hooks intercept them. The adapters translate to/from them.
|
||||
|
||||
**Key decision:** Merge Dewitt's simpler model with Coworker's richer model?
|
||||
|
||||
**Recommendation:** Coworker's approach is more complete. Key additions:
|
||||
|
||||
- `threadId` for sub-agent tracking (AG-UI has `parentRunId`)
|
||||
- `tool_update` for progress on long-running tools
|
||||
- `elicitation_request/response` as first-class (not just tool_confirmation)
|
||||
- `usage` event for token tracking
|
||||
- `_meta` escape hatch (matches AG-UI's extensibility philosophy)
|
||||
- `initialize` event (matches AG-UI's RunStarted)
|
||||
|
||||
**Open questions:**
|
||||
|
||||
- Do we need AG-UI's start/content/end triple pattern for streaming? Or is
|
||||
yielding partial events sufficient?
|
||||
- How do ContentPart types map to existing gemini-cli Part types?
|
||||
- Should events carry a `source` field? (useful for hook attribution)
|
||||
|
||||
#### 2. Agent Interface
|
||||
|
||||
**Why second:** This is the primary abstraction that LocalAgentExecutor, ADK
|
||||
adapters, and future OpenRouter adapters all implement.
|
||||
|
||||
**Key decision:** Dewitt's `runAsync/runEphemeral` vs Coworker's
|
||||
`send(Trajectory|string)`
|
||||
|
||||
**Recommendation:** Hybrid approach:
|
||||
|
||||
- Dewitt's `runAsync/runEphemeral` split is ADK-aligned and cleaner for the
|
||||
factory pattern
|
||||
- BUT add Coworker's elicitation support via AgentSend union type
|
||||
- The Trajectory concept is powerful but may be too opinionated for Phase 2
|
||||
|
||||
```
|
||||
Agent<TInput, TOutput>
|
||||
name: string
|
||||
description: string
|
||||
runAsync(input, options) → AsyncGenerator<AgentEvent, TOutput>
|
||||
runEphemeral(input, options) → AsyncGenerator<AgentEvent, TOutput>
|
||||
```
|
||||
|
||||
**Open questions:**
|
||||
|
||||
- Should Agent also support `send()` for mid-stream interactions (elicitations)?
|
||||
- How does AbortSignal propagate through the adapter boundary?
|
||||
- Do we need a `capabilities` field (supports elicitation? supports HITL? etc.)?
|
||||
|
||||
#### 3. Tool Execution Contract
|
||||
|
||||
**Why third:** Tools are the primary action mechanism. Both the policy engine
|
||||
and hooks system wrap tool execution.
|
||||
|
||||
**What needs abstracting:**
|
||||
|
||||
- Tool declaration (name, schema) — already somewhat generic via JSON Schema
|
||||
- Tool execution (args → result)
|
||||
- Tool confirmation flow (ASK_USER → user decision → proceed/deny)
|
||||
- Tool result shape (llmContent + displayContent + error + tailCalls)
|
||||
|
||||
**Key decision:** Keep DeclarativeTool pattern or flatten to a simpler
|
||||
interface?
|
||||
|
||||
**Recommendation:** Define a minimal `ToolExecutor` interface:
|
||||
|
||||
```
|
||||
ToolExecutor {
|
||||
name: string
|
||||
description: string
|
||||
schema: JSONSchema
|
||||
execute(args, context): Promise<ToolResult>
|
||||
requiresConfirmation?(args, context): Promise<boolean>
|
||||
}
|
||||
```
|
||||
|
||||
DeclarativeTool remains the concrete implementation. ADK's BaseTool adapts to
|
||||
this.
|
||||
|
||||
**Open questions:**
|
||||
|
||||
- How do MCP tools fit? They already have their own protocol.
|
||||
- Tool annotations (destructive hints) — should these be in the interface?
|
||||
- Long-running tools need progress reporting — how does this interact with
|
||||
tool_update events?
|
||||
|
||||
---
|
||||
|
||||
### P1 (Important - Define After P0)
|
||||
|
||||
#### 4. Policy / Permission Interface
|
||||
|
||||
**Why important:** Every tool call goes through policy. External agents need
|
||||
policy enforcement too.
|
||||
|
||||
**Current state:** gemini-cli has a sophisticated TOML-based policy engine with
|
||||
tiered priorities. ADK-TS has a simpler SecurityPlugin with PolicyOutcome
|
||||
(DENY/CONFIRM/ALLOW).
|
||||
|
||||
**What needs abstracting:**
|
||||
|
||||
```
|
||||
PolicyEngine {
|
||||
evaluate(toolName, args, context): PolicyDecision // ALLOW | DENY | ASK_USER
|
||||
getExcludedTools(): string[] // Tools statically denied
|
||||
}
|
||||
```
|
||||
|
||||
**Key decision:** Do external agents (OpenRouter, etc.) get the same policy
|
||||
enforcement?
|
||||
|
||||
**Open questions:**
|
||||
|
||||
- If an ADK agent calls a tool internally, does gemini-cli's policy apply?
|
||||
- With `pauseOnToolCalls: true` in ADK, the CLI controls execution — but what
|
||||
about headless mode?
|
||||
- How do agent-level policies work? (allow/deny entire agents, not just tools)
|
||||
- Should policy be a middleware (AG-UI pattern) or a callback (ADK plugin
|
||||
pattern)?
|
||||
|
||||
#### 5. Hooks Interface
|
||||
|
||||
**Why important:** Hooks are a major gemini-cli feature. They need to work
|
||||
regardless of which agent backend runs.
|
||||
|
||||
**Current state:** 11 hook types firing at specific lifecycle points.
|
||||
|
||||
**What needs abstracting:**
|
||||
|
||||
- Hook lifecycle must be backend-agnostic
|
||||
- BeforeModel/AfterModel hooks need to work even when ADK controls the model
|
||||
- BeforeTool/AfterTool hooks need to intercept regardless of who executes the
|
||||
tool
|
||||
|
||||
**Key challenge:** When ADK runs the model internally, gemini-cli hooks can't
|
||||
easily intercept. **Dewitt's solution:** ADK uses gemini-cli's model via
|
||||
AdkGeminiModel adapter — hooks fire inside GeminiChat.
|
||||
|
||||
**Open questions:**
|
||||
|
||||
- If OpenRouter runs the model, how do BeforeModel/AfterModel hooks work?
|
||||
- Do we need a "model steering" abstraction (injecting context mid-stream)?
|
||||
- Can hooks be expressed as AG-UI middleware? (intercept event stream)
|
||||
|
||||
#### 6. Model / LLM Interface
|
||||
|
||||
**Why important:** Model abstraction enables swapping LLM providers.
|
||||
|
||||
**Dewitt's approach:** Exposes Model interface, ADK uses it via AdkGeminiModel
|
||||
adapter. **Coworker's approach:** Model is internal to Agent (no separate Model
|
||||
interface).
|
||||
|
||||
**Recommendation:** Keep Dewitt's separate Model interface BUT make it
|
||||
provider-agnostic:
|
||||
|
||||
- Remove `@google/genai` types from the interface signature
|
||||
- Define generic Message/Content types
|
||||
- Model interface is an implementation detail, not part of the Agent contract
|
||||
|
||||
**Open questions:**
|
||||
|
||||
- Can we define a truly provider-agnostic Model interface?
|
||||
- Or is the Model always tied to the agent backend? (ADK uses Gemini, OpenRouter
|
||||
uses whatever)
|
||||
- Model routing (choosing which model) — is this a concern of the Model
|
||||
interface or a separate service?
|
||||
|
||||
---
|
||||
|
||||
### P2 (Important but Can Follow)
|
||||
|
||||
#### 7. Session / State Interface
|
||||
|
||||
**Current state:** gemini-cli uses ChatRecordingService (JSON files). ADK uses
|
||||
Session with BaseSessionService.
|
||||
|
||||
**What needs abstracting:**
|
||||
|
||||
- Session creation/retrieval
|
||||
- State persistence across turns
|
||||
- History/trajectory management
|
||||
|
||||
**Open questions:**
|
||||
|
||||
- Does the trajectory (coworker's concept) replace gemini-cli's chat recording?
|
||||
- Should session state be shared between gemini-cli and the agent backend?
|
||||
|
||||
#### 8. Elicitation / User Interaction Interface
|
||||
|
||||
**What it covers:** Model fallback dialogs, tool confirmations, Ctrl+B
|
||||
interrupts, user questions
|
||||
|
||||
**Current state:** gemini-cli uses ConfirmationBus + MessageBus. AG-UI uses
|
||||
frontend tools.
|
||||
|
||||
**Open questions:**
|
||||
|
||||
- Is elicitation just a special case of tool calls (AG-UI approach)?
|
||||
- Or is it a first-class event type (coworker's approach)?
|
||||
- How does Ctrl+B (cancel/interrupt) propagate through the agent boundary?
|
||||
|
||||
#### 9. Configuration / Capability Discovery
|
||||
|
||||
**What it covers:** Feature flags, experiment settings, agent capabilities
|
||||
|
||||
**Open questions:**
|
||||
|
||||
- How does an external agent declare its capabilities?
|
||||
- Does OpenRouter support HITL? Elicitation? Tool confirmation? Each agent may
|
||||
differ.
|
||||
- Need a `capabilities` negotiation at connection time?
|
||||
|
||||
---
|
||||
|
||||
### P3 (Future / Can Defer)
|
||||
|
||||
#### 10. A2UI / Rich UI Interface
|
||||
|
||||
- Declarative UI generation from agents
|
||||
- Not critical for Phase 2 but important for differentiation
|
||||
|
||||
#### 11. Memory / Artifact Interface
|
||||
|
||||
- ADK has memory/artifact services
|
||||
- gemini-cli has ChatRecordingService + memory tools
|
||||
- Can standardize later
|
||||
|
||||
#### 12. Telemetry / Observability Interface
|
||||
|
||||
- Both systems have telemetry
|
||||
- Can standardize later
|
||||
|
||||
---
|
||||
|
||||
## Critical Open Questions (Need Team Discussion)
|
||||
|
||||
### 1. OpenRouter Integration Model
|
||||
|
||||
**Question:** When OpenRouter (or any external agent) is used, what does the
|
||||
integration look like?
|
||||
|
||||
**Option A: Full Agent Interface** — OpenRouter implements the Agent interface
|
||||
directly
|
||||
|
||||
- Pro: Clean, uniform
|
||||
- Con: OpenRouter doesn't support HITL, hooks, policies natively
|
||||
|
||||
**Option B: ACP Shim** — Agent Communication Protocol between CLI and external
|
||||
agents
|
||||
|
||||
- Pro: Standards-based
|
||||
- Con: Additional protocol layer, may be premature
|
||||
|
||||
**Option C: Model-only Integration** — OpenRouter is just an alternative Model,
|
||||
not Agent
|
||||
|
||||
- Pro: Simpler, leverages existing agent loop
|
||||
- Con: Doesn't support OpenRouter-specific features
|
||||
|
||||
**Recommendation:** Start with Option C (model-only). OpenRouter provides an LLM
|
||||
endpoint. Gemini-cli's own agent loop handles tools, policies, hooks. This means
|
||||
defining a provider-agnostic Model interface is the key enabler.
|
||||
|
||||
### 2. Tool Execution: Client-side vs Agent-side
|
||||
|
||||
**Question:** Who executes tools — the CLI or the agent backend?
|
||||
|
||||
**Option A: Always client-side** (CLI executes, agent suspends)
|
||||
|
||||
- ADK: `pauseOnToolCalls: true`
|
||||
- Pro: CLI maintains control, policies enforced, hooks fire
|
||||
- Con: Higher latency, more round-trips
|
||||
|
||||
**Option B: Agent-side execution** (agent runs tools internally)
|
||||
|
||||
- Pro: Faster, simpler
|
||||
- Con: Bypasses CLI policies, hooks, confirmations
|
||||
|
||||
**Option C: Configurable** — CLI decides per-tool or per-agent
|
||||
|
||||
- Pro: Flexible
|
||||
- Con: Complex
|
||||
|
||||
**Recommendation:** Option A for safety-critical CLI use case. Option B only for
|
||||
trusted/sandboxed sub-agents.
|
||||
|
||||
### 3. Model Steering (Hooks that inject context mid-stream)
|
||||
|
||||
**Question:** How do user-local hooks (like injecting project context) work with
|
||||
external agents?
|
||||
|
||||
**Answer:** They can only work if:
|
||||
|
||||
- The CLI controls the model (via Model interface adapter) — then BeforeModel
|
||||
hook injects context
|
||||
- OR the agent supports a "system instruction update" mechanism
|
||||
|
||||
For OpenRouter: model steering works because CLI controls the model call. For
|
||||
ADK: model steering works because AdkGeminiModel wraps GeminiChat. For fully
|
||||
opaque agents: model steering **cannot work** — this is a known limitation.
|
||||
|
||||
### 4. Elicitation Flow
|
||||
|
||||
**Question:** When the agent needs user input (model fallback, clarification),
|
||||
how does it work?
|
||||
|
||||
**For CLI-controlled agents:** Agent yields an elicitation_request event → CLI
|
||||
renders prompt → user responds → CLI sends response back via session.stream({
|
||||
kind: 'elicitation_response', ... }) to resume
|
||||
|
||||
**For external agents:** Agent uses A2A protocol or similar to send elicitation
|
||||
→ CLI bridges the request to user → response sent back via protocol
|
||||
|
||||
**Key insight:** Elicitation is fundamentally about the agent SUSPENDING and
|
||||
waiting for user input. ADK already supports this via `pauseOnToolCalls`. Can we
|
||||
generalize to `pauseOnElicitation`?
|
||||
|
||||
### 5. Sub-agent Identity and Policies
|
||||
|
||||
**Question:** When a sub-agent spawns, does it inherit parent policies? Get its
|
||||
own?
|
||||
|
||||
**Current gemini-cli behavior:** Sub-agents registered as tools, go through same
|
||||
policy engine. **ADK behavior:** Sub-agents are child nodes in agent tree, get
|
||||
parent's plugins.
|
||||
|
||||
**Recommendation:** Sub-agents inherit parent policy context. Additional
|
||||
restrictions can be layered (e.g., sub-agent X cannot use shell tool). This is
|
||||
already how gemini-cli works.
|
||||
Reference in New Issue
Block a user