Add validated architectural notes

This commit is contained in:
Adam Weidman
2026-04-05 09:13:57 -07:00
parent 137c4cd59c
commit 9f3a154014
6 changed files with 2294 additions and 0 deletions
+529
View File
@@ -0,0 +1,529 @@
# ADK-TS Alignment Pass
Every interface in our outline must map cleanly to ADK-TS. This document
verifies that mapping field-by-field, identifies gaps, and confirms
HITL/plugin/transfer patterns work.
Source: ADK-TS v0.4.0 at `/Users/adamfweidman/Desktop/adk-int/adk-js/core/src/`
---
## 1. AgentDescriptor ↔ ADK Agent Hierarchy
### Field-by-field mapping
| AgentDescriptor field | ADK-TS source | Notes |
| ---------------------------- | -------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name` | `BaseAgent.name` | Direct. ADK validates it's a valid JS identifier. |
| `displayName` | — | ADK doesn't have this. No conflict. |
| `description` | `BaseAgent.description` (optional in ADK) | Direct. Used for model routing in AgentTool. |
| `executor` | — | New concept. ADK agents are always 'adk'. Adapter sets this. |
| `inputSchema` | `LlmAgent.inputSchema` (Zod or JSON Schema) | Direct. ADK's AgentTool uses this for tool parameter generation. |
| `outputSchema` | `LlmAgent.outputSchema` (Zod or JSON Schema) | Direct. ADK uses for structured output + AgentTool response. |
| `capabilities` | — | New concept. Adapter infers from agent type: LlmAgent gets `['elicitation', 'streaming', 'host_tool_execution']`, LoopAgent gets `['composition']`, etc. |
| `ownTools` | `LlmAgent.tools: ToolUnion[]` | Maps via ToolDescriptor adapter. ADK tools have `name`, `description`, `_getDeclaration()` which returns JSON Schema. |
| `requiredTools` | — | New concept. ADK agents don't declare required host tools. Adapter can infer from tool references. |
| `subAgents` | `BaseAgent.subAgents: BaseAgent[]` | Recursive. Each sub-agent becomes a nested AgentDescriptor. |
| `constraints.maxTurns` | `RunConfig.maxLlmCalls` (default 500) | Maps, though semantics differ slightly (LLM calls vs turns). |
| `constraints.maxTimeMinutes` | — | ADK doesn't have time limits. No conflict — host enforces. |
| `constraints.maxBudgetUsd` | — | ADK doesn't have budget. No conflict — host enforces. |
| `metadata` | — | New concept. Adapter can populate from agent registration context. |
### ADK-specific fields NOT in AgentDescriptor
| ADK field | Where it lives | Our approach |
| ----------------------------------- | -------------- | ---------------------------------------------------------------------------------------------------------- |
| `instruction` / `globalInstruction` | LlmAgent | Executor-internal. Not in descriptor (it's runtime config, not identity). |
| `model` | LlmAgent | Goes in ExecutionOptions.model or executor-internal config. |
| `generateContentConfig` | LlmAgent | Executor-internal. |
| `disallowTransferToParent/Peers` | LlmAgent | Could be `constraints` or `_meta`. Transfer policy is host-enforced. |
| `includeContents` | LlmAgent | Executor-internal (context management). |
| `outputKey` | LlmAgent | Executor-internal (state management). |
| `beforeModelCallback`, etc. | LlmAgent | Executor-internal. These are ADK's callback system — our LifecycleInterceptor is the interface equivalent. |
### Verdict: CLEAN MAPPING
AgentDescriptor captures everything needed to describe an ADK agent externally.
ADK-specific runtime config (instruction, model, callbacks) stays inside the
executor — exactly right for the descriptor/executor separation.
**Key ADK pattern preserved:** AgentTool wraps an agent as a tool using
`inputSchema` for parameters and `description` for the tool description. Our
AgentDescriptor has both, so SubagentTool can do the same thing.
---
## 2. AgentSession ↔ ADK Runner
### Method mapping
| AgentSession method | ADK-TS equivalent | How adapter works |
| ----------------------- | --------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `stream(data, options)` | `Runner.runAsync({ userId, sessionId, newMessage, runConfig })` | Adapter creates/loads session, maps data+options → runAsync params, wraps Event generator → AgentEvent generator. Each `stream()` call triggers a new `runAsync()`. |
| `update(config)` | No direct equivalent | ADK doesn't support mid-stream config changes. Adapter queues updates for next `runAsync()` call. |
| `steer(data)` | No direct equivalent | ADK doesn't support mid-stream intervention. Adapter can queue for next invocation or ignore. |
| `abort()` | No direct equivalent | ADK uses `invocationContext.endInvocation = true`. Adapter sets this flag. Could also use AbortController. |
### ExecutionRequest → Runner.runAsync mapping
| ExecutionRequest field | ADK mapping |
| --------------------------- | ---------------------------------------------------------------- |
| `descriptor` | Used to find/create the BaseAgent instance |
| `input` | → `newMessage: Content` (converted from ContentPart[] → Content) |
| `sessionRef` | → `sessionId` (string) or creates session from SessionSnapshot |
| `forkSession` | Adapter clones session before running |
| `options.tools` | → merged into agent's `tools` config |
| `options.model` | → `LlmAgent.model` override |
| `options.hostToolExecution` | → `RunConfig.pauseOnToolCalls: true` |
| `options.streaming` | → `RunConfig.streamingMode` |
| `options.permissionMode` | → SecurityPlugin config |
| `signal` | → wired to `invocationContext.endInvocation` |
### HITL: How pauseOnToolCalls works end-to-end
This is the critical path. Here's the full flow:
```
1. LLM returns tool call (FunctionCall in Event)
2. ADK checks RunConfig.pauseOnToolCalls === true
3. ADK sets invocationContext.endInvocation = true
4. ADK yields the Event (with FunctionCall) and stops
5. Runner.runAsync() generator completes
--- OUR INTERFACE BOUNDARY ---
6. Adapter translates ADK Event → ToolRequestEvent
7. Host receives ToolRequestEvent from session.stream() generator
8. Host runs policy check (PolicyEvaluator.evaluate())
9. Host fires hooks (LifecycleInterceptor.fire('before_tool', ...))
10. If policy allows → Host executes tool → gets ToolResultData
11. Host calls session.stream({ kind: 'tool_result', ... }) to get next stream
--- BACK INTO ADK ---
12. Adapter receives tool result
13. Adapter creates FunctionResponse Content
14. Adapter calls Runner.runAsync() again with FunctionResponse as newMessage
15. ADK loads session (has prior tool call event)
16. ADK resumes agent with tool response
17. Loop continues from step 1
```
**Why this works:** ADK's `pauseOnToolCalls` was designed exactly for this
pattern — external tool execution by a host. The adapter translates between
ADK's "end invocation + resume with FunctionResponse" pattern and our
"ToolRequestEvent + send(tool_result)" pattern.
**Key insight:** Each `session.stream()` call triggers a new `Runner.runAsync()`
call. This means each ADK "invocation" maps to one `stream()` call. The session
persists state across invocations. Mid-stream `update()` and `steer()` calls are
queued for the next invocation since ADK doesn't support mid-turn changes.
### HITL: ToolConfirmation flow
ADK also has a separate ToolConfirmation pattern (via
`context.requestConfirmation()`):
```
1. beforeToolCallback calls context.requestConfirmation({ hint: '...' })
2. This sets eventActions.requestedToolConfirmations[functionCallId]
3. ADK yields event with requestedToolConfirmations populated
4. Runner completes (invocation ends)
--- OUR INTERFACE BOUNDARY ---
5. Adapter sees requestedToolConfirmations in event
6. Adapter translates → ElicitationRequest { kind: 'tool_confirmation', ... }
7. Host renders confirmation UI
8. User responds → ElicitationResponse { action: 'accept' | 'decline' }
--- BACK INTO ADK ---
9. Adapter receives elicitation response
10. If accepted: Adapter creates FunctionResponse with confirmed=true
11. Calls Runner.runAsync() with FunctionResponse
12. ADK's SecurityPlugin or callback reads confirmation from session
13. Tool executes
```
**Maps to our ElicitationRequest:** ADK's `ToolConfirmation.hint`
`ElicitationRequest.message`. ADK's `ToolConfirmation.payload`
`ElicitationRequest.context`. The `kind: 'tool_confirmation'` is the
discriminator.
### HITL: Auth request flow
```
1. Tool or callback calls context.requestCredential(authConfig)
2. Sets eventActions.requestedAuthConfigs[functionCallId]
3. Event yields, invocation ends
--- OUR INTERFACE BOUNDARY ---
4. Adapter sees requestedAuthConfigs
5. Translates → ElicitationRequest { kind: 'auth_required', context: authConfig }
6. User provides credentials
7. ElicitationResponse { action: 'accept', content: { credential: ... } }
--- BACK INTO ADK ---
8. Adapter stores credential via CredentialService
9. Calls Runner.runAsync() again
10. Tool calls context.getAuthResponse() → gets credential
```
**Maps to our ElicitationRequest:** ADK's auth pattern is just another
elicitation kind. This validates our generic elicitation design — it handles
tool confirmation, auth, and any future interaction type.
---
## 3. AgentEvent ↔ ADK Event
### Event type mapping
| Our AgentEvent | ADK Event pattern | Adapter translation |
| --------------------- | --------------------------------------------------------------------------------- | --------------------------------------------- |
| `InitializeEvent` | First event from Runner.runAsync() | Adapter emits on first stream() call |
| `SessionUpdateEvent` | `eventActions.stateDelta` | Adapter emits when stateDelta is non-empty |
| `MessageEvent` | `event.content` with text Parts | Filter text/thought parts from Content |
| `ToolRequestEvent` | `getFunctionCalls(event)` returns FunctionCall[] | Each FunctionCall → one ToolRequestEvent |
| `ToolUpdateEvent` | `event.longRunningToolIds` | Adapter emits progress for long-running tools |
| `ToolResponseEvent` | `getFunctionResponses(event)` returns FunctionResponse[] | Each FunctionResponse → one ToolResponseEvent |
| `ElicitationRequest` | `eventActions.requestedToolConfirmations` or `requestedAuthConfigs` | Map to generic elicitation |
| `ElicitationResponse` | User input → FunctionResponse in next runAsync call | Reverse of above |
| `UsageEvent` | `event.usageMetadata` (GenerateContentResponseUsageMetadata) | Map token counts |
| `ErrorEvent` | `event.errorCode` + `event.errorMessage` | Map error fields |
| `stream_end` | `isFinalResponse(event)`, `eventActions.transferToAgent`, `eventActions.escalate` | Derive `stream_end` reason from ADK signals |
| `CustomEvent` | `event.customMetadata` | Pass through |
### ADK EventActions → Our events
| EventActions field | Our event | Notes |
| ---------------------------- | ------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
| `stateDelta` | SessionUpdate or embedded in other events | Delta state is a core ADK pattern |
| `artifactDelta` | `CustomEvent { kind: 'artifact_delta' }` | Artifacts not in our core events |
| `transferToAgent` | Tool call (`transfer_to_agent`) + `stream_end` `reason: 'completed'` | Handoff is a tool call. Host intercepts the tool request, mediates the handoff, originating agent completes. |
| `escalate` | `stream_end` `reason: 'completed'` with `data: { escalateReason: '...' }` | LoopAgent exit signal. ADK's escalate = "I'm done, pass control back up" |
| `requestedToolConfirmations` | `ElicitationRequest { kind: 'tool_confirmation' }` | Per function call ID |
| `requestedAuthConfigs` | `ElicitationRequest { kind: 'auth_required' }` | Per function call ID |
| `skipSummarization` | `_meta: { skipSummarization: true }` | ADK-specific, goes in metadata |
### AgentEventBase mapping
| AgentEventBase field | ADK Event field | Notes |
| -------------------- | ---------------------------------------- | ------------------------------------------------- |
| `id` | `event.id` | Direct |
| `timestamp` | `event.timestamp` (number) | Convert to ISO 8601 string |
| `type` | Derived from content analysis | ADK doesn't have event types — adapter classifies |
| `agentId` | `event.author` (agent name) or context | **New field** — which agent emitted this event |
| `threadId` | `event.branch` (e.g., "agent_1.agent_2") | Direct mapping |
| `source` | `event.author` ("user" or agent name) | Direct |
| `_meta` | `event.customMetadata` | Direct |
### Verdict: CLEAN MAPPING
Every ADK event pattern maps to our event types. The adapter classifies ADK's
untyped events into our typed event taxonomy. Key insight: ADK events are richer
(they carry EventActions, function calls, auth requests all in one event), so
the adapter may fan out one ADK Event into multiple AgentEvents (e.g., one
Message + one ToolRequest + one ElicitationRequest). The new `agentId` field
maps directly from ADK's `event.author`.
---
## 4. ToolContract ↔ ADK Tool System
### ToolDescriptor ↔ BaseTool
| ToolDescriptor field | ADK source | Notes |
| ------------------------- | ------------------------------------------------------------- | --------------------------------- |
| `name` | `BaseTool.name` | Direct |
| `displayName` | — | ADK doesn't have this |
| `description` | `BaseTool.description` | Direct |
| `parametersSchema` | `BaseTool._getDeclaration()` → FunctionDeclaration.parameters | JSON Schema from declaration |
| `annotations.readOnly` | Inferred from tool type | FunctionTool with no side effects |
| `annotations.longRunning` | `BaseTool.isLongRunning` | Direct |
### ToolCallRequest ↔ FunctionCall
| ToolCallRequest | ADK FunctionCall | Notes |
| --------------- | ------------------- | ------ |
| `requestId` | `functionCall.id` | Direct |
| `name` | `functionCall.name` | Direct |
| `args` | `functionCall.args` | Direct |
### ToolResultData ↔ FunctionResponse + tool return
| ToolResultData | ADK | Notes |
| ---------------- | ------------------------------ | ------------------------------------------------ |
| `llmContent` | `FunctionResponse.response` | Adapter wraps into ContentPart[] |
| `displayContent` | — | ADK doesn't separate display from model content |
| `isError` | Error thrown from `runAsync()` | Adapter catches and sets flag |
| `tailCalls` | — | ADK doesn't have tail calls (gemini-cli concept) |
### AgentTool pattern
ADK's `AgentTool` wraps a `BaseAgent` as a `BaseTool`:
- Uses `agent.inputSchema` for tool parameters
- Uses `agent.description` for tool description
- Creates internal Runner with isolated session
- Returns agent output as tool result
- Merges state deltas back to parent
**Our equivalent:** `SubagentTool` wraps `AgentDescriptor` as a tool:
- Uses `descriptor.inputSchema` for tool parameters
- Uses `descriptor.description` for tool description
- Creates executor via `SessionFactory.create(descriptor, context)`
- Returns execution result as tool result
**Mapping is 1:1.** The only difference is ADK does it with concrete agent
instances; we do it with descriptors + factory.
---
## 5. LifecycleInterceptor ↔ ADK Plugin System
### Hook point mapping
| Our hook point string | ADK Plugin callback | Mapping |
| --------------------- | ----------------------- | ------------------------------------------ |
| `'before_agent'` | `beforeAgentCallback` | `payload: { agent, context }` |
| `'after_agent'` | `afterAgentCallback` | `payload: { agent, context }` |
| `'before_model'` | `beforeModelCallback` | `payload: { context, llmRequest }` |
| `'after_model'` | `afterModelCallback` | `payload: { context, llmResponse }` |
| `'before_tool'` | `beforeToolCallback` | `payload: { tool, args, context }` |
| `'after_tool'` | `afterToolCallback` | `payload: { tool, args, context, result }` |
| `'on_event'` | `onEventCallback` | `payload: { event }` |
| `'on_user_message'` | `onUserMessageCallback` | `payload: { userMessage }` |
| `'before_run'` | `beforeRunCallback` | `payload: { context }` |
| `'after_run'` | `afterRunCallback` | `payload: { context }` |
| `'on_model_error'` | `onModelErrorCallback` | `payload: { request, error }` |
| `'on_tool_error'` | `onToolErrorCallback` | `payload: { tool, args, error }` |
### HookResult ↔ ADK callback return
| HookResult field | ADK pattern | Notes |
| ------------------- | ----------------------------------------------- | ----------------------------------- |
| `action: 'proceed'` | Return `undefined` | Plugin returns nothing → continue |
| `action: 'block'` | Return `Content` (for agent/model) or throw | Non-undefined return short-circuits |
| `modifications` | Return modified `LlmRequest`/`LlmResponse`/args | Plugin returns modified version |
### ADK's early-exit pattern
ADK plugins use "first non-undefined return wins":
- `beforeModelCallback` returns `LlmResponse` → skips LLM call entirely (cache
hit)
- `beforeToolCallback` returns modified `args` → tool runs with new args
- `beforeAgentCallback` returns `Content` → skips agent run entirely
Our `HookResult.modifications` carries the same data. The `action: 'block'` +
return value pattern maps cleanly.
### gemini-cli hooks NOT in ADK
| gemini-cli hook | ADK equivalent | Notes |
| --------------------- | ------------------------------------ | ------------------------------------------------------------- |
| `BeforeToolSelection` | — | ADK doesn't let you modify which tools are available mid-turn |
| `Notification` | — | ADK doesn't have notification hooks |
| `SessionStart` | `onUserMessageCallback` (first call) | Close enough |
| `SessionEnd` | `afterRunCallback` | Close enough |
| `PreCompress` | — | ADK doesn't have context compression hooks |
These gaps are fine — they're gemini-cli-specific hook points. Our generic
`fire(hookPoint, payload)` handles them because the hook point is an open
string. ADK executors simply don't fire these hook points, and
`supportedHookPoints()` reflects that.
---
## 6. PolicyEvaluator ↔ ADK SecurityPlugin
### ADK SecurityPlugin
```typescript
class SecurityPlugin extends BasePlugin {
policyEngine: BasePolicyEngine;
// In beforeToolCallback:
async beforeToolCallback({ tool, args, context }) {
const outcome = await this.policyEngine.evaluate(tool.name, args);
switch (outcome) {
case PolicyOutcome.DENY:
throw error;
case PolicyOutcome.CONFIRM:
context.requestConfirmation({ hint });
case PolicyOutcome.ALLOW:
return undefined; // proceed
}
}
}
```
### Mapping
| Our PolicyEvaluator | ADK SecurityPlugin | Notes |
| ------------------------- | --------------------------------------------------------- | ------------------------------------------ |
| `evaluate(request)` | `policyEngine.evaluate(toolName, args)` | ADK is simpler — tool name + args only |
| `PolicyDecision.allow` | `PolicyOutcome.ALLOW` | Direct |
| `PolicyDecision.deny` | `PolicyOutcome.DENY` | Direct |
| `PolicyDecision.ask_user` | `PolicyOutcome.CONFIRM``context.requestConfirmation()` | ADK chains to ToolConfirmation |
| `getExcluded()` | — | ADK doesn't pre-filter tools |
| `request.principal` | — | ADK doesn't track who's calling |
| `request.principalPath` | Could use `context.agentName` + branch | For hierarchical policy |
| `request.context` | — | Our extension point for host-specific data |
### How ADK policy maps when host controls execution
With `pauseOnToolCalls: true`, the flow is:
1. ADK yields tool call → adapter converts to ToolRequestEvent
2. **Host** runs PolicyEvaluator.evaluate() — NOT ADK's SecurityPlugin
3. Host decides allow/deny/ask_user
4. If allowed, host executes tool and sends result via `session.stream()`
This means **ADK's SecurityPlugin is bypassed when the host controls tool
execution** — which is correct! The host's PolicyEvaluator is the authority.
ADK's SecurityPlugin only matters when ADK executes tools internally
(`pauseOnToolCalls: false`).
---
## 7. SessionContract ↔ ADK Session
### Session mapping
| Our SessionHandle | ADK Session | Notes |
| ----------------- | ---------------------------------------- | ----------------------------------------- |
| `id` | `Session.id` | Direct |
| `agentName` | `Session.appName` | ADK uses appName, not agent name |
| `events` | `Session.events: Event[]` | Direct (but ADK Events → our AgentEvents) |
| `state` | `Session.state: Record<string, unknown>` | Direct |
| `lastUpdateTime` | `Session.lastUpdateTime` | Direct |
### SessionProvider ↔ BaseSessionService
| Our SessionProvider | ADK BaseSessionService | Notes |
| ----------------------------- | ----------------------------------------------- | -------------------------- |
| `create(agentName, metadata)` | `createSession({ appName, userId })` | ADK requires userId |
| `load(sessionId)` | `getSession({ appName, userId, sessionId })` | ADK requires all three IDs |
| `list(agentName)` | `listSessions({ appName, userId })` | ADK scopes by userId |
| `delete(sessionId)` | `deleteSession({ appName, userId, sessionId })` | Same pattern |
### Gap: ADK requires userId
ADK sessions are scoped by `(appName, userId, sessionId)`. Our interface uses
just `sessionId`. The adapter can embed userId in the session metadata or derive
it from HostContext.
### State prefixes (ADK-specific)
ADK uses prefixed state keys:
- `app:` — app-scoped, persisted
- `user:` — user-scoped, persisted
- `temp:` — temporary, stripped before persistence
Our `SessionHandle.state` is a flat `Record<string, unknown>`. The adapter
preserves prefixes as-is — they're just string keys. No conflict.
---
## 8. ContentPart ↔ ADK Content/Part
### ADK uses Google GenAI types
ADK's `Content` and `Part` come from `@google/genai`:
```typescript
interface Content {
role?: string; // 'user' | 'model'
parts: Part[];
}
type Part = TextPart | InlineDataPart | FunctionCallPart | FunctionResponsePart | ...
```
### Mapping
| Our ContentPart | ADK/GenAI Part | Notes |
| --------------------------------------------------- | ---------------------------------------------- | ------------------------------------------------------ |
| `{ type: 'text', text }` | `{ text: string }` | Direct |
| `{ type: 'thought', thought }` | `{ thought: true, text: string }` | ADK uses `thought` boolean flag on TextPart |
| `{ type: 'media', mimeType, data }` | `{ inlineData: { mimeType, data } }` | Restructure |
| `{ type: 'reference', text, uri }` | `{ fileData: { fileUri, mimeType } }` | Map fileData → reference |
| `{ type: 'refusal', text }` | — | Not in ADK/GenAI. Adapter would map from finishReason. |
| `{ type: 'function_call', name, args, id }` | `{ functionCall: { name, args, id } }` | Unwrap |
| `{ type: 'function_response', name, response, id }` | `{ functionResponse: { name, response, id } }` | Unwrap |
### Verdict: CLEAN MAPPING
The adapter converts between our flat discriminated union and ADK's nested Part
structure. No information loss in either direction.
---
## 9. Composition ↔ ADK Agent Patterns
| Our CompositionConfig.pattern | ADK Agent type | Notes |
| ----------------------------- | -------------------------------------- | ------------------------------------------------ |
| `'hierarchical'` | Any agent with `subAgents` | Default — parent calls sub-agents as tools |
| `'sequential'` | `SequentialAgent` | Runs children in order |
| `'parallel'` | `ParallelAgent` | Runs children concurrently, branch isolation |
| `'loop'` | `LoopAgent` | Repeats children until escalate or maxIterations |
| `'transfer'` | LlmAgent with `transfer_to_agent` tool | Peer-to-peer handoff |
### Branch isolation
ADK's `ParallelAgent` gives each child an isolated `branch` context:
- Children don't see peer events
- Each gets unique branch path: `"parent.child_0"`, `"parent.child_1"`
- Results merged after all complete
Maps to our `threadId` — each parallel branch gets a unique threadId. Events
from different branches are interleaved by the host.
---
## 10. Summary: Gaps and Resolutions
### No gaps blocking ADK integration:
| Concern | Status | Resolution |
| ----------------------- | --------- | ------------------------------------------------------------------------- |
| pauseOnToolCalls HITL | **Works** | Adapter maps to stream() cycle (§2) |
| ToolConfirmation | **Works** | Maps to ElicitationRequest (§2) |
| Auth requests | **Works** | Maps to ElicitationRequest (§2) |
| Plugin hooks (12 types) | **Works** | Maps to LifecycleInterceptor.fire() (§5) |
| Agent transfers | **Works** | Tool call (`transfer_to_agent`) + `stream_end` `reason: 'completed'` (§3) |
| State delta pattern | **Works** | SessionUpdateEvent or \_meta (§3) |
| Branch isolation | **Works** | threadId mapping (§9) |
| AgentTool pattern | **Works** | SubagentTool with descriptor + factory (§4) |
| Session management | **Works** | Adapter maps userId into session (§7) |
### Minor adapter complexity:
1. **Event fan-out:** One ADK Event may become multiple AgentEvents (message +
tool call + elicitation). Adapter logic needed but straightforward.
2. **userId scoping:** ADK sessions require userId; our interface doesn't.
Adapter derives from HostContext.
3. **Timestamp format:** ADK uses `number` (epoch ms); we use ISO 8601 string.
Simple conversion.
4. **Content structure:** ADK uses nested Part types; we use flat discriminated
union. Adapter converts bidirectionally.
### ADK features our interface supports that gemini-cli doesn't have yet:
- `LoopAgent` / `ParallelAgent` / `SequentialAgent` composition → our
CompositionConfig
- `eventActions.stateDelta` → our SessionUpdateEvent
- `eventActions.transferToAgent` → tool call (`transfer_to_agent`) +
`stream_end` `reason: 'completed'`
- `eventActions.escalate``stream_end` `reason: 'completed'` with
`data: { escalateReason }`
- Long-running tools → our ToolUpdateEvent
- Auth credential flow → our ElicitationRequest with kind: 'auth_required'
@@ -0,0 +1,274 @@
# ADK-TS (Agent Development Kit - TypeScript) Architecture Notes
## Package: `@google/adk` v0.4.0
**Location:** `/Users/adamfweidman/Desktop/adk-int/adk-js/core/`
## Agent Hierarchy
```
BaseAgent (abstract)
├── LlmAgent - Model-driven agent with tools (the main one)
├── LoopAgent - Runs sub-agents in a loop (maxIterations, escalate to exit)
├── ParallelAgent - Runs sub-agents concurrently (isolated branches)
└── SequentialAgent - Runs sub-agents sequentially
```
### BaseAgent Config
- `name: string` - Unique identifier (must be valid JS identifier)
- `description?: string` - One-line capability for model routing
- `parentAgent?: BaseAgent` - Parent in agent tree
- `subAgents?: BaseAgent[]` - Child agents
- `beforeAgentCallback / afterAgentCallback` - Pre/post execution hooks
### LlmAgent Config (extends BaseAgent)
- `model?: string | BaseLlm` - LLM to use
- `instruction?: string | InstructionProvider` - Agent-specific instructions
- `globalInstruction?: string | InstructionProvider` - Tree-wide (root only)
- `tools?: ToolUnion[]` - Available tools
- `generateContentConfig?: GenerateContentConfig` - LLM params
- `disallowTransferToParent / disallowTransferToPeers` - Transfer controls
- `includeContents?: 'default' | 'none'` - Context history inclusion
- `inputSchema / outputSchema` - Validation schemas
- `outputKey?: string` - Session state key for output storage
- `beforeModelCallback / afterModelCallback` - LLM hooks
- `beforeToolCallback / afterToolCallback` - Tool hooks
- `requestProcessors / responseProcessors` - LLM request/response processors
- `codeExecutor?: BaseCodeExecutor`
## Event System
### Event Interface
```typescript
interface Event extends LlmResponse {
id: string;
invocationId: string;
author?: string; // "user" or agent name
actions: EventActions; // State/artifact/auth/transfer operations
longRunningToolIds?: string[];
branch?: string; // Hierarchical agent path
timestamp: number;
content?: Content;
partial?: boolean; // Streaming indicator
}
```
### EventActions
```typescript
interface EventActions {
skipSummarization?: boolean;
stateDelta: Record<string, unknown>;
artifactDelta: Record<string, number>;
transferToAgent?: string;
escalate?: boolean;
requestedAuthConfigs: Record<string, AuthConfig>;
requestedToolConfirmations: Record<string, ToolConfirmation>;
}
```
### Structured Events (utility layer)
Converts raw Event to discriminated union:
```
EventType: THOUGHT | CONTENT | TOOL_CALL | TOOL_RESULT | CALL_CODE |
CODE_RESULT | ERROR | ACTIVITY | TOOL_CONFIRMATION | FINISHED
```
## Tool System
### BaseTool (abstract)
- `name, description, isLongRunning`
- `_getDeclaration(): FunctionDeclaration` - OpenAPI schema for LLM
- `runAsync(request): Promise<unknown>` - Execute tool
- `processLlmRequest(request): Promise<void>` - Preprocessing
### Concrete Tool Types
1. **FunctionTool** - Generic typed tools (Zod schema support)
2. **AgentTool** - Wrap agents as tools (for hierarchical composition)
3. **MCPTool** - Model Context Protocol server tools
4. **GoogleSearchTool** - Built-in web search
5. **ExitLoopTool** - Signal loop exit
6. **LongRunningFunctionTool** - Async long-running operations
### BaseToolset
- Filter tools by predicate or string list
- `getTools(context)`, `close()`, `isToolSelected()`
- **MCPToolset** - Toolset for MCP server connections
## Session Management
### Session Interface
```typescript
interface Session {
id: string;
appName: string;
userId: string;
state: Record<string, unknown>; // Mutable key-value store
events: Event[]; // Complete conversation history
lastUpdateTime: number;
}
```
### Session Services
- `BaseSessionService` (abstract) - createSession, getSession, listSessions,
deleteSession, appendEvent
- `InMemorySessionService` - In-process storage
- `DatabaseSessionService` - Mikro-ORM backed (SQL)
### State Management
- `State` class wraps base state + delta
- `get()` returns from delta if present, else base
- `set()` updates delta only
- `hasDelta()` checks if changes made
## Human-in-the-Loop (HITL)
### Tool Confirmation
```typescript
class ToolConfirmation {
hint?: string; // Guidance for user
confirmed: boolean; // User approval
payload?: unknown; // Additional context
}
```
### Security Plugin
- `beforeToolCallback` - Evaluates policy before tool execution
- `BasePolicyEngine` interface with `evaluate()` method
- `PolicyOutcome`: DENY | CONFIRM | ALLOW
### Auth Requests
- `context.requestCredential(authConfig)` - Request auth from user
- `context.getAuthResponse(authConfig)` - Check for auth response
- Sets `eventActions.requestedAuthConfigs[functionCallId]`
## Multi-Agent Patterns
### Agent Transfer
- LlmAgent injects `transfer_to_agent(agentName)` tool
- Sets `eventActions.transferToAgent = targetAgentName`
- Runner resolves target and continues
- Can transfer to: sub-agents, parent (if not disabled), peers (if not disabled)
### Parallel Agent
- Runs all subAgents concurrently
- Isolates each via `branch` context
- Sub-agents don't see peer history
- Merges event streams with fair ordering
### Loop Agent
- Repeatedly runs subAgents
- `maxIterations` caps loop count
- Exits on `event.actions.escalate === true`
## Plugin System
### BasePlugin Lifecycle Hooks (14 hooks!)
- `onUserMessageCallback` - Preprocess user messages
- `beforeRunCallback` - Before agent run (can short-circuit)
- `onEventCallback` - Per-event (can modify events)
- `afterRunCallback` - Final cleanup
- `beforeAgentCallback / afterAgentCallback` - Agent lifecycle
- `beforeModelCallback / afterModelCallback` - LLM lifecycle
- `onModelErrorCallback` - Model error handling
- `beforeToolCallback / afterToolCallback` - Tool lifecycle
- `onToolErrorCallback` - Tool error handling
### Built-in Plugins
- **LoggingPlugin** - Debug logging
- **SecurityPlugin** - Policy enforcement + tool confirmation
- **PluginManager** - Plugin orchestration
## Runner
### Runner Config
```typescript
interface RunnerConfig {
appName: string;
agent: BaseAgent; // Root agent
plugins?: BasePlugin[];
artifactService?: BaseArtifactService;
sessionService: BaseSessionService; // Required
memoryService?: BaseMemoryService;
credentialService?: BaseCredentialService;
}
```
### RunConfig (per-run options)
```typescript
interface RunConfig {
speechConfig?: SpeechConfig;
responseModalities?: Modality[];
maxLlmCalls?: number; // Default 500
pauseOnToolCalls?: boolean; // Client-side tool execution
streamingMode?: StreamingMode; // NONE | SSE | BIDI
// ... audio/live configs
}
```
### Execution Pipeline
1. Load or create session
2. Create InvocationContext
3. Run pluginManager.runOnUserMessageCallback()
4. Append user message to session
5. Run agent.runAsync(invocationContext) → yields events
6. For each non-partial event: append to session
7. Run pluginManager.runOnEventCallback()
8. Run pluginManager.runAfterRunCallback()
## Model Layer
### BaseLlm (abstract)
- `generateContentAsync(llmRequest, stream?): AsyncGenerator<LlmResponse>`
- `connect(llmRequest): Promise<BaseLlmConnection>` - For live/streaming
### Implementations
- `Gemini` - Google Gemini API
- `ApigeeLlm` - Apigee-wrapped models
- `LLMRegistry` - Static registry for model lookup
## Service Adapters (all abstract base + implementations)
| Service | Implementations |
| --------------------- | ------------------------------ |
| BaseSessionService | InMemory, Database (Mikro-ORM) |
| BaseArtifactService | InMemory, File, GCS |
| BaseMemoryService | InMemory |
| BaseCredentialService | InMemory |
| BaseCodeExecutor | BuiltIn |
## Design Patterns
1. **Symbol-based type guards** - Every class uses `Symbol.for()` + `isXxx()`
2. **Abstract base classes** - Service interfaces via abstract classes
3. **Async generators** - All agent execution yields events
4. **Context objects** - Rich context passed to callbacks/tools
5. **Delta state** - Session state + event action deltas
6. **Plugin middleware** - 14 hooks at multiple execution points
7. **Tree-based hierarchy** - Parent-child agents with root traversal
8. **Branch isolation** - Parallel agents use branch paths
9. **Callback chains** - Multiple callbacks per stage with early termination
@@ -0,0 +1,587 @@
# Cross-SDK Comparison: Events, Agents, and Interface Superset
## 1. AgentEvents: Our Outline vs Michael's
Our outline and Michael's `Gemini CLI Agents.txt` are **nearly identical** in
event taxonomy. The only difference is we added a `stream_end` event type:
| # | Michael's Events | Our Outline | Delta |
| --- | ---------------------- | --------------------- | ------------------------------------------------------------------------------- |
| 1 | `initialize` | `InitializeEvent` | Same |
| 2 | `session_update` | `SessionUpdateEvent` | Same |
| 3 | `message` | `MessageEvent` | Same — streaming handled by AsyncGenerator |
| 4 | `tool_request` | `ToolRequestEvent` | Same |
| 5 | `tool_update` | `ToolUpdateEvent` | Same |
| 6 | `tool_response` | `ToolResponseEvent` | Same |
| 7 | `elicitation_request` | `ElicitationRequest` | Same |
| 8 | `elicitation_response` | `ElicitationResponse` | Same |
| 9 | `usage` | `UsageEvent` | Same |
| 10 | `error` | `ErrorEvent` | Same |
| 11 | `custom` | `CustomEvent` | Same |
| 12 | — | **StreamEnd** | **Added**: completed, failed, aborted, max_turns, max_budget, max_time, refusal |
### Minor structural differences:
| Aspect | Michael | Our Outline |
| ---------------------- | --------------------------------------------------- | ------------------------------------------------------------------------- |
| **Base type** | `AgentEventCommon` with `type: string` (fully open) | `AgentEventBase` with `type: AgentEventType` (`'known' \| (string & {})`) |
| **Agent ID** | — | `agentId` on event base (which agent emitted this event) |
| **Event map** | Generic `interface AgentEvents` + mapped type | Same — adopted Michael's pattern for declaration merging extensibility |
| **ContentPart.\_meta** | Required (`_meta: Record<string, unknown>`) | Optional (`_meta?: Record<string, unknown>`) |
| **ErrorData.status** | Google RPC codes (`'RESOURCE_EXHAUSTED' \| '...'`) | Open string (per our generic philosophy) |
| **Message.role** | `'user' \| 'agent' \| 'developer'` | Same |
| **Stream end** | Only `initialize` | `stream_end` with `reason` field + open `data` bag |
| **Handoff** | Not covered | Tool call (`transfer_to_agent`) — no dedicated event |
| **Pausing** | Implicit (elicitation/tool events) | Same — no explicit pause/resume events |
### Design decisions adopted from Michael
1. **`interface AgentEvents` + mapped type** — Michael's pattern enables
declaration merging, letting any module add new event types without modifying
the base definition. Strictly better than an explicit union type.
2. **`_meta` on ContentPart** — More extensible. We adopted it (as optional).
3. **Implicit pausing** — No separate pause/resume events. When the agent emits
an `elicitation_request` or `tool_request`, the stream naturally pauses. The
host calls `stream()` to resume.
---
## 2. Claude Agent SDK — Key Interfaces
Source: `@anthropic-ai/claude-agent-sdk`
### Agent Execution Model
```typescript
// Entry point — not an interface, a function
function query({
prompt: string | AsyncIterable<SDKUserMessage>,
options?: Options
}): Query // extends AsyncGenerator<SDKMessage, void>
```
### Message Types (Event Stream)
```typescript
type SDKMessage =
| SystemMessage // subtype: "init" | "compact_boundary"
| AssistantMessage // Claude's response with tool calls
| UserMessage // Tool results fed back
| StreamEvent // Raw API stream events (opt-in)
| ResultMessage // Final: success | error_max_turns | error_max_budget_usd | error_during_execution
| CompactBoundaryMessage; // Context compaction marker
```
### Tool Approval (HITL)
```typescript
canUseTool: async (toolName: string, input: Record<string, any>) =>
Promise<
| { behavior: 'allow'; updatedInput: Record<string, any> }
| { behavior: 'deny'; message: string }
>;
```
### Subagent Definition
```typescript
interface AgentDefinition {
description: string; // When to invoke
prompt: string; // System prompt
tools?: string[]; // Available tools (defaults to all)
model?: 'sonnet' | 'opus' | 'haiku' | 'inherit';
}
```
### Session Management
```typescript
interface Options {
continue?: boolean; // Resume most recent session
resume?: string; // Resume by session ID
forkSession?: boolean; // Branch from resume point
persistSession?: boolean; // Default: true
maxTurns?: number;
maxBudgetUsd?: number; // Spend limit
permissionMode?: 'default' | 'acceptEdits' | 'plan' | 'dontAsk' | 'bypassPermissions';
structuredOutput?: { type: "json_schema", ... };
}
```
### Result (Termination)
```typescript
interface SDKResultMessage {
type: 'result';
subtype:
| 'success'
| 'error_max_turns'
| 'error_max_budget_usd'
| 'error_during_execution'
| 'error_max_structured_output_retries';
result?: string;
total_cost_usd: number;
usage: { input_tokens: number; output_tokens: number };
num_turns: number;
session_id: string;
stop_reason: string | null; // "end_turn", "max_tokens", "refusal"
}
```
### V2 Preview (Simpler API)
```typescript
await using session = unstable_v2_createSession({ model: "..." });
await session.send("Hello!");
for await (const msg of session.stream()) { ... }
await session.send("Follow-up");
for await (const msg of session.stream()) { ... }
```
---
## 3. OpenAI Codex SDK / Responses API — Key Interfaces
### Codex SDK (TypeScript)
```typescript
// Client
const codex = new Codex({ env?, config? });
const thread = codex.startThread({ workingDirectory?, skipGitRepoCheck? });
const thread = codex.resumeThread(threadId);
// Execution
const turn = await thread.run(prompt: string | InputEntry[], options?);
const { events } = await thread.runStreamed(prompt);
// Streaming
for await (const event of events) {
switch (event.type) {
case "item.completed": // event.item
case "turn.completed": // event.usage
}
}
```
### Responses API Streaming Events (53 types)
Organized hierarchically:
**Response Lifecycle (7):**
- `response.queued`, `response.created`, `response.in_progress`
- `response.completed`, `response.incomplete`, `response.failed`
- `error`
**Content Streaming (8):**
- `response.output_item.added`, `response.output_item.done`
- `response.content_part.added`, `response.content_part.done`
- `response.output_text.delta`, `response.output_text.done`
- `response.refusal.delta`, `response.refusal.done`
**Reasoning (6):**
- `response.reasoning_text.delta`, `response.reasoning_text.done`
- `response.reasoning_summary_part.added`,
`response.reasoning_summary_part.done`
- `response.reasoning_summary_text.delta`,
`response.reasoning_summary_text.done`
**Function Calls (2):**
- `response.function_call_arguments.delta`,
`response.function_call_arguments.done`
**MCP (8):**
- `response.mcp_call_arguments.delta`, `response.mcp_call_arguments.done`
- `response.mcp_call.in_progress`, `response.mcp_call.completed`,
`response.mcp_call.failed`
- `response.mcp_list_tools.in_progress`, `response.mcp_list_tools.completed`,
`response.mcp_list_tools.failed`
**Built-in Tools (15):**
- File search: `in_progress`, `searching`, `completed`
- Web search: `in_progress`, `searching`, `completed`
- Code interpreter: `in_progress`, `interpreting`, `code.delta`, `code.done`,
`completed`
- Image gen: `in_progress`, `generating`, `partial_image`, `completed`
**Audio (4):**
- `response.audio.delta`, `response.audio.done`
- `response.audio.transcript.delta`, `response.audio.transcript.done`
**Annotations (1):**
- `response.output_text.annotation.added`
### OpenAI Agents SDK (higher-level)
```python
# Python-first, but patterns apply
class RunItemStreamEvent:
name: Literal[
"message_output_created",
"handoff_requested",
"handoff_occurred",
"tool_called",
"tool_output",
"tool_search_called",
"tool_search_output_created",
"reasoning_item_created",
"mcp_approval_requested",
"mcp_approval_response",
"mcp_list_tools",
]
class AgentUpdatedStreamEvent:
# Fires when current agent changes (handoff)
new_agent: Agent
```
---
## 4. Superset Analysis — What Changes Our Interfaces?
### Concepts Present in ALL Systems
| Concept | gemini-cli | ADK-TS | Claude SDK | Codex/OpenAI | Our Interfaces |
| --------------------- | ---------- | ------ | ------------- | -------------- | ----------------------- |
| Text streaming | ✅ | ✅ | ✅ | ✅ | ✅ MessageEvent |
| Tool request/response | ✅ | ✅ | ✅ | ✅ | ✅ ToolRequest/Response |
| Thinking/reasoning | ✅ | ✅ | ✅ (thinking) | ✅ (reasoning) | ✅ ContentPart.thought |
| Error events | ✅ | ✅ | ✅ | ✅ | ✅ ErrorEvent |
| Token usage | ✅ | ✅ | ✅ | ✅ | ✅ UsageEvent |
| Tool progress | ✅ | ✅ | — | ✅ | ✅ ToolUpdateEvent |
| Session resume | ✅ | ✅ | ✅ | ✅ | ✅ sessionRef |
| Subagents | ✅ | ✅ | ✅ | — | ✅ threadId |
| Abort/cancel | ✅ | ✅ | ✅ | ✅ | ✅ abort() |
| Metadata escape hatch | — | ✅ | — | — | ✅ \_meta |
### NEW Concepts From Claude/Codex That We Should Incorporate
#### 4.1 Structured Stream End Reasons (HIGH PRIORITY)
**What:** Claude SDK has typed termination:
`success | error_max_turns | error_max_budget_usd | error_during_execution`.
OpenAI has `completed | incomplete | failed`.
**Why it matters:** We need a `stream_end` event that captures why the stream
ended — the one signal not covered by other event types.
**Final design — `stream_end` with `reason` + open `data` bag:**
```typescript
type StreamEndReason =
| 'completed'
| 'failed'
| 'aborted'
| 'max_turns'
| 'max_budget'
| 'max_time'
| 'refusal'
| (string & {});
interface StreamEnd {
reason: StreamEndReason;
data?: Record<string, unknown>; // { result?, cost?, usage?, numTurns?, error?, ... }
}
```
**Design rationale:**
- Start is covered by `initialize`. Pausing is implicit (elicitation/tool
request events). Handoff is a tool call (`transfer_to_agent`).
- End-of-stream details go in `data` as an open bag, not fixed fields.
#### 4.2 Budget Constraints (MEDIUM PRIORITY)
**What:** Claude SDK has `maxBudgetUsd`. Neither gemini-cli nor ADK has this
today.
**Why it matters:** Cost control is critical for production deployments.
**Proposed change to AgentConstraints:**
```typescript
interface AgentConstraints {
maxTurns?: number;
maxTimeMinutes?: number;
maxLlmCalls?: number;
maxBudgetUsd?: number; // NEW: from Claude SDK
}
```
#### 4.3 Session Forking (MEDIUM PRIORITY)
**What:** Claude SDK supports `forkSession: boolean` — branch from a resume
point to explore alternatives.
**Why it matters:** Enables "what if" exploration without destroying history.
Useful for plan mode.
**Proposed change to ExecutionRequest:**
```typescript
interface ExecutionRequest {
// ... existing fields ...
sessionRef?: string | SessionSnapshot;
forkSession?: boolean; // NEW: branch from sessionRef instead of continuing
}
```
#### 4.4 Permission Modes on Execution (MEDIUM PRIORITY)
**What:** Claude has 5 permission modes:
`default | acceptEdits | plan | dontAsk | bypassPermissions`. gemini-cli has 4
approval modes: `default | autoEdit | yolo | plan`.
**Why it matters:** Both systems have this concept. It should be in
ExecutionOptions, not hard-coded.
**Proposed change to ExecutionOptions:**
```typescript
interface ExecutionOptions {
// ... existing fields ...
permissionMode?: string; // Open string. Conventions: 'default' | 'auto_edit' | 'autonomous' | 'plan' | string
}
```
#### 4.5 Agent Handoff (MEDIUM PRIORITY)
**What:** OpenAI Agents SDK has explicit `handoff_requested` /
`handoff_occurred` events plus `AgentUpdatedStreamEvent`. ADK has
`transfer_to_agent` tool + `eventActions.transferToAgent`. Claude SDK has
subagent invocation via Agent tool.
**Why it matters:** When agent A delegates to agent B, the host/UI needs to
know.
**Design decision: Handoff is a tool call, not a separate event type.**
The agent calls `transfer_to_agent` as a tool (ToolRequest event). The host
intercepts this tool call (since host controls tool execution), looks up the
target agent, creates a new executor via the factory, and mediates the handoff.
The originating agent's stream ends with `stream_end` reason `'completed'`.
```typescript
// 1. Agent emits tool request:
{ type: 'tool_request', name: 'transfer_to_agent', args: { target: 'coder', reason: '...' } }
// 2. Host mediates handoff, originating agent completes:
{ type: 'stream_end', reason: 'completed', agentId: 'planner', data: { handoffTarget: 'coder' } }
```
This avoids duplicating routing logic between stream_end events and tool calls.
Matches ADK's `transfer_to_agent` tool pattern.
#### 4.6 Refusal as Distinct Signal (LOW PRIORITY)
**What:** OpenAI has explicit `response.refusal.delta/done` events. Claude has
`stop_reason: "refusal"`.
**Why it matters:** Model refusals are operationally important (safety, policy).
**Proposed:** No new event type. Handle via `MessageEvent` with a `refusal`
content part type, or via `ErrorEvent` with specific error code. ContentPart can
be extended:
```typescript
| { type: 'refusal'; text: string }
```
#### 4.7 Content Annotations (LOW PRIORITY)
**What:** OpenAI has `response.output_text.annotation.added` for citations, file
paths.
**Why it matters:** Citations and source attribution are increasingly important.
**Proposed:** Michael's `reference` ContentPart already covers this. No change
needed — `reference` with `uri` and `text` handles citations.
#### 4.8 Context Compaction Events (LOW PRIORITY)
**What:** Claude SDK has `CompactBoundaryMessage` marking when context was
compressed.
**Why it matters:** For long sessions, knowing when context was compressed helps
with debugging and UI.
**Proposed:** `CustomEvent` with `kind: 'compact_boundary'`. No new event type
needed.
#### 4.9 Structured Output Schema (ALREADY COVERED)
**What:** Both Claude (`structuredOutput`) and OpenAI support JSON Schema output
constraints.
**Status:** Already covered by `AgentDescriptor.outputSchema: JsonSchema`. No
change needed.
### Concepts We DON'T Need to Adopt
| Concept | Why Skip |
| ------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| OpenAI's 53 granular streaming events | Too coupled to Responses API internals. Our `ToolUpdateEvent` + `MessageEvent` via AsyncGenerator abstracts over this. |
| OpenAI's per-tool-type events (file_search, web_search, code_interpreter) | Tool-specific progress belongs in `ToolUpdateEvent.data`, not in the event taxonomy. |
| Audio/Image streaming events | Handle via `ToolUpdateEvent` with media ContentParts. When needed, add as ContentPart types, not event types. |
| Claude's raw `StreamEvent` wrapper | Implementation detail of the Claude API client. Our adapters consume these internally. |
| MCP-specific events (mcp_call, mcp_list_tools) | MCP tools are just tools. Use generic `ToolRequestEvent/ToolResponseEvent`. MCP approval is an `ElicitationRequest`. |
---
## 5. Updated Event Type Comparison (Full Superset)
| # | Event Type | Michael | Our Outline | Claude SDK | OpenAI | Verdict |
| --- | -------------------- | ------- | ----------- | ----------------------------- | --------------------------------- | ---------------------------------------------- |
| 1 | Initialize | ✅ | ✅ | SystemMessage(init) | — | **Keep** |
| 2 | Session Update | ✅ | ✅ | — | — | **Keep** |
| 3 | Message | ✅ | ✅ | AssistantMessage | output_text.delta/done | **Keep** |
| 4 | Tool Request | ✅ | ✅ | AssistantMessage.tool_use | function_call_arguments | **Keep** |
| 5 | Tool Update | ✅ | ✅ | — | per-tool progress events | **Keep** |
| 6 | Tool Response | ✅ | ✅ | UserMessage | — | **Keep** |
| 7 | Elicitation Request | ✅ | ✅ | canUseTool callback | mcp_approval_requested | **Keep** |
| 8 | Elicitation Response | ✅ | ✅ | canUseTool return | mcp_approval_response | **Keep** |
| 9 | Usage | ✅ | ✅ | ResultMessage.usage | response.completed | **Keep** |
| 10 | Error | ✅ | ✅ | ResultMessage(error\_\*) | response.failed | **Keep** |
| 11 | Custom | ✅ | ✅ | — | — | **Keep** |
| 12 | StreamEnd | — | ✅ | ResultMessage + SystemMessage | response.created/completed/failed | **Keep — `stream_end` with `reason` + `data`** |
**Result: Our 12 event types are the right abstraction level.** Claude and
OpenAI validate every category. The granularity differences (OpenAI's 53 vs
our 12) are implementation details that adapters handle internally. `stream_end`
uses a single `reason` field with an open `data` bag. Handoff is a tool call.
Pausing is implicit.
---
## 6. Updated ContentPart Types (Superset)
```typescript
type ContentPart = (
| { type: 'text'; text: string }
| { type: 'thought'; thought: string; thoughtSignature?: string }
| { type: 'media'; data?: string; uri?: string; mimeType?: string }
| {
type: 'reference';
text: string;
data?: string;
uri?: string;
mimeType?: string;
}
| { type: 'refusal'; text: string } // NEW: from OpenAI
) &
// Future: type: string for unknown types from new SDKs
{ _meta?: Record<string, unknown> };
```
Adding `refusal` as a ContentPart type (rather than a new event) keeps the event
taxonomy stable while supporting model refusals from both Claude and OpenAI.
---
## 7. Key Architectural Patterns Across SDKs
### Pattern: Execution Entry Points
| SDK | Entry Point | Multi-turn Pattern |
| ----------- | ------------------------------------------------------------------------- | ----------------------------------------------- |
| Michael | `agent.send(trajectory, data)` / `session.send()` + `session.update()` | Same method / three-method session |
| Our Outline | `session.stream(data)` + `session.update(config)` + `session.steer(data)` | Four-method session (stream/update/steer/abort) |
| Claude SDK | `query({ prompt, options })` | New `query()` call with `resume: sessionId` |
| Claude V2 | `session.send()` + `session.stream()` | Separate send/stream |
| Codex SDK | `thread.run(prompt)` / `thread.runStreamed(prompt)` | Same thread object |
**Observation:** Claude V2 and Codex both use a stateful session/thread object
with send+stream. Michael uses a single `send()` method. Our `stream()` method
is the unified version — the first call starts, subsequent calls continue (like
ADK's `runAsync()`).
### Pattern: Tool Approval
| SDK | Pattern | Sync/Async |
| ----------- | -------------------------------------------------------- | ------------------------ |
| gemini-cli | PolicyEngine + ConfirmationBus | Async (message bus) |
| ADK-TS | SecurityPlugin.policyCheck() | Async (plugin callback) |
| Claude SDK | `canUseTool()` callback | Async (callback) |
| OpenAI | `mcp_approval_requested` event | Event-based |
| Our Outline | `ElicitationRequest` event + `PolicyEvaluator` interface | Both (event + interface) |
**Observation:** Our approach covers both patterns — the `ElicitationRequest`
event for event-based approval (like OpenAI), and the `PolicyEvaluator`
interface for synchronous policy checks (like gemini-cli/ADK/Claude). This is
the right superset.
### Pattern: Subagent Definition
| SDK | Pattern | Key Fields |
| ------------- | -------------------------------- | ------------------------------------------------------------------------------------------ |
| gemini-cli | `AgentDefinition` (local/remote) | name, description, kind, tools, model |
| ADK-TS | `BaseAgentConfig` | name, description, subAgents, tools |
| Claude SDK | `AgentDefinition` | description, prompt, tools, model |
| OpenAI Agents | `Agent` class | name, instructions, tools, handoffs, model |
| Our Outline | `AgentDescriptor` | name, description, executor, inputSchema, capabilities, ownTools, requiredTools, subAgents |
**Observation:** Our `AgentDescriptor` is the most complete. Claude's `prompt`
field and OpenAI's `instructions` are executor-level concerns (system prompt),
not descriptor-level. The descriptor declares identity; the executor uses the
prompt. This separation is correct.
One gap: **handoffs**. OpenAI Agents has an explicit `handoffs` field listing
which agents can be delegated to. Our `subAgents` field serves the same purpose
but the naming implies hierarchy rather than peer delegation. Consider whether
`subAgents` should be renamed to `delegateAgents` or kept as-is with
documentation clarifying it covers both hierarchical and peer delegation.
---
## 8. Concrete Changes to outline.md
Based on this analysis, the following changes should be made:
### Applied (validated by multiple SDKs):
1. ✅ **`type: AgentEventType`** with known values + `(string & {})`
(autocomplete + extensibility)
2. ✅ **`interface AgentEvents` + mapped type** (adopted from Michael for
declaration merging)
3. ✅ **`agentId` on event base** (which agent emitted this event)
4. ✅ **`_meta` on ContentPart** (aligned with Michael)
5. ✅ **`stream_end` event** — signals why the stream ended, with `reason`
field + open `data` bag
6. ✅ **Handoff as tool call**`transfer_to_agent` tool, not a separate event
7. ✅ **`maxBudgetUsd` in AgentConstraints** (Claude SDK, increasingly standard)
8. ✅ **`refusal` ContentPart type** (both Claude and OpenAI surface refusals)
9. ✅ **`forkSession` in ExecutionRequest** (Claude SDK, valuable for
exploration)
10. ✅ **`permissionMode` in ExecutionOptions** (both gemini-cli and Claude SDK)
11. ✅ **`cost` field on Usage** (Claude SDK tracks total_cost_usd)
### Correctly abstracted (no change needed):
- Event taxonomy (12 types) — validated as right abstraction level
- `AgentDescriptor` shape — most complete across all SDKs
- `AgentSession.stream/update/steer/abort` — covers all SDK patterns
- ToolUpdate — correctly abstracts over OpenAI's 15+ tool-specific progress
events
- `ElicitationRequest/Response` — covers both callback and event patterns
- `ContentPart` types — text/thought/media/reference/refusal
---
## Sources
- [Claude Agent SDK TypeScript Reference](https://platform.claude.com/docs/en/agent-sdk/typescript)
- [Claude Agent SDK Streaming](https://platform.claude.com/docs/en/agent-sdk/streaming-output)
- [Claude Agent SDK Sessions](https://platform.claude.com/docs/en/agent-sdk/sessions)
- [Claude Agent SDK Subagents](https://platform.claude.com/docs/en/agent-sdk/subagents)
- [OpenAI Codex SDK TypeScript](https://github.com/openai/codex/tree/main/sdk/typescript)
- [OpenAI Codex SDK Docs](https://developers.openai.com/codex/sdk/)
- [OpenAI Responses API Streaming Events](https://developers.openai.com/api/reference/resources/responses/streaming-events/)
- [OpenAI Agents SDK Streaming](https://openai.github.io/openai-agents-python/streaming/)
- [Responses API Streaming Guide (Community)](https://community.openai.com/t/responses-api-streaming-the-simple-guide-to-events/1363122)
@@ -0,0 +1,259 @@
# Gemini CLI Architecture Notes
## Project Structure
**Monorepo packages:**
- `packages/core/` - Main execution engine (the big one)
- `packages/cli/` - CLI frontend
- `packages/sdk/` - SDK for extensions
- `packages/a2a-server/` - Agent-to-agent server
- `packages/devtools/` - Dev utilities
- `packages/vscode-ide-companion/` - VS Code extension
## Core Execution Loop
### GeminiClient (`core/src/core/client.ts` ~38KB)
- **Primary orchestrator** for user interactions
- Manages session lifecycle, message routing, model selection
- Coordinates hooks, context management, error recovery
- Enforces `MAX_TURNS = 100` per session
- Tracks `currentSequenceModel` for multi-turn stickiness
- Handles history compression when context grows
### GeminiChat (`core/src/core/geminiChat.ts` ~34KB)
- Bidirectional LLM communication
- Maintains `history[]` alternating user/model turns
- Retry logic: max 2 attempts, 500ms delay for invalid responses
- Fires `BeforeModel` and `AfterModel` hooks
- Integrates ChatRecordingService for persistence
### Scheduler (`core/src/scheduler/scheduler.ts` ~23KB)
- **Three-phase event-driven**: Ingestion → Processing → Completion
- Tool call state machine:
`Validating → AwaitingApproval → Scheduled → Executing → Terminal`
- Terminal states: `Success`, `Error`, `Cancelled`
- Parallel execution for read-only and agent-type tools
- Yields to event loop for user approval
- Publishes state changes via MessageBus
### CoreToolScheduler (`core/src/core/coreToolScheduler.ts` ~38KB)
- Sequential, queue-based tool processing
- Validates policy via PolicyEngine
- Confirmation handling via ToolModificationHandler (editor integration)
- Uses MessageBus for async confirmation responses
## Tool System
### DeclarativeTool Pattern
- **Separation of concerns**: build() → validate → createInvocation() →
execute()
- `ToolBuilder` defines metadata (name, displayName, description, kind) + schema
via `getSchema()`
- `ToolInvocation` has: `getDescription()`, `toolLocations()`,
`shouldConfirmExecute()`, `execute()`
- `ToolResult` contains: `llmContent` (for LLM), `returnDisplay` (for UI), error
details, tail calls
### BaseToolInvocation
- Abstract base with MessageBus integration for policy/confirmation
- Three decision paths: ALLOW, DENY, ASK_USER via `getMessageBusDecision()`
### ToolRegistry (`core/src/tools/tool-registry.ts`)
- Registers tools via `registerTool()`
- MCP tools with fully qualified names: `mcp_serverName_toolName`
- Priority sorting: built-in → discovered → MCP (by server name)
- Filters by active status based on configuration
### Confirmation System
- `ToolCallConfirmationDetails` union: edit, execute, MCP, info, ask_user,
exit_plan_mode
- `ToolConfirmationOutcome` enum: ProceedOnce, ProceedAlways, etc.
- Async confirmation via MessageBus pub/sub
## Hooks System
### Hook Types (11 hook points)
| Hook | Trigger | Key Capability |
| --------------------- | ----------------------- | --------------------------------- |
| `BeforeTool` | Before tool execution | Modify tool_input |
| `AfterTool` | After tool completion | Context injection, tail calls |
| `BeforeAgent` | Before agent prompt | Additional context |
| `AfterAgent` | After agent response | Clear context flag |
| `BeforeModel` | Before LLM request | Modify request or inject response |
| `AfterModel` | After LLM response | Modify response |
| `BeforeToolSelection` | Before tool selection | Modify toolConfig |
| `Notification` | When notifications fire | Suppress/modify message |
| `SessionStart` | Session begins | Additional context |
| `SessionEnd` | Session terminates | Cleanup |
| `PreCompress` | Before compression | Suppress/modify |
### Hook Output Fields (common to all hooks)
- `continue` - Whether execution proceeds
- `stopReason` - Reason to halt
- `suppressOutput` - Hide from user
- `systemMessage` - Add to system context
- `decision` - ask/block/deny/approve/allow
### Hook System Components
- `HookSystem` - Main coordinator
- `HookRegistry` - Stores/manages configurations
- `HookRunner` - Executes registered hooks
- `HookAggregator` - Combines multiple hook results
- `HookPlanner` - Determines execution order
- `HookEventHandler` - Orchestrates event firing
- `HookTranslator` - Converts between formats
## Policy Engine
### Rule Structure
```
PolicyRule {
toolName: string; // wildcards supported
decision: PolicyDecision; // ALLOW | DENY | ASK_USER
priority: number;
argsPattern?: RegExp; // conditional on args
mcpName?: string;
source: string;
}
```
### Tier Hierarchy (lowest → highest priority)
1. Default (1) - Core built-in policies
2. Extension (2) - Extension contributions
3. Workspace (3) - Project-scoped (.gemini/)
4. User (4) - User-provided (~/.gemini/)
5. Admin (5) - System-level policies
### Dynamic Rule Priorities (within User Tier)
- 4.9 - MCP_EXCLUDED (persistent server blocks)
- 4.4 - EXCLUDE_TOOLS_FLAG (CLI exclusions)
- 4.3 - ALLOWED_TOOLS_FLAG (CLI allows)
- 4.2 - TRUSTED_MCP_SERVER
- 4.1 - ALLOWED_MCP_SERVER
- 3.95 - ALWAYS_ALLOW (interactive selections)
### Security Constraint
- Extensions CANNOT contribute ALLOW rules or YOLO mode
## Agent System
### Agent Registry (`core/src/agents/registry.ts`)
Discovery sources:
1. Built-in: CodebaseInvestigator, CliHelp, Generalist, Browser
2. User-level: `~/.gemini/agents/`
3. Project-level: `.gemini/agents/` (requires folder trust)
4. Extension-based: From active extensions
### LocalAgentExecutor (`core/src/agents/local-executor.ts`)
- Prompt processing: input augmentation → template expansion → system prompt
construction
- Uses GeminiChat for accumulating conversation
- ChatCompressionService for history management
- Turn loop: invoke model → extract function calls → check auth → append results
- Termination: complete_task tool, max turns, timeout
### SubagentTool (`core/src/agents/subagent-tool.ts`)
- Extends BaseDeclarativeTool - agents invoked like standard tools
- Read-only status checking, user hint propagation
- Execution: validate → optional confirmation → parameter enrichment →
SubagentToolWrapper
### Remote Agents
- A2A client manager for agent-to-agent protocol
- Remote invocation for external agents
- Agent acknowledgement system (security for project agents)
## Model System
### ModelConfigService
- **Hierarchical alias system**: children override parents
- Resolution: alias chain → level assignment → apply overrides
- Deep merging with array override capability
- Fallback to `chat-base` alias for unknown models
### ModelRouterService
Sequential strategy pattern:
1. Fallback & Override
2. Approval Mode Strategy
3. Gemma Classifier (if enabled)
4. Generic Classifier
5. Numerical Classifier
6. Default Strategy
### ModelAvailabilityService
Health states:
- **Terminal** - permanently unavailable
- **Sticky Retry** - failed once, can retry once per turn
- **Healthy** - no issues
## Services
| Service | Purpose |
| --------------------------- | --------------------------------------- |
| ChatRecordingService | Session persistence (JSON files) |
| ChatCompressionService | History summarization for token budgets |
| ModelConfigService | Hierarchical model config with aliases |
| ModelAvailabilityService | Model health tracking |
| ModelRouterService | Model selection via strategies |
| FolderTrustDiscoveryService | Workspace security scanning |
| KeychainService | Credential storage |
| LoopDetectionService | Detect repetitive agent loops |
## UI + Core Separation
### IDE Client (`core/src/ide/ide-client.ts`)
- Singleton managing CLI ↔ IDE communication via MCP
- **Outbound** (CLI → IDE): `openDiff`, `closeDiff`
- **Inbound** (IDE → CLI): `ide/contextUpdate`, `ide/diffAccepted`,
`ide/diffRejected`
### Event Contract
```typescript
interface IdeContextNotification {
method: 'ide/contextUpdate';
params: { workspaceState: { openFiles: string[]; isTrusted: boolean } };
}
```
### Confirmation Bus
- `TOOL_CONFIRMATION_REQUEST` / `TOOL_CONFIRMATION_RESPONSE`
- Detail types: edit, execute, MCP, info, ask_user, exit_plan_mode
- Async pub/sub via MessageBus
## Configuration (`core/src/config/config.ts` ~95KB!)
- Tool config: core tools, allowed/excluded, MCP servers
- File filtering: git ignore, fuzzy search, max counts, timeouts
- Approval modes: policy engine config
- Experiments: feature flags (GEMINI_3_1_PRO_LAUNCHED, ENABLE_ADMIN_CONTROLS,
etc.)
- FolderTrust: discovery scans for commands, skills, settings, MCP, hooks
@@ -0,0 +1,296 @@
# Deep Dive: Key Gemini-CLI Systems
## Hooks System (Complete)
### 11 Hook Points
| Hook | Input | Key Output Capabilities |
| ------------------- | -------------------------------------- | --------------------------------------------- |
| BeforeTool | toolName, toolInput, mcpContext | Modify tool_input, block/allow, systemMessage |
| AfterTool | toolName, toolInput, toolResponse | additionalContext, tailToolCallRequest |
| BeforeAgent | prompt | Additional context |
| AfterAgent | prompt, response, stopHookActive | Clear context |
| BeforeModel | llmRequest (GenerateContentParameters) | Modify llm_request OR inject llm_response |
| AfterModel | llmRequest, llmResponse | Modify llm_response |
| BeforeToolSelection | llmRequest | Modify toolConfig (function list, mode) |
| Notification | type, message, details | Suppress/modify |
| SessionStart | source (Startup/Resume/Clear) | Additional context |
| SessionEnd | reason (Exit/Clear/Logout/etc) | Cleanup |
| PreCompress | trigger (Manual/Auto) | Suppress/modify |
### Hook Configuration Types
- **Runtime hooks** (HookType.Runtime): JS/TS functions, registered
programmatically
- **Command hooks** (HookType.Command): External shell commands with JSON I/O
### Exit Code Semantics (Command Hooks)
- 0 = Success (allowed with system message)
- 1 = Non-blocking error (warning, continues)
- 2+ = Blocking failure (denied, stderr as reason)
### Hook Decision Values
`'ask' | 'block' | 'deny' | 'approve' | 'allow' | undefined`
### Execution Strategies
- **Parallel** (default): Promise.all(), independent
- **Sequential** (opt-in per hook): Chained, output→input cascading
### Aggregation
- Blocking decisions: OR logic (any block → all block)
- Field replacement: later overrides earlier
- Tool selection: union of allowed functions, mode precedence NONE > ANY > AUTO
### Trust Model
- Project hooks require folder trust verification
- TrustedHooksManager at `~/.gemini/trusted-hooks.json`
- Environment sanitized for command hooks (sensitive vars removed)
- `GEMINI_PROJECT_DIR` injected
### Key Insight for Abstraction
Hooks fire inside gemini-cli's execution loop. When ADK controls the model:
- BeforeModel/AfterModel still fire because AdkGeminiModel wraps GeminiChat
- BeforeTool/AfterTool still fire because AdkToolAdapter wraps DeclarativeTool
- This is dewitt's solution: adapters preserve hook injection points
**For OpenRouter or opaque agents, hooks CANNOT fire unless the agent delegates
model/tool calls back to gemini-cli.**
---
## Policy Engine (Complete)
### TOML Rule Format
```toml
[[rules]]
decision = "allow" | "deny" | "ask_user"
priority = 0-999
toolName = "tool_name" # wildcards: *, mcp_*, mcp_server_*
mcpName = "server_name" # MCP server filter
argsPattern = "regex" # matches JSON-stringified args
commandPrefix = "cmd" # shell command prefix match
commandRegex = "regex" # shell command regex (mutually exclusive with prefix)
modes = ["default", "autoEdit", "yolo", "plan"]
annotations = ["read-only", "experimental"]
allowRedirection = true # for shell commands
allowMessage = "..." # user-facing message on allow
denyMessage = "..." # user-facing message on deny
```
### 5-Tier Priority System
- Tier 5 (Admin): 5.000-5.999
- Tier 4 (User): 4.000-4.999
- Tier 3 (Workspace): 3.000-3.999
- Tier 2 (Extension): 2.000-2.999
- Tier 1 (Default): 1.000-1.999
Formula: `tier + (priority / 1000)`
### 4 Approval Modes
1. **default** — ASK_USER decisions prompt user
2. **autoEdit** — File writes auto-approved with safety checking (conseca)
3. **yolo** — All auto-approved except explicit ask_user rules
4. **plan** — Read-only, blocks modifications, allows planning docs
### Shell Command Safety
- Parses multi-command sequences (&&, ;, ||)
- Detects injection: $(...), `...`, <(...), >(...), --flag=$(...)
- Each subcommand evaluated independently
- DENY overrides everything; ASK_USER escalates; ALLOW only if all pass
- Redirections (>) downgrade ALLOW → ASK_USER unless allowRedirection=true
### Security Constraints
- Extensions cannot contribute ALLOW rules or YOLO mode
- Regex patterns validated for ReDoS
- Tool name typos detected via Levenshtein distance ≤3
- Policy file integrity: SHA-256 hash checking
### Key Insight for Abstraction
Policy is evaluated at the tool execution boundary. For the interface layer:
- If CLI controls tool execution → policy naturally applies
- If agent controls tool execution internally → policy bypassed (danger!)
- This reinforces the `pauseOnToolCalls: true` approach for ADK
- Need a `PolicyEvaluator` interface that any executor can call
---
## Tool System (Complete)
### Core Abstraction Chain
```
ToolBuilder (metadata + schema)
→ build(params) validates → ToolInvocation (ready to execute)
→ shouldConfirmExecute() → execute(signal) → ToolResult
```
### DeclarativeTool Pattern
- `build(params)` — Validate and create invocation
- `buildAndExecute(params)` — One-step convenience
- `validateBuildAndExecute(params)` — Non-throwing variant
### BaseToolInvocation
- Message bus integration for policy decisions
- Three decision paths: ALLOW → execute, DENY → reject, ASK_USER → confirm
### ToolResult Structure
- `llmContent` — For LLM conversation history
- `returnDisplay` — For UI presentation
- `displayContent` — Additional display formatting
- `errorDetails` — Optional error info
- `result` — Structured data payload
- `tailCall` — Optional chaining requests
### Confirmation System (6 types)
1. **edit** — File modification with diff
2. **execute** — Command execution
3. **mcp** — MCP tool with allowlist mgmt
4. **info** — Information-only
5. **ask_user** — General user approval
6. **exit_plan_mode** — Plan exit notification
### Confirmation Outcomes (7 values)
ProceedOnce, ProceedAlways, ProceedAlwaysAndSave, ProceedAlwaysServer,
ProceedAlwaysTool, ModifyWithEditor, Cancel
### Tool Kinds
- **Mutator**: Edit, Delete, Move, Execute
- **Read-Only**: Read, Search, Fetch
- **Other**: Think, Agent, Communicate, Plan, SwitchMode, Other
### MCP Tools
- Naming: `mcp_<server>_<toolname>` (64-char limit)
- Schema validation via LenientJsonSchemaValidator
- Response types: McpTextBlock, McpMediaBlock, McpResourceBlock,
McpResourceLinkBlock
- Transform to GenAI Parts format
### Error Types (20+)
- **Recoverable**: INVALID_TOOL_PARAMS, FILE_NOT_FOUND,
EDIT_NO_OCCURRENCE_FOUND, SHELL_TIMEOUT, MCP_TOOL_ERROR...
- **Fatal**: NO_SPACE_LEFT (only one!)
### ModifiableTool
- Extends DeclarativeTool with external editor support
- `getModifyContext()` → temp files → editor opens → `getUpdatedParams()` → diff
---
## Execution Loop (Complete)
### LocalAgentExecutor Flow
1. Collect user hints, setup deadline timer
2. **Turn loop**: executeTurn() repeatedly until completion
3. Per-turn: compress chat → callModel() → processFunctionCalls()
4. On limit hit: executeFinalWarningTurn() with 60s grace period
5. Return OutputObject { result, terminate_reason }
### AgentTerminateMode
GOAL | TIMEOUT | MAX_TURNS | ABORTED | ERROR | ERROR_NO_COMPLETE_TASK_CALL
### SubagentTool Architecture
```
Parent Agent
└─ SubagentTool (wraps AgentDefinition as DeclarativeTool)
└─ SubagentToolWrapper (routes by agent kind)
├─ LocalSubagentInvocation → LocalAgentExecutor
├─ RemoteAgentInvocation → A2AClientManager
└─ BrowserAgentInvocation
```
### Agent Types
- `LocalAgentDefinition` — kind: 'local', has promptConfig, modelConfig,
runConfig, toolConfig
- `RemoteAgentDefinition` — kind: 'remote', has agentCardUrl, auth config
### Key Defaults
- DEFAULT_MAX_TURNS = 15
- DEFAULT_MAX_TIME_MINUTES = 5
- A2A_TIMEOUT = 1800000 (30 min for remote agents)
---
## Services/Config (Complete)
### ModelConfigService
- **Alias chains**: Inheritance with `extends`, merged root-to-leaf
- **Overrides**: Contextual (model, scope, retry, isChatModel), sorted by
specificity
- **Runtime registration**: Dynamic aliases and overrides
- **Deep merge**: Objects merged, arrays replaced entirely
### ModelRouterService (Strategy Chain)
1. Fallback & Override → 2. Approval Mode → 3. Gemma Classifier → 4. Generic
Classifier → 5. Numerical Classifier → 6. Default
### ModelAvailabilityService
- Terminal (permanent), Sticky_retry (one retry per turn), Healthy
- `selectFirstAvailable()` iterates fallback chain
- `resetTurn()` at turn boundaries enables fresh retries
### Config (~95KB!)
Central dependency injection. Initializes: ModelAvailabilityService →
ModelConfigService → FolderTrustDiscoveryService → PolicyEngine →
FileDiscoveryService → GitService → ToolRegistry → MCP → GeminiClient →
HookSystem
### CoreEventEmitter (UI Events)
Event types: UserFeedback, ModelChanged, ConsoleLog, Output, RetryAttempt,
ConsentRequest, McpProgress, Hook, QuotaChanged
Backlog buffering (max 10,000) with head-pointer eviction and auto-compaction.
### Scheduler Types
```typescript
ToolCallRequestInfo {
callId, name, args, originalRequestName,
isClientInitiated, prompt_id, checkpoint, traceId,
parentCallId, schedulerId
}
ToolCallResponseInfo {
callId, responseParts, resultDisplay, error, errorType,
outputFile, contentLength, data
}
CoreToolCallStatus: Validating → AwaitingApproval → Scheduled → Executing → Success|Error|Cancelled
```
### FolderTrust
Scans: commands (.toml), skills (SKILL.md), settings.json, MCP servers, hooks
Security warnings: auto-approved tools, autonomous agents, disabled trust,
disabled sandbox Pattern: discovery → review → execution (no code runs during
scan)
+349
View File
@@ -0,0 +1,349 @@
# Interface Priority Analysis & Open Questions
## The Big Picture
We're defining **framework-agnostic interfaces** that allow gemini-cli to:
1. Keep its existing execution loop working unchanged (Legacy path)
2. Swap in ADK as an alternative runtime via config flag
3. Eventually support OpenRouter or other agent backends
4. Maintain all existing CLI behavior: hooks, policies, confirmations, UI events
## Proposed Interface Layers (Priority Order)
---
### P0 (Critical Path - Must Define First)
#### 1. AgentEvent / Event Stream Contract
**Why first:** Everything else consumes or produces these events. The UI renders
them. The hooks intercept them. The adapters translate to/from them.
**Key decision:** Merge Dewitt's simpler model with Coworker's richer model?
**Recommendation:** Coworker's approach is more complete. Key additions:
- `threadId` for sub-agent tracking (AG-UI has `parentRunId`)
- `tool_update` for progress on long-running tools
- `elicitation_request/response` as first-class (not just tool_confirmation)
- `usage` event for token tracking
- `_meta` escape hatch (matches AG-UI's extensibility philosophy)
- `initialize` event (matches AG-UI's RunStarted)
**Open questions:**
- Do we need AG-UI's start/content/end triple pattern for streaming? Or is
yielding partial events sufficient?
- How do ContentPart types map to existing gemini-cli Part types?
- Should events carry a `source` field? (useful for hook attribution)
#### 2. Agent Interface
**Why second:** This is the primary abstraction that LocalAgentExecutor, ADK
adapters, and future OpenRouter adapters all implement.
**Key decision:** Dewitt's `runAsync/runEphemeral` vs Coworker's
`send(Trajectory|string)`
**Recommendation:** Hybrid approach:
- Dewitt's `runAsync/runEphemeral` split is ADK-aligned and cleaner for the
factory pattern
- BUT add Coworker's elicitation support via AgentSend union type
- The Trajectory concept is powerful but may be too opinionated for Phase 2
```
Agent<TInput, TOutput>
name: string
description: string
runAsync(input, options) → AsyncGenerator<AgentEvent, TOutput>
runEphemeral(input, options) → AsyncGenerator<AgentEvent, TOutput>
```
**Open questions:**
- Should Agent also support `send()` for mid-stream interactions (elicitations)?
- How does AbortSignal propagate through the adapter boundary?
- Do we need a `capabilities` field (supports elicitation? supports HITL? etc.)?
#### 3. Tool Execution Contract
**Why third:** Tools are the primary action mechanism. Both the policy engine
and hooks system wrap tool execution.
**What needs abstracting:**
- Tool declaration (name, schema) — already somewhat generic via JSON Schema
- Tool execution (args → result)
- Tool confirmation flow (ASK_USER → user decision → proceed/deny)
- Tool result shape (llmContent + displayContent + error + tailCalls)
**Key decision:** Keep DeclarativeTool pattern or flatten to a simpler
interface?
**Recommendation:** Define a minimal `ToolExecutor` interface:
```
ToolExecutor {
name: string
description: string
schema: JSONSchema
execute(args, context): Promise<ToolResult>
requiresConfirmation?(args, context): Promise<boolean>
}
```
DeclarativeTool remains the concrete implementation. ADK's BaseTool adapts to
this.
**Open questions:**
- How do MCP tools fit? They already have their own protocol.
- Tool annotations (destructive hints) — should these be in the interface?
- Long-running tools need progress reporting — how does this interact with
tool_update events?
---
### P1 (Important - Define After P0)
#### 4. Policy / Permission Interface
**Why important:** Every tool call goes through policy. External agents need
policy enforcement too.
**Current state:** gemini-cli has a sophisticated TOML-based policy engine with
tiered priorities. ADK-TS has a simpler SecurityPlugin with PolicyOutcome
(DENY/CONFIRM/ALLOW).
**What needs abstracting:**
```
PolicyEngine {
evaluate(toolName, args, context): PolicyDecision // ALLOW | DENY | ASK_USER
getExcludedTools(): string[] // Tools statically denied
}
```
**Key decision:** Do external agents (OpenRouter, etc.) get the same policy
enforcement?
**Open questions:**
- If an ADK agent calls a tool internally, does gemini-cli's policy apply?
- With `pauseOnToolCalls: true` in ADK, the CLI controls execution — but what
about headless mode?
- How do agent-level policies work? (allow/deny entire agents, not just tools)
- Should policy be a middleware (AG-UI pattern) or a callback (ADK plugin
pattern)?
#### 5. Hooks Interface
**Why important:** Hooks are a major gemini-cli feature. They need to work
regardless of which agent backend runs.
**Current state:** 11 hook types firing at specific lifecycle points.
**What needs abstracting:**
- Hook lifecycle must be backend-agnostic
- BeforeModel/AfterModel hooks need to work even when ADK controls the model
- BeforeTool/AfterTool hooks need to intercept regardless of who executes the
tool
**Key challenge:** When ADK runs the model internally, gemini-cli hooks can't
easily intercept. **Dewitt's solution:** ADK uses gemini-cli's model via
AdkGeminiModel adapter — hooks fire inside GeminiChat.
**Open questions:**
- If OpenRouter runs the model, how do BeforeModel/AfterModel hooks work?
- Do we need a "model steering" abstraction (injecting context mid-stream)?
- Can hooks be expressed as AG-UI middleware? (intercept event stream)
#### 6. Model / LLM Interface
**Why important:** Model abstraction enables swapping LLM providers.
**Dewitt's approach:** Exposes Model interface, ADK uses it via AdkGeminiModel
adapter. **Coworker's approach:** Model is internal to Agent (no separate Model
interface).
**Recommendation:** Keep Dewitt's separate Model interface BUT make it
provider-agnostic:
- Remove `@google/genai` types from the interface signature
- Define generic Message/Content types
- Model interface is an implementation detail, not part of the Agent contract
**Open questions:**
- Can we define a truly provider-agnostic Model interface?
- Or is the Model always tied to the agent backend? (ADK uses Gemini, OpenRouter
uses whatever)
- Model routing (choosing which model) — is this a concern of the Model
interface or a separate service?
---
### P2 (Important but Can Follow)
#### 7. Session / State Interface
**Current state:** gemini-cli uses ChatRecordingService (JSON files). ADK uses
Session with BaseSessionService.
**What needs abstracting:**
- Session creation/retrieval
- State persistence across turns
- History/trajectory management
**Open questions:**
- Does the trajectory (coworker's concept) replace gemini-cli's chat recording?
- Should session state be shared between gemini-cli and the agent backend?
#### 8. Elicitation / User Interaction Interface
**What it covers:** Model fallback dialogs, tool confirmations, Ctrl+B
interrupts, user questions
**Current state:** gemini-cli uses ConfirmationBus + MessageBus. AG-UI uses
frontend tools.
**Open questions:**
- Is elicitation just a special case of tool calls (AG-UI approach)?
- Or is it a first-class event type (coworker's approach)?
- How does Ctrl+B (cancel/interrupt) propagate through the agent boundary?
#### 9. Configuration / Capability Discovery
**What it covers:** Feature flags, experiment settings, agent capabilities
**Open questions:**
- How does an external agent declare its capabilities?
- Does OpenRouter support HITL? Elicitation? Tool confirmation? Each agent may
differ.
- Need a `capabilities` negotiation at connection time?
---
### P3 (Future / Can Defer)
#### 10. A2UI / Rich UI Interface
- Declarative UI generation from agents
- Not critical for Phase 2 but important for differentiation
#### 11. Memory / Artifact Interface
- ADK has memory/artifact services
- gemini-cli has ChatRecordingService + memory tools
- Can standardize later
#### 12. Telemetry / Observability Interface
- Both systems have telemetry
- Can standardize later
---
## Critical Open Questions (Need Team Discussion)
### 1. OpenRouter Integration Model
**Question:** When OpenRouter (or any external agent) is used, what does the
integration look like?
**Option A: Full Agent Interface** — OpenRouter implements the Agent interface
directly
- Pro: Clean, uniform
- Con: OpenRouter doesn't support HITL, hooks, policies natively
**Option B: ACP Shim** — Agent Communication Protocol between CLI and external
agents
- Pro: Standards-based
- Con: Additional protocol layer, may be premature
**Option C: Model-only Integration** — OpenRouter is just an alternative Model,
not Agent
- Pro: Simpler, leverages existing agent loop
- Con: Doesn't support OpenRouter-specific features
**Recommendation:** Start with Option C (model-only). OpenRouter provides an LLM
endpoint. Gemini-cli's own agent loop handles tools, policies, hooks. This means
defining a provider-agnostic Model interface is the key enabler.
### 2. Tool Execution: Client-side vs Agent-side
**Question:** Who executes tools — the CLI or the agent backend?
**Option A: Always client-side** (CLI executes, agent suspends)
- ADK: `pauseOnToolCalls: true`
- Pro: CLI maintains control, policies enforced, hooks fire
- Con: Higher latency, more round-trips
**Option B: Agent-side execution** (agent runs tools internally)
- Pro: Faster, simpler
- Con: Bypasses CLI policies, hooks, confirmations
**Option C: Configurable** — CLI decides per-tool or per-agent
- Pro: Flexible
- Con: Complex
**Recommendation:** Option A for safety-critical CLI use case. Option B only for
trusted/sandboxed sub-agents.
### 3. Model Steering (Hooks that inject context mid-stream)
**Question:** How do user-local hooks (like injecting project context) work with
external agents?
**Answer:** They can only work if:
- The CLI controls the model (via Model interface adapter) — then BeforeModel
hook injects context
- OR the agent supports a "system instruction update" mechanism
For OpenRouter: model steering works because CLI controls the model call. For
ADK: model steering works because AdkGeminiModel wraps GeminiChat. For fully
opaque agents: model steering **cannot work** — this is a known limitation.
### 4. Elicitation Flow
**Question:** When the agent needs user input (model fallback, clarification),
how does it work?
**For CLI-controlled agents:** Agent yields an elicitation_request event → CLI
renders prompt → user responds → CLI sends response back via session.stream({
kind: 'elicitation_response', ... }) to resume
**For external agents:** Agent uses A2A protocol or similar to send elicitation
→ CLI bridges the request to user → response sent back via protocol
**Key insight:** Elicitation is fundamentally about the agent SUSPENDING and
waiting for user input. ADK already supports this via `pauseOnToolCalls`. Can we
generalize to `pauseOnElicitation`?
### 5. Sub-agent Identity and Policies
**Question:** When a sub-agent spawns, does it inherit parent policies? Get its
own?
**Current gemini-cli behavior:** Sub-agents registered as tools, go through same
policy engine. **ADK behavior:** Sub-agents are child nodes in agent tree, get
parent's plugins.
**Recommendation:** Sub-agents inherit parent policy context. Additional
restrictions can be layered (e.g., sub-agent X cannot use shell tool). This is
already how gemini-cli works.