diff --git a/docs/adk-replat/NOTES-adk-alignment.md b/docs/adk-replat/NOTES-adk-alignment.md new file mode 100644 index 0000000000..d7a52ca6ee --- /dev/null +++ b/docs/adk-replat/NOTES-adk-alignment.md @@ -0,0 +1,529 @@ +# ADK-TS Alignment Pass + +Every interface in our outline must map cleanly to ADK-TS. This document +verifies that mapping field-by-field, identifies gaps, and confirms +HITL/plugin/transfer patterns work. + +Source: ADK-TS v0.4.0 at `/Users/adamfweidman/Desktop/adk-int/adk-js/core/src/` + +--- + +## 1. AgentDescriptor ↔ ADK Agent Hierarchy + +### Field-by-field mapping + +| AgentDescriptor field | ADK-TS source | Notes | +| ---------------------------- | -------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `name` | `BaseAgent.name` | Direct. ADK validates it's a valid JS identifier. | +| `displayName` | — | ADK doesn't have this. No conflict. | +| `description` | `BaseAgent.description` (optional in ADK) | Direct. Used for model routing in AgentTool. | +| `executor` | — | New concept. ADK agents are always 'adk'. Adapter sets this. | +| `inputSchema` | `LlmAgent.inputSchema` (Zod or JSON Schema) | Direct. ADK's AgentTool uses this for tool parameter generation. | +| `outputSchema` | `LlmAgent.outputSchema` (Zod or JSON Schema) | Direct. ADK uses for structured output + AgentTool response. | +| `capabilities` | — | New concept. Adapter infers from agent type: LlmAgent gets `['elicitation', 'streaming', 'host_tool_execution']`, LoopAgent gets `['composition']`, etc. | +| `ownTools` | `LlmAgent.tools: ToolUnion[]` | Maps via ToolDescriptor adapter. ADK tools have `name`, `description`, `_getDeclaration()` which returns JSON Schema. | +| `requiredTools` | — | New concept. ADK agents don't declare required host tools. Adapter can infer from tool references. | +| `subAgents` | `BaseAgent.subAgents: BaseAgent[]` | Recursive. Each sub-agent becomes a nested AgentDescriptor. | +| `constraints.maxTurns` | `RunConfig.maxLlmCalls` (default 500) | Maps, though semantics differ slightly (LLM calls vs turns). | +| `constraints.maxTimeMinutes` | — | ADK doesn't have time limits. No conflict — host enforces. | +| `constraints.maxBudgetUsd` | — | ADK doesn't have budget. No conflict — host enforces. | +| `metadata` | — | New concept. Adapter can populate from agent registration context. | + +### ADK-specific fields NOT in AgentDescriptor + +| ADK field | Where it lives | Our approach | +| ----------------------------------- | -------------- | ---------------------------------------------------------------------------------------------------------- | +| `instruction` / `globalInstruction` | LlmAgent | Executor-internal. Not in descriptor (it's runtime config, not identity). | +| `model` | LlmAgent | Goes in ExecutionOptions.model or executor-internal config. | +| `generateContentConfig` | LlmAgent | Executor-internal. | +| `disallowTransferToParent/Peers` | LlmAgent | Could be `constraints` or `_meta`. Transfer policy is host-enforced. | +| `includeContents` | LlmAgent | Executor-internal (context management). | +| `outputKey` | LlmAgent | Executor-internal (state management). | +| `beforeModelCallback`, etc. | LlmAgent | Executor-internal. These are ADK's callback system — our LifecycleInterceptor is the interface equivalent. | + +### Verdict: CLEAN MAPPING + +AgentDescriptor captures everything needed to describe an ADK agent externally. +ADK-specific runtime config (instruction, model, callbacks) stays inside the +executor — exactly right for the descriptor/executor separation. + +**Key ADK pattern preserved:** AgentTool wraps an agent as a tool using +`inputSchema` for parameters and `description` for the tool description. Our +AgentDescriptor has both, so SubagentTool can do the same thing. + +--- + +## 2. AgentSession ↔ ADK Runner + +### Method mapping + +| AgentSession method | ADK-TS equivalent | How adapter works | +| ----------------------- | --------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `stream(data, options)` | `Runner.runAsync({ userId, sessionId, newMessage, runConfig })` | Adapter creates/loads session, maps data+options → runAsync params, wraps Event generator → AgentEvent generator. Each `stream()` call triggers a new `runAsync()`. | +| `update(config)` | No direct equivalent | ADK doesn't support mid-stream config changes. Adapter queues updates for next `runAsync()` call. | +| `steer(data)` | No direct equivalent | ADK doesn't support mid-stream intervention. Adapter can queue for next invocation or ignore. | +| `abort()` | No direct equivalent | ADK uses `invocationContext.endInvocation = true`. Adapter sets this flag. Could also use AbortController. | + +### ExecutionRequest → Runner.runAsync mapping + +| ExecutionRequest field | ADK mapping | +| --------------------------- | ---------------------------------------------------------------- | +| `descriptor` | Used to find/create the BaseAgent instance | +| `input` | → `newMessage: Content` (converted from ContentPart[] → Content) | +| `sessionRef` | → `sessionId` (string) or creates session from SessionSnapshot | +| `forkSession` | Adapter clones session before running | +| `options.tools` | → merged into agent's `tools` config | +| `options.model` | → `LlmAgent.model` override | +| `options.hostToolExecution` | → `RunConfig.pauseOnToolCalls: true` | +| `options.streaming` | → `RunConfig.streamingMode` | +| `options.permissionMode` | → SecurityPlugin config | +| `signal` | → wired to `invocationContext.endInvocation` | + +### HITL: How pauseOnToolCalls works end-to-end + +This is the critical path. Here's the full flow: + +``` +1. LLM returns tool call (FunctionCall in Event) +2. ADK checks RunConfig.pauseOnToolCalls === true +3. ADK sets invocationContext.endInvocation = true +4. ADK yields the Event (with FunctionCall) and stops +5. Runner.runAsync() generator completes + + --- OUR INTERFACE BOUNDARY --- + +6. Adapter translates ADK Event → ToolRequestEvent +7. Host receives ToolRequestEvent from session.stream() generator +8. Host runs policy check (PolicyEvaluator.evaluate()) +9. Host fires hooks (LifecycleInterceptor.fire('before_tool', ...)) +10. If policy allows → Host executes tool → gets ToolResultData +11. Host calls session.stream({ kind: 'tool_result', ... }) to get next stream + + --- BACK INTO ADK --- + +12. Adapter receives tool result +13. Adapter creates FunctionResponse Content +14. Adapter calls Runner.runAsync() again with FunctionResponse as newMessage +15. ADK loads session (has prior tool call event) +16. ADK resumes agent with tool response +17. Loop continues from step 1 +``` + +**Why this works:** ADK's `pauseOnToolCalls` was designed exactly for this +pattern — external tool execution by a host. The adapter translates between +ADK's "end invocation + resume with FunctionResponse" pattern and our +"ToolRequestEvent + send(tool_result)" pattern. + +**Key insight:** Each `session.stream()` call triggers a new `Runner.runAsync()` +call. This means each ADK "invocation" maps to one `stream()` call. The session +persists state across invocations. Mid-stream `update()` and `steer()` calls are +queued for the next invocation since ADK doesn't support mid-turn changes. + +### HITL: ToolConfirmation flow + +ADK also has a separate ToolConfirmation pattern (via +`context.requestConfirmation()`): + +``` +1. beforeToolCallback calls context.requestConfirmation({ hint: '...' }) +2. This sets eventActions.requestedToolConfirmations[functionCallId] +3. ADK yields event with requestedToolConfirmations populated +4. Runner completes (invocation ends) + + --- OUR INTERFACE BOUNDARY --- + +5. Adapter sees requestedToolConfirmations in event +6. Adapter translates → ElicitationRequest { kind: 'tool_confirmation', ... } +7. Host renders confirmation UI +8. User responds → ElicitationResponse { action: 'accept' | 'decline' } + + --- BACK INTO ADK --- + +9. Adapter receives elicitation response +10. If accepted: Adapter creates FunctionResponse with confirmed=true +11. Calls Runner.runAsync() with FunctionResponse +12. ADK's SecurityPlugin or callback reads confirmation from session +13. Tool executes +``` + +**Maps to our ElicitationRequest:** ADK's `ToolConfirmation.hint` → +`ElicitationRequest.message`. ADK's `ToolConfirmation.payload` → +`ElicitationRequest.context`. The `kind: 'tool_confirmation'` is the +discriminator. + +### HITL: Auth request flow + +``` +1. Tool or callback calls context.requestCredential(authConfig) +2. Sets eventActions.requestedAuthConfigs[functionCallId] +3. Event yields, invocation ends + + --- OUR INTERFACE BOUNDARY --- + +4. Adapter sees requestedAuthConfigs +5. Translates → ElicitationRequest { kind: 'auth_required', context: authConfig } +6. User provides credentials +7. ElicitationResponse { action: 'accept', content: { credential: ... } } + + --- BACK INTO ADK --- + +8. Adapter stores credential via CredentialService +9. Calls Runner.runAsync() again +10. Tool calls context.getAuthResponse() → gets credential +``` + +**Maps to our ElicitationRequest:** ADK's auth pattern is just another +elicitation kind. This validates our generic elicitation design — it handles +tool confirmation, auth, and any future interaction type. + +--- + +## 3. AgentEvent ↔ ADK Event + +### Event type mapping + +| Our AgentEvent | ADK Event pattern | Adapter translation | +| --------------------- | --------------------------------------------------------------------------------- | --------------------------------------------- | +| `InitializeEvent` | First event from Runner.runAsync() | Adapter emits on first stream() call | +| `SessionUpdateEvent` | `eventActions.stateDelta` | Adapter emits when stateDelta is non-empty | +| `MessageEvent` | `event.content` with text Parts | Filter text/thought parts from Content | +| `ToolRequestEvent` | `getFunctionCalls(event)` returns FunctionCall[] | Each FunctionCall → one ToolRequestEvent | +| `ToolUpdateEvent` | `event.longRunningToolIds` | Adapter emits progress for long-running tools | +| `ToolResponseEvent` | `getFunctionResponses(event)` returns FunctionResponse[] | Each FunctionResponse → one ToolResponseEvent | +| `ElicitationRequest` | `eventActions.requestedToolConfirmations` or `requestedAuthConfigs` | Map to generic elicitation | +| `ElicitationResponse` | User input → FunctionResponse in next runAsync call | Reverse of above | +| `UsageEvent` | `event.usageMetadata` (GenerateContentResponseUsageMetadata) | Map token counts | +| `ErrorEvent` | `event.errorCode` + `event.errorMessage` | Map error fields | +| `stream_end` | `isFinalResponse(event)`, `eventActions.transferToAgent`, `eventActions.escalate` | Derive `stream_end` reason from ADK signals | +| `CustomEvent` | `event.customMetadata` | Pass through | + +### ADK EventActions → Our events + +| EventActions field | Our event | Notes | +| ---------------------------- | ------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ | +| `stateDelta` | SessionUpdate or embedded in other events | Delta state is a core ADK pattern | +| `artifactDelta` | `CustomEvent { kind: 'artifact_delta' }` | Artifacts not in our core events | +| `transferToAgent` | Tool call (`transfer_to_agent`) + `stream_end` `reason: 'completed'` | Handoff is a tool call. Host intercepts the tool request, mediates the handoff, originating agent completes. | +| `escalate` | `stream_end` `reason: 'completed'` with `data: { escalateReason: '...' }` | LoopAgent exit signal. ADK's escalate = "I'm done, pass control back up" | +| `requestedToolConfirmations` | `ElicitationRequest { kind: 'tool_confirmation' }` | Per function call ID | +| `requestedAuthConfigs` | `ElicitationRequest { kind: 'auth_required' }` | Per function call ID | +| `skipSummarization` | `_meta: { skipSummarization: true }` | ADK-specific, goes in metadata | + +### AgentEventBase mapping + +| AgentEventBase field | ADK Event field | Notes | +| -------------------- | ---------------------------------------- | ------------------------------------------------- | +| `id` | `event.id` | Direct | +| `timestamp` | `event.timestamp` (number) | Convert to ISO 8601 string | +| `type` | Derived from content analysis | ADK doesn't have event types — adapter classifies | +| `agentId` | `event.author` (agent name) or context | **New field** — which agent emitted this event | +| `threadId` | `event.branch` (e.g., "agent_1.agent_2") | Direct mapping | +| `source` | `event.author` ("user" or agent name) | Direct | +| `_meta` | `event.customMetadata` | Direct | + +### Verdict: CLEAN MAPPING + +Every ADK event pattern maps to our event types. The adapter classifies ADK's +untyped events into our typed event taxonomy. Key insight: ADK events are richer +(they carry EventActions, function calls, auth requests all in one event), so +the adapter may fan out one ADK Event into multiple AgentEvents (e.g., one +Message + one ToolRequest + one ElicitationRequest). The new `agentId` field +maps directly from ADK's `event.author`. + +--- + +## 4. ToolContract ↔ ADK Tool System + +### ToolDescriptor ↔ BaseTool + +| ToolDescriptor field | ADK source | Notes | +| ------------------------- | ------------------------------------------------------------- | --------------------------------- | +| `name` | `BaseTool.name` | Direct | +| `displayName` | — | ADK doesn't have this | +| `description` | `BaseTool.description` | Direct | +| `parametersSchema` | `BaseTool._getDeclaration()` → FunctionDeclaration.parameters | JSON Schema from declaration | +| `annotations.readOnly` | Inferred from tool type | FunctionTool with no side effects | +| `annotations.longRunning` | `BaseTool.isLongRunning` | Direct | + +### ToolCallRequest ↔ FunctionCall + +| ToolCallRequest | ADK FunctionCall | Notes | +| --------------- | ------------------- | ------ | +| `requestId` | `functionCall.id` | Direct | +| `name` | `functionCall.name` | Direct | +| `args` | `functionCall.args` | Direct | + +### ToolResultData ↔ FunctionResponse + tool return + +| ToolResultData | ADK | Notes | +| ---------------- | ------------------------------ | ------------------------------------------------ | +| `llmContent` | `FunctionResponse.response` | Adapter wraps into ContentPart[] | +| `displayContent` | — | ADK doesn't separate display from model content | +| `isError` | Error thrown from `runAsync()` | Adapter catches and sets flag | +| `tailCalls` | — | ADK doesn't have tail calls (gemini-cli concept) | + +### AgentTool pattern + +ADK's `AgentTool` wraps a `BaseAgent` as a `BaseTool`: + +- Uses `agent.inputSchema` for tool parameters +- Uses `agent.description` for tool description +- Creates internal Runner with isolated session +- Returns agent output as tool result +- Merges state deltas back to parent + +**Our equivalent:** `SubagentTool` wraps `AgentDescriptor` as a tool: + +- Uses `descriptor.inputSchema` for tool parameters +- Uses `descriptor.description` for tool description +- Creates executor via `SessionFactory.create(descriptor, context)` +- Returns execution result as tool result + +**Mapping is 1:1.** The only difference is ADK does it with concrete agent +instances; we do it with descriptors + factory. + +--- + +## 5. LifecycleInterceptor ↔ ADK Plugin System + +### Hook point mapping + +| Our hook point string | ADK Plugin callback | Mapping | +| --------------------- | ----------------------- | ------------------------------------------ | +| `'before_agent'` | `beforeAgentCallback` | `payload: { agent, context }` | +| `'after_agent'` | `afterAgentCallback` | `payload: { agent, context }` | +| `'before_model'` | `beforeModelCallback` | `payload: { context, llmRequest }` | +| `'after_model'` | `afterModelCallback` | `payload: { context, llmResponse }` | +| `'before_tool'` | `beforeToolCallback` | `payload: { tool, args, context }` | +| `'after_tool'` | `afterToolCallback` | `payload: { tool, args, context, result }` | +| `'on_event'` | `onEventCallback` | `payload: { event }` | +| `'on_user_message'` | `onUserMessageCallback` | `payload: { userMessage }` | +| `'before_run'` | `beforeRunCallback` | `payload: { context }` | +| `'after_run'` | `afterRunCallback` | `payload: { context }` | +| `'on_model_error'` | `onModelErrorCallback` | `payload: { request, error }` | +| `'on_tool_error'` | `onToolErrorCallback` | `payload: { tool, args, error }` | + +### HookResult ↔ ADK callback return + +| HookResult field | ADK pattern | Notes | +| ------------------- | ----------------------------------------------- | ----------------------------------- | +| `action: 'proceed'` | Return `undefined` | Plugin returns nothing → continue | +| `action: 'block'` | Return `Content` (for agent/model) or throw | Non-undefined return short-circuits | +| `modifications` | Return modified `LlmRequest`/`LlmResponse`/args | Plugin returns modified version | + +### ADK's early-exit pattern + +ADK plugins use "first non-undefined return wins": + +- `beforeModelCallback` returns `LlmResponse` → skips LLM call entirely (cache + hit) +- `beforeToolCallback` returns modified `args` → tool runs with new args +- `beforeAgentCallback` returns `Content` → skips agent run entirely + +Our `HookResult.modifications` carries the same data. The `action: 'block'` + +return value pattern maps cleanly. + +### gemini-cli hooks NOT in ADK + +| gemini-cli hook | ADK equivalent | Notes | +| --------------------- | ------------------------------------ | ------------------------------------------------------------- | +| `BeforeToolSelection` | — | ADK doesn't let you modify which tools are available mid-turn | +| `Notification` | — | ADK doesn't have notification hooks | +| `SessionStart` | `onUserMessageCallback` (first call) | Close enough | +| `SessionEnd` | `afterRunCallback` | Close enough | +| `PreCompress` | — | ADK doesn't have context compression hooks | + +These gaps are fine — they're gemini-cli-specific hook points. Our generic +`fire(hookPoint, payload)` handles them because the hook point is an open +string. ADK executors simply don't fire these hook points, and +`supportedHookPoints()` reflects that. + +--- + +## 6. PolicyEvaluator ↔ ADK SecurityPlugin + +### ADK SecurityPlugin + +```typescript +class SecurityPlugin extends BasePlugin { + policyEngine: BasePolicyEngine; + + // In beforeToolCallback: + async beforeToolCallback({ tool, args, context }) { + const outcome = await this.policyEngine.evaluate(tool.name, args); + switch (outcome) { + case PolicyOutcome.DENY: + throw error; + case PolicyOutcome.CONFIRM: + context.requestConfirmation({ hint }); + case PolicyOutcome.ALLOW: + return undefined; // proceed + } + } +} +``` + +### Mapping + +| Our PolicyEvaluator | ADK SecurityPlugin | Notes | +| ------------------------- | --------------------------------------------------------- | ------------------------------------------ | +| `evaluate(request)` | `policyEngine.evaluate(toolName, args)` | ADK is simpler — tool name + args only | +| `PolicyDecision.allow` | `PolicyOutcome.ALLOW` | Direct | +| `PolicyDecision.deny` | `PolicyOutcome.DENY` | Direct | +| `PolicyDecision.ask_user` | `PolicyOutcome.CONFIRM` → `context.requestConfirmation()` | ADK chains to ToolConfirmation | +| `getExcluded()` | — | ADK doesn't pre-filter tools | +| `request.principal` | — | ADK doesn't track who's calling | +| `request.principalPath` | Could use `context.agentName` + branch | For hierarchical policy | +| `request.context` | — | Our extension point for host-specific data | + +### How ADK policy maps when host controls execution + +With `pauseOnToolCalls: true`, the flow is: + +1. ADK yields tool call → adapter converts to ToolRequestEvent +2. **Host** runs PolicyEvaluator.evaluate() — NOT ADK's SecurityPlugin +3. Host decides allow/deny/ask_user +4. If allowed, host executes tool and sends result via `session.stream()` + +This means **ADK's SecurityPlugin is bypassed when the host controls tool +execution** — which is correct! The host's PolicyEvaluator is the authority. +ADK's SecurityPlugin only matters when ADK executes tools internally +(`pauseOnToolCalls: false`). + +--- + +## 7. SessionContract ↔ ADK Session + +### Session mapping + +| Our SessionHandle | ADK Session | Notes | +| ----------------- | ---------------------------------------- | ----------------------------------------- | +| `id` | `Session.id` | Direct | +| `agentName` | `Session.appName` | ADK uses appName, not agent name | +| `events` | `Session.events: Event[]` | Direct (but ADK Events → our AgentEvents) | +| `state` | `Session.state: Record` | Direct | +| `lastUpdateTime` | `Session.lastUpdateTime` | Direct | + +### SessionProvider ↔ BaseSessionService + +| Our SessionProvider | ADK BaseSessionService | Notes | +| ----------------------------- | ----------------------------------------------- | -------------------------- | +| `create(agentName, metadata)` | `createSession({ appName, userId })` | ADK requires userId | +| `load(sessionId)` | `getSession({ appName, userId, sessionId })` | ADK requires all three IDs | +| `list(agentName)` | `listSessions({ appName, userId })` | ADK scopes by userId | +| `delete(sessionId)` | `deleteSession({ appName, userId, sessionId })` | Same pattern | + +### Gap: ADK requires userId + +ADK sessions are scoped by `(appName, userId, sessionId)`. Our interface uses +just `sessionId`. The adapter can embed userId in the session metadata or derive +it from HostContext. + +### State prefixes (ADK-specific) + +ADK uses prefixed state keys: + +- `app:` — app-scoped, persisted +- `user:` — user-scoped, persisted +- `temp:` — temporary, stripped before persistence + +Our `SessionHandle.state` is a flat `Record`. The adapter +preserves prefixes as-is — they're just string keys. No conflict. + +--- + +## 8. ContentPart ↔ ADK Content/Part + +### ADK uses Google GenAI types + +ADK's `Content` and `Part` come from `@google/genai`: + +```typescript +interface Content { + role?: string; // 'user' | 'model' + parts: Part[]; +} + +type Part = TextPart | InlineDataPart | FunctionCallPart | FunctionResponsePart | ... +``` + +### Mapping + +| Our ContentPart | ADK/GenAI Part | Notes | +| --------------------------------------------------- | ---------------------------------------------- | ------------------------------------------------------ | +| `{ type: 'text', text }` | `{ text: string }` | Direct | +| `{ type: 'thought', thought }` | `{ thought: true, text: string }` | ADK uses `thought` boolean flag on TextPart | +| `{ type: 'media', mimeType, data }` | `{ inlineData: { mimeType, data } }` | Restructure | +| `{ type: 'reference', text, uri }` | `{ fileData: { fileUri, mimeType } }` | Map fileData → reference | +| `{ type: 'refusal', text }` | — | Not in ADK/GenAI. Adapter would map from finishReason. | +| `{ type: 'function_call', name, args, id }` | `{ functionCall: { name, args, id } }` | Unwrap | +| `{ type: 'function_response', name, response, id }` | `{ functionResponse: { name, response, id } }` | Unwrap | + +### Verdict: CLEAN MAPPING + +The adapter converts between our flat discriminated union and ADK's nested Part +structure. No information loss in either direction. + +--- + +## 9. Composition ↔ ADK Agent Patterns + +| Our CompositionConfig.pattern | ADK Agent type | Notes | +| ----------------------------- | -------------------------------------- | ------------------------------------------------ | +| `'hierarchical'` | Any agent with `subAgents` | Default — parent calls sub-agents as tools | +| `'sequential'` | `SequentialAgent` | Runs children in order | +| `'parallel'` | `ParallelAgent` | Runs children concurrently, branch isolation | +| `'loop'` | `LoopAgent` | Repeats children until escalate or maxIterations | +| `'transfer'` | LlmAgent with `transfer_to_agent` tool | Peer-to-peer handoff | + +### Branch isolation + +ADK's `ParallelAgent` gives each child an isolated `branch` context: + +- Children don't see peer events +- Each gets unique branch path: `"parent.child_0"`, `"parent.child_1"` +- Results merged after all complete + +Maps to our `threadId` — each parallel branch gets a unique threadId. Events +from different branches are interleaved by the host. + +--- + +## 10. Summary: Gaps and Resolutions + +### No gaps blocking ADK integration: + +| Concern | Status | Resolution | +| ----------------------- | --------- | ------------------------------------------------------------------------- | +| pauseOnToolCalls HITL | **Works** | Adapter maps to stream() cycle (§2) | +| ToolConfirmation | **Works** | Maps to ElicitationRequest (§2) | +| Auth requests | **Works** | Maps to ElicitationRequest (§2) | +| Plugin hooks (12 types) | **Works** | Maps to LifecycleInterceptor.fire() (§5) | +| Agent transfers | **Works** | Tool call (`transfer_to_agent`) + `stream_end` `reason: 'completed'` (§3) | +| State delta pattern | **Works** | SessionUpdateEvent or \_meta (§3) | +| Branch isolation | **Works** | threadId mapping (§9) | +| AgentTool pattern | **Works** | SubagentTool with descriptor + factory (§4) | +| Session management | **Works** | Adapter maps userId into session (§7) | + +### Minor adapter complexity: + +1. **Event fan-out:** One ADK Event may become multiple AgentEvents (message + + tool call + elicitation). Adapter logic needed but straightforward. +2. **userId scoping:** ADK sessions require userId; our interface doesn't. + Adapter derives from HostContext. +3. **Timestamp format:** ADK uses `number` (epoch ms); we use ISO 8601 string. + Simple conversion. +4. **Content structure:** ADK uses nested Part types; we use flat discriminated + union. Adapter converts bidirectionally. + +### ADK features our interface supports that gemini-cli doesn't have yet: + +- `LoopAgent` / `ParallelAgent` / `SequentialAgent` composition → our + CompositionConfig +- `eventActions.stateDelta` → our SessionUpdateEvent +- `eventActions.transferToAgent` → tool call (`transfer_to_agent`) + + `stream_end` `reason: 'completed'` +- `eventActions.escalate` → `stream_end` `reason: 'completed'` with + `data: { escalateReason }` +- Long-running tools → our ToolUpdateEvent +- Auth credential flow → our ElicitationRequest with kind: 'auth_required' diff --git a/docs/adk-replat/NOTES-adk-ts-architecture.md b/docs/adk-replat/NOTES-adk-ts-architecture.md new file mode 100644 index 0000000000..f4c5445470 --- /dev/null +++ b/docs/adk-replat/NOTES-adk-ts-architecture.md @@ -0,0 +1,274 @@ +# ADK-TS (Agent Development Kit - TypeScript) Architecture Notes + +## Package: `@google/adk` v0.4.0 + +**Location:** `/Users/adamfweidman/Desktop/adk-int/adk-js/core/` + +## Agent Hierarchy + +``` +BaseAgent (abstract) +├── LlmAgent - Model-driven agent with tools (the main one) +├── LoopAgent - Runs sub-agents in a loop (maxIterations, escalate to exit) +├── ParallelAgent - Runs sub-agents concurrently (isolated branches) +└── SequentialAgent - Runs sub-agents sequentially +``` + +### BaseAgent Config + +- `name: string` - Unique identifier (must be valid JS identifier) +- `description?: string` - One-line capability for model routing +- `parentAgent?: BaseAgent` - Parent in agent tree +- `subAgents?: BaseAgent[]` - Child agents +- `beforeAgentCallback / afterAgentCallback` - Pre/post execution hooks + +### LlmAgent Config (extends BaseAgent) + +- `model?: string | BaseLlm` - LLM to use +- `instruction?: string | InstructionProvider` - Agent-specific instructions +- `globalInstruction?: string | InstructionProvider` - Tree-wide (root only) +- `tools?: ToolUnion[]` - Available tools +- `generateContentConfig?: GenerateContentConfig` - LLM params +- `disallowTransferToParent / disallowTransferToPeers` - Transfer controls +- `includeContents?: 'default' | 'none'` - Context history inclusion +- `inputSchema / outputSchema` - Validation schemas +- `outputKey?: string` - Session state key for output storage +- `beforeModelCallback / afterModelCallback` - LLM hooks +- `beforeToolCallback / afterToolCallback` - Tool hooks +- `requestProcessors / responseProcessors` - LLM request/response processors +- `codeExecutor?: BaseCodeExecutor` + +## Event System + +### Event Interface + +```typescript +interface Event extends LlmResponse { + id: string; + invocationId: string; + author?: string; // "user" or agent name + actions: EventActions; // State/artifact/auth/transfer operations + longRunningToolIds?: string[]; + branch?: string; // Hierarchical agent path + timestamp: number; + content?: Content; + partial?: boolean; // Streaming indicator +} +``` + +### EventActions + +```typescript +interface EventActions { + skipSummarization?: boolean; + stateDelta: Record; + artifactDelta: Record; + transferToAgent?: string; + escalate?: boolean; + requestedAuthConfigs: Record; + requestedToolConfirmations: Record; +} +``` + +### Structured Events (utility layer) + +Converts raw Event to discriminated union: + +``` +EventType: THOUGHT | CONTENT | TOOL_CALL | TOOL_RESULT | CALL_CODE | + CODE_RESULT | ERROR | ACTIVITY | TOOL_CONFIRMATION | FINISHED +``` + +## Tool System + +### BaseTool (abstract) + +- `name, description, isLongRunning` +- `_getDeclaration(): FunctionDeclaration` - OpenAPI schema for LLM +- `runAsync(request): Promise` - Execute tool +- `processLlmRequest(request): Promise` - Preprocessing + +### Concrete Tool Types + +1. **FunctionTool** - Generic typed tools (Zod schema support) +2. **AgentTool** - Wrap agents as tools (for hierarchical composition) +3. **MCPTool** - Model Context Protocol server tools +4. **GoogleSearchTool** - Built-in web search +5. **ExitLoopTool** - Signal loop exit +6. **LongRunningFunctionTool** - Async long-running operations + +### BaseToolset + +- Filter tools by predicate or string list +- `getTools(context)`, `close()`, `isToolSelected()` +- **MCPToolset** - Toolset for MCP server connections + +## Session Management + +### Session Interface + +```typescript +interface Session { + id: string; + appName: string; + userId: string; + state: Record; // Mutable key-value store + events: Event[]; // Complete conversation history + lastUpdateTime: number; +} +``` + +### Session Services + +- `BaseSessionService` (abstract) - createSession, getSession, listSessions, + deleteSession, appendEvent +- `InMemorySessionService` - In-process storage +- `DatabaseSessionService` - Mikro-ORM backed (SQL) + +### State Management + +- `State` class wraps base state + delta +- `get()` returns from delta if present, else base +- `set()` updates delta only +- `hasDelta()` checks if changes made + +## Human-in-the-Loop (HITL) + +### Tool Confirmation + +```typescript +class ToolConfirmation { + hint?: string; // Guidance for user + confirmed: boolean; // User approval + payload?: unknown; // Additional context +} +``` + +### Security Plugin + +- `beforeToolCallback` - Evaluates policy before tool execution +- `BasePolicyEngine` interface with `evaluate()` method +- `PolicyOutcome`: DENY | CONFIRM | ALLOW + +### Auth Requests + +- `context.requestCredential(authConfig)` - Request auth from user +- `context.getAuthResponse(authConfig)` - Check for auth response +- Sets `eventActions.requestedAuthConfigs[functionCallId]` + +## Multi-Agent Patterns + +### Agent Transfer + +- LlmAgent injects `transfer_to_agent(agentName)` tool +- Sets `eventActions.transferToAgent = targetAgentName` +- Runner resolves target and continues +- Can transfer to: sub-agents, parent (if not disabled), peers (if not disabled) + +### Parallel Agent + +- Runs all subAgents concurrently +- Isolates each via `branch` context +- Sub-agents don't see peer history +- Merges event streams with fair ordering + +### Loop Agent + +- Repeatedly runs subAgents +- `maxIterations` caps loop count +- Exits on `event.actions.escalate === true` + +## Plugin System + +### BasePlugin Lifecycle Hooks (14 hooks!) + +- `onUserMessageCallback` - Preprocess user messages +- `beforeRunCallback` - Before agent run (can short-circuit) +- `onEventCallback` - Per-event (can modify events) +- `afterRunCallback` - Final cleanup +- `beforeAgentCallback / afterAgentCallback` - Agent lifecycle +- `beforeModelCallback / afterModelCallback` - LLM lifecycle +- `onModelErrorCallback` - Model error handling +- `beforeToolCallback / afterToolCallback` - Tool lifecycle +- `onToolErrorCallback` - Tool error handling + +### Built-in Plugins + +- **LoggingPlugin** - Debug logging +- **SecurityPlugin** - Policy enforcement + tool confirmation +- **PluginManager** - Plugin orchestration + +## Runner + +### Runner Config + +```typescript +interface RunnerConfig { + appName: string; + agent: BaseAgent; // Root agent + plugins?: BasePlugin[]; + artifactService?: BaseArtifactService; + sessionService: BaseSessionService; // Required + memoryService?: BaseMemoryService; + credentialService?: BaseCredentialService; +} +``` + +### RunConfig (per-run options) + +```typescript +interface RunConfig { + speechConfig?: SpeechConfig; + responseModalities?: Modality[]; + maxLlmCalls?: number; // Default 500 + pauseOnToolCalls?: boolean; // Client-side tool execution + streamingMode?: StreamingMode; // NONE | SSE | BIDI + // ... audio/live configs +} +``` + +### Execution Pipeline + +1. Load or create session +2. Create InvocationContext +3. Run pluginManager.runOnUserMessageCallback() +4. Append user message to session +5. Run agent.runAsync(invocationContext) → yields events +6. For each non-partial event: append to session +7. Run pluginManager.runOnEventCallback() +8. Run pluginManager.runAfterRunCallback() + +## Model Layer + +### BaseLlm (abstract) + +- `generateContentAsync(llmRequest, stream?): AsyncGenerator` +- `connect(llmRequest): Promise` - For live/streaming + +### Implementations + +- `Gemini` - Google Gemini API +- `ApigeeLlm` - Apigee-wrapped models +- `LLMRegistry` - Static registry for model lookup + +## Service Adapters (all abstract base + implementations) + +| Service | Implementations | +| --------------------- | ------------------------------ | +| BaseSessionService | InMemory, Database (Mikro-ORM) | +| BaseArtifactService | InMemory, File, GCS | +| BaseMemoryService | InMemory | +| BaseCredentialService | InMemory | +| BaseCodeExecutor | BuiltIn | + +## Design Patterns + +1. **Symbol-based type guards** - Every class uses `Symbol.for()` + `isXxx()` +2. **Abstract base classes** - Service interfaces via abstract classes +3. **Async generators** - All agent execution yields events +4. **Context objects** - Rich context passed to callbacks/tools +5. **Delta state** - Session state + event action deltas +6. **Plugin middleware** - 14 hooks at multiple execution points +7. **Tree-based hierarchy** - Parent-child agents with root traversal +8. **Branch isolation** - Parallel agents use branch paths +9. **Callback chains** - Multiple callbacks per stage with early termination diff --git a/docs/adk-replat/NOTES-cross-sdk-comparison.md b/docs/adk-replat/NOTES-cross-sdk-comparison.md new file mode 100644 index 0000000000..ede5cde7c4 --- /dev/null +++ b/docs/adk-replat/NOTES-cross-sdk-comparison.md @@ -0,0 +1,587 @@ +# Cross-SDK Comparison: Events, Agents, and Interface Superset + +## 1. AgentEvents: Our Outline vs Michael's + +Our outline and Michael's `Gemini CLI Agents.txt` are **nearly identical** in +event taxonomy. The only difference is we added a `stream_end` event type: + +| # | Michael's Events | Our Outline | Delta | +| --- | ---------------------- | --------------------- | ------------------------------------------------------------------------------- | +| 1 | `initialize` | `InitializeEvent` | Same | +| 2 | `session_update` | `SessionUpdateEvent` | Same | +| 3 | `message` | `MessageEvent` | Same — streaming handled by AsyncGenerator | +| 4 | `tool_request` | `ToolRequestEvent` | Same | +| 5 | `tool_update` | `ToolUpdateEvent` | Same | +| 6 | `tool_response` | `ToolResponseEvent` | Same | +| 7 | `elicitation_request` | `ElicitationRequest` | Same | +| 8 | `elicitation_response` | `ElicitationResponse` | Same | +| 9 | `usage` | `UsageEvent` | Same | +| 10 | `error` | `ErrorEvent` | Same | +| 11 | `custom` | `CustomEvent` | Same | +| 12 | — | **StreamEnd** | **Added**: completed, failed, aborted, max_turns, max_budget, max_time, refusal | + +### Minor structural differences: + +| Aspect | Michael | Our Outline | +| ---------------------- | --------------------------------------------------- | ------------------------------------------------------------------------- | +| **Base type** | `AgentEventCommon` with `type: string` (fully open) | `AgentEventBase` with `type: AgentEventType` (`'known' \| (string & {})`) | +| **Agent ID** | — | `agentId` on event base (which agent emitted this event) | +| **Event map** | Generic `interface AgentEvents` + mapped type | Same — adopted Michael's pattern for declaration merging extensibility | +| **ContentPart.\_meta** | Required (`_meta: Record`) | Optional (`_meta?: Record`) | +| **ErrorData.status** | Google RPC codes (`'RESOURCE_EXHAUSTED' \| '...'`) | Open string (per our generic philosophy) | +| **Message.role** | `'user' \| 'agent' \| 'developer'` | Same | +| **Stream end** | Only `initialize` | `stream_end` with `reason` field + open `data` bag | +| **Handoff** | Not covered | Tool call (`transfer_to_agent`) — no dedicated event | +| **Pausing** | Implicit (elicitation/tool events) | Same — no explicit pause/resume events | + +### Design decisions adopted from Michael + +1. **`interface AgentEvents` + mapped type** — Michael's pattern enables + declaration merging, letting any module add new event types without modifying + the base definition. Strictly better than an explicit union type. +2. **`_meta` on ContentPart** — More extensible. We adopted it (as optional). +3. **Implicit pausing** — No separate pause/resume events. When the agent emits + an `elicitation_request` or `tool_request`, the stream naturally pauses. The + host calls `stream()` to resume. + +--- + +## 2. Claude Agent SDK — Key Interfaces + +Source: `@anthropic-ai/claude-agent-sdk` + +### Agent Execution Model + +```typescript +// Entry point — not an interface, a function +function query({ + prompt: string | AsyncIterable, + options?: Options +}): Query // extends AsyncGenerator +``` + +### Message Types (Event Stream) + +```typescript +type SDKMessage = + | SystemMessage // subtype: "init" | "compact_boundary" + | AssistantMessage // Claude's response with tool calls + | UserMessage // Tool results fed back + | StreamEvent // Raw API stream events (opt-in) + | ResultMessage // Final: success | error_max_turns | error_max_budget_usd | error_during_execution + | CompactBoundaryMessage; // Context compaction marker +``` + +### Tool Approval (HITL) + +```typescript +canUseTool: async (toolName: string, input: Record) => + Promise< + | { behavior: 'allow'; updatedInput: Record } + | { behavior: 'deny'; message: string } + >; +``` + +### Subagent Definition + +```typescript +interface AgentDefinition { + description: string; // When to invoke + prompt: string; // System prompt + tools?: string[]; // Available tools (defaults to all) + model?: 'sonnet' | 'opus' | 'haiku' | 'inherit'; +} +``` + +### Session Management + +```typescript +interface Options { + continue?: boolean; // Resume most recent session + resume?: string; // Resume by session ID + forkSession?: boolean; // Branch from resume point + persistSession?: boolean; // Default: true + maxTurns?: number; + maxBudgetUsd?: number; // Spend limit + permissionMode?: 'default' | 'acceptEdits' | 'plan' | 'dontAsk' | 'bypassPermissions'; + structuredOutput?: { type: "json_schema", ... }; +} +``` + +### Result (Termination) + +```typescript +interface SDKResultMessage { + type: 'result'; + subtype: + | 'success' + | 'error_max_turns' + | 'error_max_budget_usd' + | 'error_during_execution' + | 'error_max_structured_output_retries'; + result?: string; + total_cost_usd: number; + usage: { input_tokens: number; output_tokens: number }; + num_turns: number; + session_id: string; + stop_reason: string | null; // "end_turn", "max_tokens", "refusal" +} +``` + +### V2 Preview (Simpler API) + +```typescript +await using session = unstable_v2_createSession({ model: "..." }); +await session.send("Hello!"); +for await (const msg of session.stream()) { ... } +await session.send("Follow-up"); +for await (const msg of session.stream()) { ... } +``` + +--- + +## 3. OpenAI Codex SDK / Responses API — Key Interfaces + +### Codex SDK (TypeScript) + +```typescript +// Client +const codex = new Codex({ env?, config? }); +const thread = codex.startThread({ workingDirectory?, skipGitRepoCheck? }); +const thread = codex.resumeThread(threadId); + +// Execution +const turn = await thread.run(prompt: string | InputEntry[], options?); +const { events } = await thread.runStreamed(prompt); + +// Streaming +for await (const event of events) { + switch (event.type) { + case "item.completed": // event.item + case "turn.completed": // event.usage + } +} +``` + +### Responses API Streaming Events (53 types) + +Organized hierarchically: + +**Response Lifecycle (7):** + +- `response.queued`, `response.created`, `response.in_progress` +- `response.completed`, `response.incomplete`, `response.failed` +- `error` + +**Content Streaming (8):** + +- `response.output_item.added`, `response.output_item.done` +- `response.content_part.added`, `response.content_part.done` +- `response.output_text.delta`, `response.output_text.done` +- `response.refusal.delta`, `response.refusal.done` + +**Reasoning (6):** + +- `response.reasoning_text.delta`, `response.reasoning_text.done` +- `response.reasoning_summary_part.added`, + `response.reasoning_summary_part.done` +- `response.reasoning_summary_text.delta`, + `response.reasoning_summary_text.done` + +**Function Calls (2):** + +- `response.function_call_arguments.delta`, + `response.function_call_arguments.done` + +**MCP (8):** + +- `response.mcp_call_arguments.delta`, `response.mcp_call_arguments.done` +- `response.mcp_call.in_progress`, `response.mcp_call.completed`, + `response.mcp_call.failed` +- `response.mcp_list_tools.in_progress`, `response.mcp_list_tools.completed`, + `response.mcp_list_tools.failed` + +**Built-in Tools (15):** + +- File search: `in_progress`, `searching`, `completed` +- Web search: `in_progress`, `searching`, `completed` +- Code interpreter: `in_progress`, `interpreting`, `code.delta`, `code.done`, + `completed` +- Image gen: `in_progress`, `generating`, `partial_image`, `completed` + +**Audio (4):** + +- `response.audio.delta`, `response.audio.done` +- `response.audio.transcript.delta`, `response.audio.transcript.done` + +**Annotations (1):** + +- `response.output_text.annotation.added` + +### OpenAI Agents SDK (higher-level) + +```python +# Python-first, but patterns apply +class RunItemStreamEvent: + name: Literal[ + "message_output_created", + "handoff_requested", + "handoff_occurred", + "tool_called", + "tool_output", + "tool_search_called", + "tool_search_output_created", + "reasoning_item_created", + "mcp_approval_requested", + "mcp_approval_response", + "mcp_list_tools", + ] + +class AgentUpdatedStreamEvent: + # Fires when current agent changes (handoff) + new_agent: Agent +``` + +--- + +## 4. Superset Analysis — What Changes Our Interfaces? + +### Concepts Present in ALL Systems + +| Concept | gemini-cli | ADK-TS | Claude SDK | Codex/OpenAI | Our Interfaces | +| --------------------- | ---------- | ------ | ------------- | -------------- | ----------------------- | +| Text streaming | ✅ | ✅ | ✅ | ✅ | ✅ MessageEvent | +| Tool request/response | ✅ | ✅ | ✅ | ✅ | ✅ ToolRequest/Response | +| Thinking/reasoning | ✅ | ✅ | ✅ (thinking) | ✅ (reasoning) | ✅ ContentPart.thought | +| Error events | ✅ | ✅ | ✅ | ✅ | ✅ ErrorEvent | +| Token usage | ✅ | ✅ | ✅ | ✅ | ✅ UsageEvent | +| Tool progress | ✅ | ✅ | — | ✅ | ✅ ToolUpdateEvent | +| Session resume | ✅ | ✅ | ✅ | ✅ | ✅ sessionRef | +| Subagents | ✅ | ✅ | ✅ | — | ✅ threadId | +| Abort/cancel | ✅ | ✅ | ✅ | ✅ | ✅ abort() | +| Metadata escape hatch | — | ✅ | — | — | ✅ \_meta | + +### NEW Concepts From Claude/Codex That We Should Incorporate + +#### 4.1 Structured Stream End Reasons (HIGH PRIORITY) + +**What:** Claude SDK has typed termination: +`success | error_max_turns | error_max_budget_usd | error_during_execution`. +OpenAI has `completed | incomplete | failed`. + +**Why it matters:** We need a `stream_end` event that captures why the stream +ended — the one signal not covered by other event types. + +**Final design — `stream_end` with `reason` + open `data` bag:** + +```typescript +type StreamEndReason = + | 'completed' + | 'failed' + | 'aborted' + | 'max_turns' + | 'max_budget' + | 'max_time' + | 'refusal' + | (string & {}); + +interface StreamEnd { + reason: StreamEndReason; + data?: Record; // { result?, cost?, usage?, numTurns?, error?, ... } +} +``` + +**Design rationale:** + +- Start is covered by `initialize`. Pausing is implicit (elicitation/tool + request events). Handoff is a tool call (`transfer_to_agent`). +- End-of-stream details go in `data` as an open bag, not fixed fields. + +#### 4.2 Budget Constraints (MEDIUM PRIORITY) + +**What:** Claude SDK has `maxBudgetUsd`. Neither gemini-cli nor ADK has this +today. + +**Why it matters:** Cost control is critical for production deployments. + +**Proposed change to AgentConstraints:** + +```typescript +interface AgentConstraints { + maxTurns?: number; + maxTimeMinutes?: number; + maxLlmCalls?: number; + maxBudgetUsd?: number; // NEW: from Claude SDK +} +``` + +#### 4.3 Session Forking (MEDIUM PRIORITY) + +**What:** Claude SDK supports `forkSession: boolean` — branch from a resume +point to explore alternatives. + +**Why it matters:** Enables "what if" exploration without destroying history. +Useful for plan mode. + +**Proposed change to ExecutionRequest:** + +```typescript +interface ExecutionRequest { + // ... existing fields ... + sessionRef?: string | SessionSnapshot; + forkSession?: boolean; // NEW: branch from sessionRef instead of continuing +} +``` + +#### 4.4 Permission Modes on Execution (MEDIUM PRIORITY) + +**What:** Claude has 5 permission modes: +`default | acceptEdits | plan | dontAsk | bypassPermissions`. gemini-cli has 4 +approval modes: `default | autoEdit | yolo | plan`. + +**Why it matters:** Both systems have this concept. It should be in +ExecutionOptions, not hard-coded. + +**Proposed change to ExecutionOptions:** + +```typescript +interface ExecutionOptions { + // ... existing fields ... + permissionMode?: string; // Open string. Conventions: 'default' | 'auto_edit' | 'autonomous' | 'plan' | string +} +``` + +#### 4.5 Agent Handoff (MEDIUM PRIORITY) + +**What:** OpenAI Agents SDK has explicit `handoff_requested` / +`handoff_occurred` events plus `AgentUpdatedStreamEvent`. ADK has +`transfer_to_agent` tool + `eventActions.transferToAgent`. Claude SDK has +subagent invocation via Agent tool. + +**Why it matters:** When agent A delegates to agent B, the host/UI needs to +know. + +**Design decision: Handoff is a tool call, not a separate event type.** + +The agent calls `transfer_to_agent` as a tool (ToolRequest event). The host +intercepts this tool call (since host controls tool execution), looks up the +target agent, creates a new executor via the factory, and mediates the handoff. +The originating agent's stream ends with `stream_end` reason `'completed'`. + +```typescript +// 1. Agent emits tool request: +{ type: 'tool_request', name: 'transfer_to_agent', args: { target: 'coder', reason: '...' } } + +// 2. Host mediates handoff, originating agent completes: +{ type: 'stream_end', reason: 'completed', agentId: 'planner', data: { handoffTarget: 'coder' } } +``` + +This avoids duplicating routing logic between stream_end events and tool calls. +Matches ADK's `transfer_to_agent` tool pattern. + +#### 4.6 Refusal as Distinct Signal (LOW PRIORITY) + +**What:** OpenAI has explicit `response.refusal.delta/done` events. Claude has +`stop_reason: "refusal"`. + +**Why it matters:** Model refusals are operationally important (safety, policy). + +**Proposed:** No new event type. Handle via `MessageEvent` with a `refusal` +content part type, or via `ErrorEvent` with specific error code. ContentPart can +be extended: + +```typescript +| { type: 'refusal'; text: string } +``` + +#### 4.7 Content Annotations (LOW PRIORITY) + +**What:** OpenAI has `response.output_text.annotation.added` for citations, file +paths. + +**Why it matters:** Citations and source attribution are increasingly important. + +**Proposed:** Michael's `reference` ContentPart already covers this. No change +needed — `reference` with `uri` and `text` handles citations. + +#### 4.8 Context Compaction Events (LOW PRIORITY) + +**What:** Claude SDK has `CompactBoundaryMessage` marking when context was +compressed. + +**Why it matters:** For long sessions, knowing when context was compressed helps +with debugging and UI. + +**Proposed:** `CustomEvent` with `kind: 'compact_boundary'`. No new event type +needed. + +#### 4.9 Structured Output Schema (ALREADY COVERED) + +**What:** Both Claude (`structuredOutput`) and OpenAI support JSON Schema output +constraints. + +**Status:** Already covered by `AgentDescriptor.outputSchema: JsonSchema`. No +change needed. + +### Concepts We DON'T Need to Adopt + +| Concept | Why Skip | +| ------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | +| OpenAI's 53 granular streaming events | Too coupled to Responses API internals. Our `ToolUpdateEvent` + `MessageEvent` via AsyncGenerator abstracts over this. | +| OpenAI's per-tool-type events (file_search, web_search, code_interpreter) | Tool-specific progress belongs in `ToolUpdateEvent.data`, not in the event taxonomy. | +| Audio/Image streaming events | Handle via `ToolUpdateEvent` with media ContentParts. When needed, add as ContentPart types, not event types. | +| Claude's raw `StreamEvent` wrapper | Implementation detail of the Claude API client. Our adapters consume these internally. | +| MCP-specific events (mcp_call, mcp_list_tools) | MCP tools are just tools. Use generic `ToolRequestEvent/ToolResponseEvent`. MCP approval is an `ElicitationRequest`. | + +--- + +## 5. Updated Event Type Comparison (Full Superset) + +| # | Event Type | Michael | Our Outline | Claude SDK | OpenAI | Verdict | +| --- | -------------------- | ------- | ----------- | ----------------------------- | --------------------------------- | ---------------------------------------------- | +| 1 | Initialize | ✅ | ✅ | SystemMessage(init) | — | **Keep** | +| 2 | Session Update | ✅ | ✅ | — | — | **Keep** | +| 3 | Message | ✅ | ✅ | AssistantMessage | output_text.delta/done | **Keep** | +| 4 | Tool Request | ✅ | ✅ | AssistantMessage.tool_use | function_call_arguments | **Keep** | +| 5 | Tool Update | ✅ | ✅ | — | per-tool progress events | **Keep** | +| 6 | Tool Response | ✅ | ✅ | UserMessage | — | **Keep** | +| 7 | Elicitation Request | ✅ | ✅ | canUseTool callback | mcp_approval_requested | **Keep** | +| 8 | Elicitation Response | ✅ | ✅ | canUseTool return | mcp_approval_response | **Keep** | +| 9 | Usage | ✅ | ✅ | ResultMessage.usage | response.completed | **Keep** | +| 10 | Error | ✅ | ✅ | ResultMessage(error\_\*) | response.failed | **Keep** | +| 11 | Custom | ✅ | ✅ | — | — | **Keep** | +| 12 | StreamEnd | — | ✅ | ResultMessage + SystemMessage | response.created/completed/failed | **Keep — `stream_end` with `reason` + `data`** | + +**Result: Our 12 event types are the right abstraction level.** Claude and +OpenAI validate every category. The granularity differences (OpenAI's 53 vs +our 12) are implementation details that adapters handle internally. `stream_end` +uses a single `reason` field with an open `data` bag. Handoff is a tool call. +Pausing is implicit. + +--- + +## 6. Updated ContentPart Types (Superset) + +```typescript +type ContentPart = ( + | { type: 'text'; text: string } + | { type: 'thought'; thought: string; thoughtSignature?: string } + | { type: 'media'; data?: string; uri?: string; mimeType?: string } + | { + type: 'reference'; + text: string; + data?: string; + uri?: string; + mimeType?: string; + } + | { type: 'refusal'; text: string } // NEW: from OpenAI +) & + // Future: type: string for unknown types from new SDKs + { _meta?: Record }; +``` + +Adding `refusal` as a ContentPart type (rather than a new event) keeps the event +taxonomy stable while supporting model refusals from both Claude and OpenAI. + +--- + +## 7. Key Architectural Patterns Across SDKs + +### Pattern: Execution Entry Points + +| SDK | Entry Point | Multi-turn Pattern | +| ----------- | ------------------------------------------------------------------------- | ----------------------------------------------- | +| Michael | `agent.send(trajectory, data)` / `session.send()` + `session.update()` | Same method / three-method session | +| Our Outline | `session.stream(data)` + `session.update(config)` + `session.steer(data)` | Four-method session (stream/update/steer/abort) | +| Claude SDK | `query({ prompt, options })` | New `query()` call with `resume: sessionId` | +| Claude V2 | `session.send()` + `session.stream()` | Separate send/stream | +| Codex SDK | `thread.run(prompt)` / `thread.runStreamed(prompt)` | Same thread object | + +**Observation:** Claude V2 and Codex both use a stateful session/thread object +with send+stream. Michael uses a single `send()` method. Our `stream()` method +is the unified version — the first call starts, subsequent calls continue (like +ADK's `runAsync()`). + +### Pattern: Tool Approval + +| SDK | Pattern | Sync/Async | +| ----------- | -------------------------------------------------------- | ------------------------ | +| gemini-cli | PolicyEngine + ConfirmationBus | Async (message bus) | +| ADK-TS | SecurityPlugin.policyCheck() | Async (plugin callback) | +| Claude SDK | `canUseTool()` callback | Async (callback) | +| OpenAI | `mcp_approval_requested` event | Event-based | +| Our Outline | `ElicitationRequest` event + `PolicyEvaluator` interface | Both (event + interface) | + +**Observation:** Our approach covers both patterns — the `ElicitationRequest` +event for event-based approval (like OpenAI), and the `PolicyEvaluator` +interface for synchronous policy checks (like gemini-cli/ADK/Claude). This is +the right superset. + +### Pattern: Subagent Definition + +| SDK | Pattern | Key Fields | +| ------------- | -------------------------------- | ------------------------------------------------------------------------------------------ | +| gemini-cli | `AgentDefinition` (local/remote) | name, description, kind, tools, model | +| ADK-TS | `BaseAgentConfig` | name, description, subAgents, tools | +| Claude SDK | `AgentDefinition` | description, prompt, tools, model | +| OpenAI Agents | `Agent` class | name, instructions, tools, handoffs, model | +| Our Outline | `AgentDescriptor` | name, description, executor, inputSchema, capabilities, ownTools, requiredTools, subAgents | + +**Observation:** Our `AgentDescriptor` is the most complete. Claude's `prompt` +field and OpenAI's `instructions` are executor-level concerns (system prompt), +not descriptor-level. The descriptor declares identity; the executor uses the +prompt. This separation is correct. + +One gap: **handoffs**. OpenAI Agents has an explicit `handoffs` field listing +which agents can be delegated to. Our `subAgents` field serves the same purpose +but the naming implies hierarchy rather than peer delegation. Consider whether +`subAgents` should be renamed to `delegateAgents` or kept as-is with +documentation clarifying it covers both hierarchical and peer delegation. + +--- + +## 8. Concrete Changes to outline.md + +Based on this analysis, the following changes should be made: + +### Applied (validated by multiple SDKs): + +1. ✅ **`type: AgentEventType`** with known values + `(string & {})` + (autocomplete + extensibility) +2. ✅ **`interface AgentEvents` + mapped type** (adopted from Michael for + declaration merging) +3. ✅ **`agentId` on event base** (which agent emitted this event) +4. ✅ **`_meta` on ContentPart** (aligned with Michael) +5. ✅ **`stream_end` event** — signals why the stream ended, with `reason` + field + open `data` bag +6. ✅ **Handoff as tool call** — `transfer_to_agent` tool, not a separate event +7. ✅ **`maxBudgetUsd` in AgentConstraints** (Claude SDK, increasingly standard) +8. ✅ **`refusal` ContentPart type** (both Claude and OpenAI surface refusals) +9. ✅ **`forkSession` in ExecutionRequest** (Claude SDK, valuable for + exploration) +10. ✅ **`permissionMode` in ExecutionOptions** (both gemini-cli and Claude SDK) +11. ✅ **`cost` field on Usage** (Claude SDK tracks total_cost_usd) + +### Correctly abstracted (no change needed): + +- Event taxonomy (12 types) — validated as right abstraction level +- `AgentDescriptor` shape — most complete across all SDKs +- `AgentSession.stream/update/steer/abort` — covers all SDK patterns +- ToolUpdate — correctly abstracts over OpenAI's 15+ tool-specific progress + events +- `ElicitationRequest/Response` — covers both callback and event patterns +- `ContentPart` types — text/thought/media/reference/refusal + +--- + +## Sources + +- [Claude Agent SDK TypeScript Reference](https://platform.claude.com/docs/en/agent-sdk/typescript) +- [Claude Agent SDK Streaming](https://platform.claude.com/docs/en/agent-sdk/streaming-output) +- [Claude Agent SDK Sessions](https://platform.claude.com/docs/en/agent-sdk/sessions) +- [Claude Agent SDK Subagents](https://platform.claude.com/docs/en/agent-sdk/subagents) +- [OpenAI Codex SDK TypeScript](https://github.com/openai/codex/tree/main/sdk/typescript) +- [OpenAI Codex SDK Docs](https://developers.openai.com/codex/sdk/) +- [OpenAI Responses API Streaming Events](https://developers.openai.com/api/reference/resources/responses/streaming-events/) +- [OpenAI Agents SDK Streaming](https://openai.github.io/openai-agents-python/streaming/) +- [Responses API Streaming Guide (Community)](https://community.openai.com/t/responses-api-streaming-the-simple-guide-to-events/1363122) diff --git a/docs/adk-replat/NOTES-gemini-cli-architecture.md b/docs/adk-replat/NOTES-gemini-cli-architecture.md new file mode 100644 index 0000000000..7a43591ab7 --- /dev/null +++ b/docs/adk-replat/NOTES-gemini-cli-architecture.md @@ -0,0 +1,259 @@ +# Gemini CLI Architecture Notes + +## Project Structure + +**Monorepo packages:** + +- `packages/core/` - Main execution engine (the big one) +- `packages/cli/` - CLI frontend +- `packages/sdk/` - SDK for extensions +- `packages/a2a-server/` - Agent-to-agent server +- `packages/devtools/` - Dev utilities +- `packages/vscode-ide-companion/` - VS Code extension + +## Core Execution Loop + +### GeminiClient (`core/src/core/client.ts` ~38KB) + +- **Primary orchestrator** for user interactions +- Manages session lifecycle, message routing, model selection +- Coordinates hooks, context management, error recovery +- Enforces `MAX_TURNS = 100` per session +- Tracks `currentSequenceModel` for multi-turn stickiness +- Handles history compression when context grows + +### GeminiChat (`core/src/core/geminiChat.ts` ~34KB) + +- Bidirectional LLM communication +- Maintains `history[]` alternating user/model turns +- Retry logic: max 2 attempts, 500ms delay for invalid responses +- Fires `BeforeModel` and `AfterModel` hooks +- Integrates ChatRecordingService for persistence + +### Scheduler (`core/src/scheduler/scheduler.ts` ~23KB) + +- **Three-phase event-driven**: Ingestion → Processing → Completion +- Tool call state machine: + `Validating → AwaitingApproval → Scheduled → Executing → Terminal` +- Terminal states: `Success`, `Error`, `Cancelled` +- Parallel execution for read-only and agent-type tools +- Yields to event loop for user approval +- Publishes state changes via MessageBus + +### CoreToolScheduler (`core/src/core/coreToolScheduler.ts` ~38KB) + +- Sequential, queue-based tool processing +- Validates policy via PolicyEngine +- Confirmation handling via ToolModificationHandler (editor integration) +- Uses MessageBus for async confirmation responses + +## Tool System + +### DeclarativeTool Pattern + +- **Separation of concerns**: build() → validate → createInvocation() → + execute() +- `ToolBuilder` defines metadata (name, displayName, description, kind) + schema + via `getSchema()` +- `ToolInvocation` has: `getDescription()`, `toolLocations()`, + `shouldConfirmExecute()`, `execute()` +- `ToolResult` contains: `llmContent` (for LLM), `returnDisplay` (for UI), error + details, tail calls + +### BaseToolInvocation + +- Abstract base with MessageBus integration for policy/confirmation +- Three decision paths: ALLOW, DENY, ASK_USER via `getMessageBusDecision()` + +### ToolRegistry (`core/src/tools/tool-registry.ts`) + +- Registers tools via `registerTool()` +- MCP tools with fully qualified names: `mcp_serverName_toolName` +- Priority sorting: built-in → discovered → MCP (by server name) +- Filters by active status based on configuration + +### Confirmation System + +- `ToolCallConfirmationDetails` union: edit, execute, MCP, info, ask_user, + exit_plan_mode +- `ToolConfirmationOutcome` enum: ProceedOnce, ProceedAlways, etc. +- Async confirmation via MessageBus pub/sub + +## Hooks System + +### Hook Types (11 hook points) + +| Hook | Trigger | Key Capability | +| --------------------- | ----------------------- | --------------------------------- | +| `BeforeTool` | Before tool execution | Modify tool_input | +| `AfterTool` | After tool completion | Context injection, tail calls | +| `BeforeAgent` | Before agent prompt | Additional context | +| `AfterAgent` | After agent response | Clear context flag | +| `BeforeModel` | Before LLM request | Modify request or inject response | +| `AfterModel` | After LLM response | Modify response | +| `BeforeToolSelection` | Before tool selection | Modify toolConfig | +| `Notification` | When notifications fire | Suppress/modify message | +| `SessionStart` | Session begins | Additional context | +| `SessionEnd` | Session terminates | Cleanup | +| `PreCompress` | Before compression | Suppress/modify | + +### Hook Output Fields (common to all hooks) + +- `continue` - Whether execution proceeds +- `stopReason` - Reason to halt +- `suppressOutput` - Hide from user +- `systemMessage` - Add to system context +- `decision` - ask/block/deny/approve/allow + +### Hook System Components + +- `HookSystem` - Main coordinator +- `HookRegistry` - Stores/manages configurations +- `HookRunner` - Executes registered hooks +- `HookAggregator` - Combines multiple hook results +- `HookPlanner` - Determines execution order +- `HookEventHandler` - Orchestrates event firing +- `HookTranslator` - Converts between formats + +## Policy Engine + +### Rule Structure + +``` +PolicyRule { + toolName: string; // wildcards supported + decision: PolicyDecision; // ALLOW | DENY | ASK_USER + priority: number; + argsPattern?: RegExp; // conditional on args + mcpName?: string; + source: string; +} +``` + +### Tier Hierarchy (lowest → highest priority) + +1. Default (1) - Core built-in policies +2. Extension (2) - Extension contributions +3. Workspace (3) - Project-scoped (.gemini/) +4. User (4) - User-provided (~/.gemini/) +5. Admin (5) - System-level policies + +### Dynamic Rule Priorities (within User Tier) + +- 4.9 - MCP_EXCLUDED (persistent server blocks) +- 4.4 - EXCLUDE_TOOLS_FLAG (CLI exclusions) +- 4.3 - ALLOWED_TOOLS_FLAG (CLI allows) +- 4.2 - TRUSTED_MCP_SERVER +- 4.1 - ALLOWED_MCP_SERVER +- 3.95 - ALWAYS_ALLOW (interactive selections) + +### Security Constraint + +- Extensions CANNOT contribute ALLOW rules or YOLO mode + +## Agent System + +### Agent Registry (`core/src/agents/registry.ts`) + +Discovery sources: + +1. Built-in: CodebaseInvestigator, CliHelp, Generalist, Browser +2. User-level: `~/.gemini/agents/` +3. Project-level: `.gemini/agents/` (requires folder trust) +4. Extension-based: From active extensions + +### LocalAgentExecutor (`core/src/agents/local-executor.ts`) + +- Prompt processing: input augmentation → template expansion → system prompt + construction +- Uses GeminiChat for accumulating conversation +- ChatCompressionService for history management +- Turn loop: invoke model → extract function calls → check auth → append results +- Termination: complete_task tool, max turns, timeout + +### SubagentTool (`core/src/agents/subagent-tool.ts`) + +- Extends BaseDeclarativeTool - agents invoked like standard tools +- Read-only status checking, user hint propagation +- Execution: validate → optional confirmation → parameter enrichment → + SubagentToolWrapper + +### Remote Agents + +- A2A client manager for agent-to-agent protocol +- Remote invocation for external agents +- Agent acknowledgement system (security for project agents) + +## Model System + +### ModelConfigService + +- **Hierarchical alias system**: children override parents +- Resolution: alias chain → level assignment → apply overrides +- Deep merging with array override capability +- Fallback to `chat-base` alias for unknown models + +### ModelRouterService + +Sequential strategy pattern: + +1. Fallback & Override +2. Approval Mode Strategy +3. Gemma Classifier (if enabled) +4. Generic Classifier +5. Numerical Classifier +6. Default Strategy + +### ModelAvailabilityService + +Health states: + +- **Terminal** - permanently unavailable +- **Sticky Retry** - failed once, can retry once per turn +- **Healthy** - no issues + +## Services + +| Service | Purpose | +| --------------------------- | --------------------------------------- | +| ChatRecordingService | Session persistence (JSON files) | +| ChatCompressionService | History summarization for token budgets | +| ModelConfigService | Hierarchical model config with aliases | +| ModelAvailabilityService | Model health tracking | +| ModelRouterService | Model selection via strategies | +| FolderTrustDiscoveryService | Workspace security scanning | +| KeychainService | Credential storage | +| LoopDetectionService | Detect repetitive agent loops | + +## UI + Core Separation + +### IDE Client (`core/src/ide/ide-client.ts`) + +- Singleton managing CLI ↔ IDE communication via MCP +- **Outbound** (CLI → IDE): `openDiff`, `closeDiff` +- **Inbound** (IDE → CLI): `ide/contextUpdate`, `ide/diffAccepted`, + `ide/diffRejected` + +### Event Contract + +```typescript +interface IdeContextNotification { + method: 'ide/contextUpdate'; + params: { workspaceState: { openFiles: string[]; isTrusted: boolean } }; +} +``` + +### Confirmation Bus + +- `TOOL_CONFIRMATION_REQUEST` / `TOOL_CONFIRMATION_RESPONSE` +- Detail types: edit, execute, MCP, info, ask_user, exit_plan_mode +- Async pub/sub via MessageBus + +## Configuration (`core/src/config/config.ts` ~95KB!) + +- Tool config: core tools, allowed/excluded, MCP servers +- File filtering: git ignore, fuzzy search, max counts, timeouts +- Approval modes: policy engine config +- Experiments: feature flags (GEMINI_3_1_PRO_LAUNCHED, ENABLE_ADMIN_CONTROLS, + etc.) +- FolderTrust: discovery scans for commands, skills, settings, MCP, hooks diff --git a/docs/adk-replat/NOTES-key-systems-deep-dive.md b/docs/adk-replat/NOTES-key-systems-deep-dive.md new file mode 100644 index 0000000000..54217a41cd --- /dev/null +++ b/docs/adk-replat/NOTES-key-systems-deep-dive.md @@ -0,0 +1,296 @@ +# Deep Dive: Key Gemini-CLI Systems + +## Hooks System (Complete) + +### 11 Hook Points + +| Hook | Input | Key Output Capabilities | +| ------------------- | -------------------------------------- | --------------------------------------------- | +| BeforeTool | toolName, toolInput, mcpContext | Modify tool_input, block/allow, systemMessage | +| AfterTool | toolName, toolInput, toolResponse | additionalContext, tailToolCallRequest | +| BeforeAgent | prompt | Additional context | +| AfterAgent | prompt, response, stopHookActive | Clear context | +| BeforeModel | llmRequest (GenerateContentParameters) | Modify llm_request OR inject llm_response | +| AfterModel | llmRequest, llmResponse | Modify llm_response | +| BeforeToolSelection | llmRequest | Modify toolConfig (function list, mode) | +| Notification | type, message, details | Suppress/modify | +| SessionStart | source (Startup/Resume/Clear) | Additional context | +| SessionEnd | reason (Exit/Clear/Logout/etc) | Cleanup | +| PreCompress | trigger (Manual/Auto) | Suppress/modify | + +### Hook Configuration Types + +- **Runtime hooks** (HookType.Runtime): JS/TS functions, registered + programmatically +- **Command hooks** (HookType.Command): External shell commands with JSON I/O + +### Exit Code Semantics (Command Hooks) + +- 0 = Success (allowed with system message) +- 1 = Non-blocking error (warning, continues) +- 2+ = Blocking failure (denied, stderr as reason) + +### Hook Decision Values + +`'ask' | 'block' | 'deny' | 'approve' | 'allow' | undefined` + +### Execution Strategies + +- **Parallel** (default): Promise.all(), independent +- **Sequential** (opt-in per hook): Chained, output→input cascading + +### Aggregation + +- Blocking decisions: OR logic (any block → all block) +- Field replacement: later overrides earlier +- Tool selection: union of allowed functions, mode precedence NONE > ANY > AUTO + +### Trust Model + +- Project hooks require folder trust verification +- TrustedHooksManager at `~/.gemini/trusted-hooks.json` +- Environment sanitized for command hooks (sensitive vars removed) +- `GEMINI_PROJECT_DIR` injected + +### Key Insight for Abstraction + +Hooks fire inside gemini-cli's execution loop. When ADK controls the model: + +- BeforeModel/AfterModel still fire because AdkGeminiModel wraps GeminiChat +- BeforeTool/AfterTool still fire because AdkToolAdapter wraps DeclarativeTool +- This is dewitt's solution: adapters preserve hook injection points + +**For OpenRouter or opaque agents, hooks CANNOT fire unless the agent delegates +model/tool calls back to gemini-cli.** + +--- + +## Policy Engine (Complete) + +### TOML Rule Format + +```toml +[[rules]] +decision = "allow" | "deny" | "ask_user" +priority = 0-999 +toolName = "tool_name" # wildcards: *, mcp_*, mcp_server_* +mcpName = "server_name" # MCP server filter +argsPattern = "regex" # matches JSON-stringified args +commandPrefix = "cmd" # shell command prefix match +commandRegex = "regex" # shell command regex (mutually exclusive with prefix) +modes = ["default", "autoEdit", "yolo", "plan"] +annotations = ["read-only", "experimental"] +allowRedirection = true # for shell commands +allowMessage = "..." # user-facing message on allow +denyMessage = "..." # user-facing message on deny +``` + +### 5-Tier Priority System + +- Tier 5 (Admin): 5.000-5.999 +- Tier 4 (User): 4.000-4.999 +- Tier 3 (Workspace): 3.000-3.999 +- Tier 2 (Extension): 2.000-2.999 +- Tier 1 (Default): 1.000-1.999 + +Formula: `tier + (priority / 1000)` + +### 4 Approval Modes + +1. **default** — ASK_USER decisions prompt user +2. **autoEdit** — File writes auto-approved with safety checking (conseca) +3. **yolo** — All auto-approved except explicit ask_user rules +4. **plan** — Read-only, blocks modifications, allows planning docs + +### Shell Command Safety + +- Parses multi-command sequences (&&, ;, ||) +- Detects injection: $(...), `...`, <(...), >(...), --flag=$(...) +- Each subcommand evaluated independently +- DENY overrides everything; ASK_USER escalates; ALLOW only if all pass +- Redirections (>) downgrade ALLOW → ASK_USER unless allowRedirection=true + +### Security Constraints + +- Extensions cannot contribute ALLOW rules or YOLO mode +- Regex patterns validated for ReDoS +- Tool name typos detected via Levenshtein distance ≤3 +- Policy file integrity: SHA-256 hash checking + +### Key Insight for Abstraction + +Policy is evaluated at the tool execution boundary. For the interface layer: + +- If CLI controls tool execution → policy naturally applies +- If agent controls tool execution internally → policy bypassed (danger!) +- This reinforces the `pauseOnToolCalls: true` approach for ADK +- Need a `PolicyEvaluator` interface that any executor can call + +--- + +## Tool System (Complete) + +### Core Abstraction Chain + +``` +ToolBuilder (metadata + schema) + → build(params) validates → ToolInvocation (ready to execute) + → shouldConfirmExecute() → execute(signal) → ToolResult +``` + +### DeclarativeTool Pattern + +- `build(params)` — Validate and create invocation +- `buildAndExecute(params)` — One-step convenience +- `validateBuildAndExecute(params)` — Non-throwing variant + +### BaseToolInvocation + +- Message bus integration for policy decisions +- Three decision paths: ALLOW → execute, DENY → reject, ASK_USER → confirm + +### ToolResult Structure + +- `llmContent` — For LLM conversation history +- `returnDisplay` — For UI presentation +- `displayContent` — Additional display formatting +- `errorDetails` — Optional error info +- `result` — Structured data payload +- `tailCall` — Optional chaining requests + +### Confirmation System (6 types) + +1. **edit** — File modification with diff +2. **execute** — Command execution +3. **mcp** — MCP tool with allowlist mgmt +4. **info** — Information-only +5. **ask_user** — General user approval +6. **exit_plan_mode** — Plan exit notification + +### Confirmation Outcomes (7 values) + +ProceedOnce, ProceedAlways, ProceedAlwaysAndSave, ProceedAlwaysServer, +ProceedAlwaysTool, ModifyWithEditor, Cancel + +### Tool Kinds + +- **Mutator**: Edit, Delete, Move, Execute +- **Read-Only**: Read, Search, Fetch +- **Other**: Think, Agent, Communicate, Plan, SwitchMode, Other + +### MCP Tools + +- Naming: `mcp__` (64-char limit) +- Schema validation via LenientJsonSchemaValidator +- Response types: McpTextBlock, McpMediaBlock, McpResourceBlock, + McpResourceLinkBlock +- Transform to GenAI Parts format + +### Error Types (20+) + +- **Recoverable**: INVALID_TOOL_PARAMS, FILE_NOT_FOUND, + EDIT_NO_OCCURRENCE_FOUND, SHELL_TIMEOUT, MCP_TOOL_ERROR... +- **Fatal**: NO_SPACE_LEFT (only one!) + +### ModifiableTool + +- Extends DeclarativeTool with external editor support +- `getModifyContext()` → temp files → editor opens → `getUpdatedParams()` → diff + +--- + +## Execution Loop (Complete) + +### LocalAgentExecutor Flow + +1. Collect user hints, setup deadline timer +2. **Turn loop**: executeTurn() repeatedly until completion +3. Per-turn: compress chat → callModel() → processFunctionCalls() +4. On limit hit: executeFinalWarningTurn() with 60s grace period +5. Return OutputObject { result, terminate_reason } + +### AgentTerminateMode + +GOAL | TIMEOUT | MAX_TURNS | ABORTED | ERROR | ERROR_NO_COMPLETE_TASK_CALL + +### SubagentTool Architecture + +``` +Parent Agent + └─ SubagentTool (wraps AgentDefinition as DeclarativeTool) + └─ SubagentToolWrapper (routes by agent kind) + ├─ LocalSubagentInvocation → LocalAgentExecutor + ├─ RemoteAgentInvocation → A2AClientManager + └─ BrowserAgentInvocation +``` + +### Agent Types + +- `LocalAgentDefinition` — kind: 'local', has promptConfig, modelConfig, + runConfig, toolConfig +- `RemoteAgentDefinition` — kind: 'remote', has agentCardUrl, auth config + +### Key Defaults + +- DEFAULT_MAX_TURNS = 15 +- DEFAULT_MAX_TIME_MINUTES = 5 +- A2A_TIMEOUT = 1800000 (30 min for remote agents) + +--- + +## Services/Config (Complete) + +### ModelConfigService + +- **Alias chains**: Inheritance with `extends`, merged root-to-leaf +- **Overrides**: Contextual (model, scope, retry, isChatModel), sorted by + specificity +- **Runtime registration**: Dynamic aliases and overrides +- **Deep merge**: Objects merged, arrays replaced entirely + +### ModelRouterService (Strategy Chain) + +1. Fallback & Override → 2. Approval Mode → 3. Gemma Classifier → 4. Generic + Classifier → 5. Numerical Classifier → 6. Default + +### ModelAvailabilityService + +- Terminal (permanent), Sticky_retry (one retry per turn), Healthy +- `selectFirstAvailable()` iterates fallback chain +- `resetTurn()` at turn boundaries enables fresh retries + +### Config (~95KB!) + +Central dependency injection. Initializes: ModelAvailabilityService → +ModelConfigService → FolderTrustDiscoveryService → PolicyEngine → +FileDiscoveryService → GitService → ToolRegistry → MCP → GeminiClient → +HookSystem + +### CoreEventEmitter (UI Events) + +Event types: UserFeedback, ModelChanged, ConsoleLog, Output, RetryAttempt, +ConsentRequest, McpProgress, Hook, QuotaChanged + +Backlog buffering (max 10,000) with head-pointer eviction and auto-compaction. + +### Scheduler Types + +```typescript +ToolCallRequestInfo { + callId, name, args, originalRequestName, + isClientInitiated, prompt_id, checkpoint, traceId, + parentCallId, schedulerId +} +ToolCallResponseInfo { + callId, responseParts, resultDisplay, error, errorType, + outputFile, contentLength, data +} +CoreToolCallStatus: Validating → AwaitingApproval → Scheduled → Executing → Success|Error|Cancelled +``` + +### FolderTrust + +Scans: commands (.toml), skills (SKILL.md), settings.json, MCP servers, hooks +Security warnings: auto-approved tools, autonomous agents, disabled trust, +disabled sandbox Pattern: discovery → review → execution (no code runs during +scan) diff --git a/docs/adk-replat/PRIORITY-ANALYSIS.md b/docs/adk-replat/PRIORITY-ANALYSIS.md new file mode 100644 index 0000000000..1c87ff5233 --- /dev/null +++ b/docs/adk-replat/PRIORITY-ANALYSIS.md @@ -0,0 +1,349 @@ +# Interface Priority Analysis & Open Questions + +## The Big Picture + +We're defining **framework-agnostic interfaces** that allow gemini-cli to: + +1. Keep its existing execution loop working unchanged (Legacy path) +2. Swap in ADK as an alternative runtime via config flag +3. Eventually support OpenRouter or other agent backends +4. Maintain all existing CLI behavior: hooks, policies, confirmations, UI events + +## Proposed Interface Layers (Priority Order) + +--- + +### P0 (Critical Path - Must Define First) + +#### 1. AgentEvent / Event Stream Contract + +**Why first:** Everything else consumes or produces these events. The UI renders +them. The hooks intercept them. The adapters translate to/from them. + +**Key decision:** Merge Dewitt's simpler model with Coworker's richer model? + +**Recommendation:** Coworker's approach is more complete. Key additions: + +- `threadId` for sub-agent tracking (AG-UI has `parentRunId`) +- `tool_update` for progress on long-running tools +- `elicitation_request/response` as first-class (not just tool_confirmation) +- `usage` event for token tracking +- `_meta` escape hatch (matches AG-UI's extensibility philosophy) +- `initialize` event (matches AG-UI's RunStarted) + +**Open questions:** + +- Do we need AG-UI's start/content/end triple pattern for streaming? Or is + yielding partial events sufficient? +- How do ContentPart types map to existing gemini-cli Part types? +- Should events carry a `source` field? (useful for hook attribution) + +#### 2. Agent Interface + +**Why second:** This is the primary abstraction that LocalAgentExecutor, ADK +adapters, and future OpenRouter adapters all implement. + +**Key decision:** Dewitt's `runAsync/runEphemeral` vs Coworker's +`send(Trajectory|string)` + +**Recommendation:** Hybrid approach: + +- Dewitt's `runAsync/runEphemeral` split is ADK-aligned and cleaner for the + factory pattern +- BUT add Coworker's elicitation support via AgentSend union type +- The Trajectory concept is powerful but may be too opinionated for Phase 2 + +``` +Agent + name: string + description: string + runAsync(input, options) → AsyncGenerator + runEphemeral(input, options) → AsyncGenerator +``` + +**Open questions:** + +- Should Agent also support `send()` for mid-stream interactions (elicitations)? +- How does AbortSignal propagate through the adapter boundary? +- Do we need a `capabilities` field (supports elicitation? supports HITL? etc.)? + +#### 3. Tool Execution Contract + +**Why third:** Tools are the primary action mechanism. Both the policy engine +and hooks system wrap tool execution. + +**What needs abstracting:** + +- Tool declaration (name, schema) — already somewhat generic via JSON Schema +- Tool execution (args → result) +- Tool confirmation flow (ASK_USER → user decision → proceed/deny) +- Tool result shape (llmContent + displayContent + error + tailCalls) + +**Key decision:** Keep DeclarativeTool pattern or flatten to a simpler +interface? + +**Recommendation:** Define a minimal `ToolExecutor` interface: + +``` +ToolExecutor { + name: string + description: string + schema: JSONSchema + execute(args, context): Promise + requiresConfirmation?(args, context): Promise +} +``` + +DeclarativeTool remains the concrete implementation. ADK's BaseTool adapts to +this. + +**Open questions:** + +- How do MCP tools fit? They already have their own protocol. +- Tool annotations (destructive hints) — should these be in the interface? +- Long-running tools need progress reporting — how does this interact with + tool_update events? + +--- + +### P1 (Important - Define After P0) + +#### 4. Policy / Permission Interface + +**Why important:** Every tool call goes through policy. External agents need +policy enforcement too. + +**Current state:** gemini-cli has a sophisticated TOML-based policy engine with +tiered priorities. ADK-TS has a simpler SecurityPlugin with PolicyOutcome +(DENY/CONFIRM/ALLOW). + +**What needs abstracting:** + +``` +PolicyEngine { + evaluate(toolName, args, context): PolicyDecision // ALLOW | DENY | ASK_USER + getExcludedTools(): string[] // Tools statically denied +} +``` + +**Key decision:** Do external agents (OpenRouter, etc.) get the same policy +enforcement? + +**Open questions:** + +- If an ADK agent calls a tool internally, does gemini-cli's policy apply? +- With `pauseOnToolCalls: true` in ADK, the CLI controls execution — but what + about headless mode? +- How do agent-level policies work? (allow/deny entire agents, not just tools) +- Should policy be a middleware (AG-UI pattern) or a callback (ADK plugin + pattern)? + +#### 5. Hooks Interface + +**Why important:** Hooks are a major gemini-cli feature. They need to work +regardless of which agent backend runs. + +**Current state:** 11 hook types firing at specific lifecycle points. + +**What needs abstracting:** + +- Hook lifecycle must be backend-agnostic +- BeforeModel/AfterModel hooks need to work even when ADK controls the model +- BeforeTool/AfterTool hooks need to intercept regardless of who executes the + tool + +**Key challenge:** When ADK runs the model internally, gemini-cli hooks can't +easily intercept. **Dewitt's solution:** ADK uses gemini-cli's model via +AdkGeminiModel adapter — hooks fire inside GeminiChat. + +**Open questions:** + +- If OpenRouter runs the model, how do BeforeModel/AfterModel hooks work? +- Do we need a "model steering" abstraction (injecting context mid-stream)? +- Can hooks be expressed as AG-UI middleware? (intercept event stream) + +#### 6. Model / LLM Interface + +**Why important:** Model abstraction enables swapping LLM providers. + +**Dewitt's approach:** Exposes Model interface, ADK uses it via AdkGeminiModel +adapter. **Coworker's approach:** Model is internal to Agent (no separate Model +interface). + +**Recommendation:** Keep Dewitt's separate Model interface BUT make it +provider-agnostic: + +- Remove `@google/genai` types from the interface signature +- Define generic Message/Content types +- Model interface is an implementation detail, not part of the Agent contract + +**Open questions:** + +- Can we define a truly provider-agnostic Model interface? +- Or is the Model always tied to the agent backend? (ADK uses Gemini, OpenRouter + uses whatever) +- Model routing (choosing which model) — is this a concern of the Model + interface or a separate service? + +--- + +### P2 (Important but Can Follow) + +#### 7. Session / State Interface + +**Current state:** gemini-cli uses ChatRecordingService (JSON files). ADK uses +Session with BaseSessionService. + +**What needs abstracting:** + +- Session creation/retrieval +- State persistence across turns +- History/trajectory management + +**Open questions:** + +- Does the trajectory (coworker's concept) replace gemini-cli's chat recording? +- Should session state be shared between gemini-cli and the agent backend? + +#### 8. Elicitation / User Interaction Interface + +**What it covers:** Model fallback dialogs, tool confirmations, Ctrl+B +interrupts, user questions + +**Current state:** gemini-cli uses ConfirmationBus + MessageBus. AG-UI uses +frontend tools. + +**Open questions:** + +- Is elicitation just a special case of tool calls (AG-UI approach)? +- Or is it a first-class event type (coworker's approach)? +- How does Ctrl+B (cancel/interrupt) propagate through the agent boundary? + +#### 9. Configuration / Capability Discovery + +**What it covers:** Feature flags, experiment settings, agent capabilities + +**Open questions:** + +- How does an external agent declare its capabilities? +- Does OpenRouter support HITL? Elicitation? Tool confirmation? Each agent may + differ. +- Need a `capabilities` negotiation at connection time? + +--- + +### P3 (Future / Can Defer) + +#### 10. A2UI / Rich UI Interface + +- Declarative UI generation from agents +- Not critical for Phase 2 but important for differentiation + +#### 11. Memory / Artifact Interface + +- ADK has memory/artifact services +- gemini-cli has ChatRecordingService + memory tools +- Can standardize later + +#### 12. Telemetry / Observability Interface + +- Both systems have telemetry +- Can standardize later + +--- + +## Critical Open Questions (Need Team Discussion) + +### 1. OpenRouter Integration Model + +**Question:** When OpenRouter (or any external agent) is used, what does the +integration look like? + +**Option A: Full Agent Interface** — OpenRouter implements the Agent interface +directly + +- Pro: Clean, uniform +- Con: OpenRouter doesn't support HITL, hooks, policies natively + +**Option B: ACP Shim** — Agent Communication Protocol between CLI and external +agents + +- Pro: Standards-based +- Con: Additional protocol layer, may be premature + +**Option C: Model-only Integration** — OpenRouter is just an alternative Model, +not Agent + +- Pro: Simpler, leverages existing agent loop +- Con: Doesn't support OpenRouter-specific features + +**Recommendation:** Start with Option C (model-only). OpenRouter provides an LLM +endpoint. Gemini-cli's own agent loop handles tools, policies, hooks. This means +defining a provider-agnostic Model interface is the key enabler. + +### 2. Tool Execution: Client-side vs Agent-side + +**Question:** Who executes tools — the CLI or the agent backend? + +**Option A: Always client-side** (CLI executes, agent suspends) + +- ADK: `pauseOnToolCalls: true` +- Pro: CLI maintains control, policies enforced, hooks fire +- Con: Higher latency, more round-trips + +**Option B: Agent-side execution** (agent runs tools internally) + +- Pro: Faster, simpler +- Con: Bypasses CLI policies, hooks, confirmations + +**Option C: Configurable** — CLI decides per-tool or per-agent + +- Pro: Flexible +- Con: Complex + +**Recommendation:** Option A for safety-critical CLI use case. Option B only for +trusted/sandboxed sub-agents. + +### 3. Model Steering (Hooks that inject context mid-stream) + +**Question:** How do user-local hooks (like injecting project context) work with +external agents? + +**Answer:** They can only work if: + +- The CLI controls the model (via Model interface adapter) — then BeforeModel + hook injects context +- OR the agent supports a "system instruction update" mechanism + +For OpenRouter: model steering works because CLI controls the model call. For +ADK: model steering works because AdkGeminiModel wraps GeminiChat. For fully +opaque agents: model steering **cannot work** — this is a known limitation. + +### 4. Elicitation Flow + +**Question:** When the agent needs user input (model fallback, clarification), +how does it work? + +**For CLI-controlled agents:** Agent yields an elicitation_request event → CLI +renders prompt → user responds → CLI sends response back via session.stream({ +kind: 'elicitation_response', ... }) to resume + +**For external agents:** Agent uses A2A protocol or similar to send elicitation +→ CLI bridges the request to user → response sent back via protocol + +**Key insight:** Elicitation is fundamentally about the agent SUSPENDING and +waiting for user input. ADK already supports this via `pauseOnToolCalls`. Can we +generalize to `pauseOnElicitation`? + +### 5. Sub-agent Identity and Policies + +**Question:** When a sub-agent spawns, does it inherit parent policies? Get its +own? + +**Current gemini-cli behavior:** Sub-agents registered as tools, go through same +policy engine. **ADK behavior:** Sub-agents are child nodes in agent tree, get +parent's plugins. + +**Recommendation:** Sub-agents inherit parent policy context. Additional +restrictions can be layered (e.g., sub-agent X cannot use shell tool). This is +already how gemini-cli works.