From 99267012bcd2613db36c5ad52b8546ebd683dcba Mon Sep 17 00:00:00 2001 From: Adam Weidman Date: Mon, 6 Apr 2026 06:35:15 -0700 Subject: [PATCH] Revise unified review with corrections and phased plan MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Key changes: - Add compilation blockers (C1-C4) as front-and-center section - Soften "god object" to "extract 2 concerns from orchestrator" - Fix persistence assessment (drop DB concerns, keep rewind depth) - Add BaseContextCompactor finding (native ADK compaction ignored) - Add isStructuredError type-guard bug - Add token metrics gap in telemetry - Replace AgentTool with custom BaseTool subagent architecture - Demote retry safety to a note (ADK-wide, not migration-specific) - Add 4-phase migration plan (Foundation → Non-Interactive → Interactive → SDK) - Add feature-gate note at top - Add concern decomposition table - Simplify concurrent sessions (single-writer is sufficient for CLI) Co-Authored-By: Claude Opus 4.6 --- docs/adk-replat/unified-review.md | 347 +++++++++++++++++------------- 1 file changed, 194 insertions(+), 153 deletions(-) diff --git a/docs/adk-replat/unified-review.md b/docs/adk-replat/unified-review.md index bb2631a1f9..569ec4a77b 100644 --- a/docs/adk-replat/unified-review.md +++ b/docs/adk-replat/unified-review.md @@ -31,7 +31,29 @@ The central architectural insight is right: the migration boundary is: That boundary carries stream lifecycle, replay/resume, event projection, approval correlation, and persistence correctness. Until the translation architecture is explicit, parity claims remain unsupported. -## Merged High-Risk Findings +## Compilation Blockers (Must Fix First) + +These are binary errors — the code samples in the design doc will not compile against current ADK types. + +### C1. `LlmRequest` has no `headers` field + +`adk-js/core/src/models/llm_request.ts` defines: `model?`, `contents`, `config?`, `liveConnectConfig`, `toolsDict`. The design doc's `request.headers = {...}` will silently fail or not compile. **Fix:** use `llmRequest.config.httpOptions.headers` or pass headers at `Gemini` constructor time. + +### C2. `BaseLlm.connect()` is abstract and unimplemented + +`adk-js/core/src/models/base_llm.ts:67-78` has TWO abstract methods: `generateContentAsync` and `connect()`. `GcliAgentModel` must implement both. **Fix:** add `connect()` — stub with `throw new Error('not supported')` if live connections are out of scope. + +### C3. `BaseToolset` constructor signature is wrong + +`adk-js/core/src/tools/base_toolset.ts:46-49`: `constructor(readonly toolFilter: ToolPredicate | string[], readonly prefix?: string)`. The design doc's `super([])` passes an empty array as `toolFilter`, which makes `isToolSelected` return false for all tools — a silent bug. Also `close()` is abstract and must be implemented. **Fix:** match the actual signature. + +### C4. `FunctionalTool` does not exist + +The design doc references `FunctionalTool` for subagent wrapping. This class does not exist in adk-js. The correct classes are `FunctionTool` (wraps a plain function) and `AgentTool` (wraps an agent with isolated runner). See subagent architecture section below for the recommended approach. + +--- + +## High-Risk Findings ### 1. The actual migration boundary is under-specified @@ -49,26 +71,14 @@ The doc needs a first-class architecture section for: Without that, the rest of the design sits on an unstated core contract. -### 2. The “single consolidated decorator” is overloaded +### 2. The consolidated decorator needs two concerns extracted -The `GcliAgentModel.generateContentAsync` sketch is a useful mental model, but today it is carrying too many concerns: -- request mutation -- routing -- availability / fallback prompting -- compaction -- masking -- auth -- approval-adjacent control flow +The `GcliAgentModel.generateContentAsync` pipeline is a reasonable orchestrator pattern — each concern is a separate injected service call, not a monolithic class. However, two concerns don't belong in the model layer: -Several of these are not model-layer problems. They are session, orchestration, policy, or UI-integration concerns. +- **Quota/availability prompting** — involves interactive UI suspension, not request mutation. Should use `beforeModelCallback` on `LlmAgent`. +- **Policy/approval control flow** — involves blocking for user input. Should use `beforeToolCallback` or the `ApprovalBridge` (see architecture section). -This should be decomposed into separate layers: -- request shaping -- runtime/session ownership -- event translation -- approval / elicitation bridge -- persistence / rewind service -- telemetry projection +The remaining concerns (auth header injection, routing/model rewrite, compaction, masking) are legitimate request-preprocessing and can stay in the model pipeline. ### 3. The design frequently conflates “possible with custom logic” with “supported by current ADK architecture” @@ -110,26 +120,18 @@ Blocking in callbacks may be acceptable for the first rollout, but the doc must - not long-term SDK behavior - intentionally non-native pending future elicitation/bidi work -### 6. Rewind and custom persistence are underdesigned +### 6. Rewind is deeper than file truncation -The doc treats rewind too much like storage truncation. +The doc treats rewind as storage truncation. The storage part is straightforward — `GcliFileSessionService` wraps the existing `ChatRecordingService` and implements 4 abstract methods (`createSession`, `getSession`, `listSessions`, `deleteSession`). DB-level concerns (locking, stale-writer detection, `PESSIMISTIC_WRITE`) do not apply to a single-user CLI. -But `adk-js` session services in: -- `adk-js/core/src/sessions/base_session_service.ts` -- `adk-js/core/src/sessions/database_session_service.ts` +However, rewind itself is not just storage: +- `adk-js/core/src/agents/processors/request_confirmation_llm_request_processor.ts` scans event history to reconstruct pending confirmations and resume tool execution +- `adk-js/core/src/runner/runner.ts:447` has a TODO acknowledging the event log is used as a transaction log +- Truncating events without rolling back derived state (pending confirmations, app/user state prefixes) will break resumption -show that correctness depends on: -- append ordering -- state merging -- scope splitting -- resumption logic -- stale-writer handling -- locking / concurrency behavior +The doc should define rewind as: event-log truncation + session state recomputation + confirmation/auth state rollback. -The unified conclusion is: -- file-backed persistence is plausible -- but it is not “just write JSON” -- rewind needs explicit semantics for state recomputation and resumption rollback +Note: `BaseSessionService.appendEvent` handles state merging with `app:`, `user:`, `temp:` prefixed keys automatically — this is inherited for free. ### 7. Routing, telemetry, and plan mode are all overclaimed @@ -139,27 +141,30 @@ Routing: - model banning via simulated error is not the same as model routing Telemetry: -- `event.id` is not a safe per-tool correlation key -- ADK can batch multiple function calls or responses in one event +- `event.id` is not a safe per-tool correlation key — use `functionCall.id` +- ADK can batch multiple function calls or responses in one event (`getFunctionCalls` extracts multiple from one event; `mergeParallelFunctionResponseEvents` merges multiple responses) +- token usage is not available on ADK `Event` — it lives on `LlmResponse` and is lost by the time events reach the runner output stream. The doc needs to define how token metrics reach telemetry - the sample APIs in the design do not match current `ClearcutLogger` Plan mode: - current behavior is broader than prompt + tool filtering - policy, approval mode, routing, and config refresh are all part of the behavior -### 8. Retry safety around side effects is missing +### 8. Retry safety around side effects (note) -The “full turn reset” proposal is incomplete without a retry safety model. +The “full turn reset” proposal could duplicate side effects (file writes, shell commands, MCP mutations) if the turn already executed side-effecting tools before the stream failure. This is an ADK-wide concern, not specific to this migration. A brief note acknowledging this limitation is sufficient — a full retry safety taxonomy is out of scope for this design. -The document must define behavior after: -- file writes -- shell commands -- MCP mutations -- subagent actions with external effects +### 9. Design ignores ADK's native `BaseContextCompactor` -Otherwise “retry from the start” can duplicate side effects. +ADK already has a compaction extension point: `BaseContextCompactor` (`adk-js/core/src/context/base_context_compactor.ts`) with `shouldCompact(invocationContext)` + `compact(invocationContext)`. It's wired into the request processor pipeline via `ContextCompactorRequestProcessor` at `llm_agent.ts:407-420`. -### 9. The review should distinguish architecture defects from rollout-status evidence +The design doc proposes running compaction inside `GcliAgentModel.generateContentAsync` instead. This works but bypasses the native slot. Recommendation: implement `BaseContextCompactor`, delegate to the existing `ChatCompressionService` internally. ADK provides the trigger point; your service provides the logic. Note that `BaseContextCompactor` returns void — failure states (COMPRESSED, NOOP, CONTENT_TRUNCATED, etc.) must be communicated via session state or custom events. + +### 10. Existing `isStructuredError` type-guard bug + +`gemini-cli/packages/core/src/agent/event-translator.ts:431-438`: `isStructuredError()` checks only `typeof error === 'object' && 'message' in error && typeof error.message === 'string'`. Since every `Error` instance has `message: string`, plain `Error` objects pass this guard. In `mapError` (lines 390-429), the structured branch runs before `instanceof Error`, so plain errors get incorrect HTTP→gRPC status mapping. This should be fixed before the translator becomes the long-term session boundary. **Fix:** add `'status' in error` to the guard, or check `instanceof Error` first. + +### 11. The review should distinguish architecture defects from rollout-status evidence The interactive PR findings matter: - stacked-branch dependency @@ -191,33 +196,31 @@ Do not make `AgentSession` the innermost engine. - exposes the runtime as `AgentProtocol` / `AgentSession` - owns replay, reattach, `streamId`, `threadId`, `agent_start` / `agent_end`, and top-level event projection -3. `SubagentToolAdapter` -- exposes the same runtime as a callable tool -- projects child activity into parent-facing `tool_update` / `tool_response` or explicit child-thread events -- does not pretend every subagent is a peer top-level chat session unless that is an explicit design decision +3. `SubagentTools` (custom `BaseTool` implementations) +- subagents invoke the same shared runtime core as the main agent +- the only difference is the type mapping at the boundary (projecting child activity into parent-facing `tool_update` / `tool_response` or `threadId`-scoped events) +- `AgentTool` is explicitly rejected — it creates a fully isolated runner/session, which conflicts with the shared-core goal +- custom `BaseTool` wrappers allow subagents to share policy, routing, tool definitions, and config with the parent while keeping the door open for future resumability and richer state sharing 4. `ApprovalBridge` -- phase-1 callback bridge for approvals / elicitation +- phase-1 callback bridge for approvals / elicitation using existing callbacks - clearly documented as temporary and non-native +- eventual migration to full elicitation bidi format -5. `GcliSessionService` -- file-backed persistence, locking, rewind, state recomputation, and crash recovery +5. `GcliFileSessionService` +- file-backed persistence wrapping existing `ChatRecordingService` +- implements 4 abstract methods from `BaseSessionService` +- inherits state-merging (app/user/temp prefixes) for free +- single-writer model — concurrent runs per session are forbidden -### Subagent conclusion +### Subagent architecture decision -The document should not imply that `AgentTool` is the only correct long-term path. +`AgentTool` is not the right abstraction for this migration. It creates an isolated runner with its own `InMemorySessionService`, meaning: +- state is NOT shared with the parent +- events are NOT projected into the parent stream +- policy, routing, and config are NOT inherited -Current ADK TS exposes: -- `FunctionTool` -- `AgentTool` - -These serve different purposes. - -Merged conclusion: -- if the goal is SDK-first subagent invocation over a Gemini CLI-owned modular core, `FunctionTool` or a custom `BaseTool` wrapper is a legitimate phase-1 choice -- if the goal is ADK-native nested-agent semantics, `AgentTool` is the native abstraction - -The doc should describe this as a deliberate architectural choice, not as an accident. +The core requirement is that subagents use the same core loop functionality as the main agent — the only difference is how results are mapped back to the parent. Custom `BaseTool` implementations that invoke the shared `AdkRuntimeCore` achieve this. This also positions subagents for future complexity (resumability, richer state sharing, MCP-scoped tool sets) without fighting AgentTool's isolation model. ### Session conclusion @@ -342,14 +345,14 @@ Required change: ### 4.2 Custom File-Based Persistence -Assessment: high-risk and underdesigned. +Assessment: straightforward for storage, but rewind semantics need work. + +The storage layer itself is low-risk — `GcliFileSessionService` wraps existing `ChatRecordingService` and implements 4 abstract methods. DB concerns (locking, stale-writer, crash recovery) don't apply to a single-user CLI. Required change: -- define on-disk schema -- define lock strategy -- define crash recovery -- define state-scope storage -- define rewind algorithm +- define rewind as event-log truncation + state recomputation (not just file truncation) +- state that concurrent runs per session are forbidden (single-writer) +- define how app/user/temp state prefixes map to existing workspace JSON ### 4.3 Decomposing `AgentLoopContext` @@ -377,104 +380,142 @@ Required change: - define confirmation/auth rollback semantics - define artifact/version rollback expectations -### 5.3 Concurrent Sessions and Locking +### 5.3 Concurrent Sessions -Assessment: real issue, mitigation is missing. +Assessment: a single-user CLI does not need DB-level locking. Required change: -- define file-backed equivalent of locking / single-writer policy -- state whether concurrent runs per session are forbidden -- state where serialization is enforced +- state that concurrent runs per session are forbidden (single-writer model) +- this is sufficient for a CLI — no further locking design needed -## Corrections and Softenings Relative to the Earlier Reviews +## Corrections to Earlier Reviews -These points should be preserved in the unified version so the review stays technically precise. +### Auth is bridgeable, not impossible +Header injection works for unary requests via `llmRequest.config.httpOptions.headers` and constructor-level `headers` param. The criticism is that the design doc overclaims parity — not that auth is blocked. -### 1. `FunctionTool` vs `AgentTool` +### Persistence is simpler than initially claimed +DB-level concerns (PESSIMISTIC_WRITE, stale-writer detection, concurrent append policy) come from `DatabaseSessionService` and don't apply. The file-backed service wraps existing `ChatRecordingService`. Only rewind semantics need deeper design. -The earlier reviews were correct that `FunctionalTool` is the wrong name. - -But the stronger unified position is: -- `FunctionTool` is real and can be the right fit for SDK-first subagent invocation -- `AgentTool` is the native nested-agent abstraction -- the design should choose deliberately between them - -### 2. `AgentTool` criticism should be more precise - -Avoid overstating that `AgentTool` always creates a totally isolated in-memory boundary. - -The more accurate criticism is: -- nested-runner semantics differ from top-level semantics -- session reuse/state propagation are nuanced -- the document must define intended boundaries explicitly - -### 3. Auth criticism should be precise - -Do not say “header injection is impossible.” - -Say: -- request/header mutation exists for unary requests -- that does not justify the broader parity claim currently made in the doc - -### 4. Interactive PR findings should be demoted below the core architecture issues - -Keep: -- branch-stack caveat -- hook-order risk -- `_meta.legacyState` leakage - -But frame them as rollout and protocol-hygiene evidence, not as the single biggest issue. +### Interactive PR findings are rollout-risk evidence, not architecture defects +The branch-stack dependency, hook-order risk, and `_meta.legacyState` leakage in PR #24297 are real but secondary to the core architecture gaps above. ## What To Change Before Principal Review -1. Add an explicit `ADK runtime -> AgentProtocol / AgentSession` translation architecture section. -2. Add a “shared runtime, different adapters” section for top-level agent vs subagent invocation. -3. Replace strong parity language with explicit labels: - - native ADK support - - bridgeable with custom logic - - blocked today -4. Add a persistence section with: - - on-disk schema - - locking - - rewind semantics - - state recomputation - - crash recovery -5. Add a parity matrix with: - - feature - - current owner - - ADK integration point - - implementable now - - bridge workaround - - degraded semantics - - long-term target -6. Add explicit phase-1 non-goals: - - no live/bidi parity - - no resumable approval parity - - no hidden claim of native subagent parity - - no concurrent multi-run semantics beyond defined limits -7. Add success criteria to the migration sequence: - - non-interactive parity - - replay/reattach parity - - approval parity - - interactive UI parity - - subagent activity parity +1. **Fix the 4 compilation blockers** (C1-C4) — these will be the first thing reviewers check +2. **Add an explicit ADK→AgentProtocol translation architecture section** — this is the real design center +3. **Add the shared-core subagent architecture** — custom `BaseTool` over shared runtime, not `AgentTool` +4. **Use `BaseContextCompactor`** for compaction instead of embedding it in the model layer +5. **Replace “100% possible today” and similar language** with explicit labels: native / bridgeable / blocked +6. **Add a parity matrix** (feature | ADK mechanism | status | bridge workaround | long-term target) + +## Phased Migration Plan + +### Phase 0: Foundation (current state → near-term) +**Goal:** Establish the shared runtime core and translation layer. + +- [ ] Fix compilation blockers (C1-C4) in design doc code samples +- [ ] Implement `AdkRuntimeCore` wrapping ADK `Runner` + `LlmAgent` +- [ ] Implement `AdkEventTranslator` (ADK `Event` → gemini-cli `AgentEvent`) +- [ ] Implement `GcliAgentModel extends BaseLlm` with auth header injection and model rewrite +- [ ] Implement `GcliFileSessionService extends BaseSessionService` wrapping `ChatRecordingService` +- [ ] Implement `GcliContextCompactor implements BaseContextCompactor` wrapping `ChatCompressionService` +- [ ] Wire behind `experimental.adk` feature gate + +**Non-goals for Phase 0:** +- No interactive flow changes +- No subagent support +- No live/bidi connections +- No resumable approvals + +**Exit criteria:** Non-interactive flow produces identical output behind feature gate. + +### Phase 1: Non-Interactive Parity +**Goal:** Feature-gated non-interactive flow matches legacy behavior. + +- [ ] Tool output masking via `BaseLlmRequestProcessor` +- [ ] Model routing via `beforeModelCallback` + `RoutingContextBuilder` adapter +- [ ] Policy enforcement via `beforeToolCallback` using existing callbacks (temporary bridge) +- [ ] Quota/availability handling via `beforeModelCallback` +- [ ] Telemetry projection from runner event stream (correlate on `functionCall.id`) +- [ ] Token usage capture (solve the `LlmResponse` → event gap) + +**Non-goals for Phase 1:** +- No interactive UI changes +- No plan mode switching +- No subagents +- Approvals are callback-based, not resumable + +**Exit criteria:** Non-interactive tests pass with `experimental.adk.enabled = true`. Legacy path remains default. + +### Phase 2: Interactive Flow Migration +**Goal:** Interactive flow uses the same ADK runtime behind `TopLevelSessionAdapter`. + +- [ ] `TopLevelSessionAdapter` exposing runtime as `AgentProtocol` / `AgentSession` +- [ ] Stream lifecycle (`agent_start` / `agent_end` / `streamId` ownership) +- [ ] Replay/reattach semantics matching current `AgentSession.stream()` behavior +- [ ] Plan mode via dynamic `BaseToolset` + `InstructionProvider` + mode state +- [ ] `ApprovalBridge` for tool confirmations using existing UI callbacks +- [ ] Elicitation via existing callbacks (not yet bidi) +- [ ] `_meta.legacyState` bridge for UI rendering (documented as temporary) + +**Non-goals for Phase 2:** +- No live/bidi parity (`runLiveImpl` is still a stub in ADK TS) +- No true mid-turn interruption (steering limited to once-per-LLM-call) +- No resumable approvals across process death +- No concurrent multi-stream sessions + +**Exit criteria:** Interactive flow works behind feature gate. Legacy path remains default. + +### Phase 3: Subagents and SDK Readiness +**Goal:** Subagents share the core runtime. The architecture is SDK-reusable. + +- [ ] Custom `BaseTool` subagent wrappers over shared `AdkRuntimeCore` +- [ ] `threadId`-scoped event projection for child activity +- [ ] Shared policy, routing, and config inheritance +- [ ] Remove `_meta.legacyState` — replace with proper event/adapter separation +- [ ] Migrate approval bridge to native ADK elicitation/bidi (when available) +- [ ] Define SDK public API surface + +**Non-goals for Phase 3:** +- Full ADK Dev UI compatibility +- Agent-to-agent transfer via ADK's native mechanism + +**Exit criteria:** Subagent invocation works. Architecture is documented for SDK consumers. ## Positive Signals -The design is not wrong across the board. These parts are promising: +- Feature-gating behind `experimental.adk` is the right rollout pattern +- Tool output masking maps cleanly to request-preprocessing +- Dynamic toolset + `InstructionProvider` for plan mode is a good ADK-native fit +- The phased migration checklist with tracking issues shows good engineering discipline +- The doc correctly identifies 3 known ADK gaps (sections 5.1-5.3) +- `AgentProtocol` / `AgentSession` is the right consumer-facing boundary +- The policy adapter pattern is directionally sound even with reduced phase-1 semantics -- isolating the migration behind flags and separate paths is the right rollout shape -- tool output masking maps relatively well to request-preprocessing -- hybrid tool isolation is aligned with current subagent registry/message-bus behavior -- the introduction of `AgentProtocol` / `AgentSession` is the right boundary; it just needs a much more explicit runtime/translation design behind it +## Concern Decomposition + +| Concern | Level | ADK Mechanism | +|---|---|---| +| Auth headers | Transport | `BaseLlm` constructor / `httpOptions.headers` | +| Model rewrite | Transport | `BaseLlm.generateContentAsync` model override | +| Model routing | Agent | `beforeModelCallback` + `RoutingContextBuilder` | +| Token compaction | Agent | `BaseContextCompactor` (native ADK) | +| Quota/availability | Agent | `beforeModelCallback` | +| Tool masking | Agent | `BaseLlmRequestProcessor` | +| Tool confirmations | Agent+Consumer | `beforeToolCallback` + existing callbacks (phase 1) | +| Telemetry | Consumer | Runner event wrapper (correlate on `functionCall.id`) | +| UI rendering | Consumer | Event subscriber / adapter | +| Subagent invocation | Agent | Custom `BaseTool` over shared runtime core | ## Final Verdict -The design should be reframed as: +The design is architecturally correct in its goal. To be principal-review ready: -- one shared ADK-based runtime core -- one explicit translation layer from ADK runtime semantics to Gemini CLI session semantics -- one temporary callback-based bridge for approvals/elicitation -- different adapters for top-level agent sessions vs subagent-as-tool invocation +1. Fix the 4 compilation blockers +2. Add the ADK→AgentProtocol translation architecture as the design center +3. Adopt native `BaseContextCompactor` instead of model-layer compaction +4. Use custom `BaseTool` subagents over shared core (not `AgentTool`) +5. Downgrade parity claims to match reality +6. Add the phased plan with explicit non-goals per phase -If the doc is revised around that architecture and stops overclaiming what is already supported natively by ADK TS, it can become principal-review ready. +The migration path is: shared runtime core → translation layer → adapters (top-level session, subagent tools, approval bridge). Each phase ships independently behind the feature gate.