Revise unified review with corrections and phased plan

Key changes: - Add compilation blockers (C1-C4) as front-and-center section - Soften "god object" to "extract 2 concerns from orchestrator" - Fix persistence assessment (drop DB concerns, keep rewind depth) - Add BaseContextCompactor finding (native ADK compaction ignored) - Add isStructuredError type-guard bug - Add token metrics gap in telemetry - Replace AgentTool with custom BaseTool subagent architecture - Demote retry safety to a note (ADK-wide, not migration-specific) - Add 4-phase migration plan (Foundation → Non-Interactive → Interactive → SDK) - Add feature-gate note at top - Add concern decomposition table - Simplify concurrent sessions (single-writer is sufficient for CLI) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-15 06:12:50 -07:00 · 2026-04-06 06:35:15 -07:00
parent 53bbc2b339
commit 99267012bc
1 changed files with 194 additions and 153 deletions
@@ -31,7 +31,29 @@ The central architectural insight is right: the migration boundary is:

 That boundary carries stream lifecycle, replay/resume, event projection, approval correlation, and persistence correctness. Until the translation architecture is explicit, parity claims remain unsupported.

-## Merged High-Risk Findings
+## Compilation Blockers (Must Fix First)
+
+These are binary errors — the code samples in the design doc will not compile against current ADK types.
+
+### C1. `LlmRequest` has no `headers` field
+
+`adk-js/core/src/models/llm_request.ts` defines: `model?`, `contents`, `config?`, `liveConnectConfig`, `toolsDict`. The design doc's `request.headers = {...}` will silently fail or not compile. **Fix:** use `llmRequest.config.httpOptions.headers` or pass headers at `Gemini` constructor time.
+
+### C2. `BaseLlm.connect()` is abstract and unimplemented
+
+`adk-js/core/src/models/base_llm.ts:67-78` has TWO abstract methods: `generateContentAsync` and `connect()`. `GcliAgentModel` must implement both. **Fix:** add `connect()` — stub with `throw new Error('not supported')` if live connections are out of scope.
+
+### C3. `BaseToolset` constructor signature is wrong
+
+`adk-js/core/src/tools/base_toolset.ts:46-49`: `constructor(readonly toolFilter: ToolPredicate | string[], readonly prefix?: string)`. The design doc's `super([])` passes an empty array as `toolFilter`, which makes `isToolSelected` return false for all tools — a silent bug. Also `close()` is abstract and must be implemented. **Fix:** match the actual signature.
+
+### C4. `FunctionalTool` does not exist
+
+The design doc references `FunctionalTool` for subagent wrapping. This class does not exist in adk-js. The correct classes are `FunctionTool` (wraps a plain function) and `AgentTool` (wraps an agent with isolated runner). See subagent architecture section below for the recommended approach.
+
+---
+
+## High-Risk Findings

 ### 1. The actual migration boundary is under-specified

@@ -49,26 +71,14 @@ The doc needs a first-class architecture section for:

 Without that, the rest of the design sits on an unstated core contract.

-### 2. The “single consolidated decorator” is overloaded
+### 2. The consolidated decorator needs two concerns extracted

-The `GcliAgentModel.generateContentAsync` sketch is a useful mental model, but today it is carrying too many concerns:
- request mutation
- routing
- availability / fallback prompting
- compaction
- masking
- auth
- approval-adjacent control flow
+The `GcliAgentModel.generateContentAsync` pipeline is a reasonable orchestrator pattern — each concern is a separate injected service call, not a monolithic class. However, two concerns don't belong in the model layer:

-Several of these are not model-layer problems. They are session, orchestration, policy, or UI-integration concerns.
+- **Quota/availability prompting** — involves interactive UI suspension, not request mutation. Should use `beforeModelCallback` on `LlmAgent`.
+- **Policy/approval control flow** — involves blocking for user input. Should use `beforeToolCallback` or the `ApprovalBridge` (see architecture section).

-This should be decomposed into separate layers:
- request shaping
- runtime/session ownership
- event translation
- approval / elicitation bridge
- persistence / rewind service
- telemetry projection
+The remaining concerns (auth header injection, routing/model rewrite, compaction, masking) are legitimate request-preprocessing and can stay in the model pipeline.

 ### 3. The design frequently conflates “possible with custom logic” with “supported by current ADK architecture”

@@ -110,26 +120,18 @@ Blocking in callbacks may be acceptable for the first rollout, but the doc must
 - not long-term SDK behavior
 - intentionally non-native pending future elicitation/bidi work

-### 6. Rewind and custom persistence are underdesigned
+### 6. Rewind is deeper than file truncation

-The doc treats rewind too much like storage truncation.
+The doc treats rewind as storage truncation. The storage part is straightforward — `GcliFileSessionService` wraps the existing `ChatRecordingService` and implements 4 abstract methods (`createSession`, `getSession`, `listSessions`, `deleteSession`). DB-level concerns (locking, stale-writer detection, `PESSIMISTIC_WRITE`) do not apply to a single-user CLI.

-But `adk-js` session services in:
- `adk-js/core/src/sessions/base_session_service.ts`
- `adk-js/core/src/sessions/database_session_service.ts`
+However, rewind itself is not just storage:
+- `adk-js/core/src/agents/processors/request_confirmation_llm_request_processor.ts` scans event history to reconstruct pending confirmations and resume tool execution
+- `adk-js/core/src/runner/runner.ts:447` has a TODO acknowledging the event log is used as a transaction log
+- Truncating events without rolling back derived state (pending confirmations, app/user state prefixes) will break resumption

-show that correctness depends on:
- append ordering
- state merging
- scope splitting
- resumption logic
- stale-writer handling
- locking / concurrency behavior
+The doc should define rewind as: event-log truncation + session state recomputation + confirmation/auth state rollback.

-The unified conclusion is:
- file-backed persistence is plausible
- but it is not “just write JSON”
- rewind needs explicit semantics for state recomputation and resumption rollback
+Note: `BaseSessionService.appendEvent` handles state merging with `app:`, `user:`, `temp:` prefixed keys automatically — this is inherited for free.

 ### 7. Routing, telemetry, and plan mode are all overclaimed

@@ -139,27 +141,30 @@ Routing:
 - model banning via simulated error is not the same as model routing

 Telemetry:
- `event.id` is not a safe per-tool correlation key
- ADK can batch multiple function calls or responses in one event
+- `event.id` is not a safe per-tool correlation key — use `functionCall.id`
+- ADK can batch multiple function calls or responses in one event (`getFunctionCalls` extracts multiple from one event; `mergeParallelFunctionResponseEvents` merges multiple responses)
+- token usage is not available on ADK `Event` — it lives on `LlmResponse` and is lost by the time events reach the runner output stream. The doc needs to define how token metrics reach telemetry
 - the sample APIs in the design do not match current `ClearcutLogger`

 Plan mode:
 - current behavior is broader than prompt + tool filtering
 - policy, approval mode, routing, and config refresh are all part of the behavior

-### 8. Retry safety around side effects is missing
+### 8. Retry safety around side effects (note)

-The “full turn reset” proposal is incomplete without a retry safety model.
+The “full turn reset” proposal could duplicate side effects (file writes, shell commands, MCP mutations) if the turn already executed side-effecting tools before the stream failure. This is an ADK-wide concern, not specific to this migration. A brief note acknowledging this limitation is sufficient — a full retry safety taxonomy is out of scope for this design.

-The document must define behavior after:
- file writes
- shell commands
- MCP mutations
- subagent actions with external effects
+### 9. Design ignores ADK's native `BaseContextCompactor`

-Otherwise “retry from the start” can duplicate side effects.
+ADK already has a compaction extension point: `BaseContextCompactor` (`adk-js/core/src/context/base_context_compactor.ts`) with `shouldCompact(invocationContext)` + `compact(invocationContext)`. It's wired into the request processor pipeline via `ContextCompactorRequestProcessor` at `llm_agent.ts:407-420`.

-### 9. The review should distinguish architecture defects from rollout-status evidence
+The design doc proposes running compaction inside `GcliAgentModel.generateContentAsync` instead. This works but bypasses the native slot. Recommendation: implement `BaseContextCompactor`, delegate to the existing `ChatCompressionService` internally. ADK provides the trigger point; your service provides the logic. Note that `BaseContextCompactor` returns void — failure states (COMPRESSED, NOOP, CONTENT_TRUNCATED, etc.) must be communicated via session state or custom events.
+
+### 10. Existing `isStructuredError` type-guard bug
+
+`gemini-cli/packages/core/src/agent/event-translator.ts:431-438`: `isStructuredError()` checks only `typeof error === 'object' && 'message' in error && typeof error.message === 'string'`. Since every `Error` instance has `message: string`, plain `Error` objects pass this guard. In `mapError` (lines 390-429), the structured branch runs before `instanceof Error`, so plain errors get incorrect HTTP→gRPC status mapping. This should be fixed before the translator becomes the long-term session boundary. **Fix:** add `'status' in error` to the guard, or check `instanceof Error` first.
+
+### 11. The review should distinguish architecture defects from rollout-status evidence

 The interactive PR findings matter:
 - stacked-branch dependency
@@ -191,33 +196,31 @@ Do not make `AgentSession` the innermost engine.
 - exposes the runtime as `AgentProtocol` / `AgentSession`
 - owns replay, reattach, `streamId`, `threadId`, `agent_start` / `agent_end`, and top-level event projection

-3. `SubagentToolAdapter`
- exposes the same runtime as a callable tool
- projects child activity into parent-facing `tool_update` / `tool_response` or explicit child-thread events
- does not pretend every subagent is a peer top-level chat session unless that is an explicit design decision
+3. `SubagentTools` (custom `BaseTool` implementations)
+- subagents invoke the same shared runtime core as the main agent
+- the only difference is the type mapping at the boundary (projecting child activity into parent-facing `tool_update` / `tool_response` or `threadId`-scoped events)
+- `AgentTool` is explicitly rejected — it creates a fully isolated runner/session, which conflicts with the shared-core goal
+- custom `BaseTool` wrappers allow subagents to share policy, routing, tool definitions, and config with the parent while keeping the door open for future resumability and richer state sharing

 4. `ApprovalBridge`
- phase-1 callback bridge for approvals / elicitation
+- phase-1 callback bridge for approvals / elicitation using existing callbacks
 - clearly documented as temporary and non-native
+- eventual migration to full elicitation bidi format

-5. `GcliSessionService`
- file-backed persistence, locking, rewind, state recomputation, and crash recovery
+5. `GcliFileSessionService`
+- file-backed persistence wrapping existing `ChatRecordingService`
+- implements 4 abstract methods from `BaseSessionService`
+- inherits state-merging (app/user/temp prefixes) for free
+- single-writer model — concurrent runs per session are forbidden

-### Subagent conclusion
+### Subagent architecture decision

-The document should not imply that `AgentTool` is the only correct long-term path.
+`AgentTool` is not the right abstraction for this migration. It creates an isolated runner with its own `InMemorySessionService`, meaning:
+- state is NOT shared with the parent
+- events are NOT projected into the parent stream
+- policy, routing, and config are NOT inherited

-Current ADK TS exposes:
- `FunctionTool`
- `AgentTool`
-
-These serve different purposes.
-
-Merged conclusion:
- if the goal is SDK-first subagent invocation over a Gemini CLI-owned modular core, `FunctionTool` or a custom `BaseTool` wrapper is a legitimate phase-1 choice
- if the goal is ADK-native nested-agent semantics, `AgentTool` is the native abstraction
-
-The doc should describe this as a deliberate architectural choice, not as an accident.
+The core requirement is that subagents use the same core loop functionality as the main agent — the only difference is how results are mapped back to the parent. Custom `BaseTool` implementations that invoke the shared `AdkRuntimeCore` achieve this. This also positions subagents for future complexity (resumability, richer state sharing, MCP-scoped tool sets) without fighting AgentTool's isolation model.

 ### Session conclusion

@@ -342,14 +345,14 @@ Required change:

 ### 4.2 Custom File-Based Persistence

-Assessment: high-risk and underdesigned.
+Assessment: straightforward for storage, but rewind semantics need work.
+
+The storage layer itself is low-risk — `GcliFileSessionService` wraps existing `ChatRecordingService` and implements 4 abstract methods. DB concerns (locking, stale-writer, crash recovery) don't apply to a single-user CLI.

 Required change:
- define on-disk schema
- define lock strategy
- define crash recovery
- define state-scope storage
- define rewind algorithm
+- define rewind as event-log truncation + state recomputation (not just file truncation)
+- state that concurrent runs per session are forbidden (single-writer)
+- define how app/user/temp state prefixes map to existing workspace JSON

 ### 4.3 Decomposing `AgentLoopContext`

@@ -377,104 +380,142 @@ Required change:
 - define confirmation/auth rollback semantics
 - define artifact/version rollback expectations

-### 5.3 Concurrent Sessions and Locking
+### 5.3 Concurrent Sessions

-Assessment: real issue, mitigation is missing.
+Assessment: a single-user CLI does not need DB-level locking.

 Required change:
- define file-backed equivalent of locking / single-writer policy
- state whether concurrent runs per session are forbidden
- state where serialization is enforced
+- state that concurrent runs per session are forbidden (single-writer model)
+- this is sufficient for a CLI — no further locking design needed

-## Corrections and Softenings Relative to the Earlier Reviews
+## Corrections to Earlier Reviews

-These points should be preserved in the unified version so the review stays technically precise.
+### Auth is bridgeable, not impossible
+Header injection works for unary requests via `llmRequest.config.httpOptions.headers` and constructor-level `headers` param. The criticism is that the design doc overclaims parity — not that auth is blocked.

-### 1. `FunctionTool` vs `AgentTool`
+### Persistence is simpler than initially claimed
+DB-level concerns (PESSIMISTIC_WRITE, stale-writer detection, concurrent append policy) come from `DatabaseSessionService` and don't apply. The file-backed service wraps existing `ChatRecordingService`. Only rewind semantics need deeper design.

-The earlier reviews were correct that `FunctionalTool` is the wrong name.
-
-But the stronger unified position is:
- `FunctionTool` is real and can be the right fit for SDK-first subagent invocation
- `AgentTool` is the native nested-agent abstraction
- the design should choose deliberately between them
-
-### 2. `AgentTool` criticism should be more precise
-
-Avoid overstating that `AgentTool` always creates a totally isolated in-memory boundary.
-
-The more accurate criticism is:
- nested-runner semantics differ from top-level semantics
- session reuse/state propagation are nuanced
- the document must define intended boundaries explicitly
-
-### 3. Auth criticism should be precise
-
-Do not say “header injection is impossible.”
-
-Say:
- request/header mutation exists for unary requests
- that does not justify the broader parity claim currently made in the doc
-
-### 4. Interactive PR findings should be demoted below the core architecture issues
-
-Keep:
- branch-stack caveat
- hook-order risk
- `_meta.legacyState` leakage
-
-But frame them as rollout and protocol-hygiene evidence, not as the single biggest issue.
+### Interactive PR findings are rollout-risk evidence, not architecture defects
+The branch-stack dependency, hook-order risk, and `_meta.legacyState` leakage in PR #24297 are real but secondary to the core architecture gaps above.

 ## What To Change Before Principal Review

-1. Add an explicit `ADK runtime -> AgentProtocol / AgentSession` translation architecture section.
-2. Add a “shared runtime, different adapters” section for top-level agent vs subagent invocation.
-3. Replace strong parity language with explicit labels:
-   - native ADK support
-   - bridgeable with custom logic
-   - blocked today
-4. Add a persistence section with:
-   - on-disk schema
-   - locking
-   - rewind semantics
-   - state recomputation
-   - crash recovery
-5. Add a parity matrix with:
-   - feature
-   - current owner
-   - ADK integration point
-   - implementable now
-   - bridge workaround
-   - degraded semantics
-   - long-term target
-6. Add explicit phase-1 non-goals:
-   - no live/bidi parity
-   - no resumable approval parity
-   - no hidden claim of native subagent parity
-   - no concurrent multi-run semantics beyond defined limits
-7. Add success criteria to the migration sequence:
-   - non-interactive parity
-   - replay/reattach parity
-   - approval parity
-   - interactive UI parity
-   - subagent activity parity
+1. **Fix the 4 compilation blockers** (C1-C4) — these will be the first thing reviewers check
+2. **Add an explicit ADK→AgentProtocol translation architecture section** — this is the real design center
+3. **Add the shared-core subagent architecture** — custom `BaseTool` over shared runtime, not `AgentTool`
+4. **Use `BaseContextCompactor`** for compaction instead of embedding it in the model layer
+5. **Replace “100% possible today” and similar language** with explicit labels: native / bridgeable / blocked
+6. **Add a parity matrix** (feature | ADK mechanism | status | bridge workaround | long-term target)
+
+## Phased Migration Plan
+
+### Phase 0: Foundation (current state → near-term)
+**Goal:** Establish the shared runtime core and translation layer.
+
+- [ ] Fix compilation blockers (C1-C4) in design doc code samples
+- [ ] Implement `AdkRuntimeCore` wrapping ADK `Runner` + `LlmAgent`
+- [ ] Implement `AdkEventTranslator` (ADK `Event` → gemini-cli `AgentEvent`)
+- [ ] Implement `GcliAgentModel extends BaseLlm` with auth header injection and model rewrite
+- [ ] Implement `GcliFileSessionService extends BaseSessionService` wrapping `ChatRecordingService`
+- [ ] Implement `GcliContextCompactor implements BaseContextCompactor` wrapping `ChatCompressionService`
+- [ ] Wire behind `experimental.adk` feature gate
+
+**Non-goals for Phase 0:**
+- No interactive flow changes
+- No subagent support
+- No live/bidi connections
+- No resumable approvals
+
+**Exit criteria:** Non-interactive flow produces identical output behind feature gate.
+
+### Phase 1: Non-Interactive Parity
+**Goal:** Feature-gated non-interactive flow matches legacy behavior.
+
+- [ ] Tool output masking via `BaseLlmRequestProcessor`
+- [ ] Model routing via `beforeModelCallback` + `RoutingContextBuilder` adapter
+- [ ] Policy enforcement via `beforeToolCallback` using existing callbacks (temporary bridge)
+- [ ] Quota/availability handling via `beforeModelCallback`
+- [ ] Telemetry projection from runner event stream (correlate on `functionCall.id`)
+- [ ] Token usage capture (solve the `LlmResponse` → event gap)
+
+**Non-goals for Phase 1:**
+- No interactive UI changes
+- No plan mode switching
+- No subagents
+- Approvals are callback-based, not resumable
+
+**Exit criteria:** Non-interactive tests pass with `experimental.adk.enabled = true`. Legacy path remains default.
+
+### Phase 2: Interactive Flow Migration
+**Goal:** Interactive flow uses the same ADK runtime behind `TopLevelSessionAdapter`.
+
+- [ ] `TopLevelSessionAdapter` exposing runtime as `AgentProtocol` / `AgentSession`
+- [ ] Stream lifecycle (`agent_start` / `agent_end` / `streamId` ownership)
+- [ ] Replay/reattach semantics matching current `AgentSession.stream()` behavior
+- [ ] Plan mode via dynamic `BaseToolset` + `InstructionProvider` + mode state
+- [ ] `ApprovalBridge` for tool confirmations using existing UI callbacks
+- [ ] Elicitation via existing callbacks (not yet bidi)
+- [ ] `_meta.legacyState` bridge for UI rendering (documented as temporary)
+
+**Non-goals for Phase 2:**
+- No live/bidi parity (`runLiveImpl` is still a stub in ADK TS)
+- No true mid-turn interruption (steering limited to once-per-LLM-call)
+- No resumable approvals across process death
+- No concurrent multi-stream sessions
+
+**Exit criteria:** Interactive flow works behind feature gate. Legacy path remains default.
+
+### Phase 3: Subagents and SDK Readiness
+**Goal:** Subagents share the core runtime. The architecture is SDK-reusable.
+
+- [ ] Custom `BaseTool` subagent wrappers over shared `AdkRuntimeCore`
+- [ ] `threadId`-scoped event projection for child activity
+- [ ] Shared policy, routing, and config inheritance
+- [ ] Remove `_meta.legacyState` — replace with proper event/adapter separation
+- [ ] Migrate approval bridge to native ADK elicitation/bidi (when available)
+- [ ] Define SDK public API surface
+
+**Non-goals for Phase 3:**
+- Full ADK Dev UI compatibility
+- Agent-to-agent transfer via ADK's native mechanism
+
+**Exit criteria:** Subagent invocation works. Architecture is documented for SDK consumers.

 ## Positive Signals

-The design is not wrong across the board. These parts are promising:
+- Feature-gating behind `experimental.adk` is the right rollout pattern
+- Tool output masking maps cleanly to request-preprocessing
+- Dynamic toolset + `InstructionProvider` for plan mode is a good ADK-native fit
+- The phased migration checklist with tracking issues shows good engineering discipline
+- The doc correctly identifies 3 known ADK gaps (sections 5.1-5.3)
+- `AgentProtocol` / `AgentSession` is the right consumer-facing boundary
+- The policy adapter pattern is directionally sound even with reduced phase-1 semantics

- isolating the migration behind flags and separate paths is the right rollout shape
- tool output masking maps relatively well to request-preprocessing
- hybrid tool isolation is aligned with current subagent registry/message-bus behavior
- the introduction of `AgentProtocol` / `AgentSession` is the right boundary; it just needs a much more explicit runtime/translation design behind it
+## Concern Decomposition
+
+| Concern | Level | ADK Mechanism |
+|---|---|---|
+| Auth headers | Transport | `BaseLlm` constructor / `httpOptions.headers` |
+| Model rewrite | Transport | `BaseLlm.generateContentAsync` model override |
+| Model routing | Agent | `beforeModelCallback` + `RoutingContextBuilder` |
+| Token compaction | Agent | `BaseContextCompactor` (native ADK) |
+| Quota/availability | Agent | `beforeModelCallback` |
+| Tool masking | Agent | `BaseLlmRequestProcessor` |
+| Tool confirmations | Agent+Consumer | `beforeToolCallback` + existing callbacks (phase 1) |
+| Telemetry | Consumer | Runner event wrapper (correlate on `functionCall.id`) |
+| UI rendering | Consumer | Event subscriber / adapter |
+| Subagent invocation | Agent | Custom `BaseTool` over shared runtime core |

 ## Final Verdict

-The design should be reframed as:
+The design is architecturally correct in its goal. To be principal-review ready:

- one shared ADK-based runtime core
- one explicit translation layer from ADK runtime semantics to Gemini CLI session semantics
- one temporary callback-based bridge for approvals/elicitation
- different adapters for top-level agent sessions vs subagent-as-tool invocation
+1. Fix the 4 compilation blockers
+2. Add the ADK→AgentProtocol translation architecture as the design center
+3. Adopt native `BaseContextCompactor` instead of model-layer compaction
+4. Use custom `BaseTool` subagents over shared core (not `AgentTool`)
+5. Downgrade parity claims to match reality
+6. Add the phased plan with explicit non-goals per phase

-If the doc is revised around that architecture and stops overclaiming what is already supported natively by ADK TS, it can become principal-review ready.
+The migration path is: shared runtime core → translation layer → adapters (top-level session, subagent tools, approval bridge). Each phase ships independently behind the feature gate.