Revise unified review with corrections and phased plan

Key changes:
- Add compilation blockers (C1-C4) as front-and-center section
- Soften "god object" to "extract 2 concerns from orchestrator"
- Fix persistence assessment (drop DB concerns, keep rewind depth)
- Add BaseContextCompactor finding (native ADK compaction ignored)
- Add isStructuredError type-guard bug
- Add token metrics gap in telemetry
- Replace AgentTool with custom BaseTool subagent architecture
- Demote retry safety to a note (ADK-wide, not migration-specific)
- Add 4-phase migration plan (Foundation → Non-Interactive → Interactive → SDK)
- Add feature-gate note at top
- Add concern decomposition table
- Simplify concurrent sessions (single-writer is sufficient for CLI)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Adam Weidman
2026-04-06 06:35:15 -07:00
parent 53bbc2b339
commit 99267012bc
+194 -153
View File
@@ -31,7 +31,29 @@ The central architectural insight is right: the migration boundary is:
That boundary carries stream lifecycle, replay/resume, event projection, approval correlation, and persistence correctness. Until the translation architecture is explicit, parity claims remain unsupported.
## Merged High-Risk Findings
## Compilation Blockers (Must Fix First)
These are binary errors — the code samples in the design doc will not compile against current ADK types.
### C1. `LlmRequest` has no `headers` field
`adk-js/core/src/models/llm_request.ts` defines: `model?`, `contents`, `config?`, `liveConnectConfig`, `toolsDict`. The design doc's `request.headers = {...}` will silently fail or not compile. **Fix:** use `llmRequest.config.httpOptions.headers` or pass headers at `Gemini` constructor time.
### C2. `BaseLlm.connect()` is abstract and unimplemented
`adk-js/core/src/models/base_llm.ts:67-78` has TWO abstract methods: `generateContentAsync` and `connect()`. `GcliAgentModel` must implement both. **Fix:** add `connect()` — stub with `throw new Error('not supported')` if live connections are out of scope.
### C3. `BaseToolset` constructor signature is wrong
`adk-js/core/src/tools/base_toolset.ts:46-49`: `constructor(readonly toolFilter: ToolPredicate | string[], readonly prefix?: string)`. The design doc's `super([])` passes an empty array as `toolFilter`, which makes `isToolSelected` return false for all tools — a silent bug. Also `close()` is abstract and must be implemented. **Fix:** match the actual signature.
### C4. `FunctionalTool` does not exist
The design doc references `FunctionalTool` for subagent wrapping. This class does not exist in adk-js. The correct classes are `FunctionTool` (wraps a plain function) and `AgentTool` (wraps an agent with isolated runner). See subagent architecture section below for the recommended approach.
---
## High-Risk Findings
### 1. The actual migration boundary is under-specified
@@ -49,26 +71,14 @@ The doc needs a first-class architecture section for:
Without that, the rest of the design sits on an unstated core contract.
### 2. The “single consolidated decorator” is overloaded
### 2. The consolidated decorator needs two concerns extracted
The `GcliAgentModel.generateContentAsync` sketch is a useful mental model, but today it is carrying too many concerns:
- request mutation
- routing
- availability / fallback prompting
- compaction
- masking
- auth
- approval-adjacent control flow
The `GcliAgentModel.generateContentAsync` pipeline is a reasonable orchestrator pattern — each concern is a separate injected service call, not a monolithic class. However, two concerns don't belong in the model layer:
Several of these are not model-layer problems. They are session, orchestration, policy, or UI-integration concerns.
- **Quota/availability prompting** — involves interactive UI suspension, not request mutation. Should use `beforeModelCallback` on `LlmAgent`.
- **Policy/approval control flow** — involves blocking for user input. Should use `beforeToolCallback` or the `ApprovalBridge` (see architecture section).
This should be decomposed into separate layers:
- request shaping
- runtime/session ownership
- event translation
- approval / elicitation bridge
- persistence / rewind service
- telemetry projection
The remaining concerns (auth header injection, routing/model rewrite, compaction, masking) are legitimate request-preprocessing and can stay in the model pipeline.
### 3. The design frequently conflates “possible with custom logic” with “supported by current ADK architecture”
@@ -110,26 +120,18 @@ Blocking in callbacks may be acceptable for the first rollout, but the doc must
- not long-term SDK behavior
- intentionally non-native pending future elicitation/bidi work
### 6. Rewind and custom persistence are underdesigned
### 6. Rewind is deeper than file truncation
The doc treats rewind too much like storage truncation.
The doc treats rewind as storage truncation. The storage part is straightforward — `GcliFileSessionService` wraps the existing `ChatRecordingService` and implements 4 abstract methods (`createSession`, `getSession`, `listSessions`, `deleteSession`). DB-level concerns (locking, stale-writer detection, `PESSIMISTIC_WRITE`) do not apply to a single-user CLI.
But `adk-js` session services in:
- `adk-js/core/src/sessions/base_session_service.ts`
- `adk-js/core/src/sessions/database_session_service.ts`
However, rewind itself is not just storage:
- `adk-js/core/src/agents/processors/request_confirmation_llm_request_processor.ts` scans event history to reconstruct pending confirmations and resume tool execution
- `adk-js/core/src/runner/runner.ts:447` has a TODO acknowledging the event log is used as a transaction log
- Truncating events without rolling back derived state (pending confirmations, app/user state prefixes) will break resumption
show that correctness depends on:
- append ordering
- state merging
- scope splitting
- resumption logic
- stale-writer handling
- locking / concurrency behavior
The doc should define rewind as: event-log truncation + session state recomputation + confirmation/auth state rollback.
The unified conclusion is:
- file-backed persistence is plausible
- but it is not “just write JSON”
- rewind needs explicit semantics for state recomputation and resumption rollback
Note: `BaseSessionService.appendEvent` handles state merging with `app:`, `user:`, `temp:` prefixed keys automatically — this is inherited for free.
### 7. Routing, telemetry, and plan mode are all overclaimed
@@ -139,27 +141,30 @@ Routing:
- model banning via simulated error is not the same as model routing
Telemetry:
- `event.id` is not a safe per-tool correlation key
- ADK can batch multiple function calls or responses in one event
- `event.id` is not a safe per-tool correlation key — use `functionCall.id`
- ADK can batch multiple function calls or responses in one event (`getFunctionCalls` extracts multiple from one event; `mergeParallelFunctionResponseEvents` merges multiple responses)
- token usage is not available on ADK `Event` — it lives on `LlmResponse` and is lost by the time events reach the runner output stream. The doc needs to define how token metrics reach telemetry
- the sample APIs in the design do not match current `ClearcutLogger`
Plan mode:
- current behavior is broader than prompt + tool filtering
- policy, approval mode, routing, and config refresh are all part of the behavior
### 8. Retry safety around side effects is missing
### 8. Retry safety around side effects (note)
The “full turn reset” proposal is incomplete without a retry safety model.
The “full turn reset” proposal could duplicate side effects (file writes, shell commands, MCP mutations) if the turn already executed side-effecting tools before the stream failure. This is an ADK-wide concern, not specific to this migration. A brief note acknowledging this limitation is sufficient — a full retry safety taxonomy is out of scope for this design.
The document must define behavior after:
- file writes
- shell commands
- MCP mutations
- subagent actions with external effects
### 9. Design ignores ADK's native `BaseContextCompactor`
Otherwise “retry from the start” can duplicate side effects.
ADK already has a compaction extension point: `BaseContextCompactor` (`adk-js/core/src/context/base_context_compactor.ts`) with `shouldCompact(invocationContext)` + `compact(invocationContext)`. It's wired into the request processor pipeline via `ContextCompactorRequestProcessor` at `llm_agent.ts:407-420`.
### 9. The review should distinguish architecture defects from rollout-status evidence
The design doc proposes running compaction inside `GcliAgentModel.generateContentAsync` instead. This works but bypasses the native slot. Recommendation: implement `BaseContextCompactor`, delegate to the existing `ChatCompressionService` internally. ADK provides the trigger point; your service provides the logic. Note that `BaseContextCompactor` returns void — failure states (COMPRESSED, NOOP, CONTENT_TRUNCATED, etc.) must be communicated via session state or custom events.
### 10. Existing `isStructuredError` type-guard bug
`gemini-cli/packages/core/src/agent/event-translator.ts:431-438`: `isStructuredError()` checks only `typeof error === 'object' && 'message' in error && typeof error.message === 'string'`. Since every `Error` instance has `message: string`, plain `Error` objects pass this guard. In `mapError` (lines 390-429), the structured branch runs before `instanceof Error`, so plain errors get incorrect HTTP→gRPC status mapping. This should be fixed before the translator becomes the long-term session boundary. **Fix:** add `'status' in error` to the guard, or check `instanceof Error` first.
### 11. The review should distinguish architecture defects from rollout-status evidence
The interactive PR findings matter:
- stacked-branch dependency
@@ -191,33 +196,31 @@ Do not make `AgentSession` the innermost engine.
- exposes the runtime as `AgentProtocol` / `AgentSession`
- owns replay, reattach, `streamId`, `threadId`, `agent_start` / `agent_end`, and top-level event projection
3. `SubagentToolAdapter`
- exposes the same runtime as a callable tool
- projects child activity into parent-facing `tool_update` / `tool_response` or explicit child-thread events
- does not pretend every subagent is a peer top-level chat session unless that is an explicit design decision
3. `SubagentTools` (custom `BaseTool` implementations)
- subagents invoke the same shared runtime core as the main agent
- the only difference is the type mapping at the boundary (projecting child activity into parent-facing `tool_update` / `tool_response` or `threadId`-scoped events)
- `AgentTool` is explicitly rejected — it creates a fully isolated runner/session, which conflicts with the shared-core goal
- custom `BaseTool` wrappers allow subagents to share policy, routing, tool definitions, and config with the parent while keeping the door open for future resumability and richer state sharing
4. `ApprovalBridge`
- phase-1 callback bridge for approvals / elicitation
- phase-1 callback bridge for approvals / elicitation using existing callbacks
- clearly documented as temporary and non-native
- eventual migration to full elicitation bidi format
5. `GcliSessionService`
- file-backed persistence, locking, rewind, state recomputation, and crash recovery
5. `GcliFileSessionService`
- file-backed persistence wrapping existing `ChatRecordingService`
- implements 4 abstract methods from `BaseSessionService`
- inherits state-merging (app/user/temp prefixes) for free
- single-writer model — concurrent runs per session are forbidden
### Subagent conclusion
### Subagent architecture decision
The document should not imply that `AgentTool` is the only correct long-term path.
`AgentTool` is not the right abstraction for this migration. It creates an isolated runner with its own `InMemorySessionService`, meaning:
- state is NOT shared with the parent
- events are NOT projected into the parent stream
- policy, routing, and config are NOT inherited
Current ADK TS exposes:
- `FunctionTool`
- `AgentTool`
These serve different purposes.
Merged conclusion:
- if the goal is SDK-first subagent invocation over a Gemini CLI-owned modular core, `FunctionTool` or a custom `BaseTool` wrapper is a legitimate phase-1 choice
- if the goal is ADK-native nested-agent semantics, `AgentTool` is the native abstraction
The doc should describe this as a deliberate architectural choice, not as an accident.
The core requirement is that subagents use the same core loop functionality as the main agent — the only difference is how results are mapped back to the parent. Custom `BaseTool` implementations that invoke the shared `AdkRuntimeCore` achieve this. This also positions subagents for future complexity (resumability, richer state sharing, MCP-scoped tool sets) without fighting AgentTool's isolation model.
### Session conclusion
@@ -342,14 +345,14 @@ Required change:
### 4.2 Custom File-Based Persistence
Assessment: high-risk and underdesigned.
Assessment: straightforward for storage, but rewind semantics need work.
The storage layer itself is low-risk — `GcliFileSessionService` wraps existing `ChatRecordingService` and implements 4 abstract methods. DB concerns (locking, stale-writer, crash recovery) don't apply to a single-user CLI.
Required change:
- define on-disk schema
- define lock strategy
- define crash recovery
- define state-scope storage
- define rewind algorithm
- define rewind as event-log truncation + state recomputation (not just file truncation)
- state that concurrent runs per session are forbidden (single-writer)
- define how app/user/temp state prefixes map to existing workspace JSON
### 4.3 Decomposing `AgentLoopContext`
@@ -377,104 +380,142 @@ Required change:
- define confirmation/auth rollback semantics
- define artifact/version rollback expectations
### 5.3 Concurrent Sessions and Locking
### 5.3 Concurrent Sessions
Assessment: real issue, mitigation is missing.
Assessment: a single-user CLI does not need DB-level locking.
Required change:
- define file-backed equivalent of locking / single-writer policy
- state whether concurrent runs per session are forbidden
- state where serialization is enforced
- state that concurrent runs per session are forbidden (single-writer model)
- this is sufficient for a CLI — no further locking design needed
## Corrections and Softenings Relative to the Earlier Reviews
## Corrections to Earlier Reviews
These points should be preserved in the unified version so the review stays technically precise.
### Auth is bridgeable, not impossible
Header injection works for unary requests via `llmRequest.config.httpOptions.headers` and constructor-level `headers` param. The criticism is that the design doc overclaims parity — not that auth is blocked.
### 1. `FunctionTool` vs `AgentTool`
### Persistence is simpler than initially claimed
DB-level concerns (PESSIMISTIC_WRITE, stale-writer detection, concurrent append policy) come from `DatabaseSessionService` and don't apply. The file-backed service wraps existing `ChatRecordingService`. Only rewind semantics need deeper design.
The earlier reviews were correct that `FunctionalTool` is the wrong name.
But the stronger unified position is:
- `FunctionTool` is real and can be the right fit for SDK-first subagent invocation
- `AgentTool` is the native nested-agent abstraction
- the design should choose deliberately between them
### 2. `AgentTool` criticism should be more precise
Avoid overstating that `AgentTool` always creates a totally isolated in-memory boundary.
The more accurate criticism is:
- nested-runner semantics differ from top-level semantics
- session reuse/state propagation are nuanced
- the document must define intended boundaries explicitly
### 3. Auth criticism should be precise
Do not say “header injection is impossible.”
Say:
- request/header mutation exists for unary requests
- that does not justify the broader parity claim currently made in the doc
### 4. Interactive PR findings should be demoted below the core architecture issues
Keep:
- branch-stack caveat
- hook-order risk
- `_meta.legacyState` leakage
But frame them as rollout and protocol-hygiene evidence, not as the single biggest issue.
### Interactive PR findings are rollout-risk evidence, not architecture defects
The branch-stack dependency, hook-order risk, and `_meta.legacyState` leakage in PR #24297 are real but secondary to the core architecture gaps above.
## What To Change Before Principal Review
1. Add an explicit `ADK runtime -> AgentProtocol / AgentSession` translation architecture section.
2. Add a “shared runtime, different adapters” section for top-level agent vs subagent invocation.
3. Replace strong parity language with explicit labels:
- native ADK support
- bridgeable with custom logic
- blocked today
4. Add a persistence section with:
- on-disk schema
- locking
- rewind semantics
- state recomputation
- crash recovery
5. Add a parity matrix with:
- feature
- current owner
- ADK integration point
- implementable now
- bridge workaround
- degraded semantics
- long-term target
6. Add explicit phase-1 non-goals:
- no live/bidi parity
- no resumable approval parity
- no hidden claim of native subagent parity
- no concurrent multi-run semantics beyond defined limits
7. Add success criteria to the migration sequence:
- non-interactive parity
- replay/reattach parity
- approval parity
- interactive UI parity
- subagent activity parity
1. **Fix the 4 compilation blockers** (C1-C4) — these will be the first thing reviewers check
2. **Add an explicit ADK→AgentProtocol translation architecture section** — this is the real design center
3. **Add the shared-core subagent architecture** — custom `BaseTool` over shared runtime, not `AgentTool`
4. **Use `BaseContextCompactor`** for compaction instead of embedding it in the model layer
5. **Replace “100% possible today” and similar language** with explicit labels: native / bridgeable / blocked
6. **Add a parity matrix** (feature | ADK mechanism | status | bridge workaround | long-term target)
## Phased Migration Plan
### Phase 0: Foundation (current state → near-term)
**Goal:** Establish the shared runtime core and translation layer.
- [ ] Fix compilation blockers (C1-C4) in design doc code samples
- [ ] Implement `AdkRuntimeCore` wrapping ADK `Runner` + `LlmAgent`
- [ ] Implement `AdkEventTranslator` (ADK `Event` → gemini-cli `AgentEvent`)
- [ ] Implement `GcliAgentModel extends BaseLlm` with auth header injection and model rewrite
- [ ] Implement `GcliFileSessionService extends BaseSessionService` wrapping `ChatRecordingService`
- [ ] Implement `GcliContextCompactor implements BaseContextCompactor` wrapping `ChatCompressionService`
- [ ] Wire behind `experimental.adk` feature gate
**Non-goals for Phase 0:**
- No interactive flow changes
- No subagent support
- No live/bidi connections
- No resumable approvals
**Exit criteria:** Non-interactive flow produces identical output behind feature gate.
### Phase 1: Non-Interactive Parity
**Goal:** Feature-gated non-interactive flow matches legacy behavior.
- [ ] Tool output masking via `BaseLlmRequestProcessor`
- [ ] Model routing via `beforeModelCallback` + `RoutingContextBuilder` adapter
- [ ] Policy enforcement via `beforeToolCallback` using existing callbacks (temporary bridge)
- [ ] Quota/availability handling via `beforeModelCallback`
- [ ] Telemetry projection from runner event stream (correlate on `functionCall.id`)
- [ ] Token usage capture (solve the `LlmResponse` → event gap)
**Non-goals for Phase 1:**
- No interactive UI changes
- No plan mode switching
- No subagents
- Approvals are callback-based, not resumable
**Exit criteria:** Non-interactive tests pass with `experimental.adk.enabled = true`. Legacy path remains default.
### Phase 2: Interactive Flow Migration
**Goal:** Interactive flow uses the same ADK runtime behind `TopLevelSessionAdapter`.
- [ ] `TopLevelSessionAdapter` exposing runtime as `AgentProtocol` / `AgentSession`
- [ ] Stream lifecycle (`agent_start` / `agent_end` / `streamId` ownership)
- [ ] Replay/reattach semantics matching current `AgentSession.stream()` behavior
- [ ] Plan mode via dynamic `BaseToolset` + `InstructionProvider` + mode state
- [ ] `ApprovalBridge` for tool confirmations using existing UI callbacks
- [ ] Elicitation via existing callbacks (not yet bidi)
- [ ] `_meta.legacyState` bridge for UI rendering (documented as temporary)
**Non-goals for Phase 2:**
- No live/bidi parity (`runLiveImpl` is still a stub in ADK TS)
- No true mid-turn interruption (steering limited to once-per-LLM-call)
- No resumable approvals across process death
- No concurrent multi-stream sessions
**Exit criteria:** Interactive flow works behind feature gate. Legacy path remains default.
### Phase 3: Subagents and SDK Readiness
**Goal:** Subagents share the core runtime. The architecture is SDK-reusable.
- [ ] Custom `BaseTool` subagent wrappers over shared `AdkRuntimeCore`
- [ ] `threadId`-scoped event projection for child activity
- [ ] Shared policy, routing, and config inheritance
- [ ] Remove `_meta.legacyState` — replace with proper event/adapter separation
- [ ] Migrate approval bridge to native ADK elicitation/bidi (when available)
- [ ] Define SDK public API surface
**Non-goals for Phase 3:**
- Full ADK Dev UI compatibility
- Agent-to-agent transfer via ADK's native mechanism
**Exit criteria:** Subagent invocation works. Architecture is documented for SDK consumers.
## Positive Signals
The design is not wrong across the board. These parts are promising:
- Feature-gating behind `experimental.adk` is the right rollout pattern
- Tool output masking maps cleanly to request-preprocessing
- Dynamic toolset + `InstructionProvider` for plan mode is a good ADK-native fit
- The phased migration checklist with tracking issues shows good engineering discipline
- The doc correctly identifies 3 known ADK gaps (sections 5.1-5.3)
- `AgentProtocol` / `AgentSession` is the right consumer-facing boundary
- The policy adapter pattern is directionally sound even with reduced phase-1 semantics
- isolating the migration behind flags and separate paths is the right rollout shape
- tool output masking maps relatively well to request-preprocessing
- hybrid tool isolation is aligned with current subagent registry/message-bus behavior
- the introduction of `AgentProtocol` / `AgentSession` is the right boundary; it just needs a much more explicit runtime/translation design behind it
## Concern Decomposition
| Concern | Level | ADK Mechanism |
|---|---|---|
| Auth headers | Transport | `BaseLlm` constructor / `httpOptions.headers` |
| Model rewrite | Transport | `BaseLlm.generateContentAsync` model override |
| Model routing | Agent | `beforeModelCallback` + `RoutingContextBuilder` |
| Token compaction | Agent | `BaseContextCompactor` (native ADK) |
| Quota/availability | Agent | `beforeModelCallback` |
| Tool masking | Agent | `BaseLlmRequestProcessor` |
| Tool confirmations | Agent+Consumer | `beforeToolCallback` + existing callbacks (phase 1) |
| Telemetry | Consumer | Runner event wrapper (correlate on `functionCall.id`) |
| UI rendering | Consumer | Event subscriber / adapter |
| Subagent invocation | Agent | Custom `BaseTool` over shared runtime core |
## Final Verdict
The design should be reframed as:
The design is architecturally correct in its goal. To be principal-review ready:
- one shared ADK-based runtime core
- one explicit translation layer from ADK runtime semantics to Gemini CLI session semantics
- one temporary callback-based bridge for approvals/elicitation
- different adapters for top-level agent sessions vs subagent-as-tool invocation
1. Fix the 4 compilation blockers
2. Add the ADK→AgentProtocol translation architecture as the design center
3. Adopt native `BaseContextCompactor` instead of model-layer compaction
4. Use custom `BaseTool` subagents over shared core (not `AgentTool`)
5. Downgrade parity claims to match reality
6. Add the phased plan with explicit non-goals per phase
If the doc is revised around that architecture and stops overclaiming what is already supported natively by ADK TS, it can become principal-review ready.
The migration path is: shared runtime core → translation layer → adapters (top-level session, subagent tools, approval bridge). Each phase ships independently behind the feature gate.