Add unified design review document (pre-edit baseline)

Codex-generated unified review merging claude and codex design review
findings for the ADK migration design doc. Committing as-is before
applying corrections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Adam Weidman
2026-04-06 00:31:05 -07:00
parent a31412ba83
commit 53bbc2b339
+480
View File
@@ -0,0 +1,480 @@
# Unified ADK CLI Design Review
Date: 2026-04-06
> **Rollout:** All ADK migration work is feature-gated behind `experimental.adk` flags. No behavioral changes ship without an explicit opt-in. This applies to both non-interactive and interactive flows.
Inputs merged:
- `claude-adk-cli-design-review.md` (adversarial code-level review)
- `codex-adk-cli-design-review.md`
- Follow-up review discussion on subagent architecture, session semantics, and `BaseContextCompactor`
Scope reviewed:
- `gemini-cli/docs/adk-replat/adk_migration_design_doc.md`
- `gemini-cli` `AgentProtocol` / `AgentSession` / interactive migration work
- `adk-js` runner / session / tool / security / model architecture
- PR context for interactive migration, including `google-gemini/gemini-cli#24297`
## Executive Summary
The design is directionally sound and the migration goal is correct. However, the document is not yet principal-review ready due to:
1. **Three non-compilable code samples** that will immediately erode reviewer confidence
2. **An under-specified translation boundary** — the ADK→AgentProtocol mapping is the real architecture, but the doc treats it as an afterthought
3. **Overclaimed parity** in routing, telemetry, and approvals
4. **Missing phased plan** with clear milestones and non-goals
The central architectural insight is right: the migration boundary is:
- `adk-js Event -> Gemini CLI AgentEvent`
- `adk-js session/runtime semantics -> AgentProtocol / AgentSession semantics`
That boundary carries stream lifecycle, replay/resume, event projection, approval correlation, and persistence correctness. Until the translation architecture is explicit, parity claims remain unsupported.
## Merged High-Risk Findings
### 1. The actual migration boundary is under-specified
Current `AgentProtocol` requires `send()` timing guarantees, replay/reattach semantics, `streamId`, and optional `threadId` for subagent threads in `gemini-cli/packages/core/src/agent/types.ts` and `gemini-cli/packages/core/src/agent/agent-session.ts`.
ADK emits a different event model in `adk-js/core/src/events/event.ts`.
The doc needs a first-class architecture section for:
- `AdkSessionRuntime`
- `AdkEventTranslator`
- event buffering / replay
- `threadId` mapping
- `tool_update` mapping
- `agent_start` / `agent_end` ownership
Without that, the rest of the design sits on an unstated core contract.
### 2. The “single consolidated decorator” is overloaded
The `GcliAgentModel.generateContentAsync` sketch is a useful mental model, but today it is carrying too many concerns:
- request mutation
- routing
- availability / fallback prompting
- compaction
- masking
- auth
- approval-adjacent control flow
Several of these are not model-layer problems. They are session, orchestration, policy, or UI-integration concerns.
This should be decomposed into separate layers:
- request shaping
- runtime/session ownership
- event translation
- approval / elicitation bridge
- persistence / rewind service
- telemetry projection
### 3. The design frequently conflates “possible with custom logic” with “supported by current ADK architecture”
This is one of the main wording risks.
Several statements currently read like native ADK parity when the real meaning is:
- possible with substantial Gemini CLI-owned bridge logic
- possible by bypassing native ADK facilities
- possible only for phase 1 with intentionally degraded semantics
That distinction needs to be explicit everywhere the doc uses strong language like “validated” or “100% possible today.”
### 4. Auth is materially overstated
The auth section has real issues:
- `LlmRequest` has no top-level `headers` field in `adk-js/core/src/models/llm_request.ts`
- backend selection still depends on constructor-level `apiKey` vs Vertex config in `adk-js/core/src/models/google_llm.ts`
- live/bidi handling is still a separate path with different behavior
Important nuance:
- unary header injection via `llmRequest.config.httpOptions.headers` and constructor `headers` does exist in `google_llm.ts`
- but that is not enough to claim general auth support across the full runtime
Conclusion:
- the doc should present auth as a bridge with caveats or a real blocker, not as a solved integration
### 5. Approval / elicitation bridging is acceptable only as a temporary non-native bridge
The designs callback-based approval approach is pragmatic for phase 1, but it bypasses native ADK confirmation semantics.
`adk-js/core/src/plugins/security_plugin.ts` and `adk-js/core/src/agents/processors/request_confirmation_llm_request_processor.ts` show that ADKs model is:
- `ALLOW | DENY | CONFIRM`
- persisted confirmation state
- later resumption from history
Blocking in callbacks may be acceptable for the first rollout, but the doc must say clearly:
- local-only bridge
- not resumable across process death
- not long-term SDK behavior
- intentionally non-native pending future elicitation/bidi work
### 6. Rewind and custom persistence are underdesigned
The doc treats rewind too much like storage truncation.
But `adk-js` session services in:
- `adk-js/core/src/sessions/base_session_service.ts`
- `adk-js/core/src/sessions/database_session_service.ts`
show that correctness depends on:
- append ordering
- state merging
- scope splitting
- resumption logic
- stale-writer handling
- locking / concurrency behavior
The unified conclusion is:
- file-backed persistence is plausible
- but it is not “just write JSON”
- rewind needs explicit semantics for state recomputation and resumption rollback
### 7. Routing, telemetry, and plan mode are all overclaimed
Routing:
- current routing requires a richer `RoutingContext`
- `contents.slice(0, -1)` / `contents.pop()` is not a sufficient or type-accurate mapping
- model banning via simulated error is not the same as model routing
Telemetry:
- `event.id` is not a safe per-tool correlation key
- ADK can batch multiple function calls or responses in one event
- the sample APIs in the design do not match current `ClearcutLogger`
Plan mode:
- current behavior is broader than prompt + tool filtering
- policy, approval mode, routing, and config refresh are all part of the behavior
### 8. Retry safety around side effects is missing
The “full turn reset” proposal is incomplete without a retry safety model.
The document must define behavior after:
- file writes
- shell commands
- MCP mutations
- subagent actions with external effects
Otherwise “retry from the start” can duplicate side effects.
### 9. The review should distinguish architecture defects from rollout-status evidence
The interactive PR findings matter:
- stacked-branch dependency
- hook-order risk in `#24297`
- `_meta.legacyState` leakage into protocol events
But they are secondary to the core design defect, which is the missing runtime/translation architecture.
These findings should stay in the review, but as credibility and rollout-risk evidence rather than the centerpiece.
## Recommended Architecture
This is the cleanest merged recommendation from both reviews plus follow-up discussion.
### Core principle
Use one shared ADK-based execution/runtime core, then put different adapters on top of it.
Do not make `AgentSession` the innermost engine.
### Recommended split
1. `AdkRuntimeCore`
- owns the ADK runner loop
- owns tool execution, policy integration, routing integration, masking, compaction hooks, and persistence hooks
- emits runtime-level activity/events
2. `TopLevelSessionAdapter`
- exposes the runtime as `AgentProtocol` / `AgentSession`
- owns replay, reattach, `streamId`, `threadId`, `agent_start` / `agent_end`, and top-level event projection
3. `SubagentToolAdapter`
- exposes the same runtime as a callable tool
- projects child activity into parent-facing `tool_update` / `tool_response` or explicit child-thread events
- does not pretend every subagent is a peer top-level chat session unless that is an explicit design decision
4. `ApprovalBridge`
- phase-1 callback bridge for approvals / elicitation
- clearly documented as temporary and non-native
5. `GcliSessionService`
- file-backed persistence, locking, rewind, state recomputation, and crash recovery
### Subagent conclusion
The document should not imply that `AgentTool` is the only correct long-term path.
Current ADK TS exposes:
- `FunctionTool`
- `AgentTool`
These serve different purposes.
Merged conclusion:
- if the goal is SDK-first subagent invocation over a Gemini CLI-owned modular core, `FunctionTool` or a custom `BaseTool` wrapper is a legitimate phase-1 choice
- if the goal is ADK-native nested-agent semantics, `AgentTool` is the native abstraction
The doc should describe this as a deliberate architectural choice, not as an accident.
### Session conclusion
Main agent and subagents can share the same underlying runtime core.
But they should not necessarily share the exact same public adapter.
The right formulation is:
- same core runtime
- different event/session projections depending on embedding context
## Section-by-Section Unified Review
### 3.1 Authentication Flexibility
Assessment: bridgeable, not validated parity.
Keep:
- ADK `Gemini` does not natively accept the CLIs auth shape
Change:
- fix request-shape examples
- downgrade the workaround claim
- state ownership of refresh, header precedence, and live/bidi caveats
### 3.2 Model Steering and Mid-Stream Injection
Assessment: next-turn steering is plausible now; true live interruption is still blocked.
Required change:
- split “next-step steering” from “true mid-turn interruption”
- define temporary user-visible behavior until live input-stream support exists
### 3.3 State Management and Token Compaction
Assessment: plausible, but too wrapper-centric.
Required change:
- specify whether compaction operates on persisted history, outgoing request history, or both
- define recursion guards and utility-call isolation
- define artifact ownership across sessions and subagents
### 3.4 Model Configuration and Hierarchical Overrides
Assessment: under-specified.
Required change:
- model config resolution should remain request-scoped
- preserve scoped overrides and retry-aware behavior
- explain subagent and utility-call override behavior
### 3.5 Universal Policy Enforcement
Assessment: acceptable as a bridge, not as full parity.
Required change:
- explicitly document reduced phase-1 semantics
- list policy inputs lost unless extra context plumbing is added
- separate short-term callback bridge from long-term native confirmation flow
### 3.6 Telemetry and Observability
Assessment: the design direction is fine, but the example implementation is incorrect.
Required change:
- correlate tools by function-call id, not event id
- account for merged / parallel tool events
- replace fake API examples with a telemetry parity matrix
### 3.7 Dynamic Model Routing and Configurability
Assessment: partly feasible, materially overstated.
Required change:
- remove “100% possible today”
- separate alias rewrite, fallback selection, classifier routing, and banning/rejection behavior
- define a real `RoutingContextBuilder`
### 3.8 Fallbacks and Availability Management
Assessment: incomplete.
Required change:
- split preflight fallback from post-failure transition behavior
- define retry safety
- define main-call vs utility-call fallback ownership
### 3.9 State-Driven Mode Switching
Assessment: too narrow.
Required change:
- define a single mode state source of truth
- enumerate all consumers: prompt, toolset, policy, router, UI
### 3.10 Tool Output Masking
Assessment: one of the stronger sections.
Required change:
- define whether masking mutates persisted history or only outgoing request history
- define ordering and idempotence relative to compaction
- define file/artifact lifecycle
### 4. SDK Facade / Stateful Orchestration
Assessment: this should become the architectural center of the document.
Required change:
- expand ownership boundaries
- describe the shared runtime plus adapter model
- document temporary approval/elicitation limitations explicitly
### 4.1 Hybrid Tool Instantiation
Assessment: directionally good.
Required change:
- include message bus derivation
- include MCP discovery scoping
- include recursion prevention and tool isolation details
### 4.2 Custom File-Based Persistence
Assessment: high-risk and underdesigned.
Required change:
- define on-disk schema
- define lock strategy
- define crash recovery
- define state-scope storage
- define rewind algorithm
### 4.3 Decomposing `AgentLoopContext`
Assessment: right instinct, incomplete decomposition.
Required change:
- account for message bus / confirmation routing
- account for injection queues
- account for prompt/resource registries and MCP scoping
### 5.1 Real-Time User Message Injections
Assessment: blocker analysis is incomplete.
Required change:
- mention TS live runtime gaps directly
- describe temporary UX until true interruption exists
### 5.2 Conversation Rewind and State Reversal
Assessment: too shallow today.
Required change:
- define rewind as event-log truncation plus state rollback semantics
- define confirmation/auth rollback semantics
- define artifact/version rollback expectations
### 5.3 Concurrent Sessions and Locking
Assessment: real issue, mitigation is missing.
Required change:
- define file-backed equivalent of locking / single-writer policy
- state whether concurrent runs per session are forbidden
- state where serialization is enforced
## Corrections and Softenings Relative to the Earlier Reviews
These points should be preserved in the unified version so the review stays technically precise.
### 1. `FunctionTool` vs `AgentTool`
The earlier reviews were correct that `FunctionalTool` is the wrong name.
But the stronger unified position is:
- `FunctionTool` is real and can be the right fit for SDK-first subagent invocation
- `AgentTool` is the native nested-agent abstraction
- the design should choose deliberately between them
### 2. `AgentTool` criticism should be more precise
Avoid overstating that `AgentTool` always creates a totally isolated in-memory boundary.
The more accurate criticism is:
- nested-runner semantics differ from top-level semantics
- session reuse/state propagation are nuanced
- the document must define intended boundaries explicitly
### 3. Auth criticism should be precise
Do not say “header injection is impossible.”
Say:
- request/header mutation exists for unary requests
- that does not justify the broader parity claim currently made in the doc
### 4. Interactive PR findings should be demoted below the core architecture issues
Keep:
- branch-stack caveat
- hook-order risk
- `_meta.legacyState` leakage
But frame them as rollout and protocol-hygiene evidence, not as the single biggest issue.
## What To Change Before Principal Review
1. Add an explicit `ADK runtime -> AgentProtocol / AgentSession` translation architecture section.
2. Add a “shared runtime, different adapters” section for top-level agent vs subagent invocation.
3. Replace strong parity language with explicit labels:
- native ADK support
- bridgeable with custom logic
- blocked today
4. Add a persistence section with:
- on-disk schema
- locking
- rewind semantics
- state recomputation
- crash recovery
5. Add a parity matrix with:
- feature
- current owner
- ADK integration point
- implementable now
- bridge workaround
- degraded semantics
- long-term target
6. Add explicit phase-1 non-goals:
- no live/bidi parity
- no resumable approval parity
- no hidden claim of native subagent parity
- no concurrent multi-run semantics beyond defined limits
7. Add success criteria to the migration sequence:
- non-interactive parity
- replay/reattach parity
- approval parity
- interactive UI parity
- subagent activity parity
## Positive Signals
The design is not wrong across the board. These parts are promising:
- isolating the migration behind flags and separate paths is the right rollout shape
- tool output masking maps relatively well to request-preprocessing
- hybrid tool isolation is aligned with current subagent registry/message-bus behavior
- the introduction of `AgentProtocol` / `AgentSession` is the right boundary; it just needs a much more explicit runtime/translation design behind it
## Final Verdict
The design should be reframed as:
- one shared ADK-based runtime core
- one explicit translation layer from ADK runtime semantics to Gemini CLI session semantics
- one temporary callback-based bridge for approvals/elicitation
- different adapters for top-level agent sessions vs subagent-as-tool invocation
If the doc is revised around that architecture and stops overclaiming what is already supported natively by ADK TS, it can become principal-review ready.