mirror of
https://github.com/google-gemini/gemini-cli.git
synced 2026-05-19 08:14:35 -07:00
Add unified design review document (pre-edit baseline)
Codex-generated unified review merging claude and codex design review findings for the ADK migration design doc. Committing as-is before applying corrections. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,480 @@
|
||||
# Unified ADK CLI Design Review
|
||||
|
||||
Date: 2026-04-06
|
||||
|
||||
> **Rollout:** All ADK migration work is feature-gated behind `experimental.adk` flags. No behavioral changes ship without an explicit opt-in. This applies to both non-interactive and interactive flows.
|
||||
|
||||
Inputs merged:
|
||||
- `claude-adk-cli-design-review.md` (adversarial code-level review)
|
||||
- `codex-adk-cli-design-review.md`
|
||||
- Follow-up review discussion on subagent architecture, session semantics, and `BaseContextCompactor`
|
||||
|
||||
Scope reviewed:
|
||||
- `gemini-cli/docs/adk-replat/adk_migration_design_doc.md`
|
||||
- `gemini-cli` `AgentProtocol` / `AgentSession` / interactive migration work
|
||||
- `adk-js` runner / session / tool / security / model architecture
|
||||
- PR context for interactive migration, including `google-gemini/gemini-cli#24297`
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The design is directionally sound and the migration goal is correct. However, the document is not yet principal-review ready due to:
|
||||
|
||||
1. **Three non-compilable code samples** that will immediately erode reviewer confidence
|
||||
2. **An under-specified translation boundary** — the ADK→AgentProtocol mapping is the real architecture, but the doc treats it as an afterthought
|
||||
3. **Overclaimed parity** in routing, telemetry, and approvals
|
||||
4. **Missing phased plan** with clear milestones and non-goals
|
||||
|
||||
The central architectural insight is right: the migration boundary is:
|
||||
|
||||
- `adk-js Event -> Gemini CLI AgentEvent`
|
||||
- `adk-js session/runtime semantics -> AgentProtocol / AgentSession semantics`
|
||||
|
||||
That boundary carries stream lifecycle, replay/resume, event projection, approval correlation, and persistence correctness. Until the translation architecture is explicit, parity claims remain unsupported.
|
||||
|
||||
## Merged High-Risk Findings
|
||||
|
||||
### 1. The actual migration boundary is under-specified
|
||||
|
||||
Current `AgentProtocol` requires `send()` timing guarantees, replay/reattach semantics, `streamId`, and optional `threadId` for subagent threads in `gemini-cli/packages/core/src/agent/types.ts` and `gemini-cli/packages/core/src/agent/agent-session.ts`.
|
||||
|
||||
ADK emits a different event model in `adk-js/core/src/events/event.ts`.
|
||||
|
||||
The doc needs a first-class architecture section for:
|
||||
- `AdkSessionRuntime`
|
||||
- `AdkEventTranslator`
|
||||
- event buffering / replay
|
||||
- `threadId` mapping
|
||||
- `tool_update` mapping
|
||||
- `agent_start` / `agent_end` ownership
|
||||
|
||||
Without that, the rest of the design sits on an unstated core contract.
|
||||
|
||||
### 2. The “single consolidated decorator” is overloaded
|
||||
|
||||
The `GcliAgentModel.generateContentAsync` sketch is a useful mental model, but today it is carrying too many concerns:
|
||||
- request mutation
|
||||
- routing
|
||||
- availability / fallback prompting
|
||||
- compaction
|
||||
- masking
|
||||
- auth
|
||||
- approval-adjacent control flow
|
||||
|
||||
Several of these are not model-layer problems. They are session, orchestration, policy, or UI-integration concerns.
|
||||
|
||||
This should be decomposed into separate layers:
|
||||
- request shaping
|
||||
- runtime/session ownership
|
||||
- event translation
|
||||
- approval / elicitation bridge
|
||||
- persistence / rewind service
|
||||
- telemetry projection
|
||||
|
||||
### 3. The design frequently conflates “possible with custom logic” with “supported by current ADK architecture”
|
||||
|
||||
This is one of the main wording risks.
|
||||
|
||||
Several statements currently read like native ADK parity when the real meaning is:
|
||||
- possible with substantial Gemini CLI-owned bridge logic
|
||||
- possible by bypassing native ADK facilities
|
||||
- possible only for phase 1 with intentionally degraded semantics
|
||||
|
||||
That distinction needs to be explicit everywhere the doc uses strong language like “validated” or “100% possible today.”
|
||||
|
||||
### 4. Auth is materially overstated
|
||||
|
||||
The auth section has real issues:
|
||||
- `LlmRequest` has no top-level `headers` field in `adk-js/core/src/models/llm_request.ts`
|
||||
- backend selection still depends on constructor-level `apiKey` vs Vertex config in `adk-js/core/src/models/google_llm.ts`
|
||||
- live/bidi handling is still a separate path with different behavior
|
||||
|
||||
Important nuance:
|
||||
- unary header injection via `llmRequest.config.httpOptions.headers` and constructor `headers` does exist in `google_llm.ts`
|
||||
- but that is not enough to claim general auth support across the full runtime
|
||||
|
||||
Conclusion:
|
||||
- the doc should present auth as a bridge with caveats or a real blocker, not as a solved integration
|
||||
|
||||
### 5. Approval / elicitation bridging is acceptable only as a temporary non-native bridge
|
||||
|
||||
The design’s callback-based approval approach is pragmatic for phase 1, but it bypasses native ADK confirmation semantics.
|
||||
|
||||
`adk-js/core/src/plugins/security_plugin.ts` and `adk-js/core/src/agents/processors/request_confirmation_llm_request_processor.ts` show that ADK’s model is:
|
||||
- `ALLOW | DENY | CONFIRM`
|
||||
- persisted confirmation state
|
||||
- later resumption from history
|
||||
|
||||
Blocking in callbacks may be acceptable for the first rollout, but the doc must say clearly:
|
||||
- local-only bridge
|
||||
- not resumable across process death
|
||||
- not long-term SDK behavior
|
||||
- intentionally non-native pending future elicitation/bidi work
|
||||
|
||||
### 6. Rewind and custom persistence are underdesigned
|
||||
|
||||
The doc treats rewind too much like storage truncation.
|
||||
|
||||
But `adk-js` session services in:
|
||||
- `adk-js/core/src/sessions/base_session_service.ts`
|
||||
- `adk-js/core/src/sessions/database_session_service.ts`
|
||||
|
||||
show that correctness depends on:
|
||||
- append ordering
|
||||
- state merging
|
||||
- scope splitting
|
||||
- resumption logic
|
||||
- stale-writer handling
|
||||
- locking / concurrency behavior
|
||||
|
||||
The unified conclusion is:
|
||||
- file-backed persistence is plausible
|
||||
- but it is not “just write JSON”
|
||||
- rewind needs explicit semantics for state recomputation and resumption rollback
|
||||
|
||||
### 7. Routing, telemetry, and plan mode are all overclaimed
|
||||
|
||||
Routing:
|
||||
- current routing requires a richer `RoutingContext`
|
||||
- `contents.slice(0, -1)` / `contents.pop()` is not a sufficient or type-accurate mapping
|
||||
- model banning via simulated error is not the same as model routing
|
||||
|
||||
Telemetry:
|
||||
- `event.id` is not a safe per-tool correlation key
|
||||
- ADK can batch multiple function calls or responses in one event
|
||||
- the sample APIs in the design do not match current `ClearcutLogger`
|
||||
|
||||
Plan mode:
|
||||
- current behavior is broader than prompt + tool filtering
|
||||
- policy, approval mode, routing, and config refresh are all part of the behavior
|
||||
|
||||
### 8. Retry safety around side effects is missing
|
||||
|
||||
The “full turn reset” proposal is incomplete without a retry safety model.
|
||||
|
||||
The document must define behavior after:
|
||||
- file writes
|
||||
- shell commands
|
||||
- MCP mutations
|
||||
- subagent actions with external effects
|
||||
|
||||
Otherwise “retry from the start” can duplicate side effects.
|
||||
|
||||
### 9. The review should distinguish architecture defects from rollout-status evidence
|
||||
|
||||
The interactive PR findings matter:
|
||||
- stacked-branch dependency
|
||||
- hook-order risk in `#24297`
|
||||
- `_meta.legacyState` leakage into protocol events
|
||||
|
||||
But they are secondary to the core design defect, which is the missing runtime/translation architecture.
|
||||
|
||||
These findings should stay in the review, but as credibility and rollout-risk evidence rather than the centerpiece.
|
||||
|
||||
## Recommended Architecture
|
||||
|
||||
This is the cleanest merged recommendation from both reviews plus follow-up discussion.
|
||||
|
||||
### Core principle
|
||||
|
||||
Use one shared ADK-based execution/runtime core, then put different adapters on top of it.
|
||||
|
||||
Do not make `AgentSession` the innermost engine.
|
||||
|
||||
### Recommended split
|
||||
|
||||
1. `AdkRuntimeCore`
|
||||
- owns the ADK runner loop
|
||||
- owns tool execution, policy integration, routing integration, masking, compaction hooks, and persistence hooks
|
||||
- emits runtime-level activity/events
|
||||
|
||||
2. `TopLevelSessionAdapter`
|
||||
- exposes the runtime as `AgentProtocol` / `AgentSession`
|
||||
- owns replay, reattach, `streamId`, `threadId`, `agent_start` / `agent_end`, and top-level event projection
|
||||
|
||||
3. `SubagentToolAdapter`
|
||||
- exposes the same runtime as a callable tool
|
||||
- projects child activity into parent-facing `tool_update` / `tool_response` or explicit child-thread events
|
||||
- does not pretend every subagent is a peer top-level chat session unless that is an explicit design decision
|
||||
|
||||
4. `ApprovalBridge`
|
||||
- phase-1 callback bridge for approvals / elicitation
|
||||
- clearly documented as temporary and non-native
|
||||
|
||||
5. `GcliSessionService`
|
||||
- file-backed persistence, locking, rewind, state recomputation, and crash recovery
|
||||
|
||||
### Subagent conclusion
|
||||
|
||||
The document should not imply that `AgentTool` is the only correct long-term path.
|
||||
|
||||
Current ADK TS exposes:
|
||||
- `FunctionTool`
|
||||
- `AgentTool`
|
||||
|
||||
These serve different purposes.
|
||||
|
||||
Merged conclusion:
|
||||
- if the goal is SDK-first subagent invocation over a Gemini CLI-owned modular core, `FunctionTool` or a custom `BaseTool` wrapper is a legitimate phase-1 choice
|
||||
- if the goal is ADK-native nested-agent semantics, `AgentTool` is the native abstraction
|
||||
|
||||
The doc should describe this as a deliberate architectural choice, not as an accident.
|
||||
|
||||
### Session conclusion
|
||||
|
||||
Main agent and subagents can share the same underlying runtime core.
|
||||
|
||||
But they should not necessarily share the exact same public adapter.
|
||||
|
||||
The right formulation is:
|
||||
- same core runtime
|
||||
- different event/session projections depending on embedding context
|
||||
|
||||
## Section-by-Section Unified Review
|
||||
|
||||
### 3.1 Authentication Flexibility
|
||||
|
||||
Assessment: bridgeable, not validated parity.
|
||||
|
||||
Keep:
|
||||
- ADK `Gemini` does not natively accept the CLI’s auth shape
|
||||
|
||||
Change:
|
||||
- fix request-shape examples
|
||||
- downgrade the workaround claim
|
||||
- state ownership of refresh, header precedence, and live/bidi caveats
|
||||
|
||||
### 3.2 Model Steering and Mid-Stream Injection
|
||||
|
||||
Assessment: next-turn steering is plausible now; true live interruption is still blocked.
|
||||
|
||||
Required change:
|
||||
- split “next-step steering” from “true mid-turn interruption”
|
||||
- define temporary user-visible behavior until live input-stream support exists
|
||||
|
||||
### 3.3 State Management and Token Compaction
|
||||
|
||||
Assessment: plausible, but too wrapper-centric.
|
||||
|
||||
Required change:
|
||||
- specify whether compaction operates on persisted history, outgoing request history, or both
|
||||
- define recursion guards and utility-call isolation
|
||||
- define artifact ownership across sessions and subagents
|
||||
|
||||
### 3.4 Model Configuration and Hierarchical Overrides
|
||||
|
||||
Assessment: under-specified.
|
||||
|
||||
Required change:
|
||||
- model config resolution should remain request-scoped
|
||||
- preserve scoped overrides and retry-aware behavior
|
||||
- explain subagent and utility-call override behavior
|
||||
|
||||
### 3.5 Universal Policy Enforcement
|
||||
|
||||
Assessment: acceptable as a bridge, not as full parity.
|
||||
|
||||
Required change:
|
||||
- explicitly document reduced phase-1 semantics
|
||||
- list policy inputs lost unless extra context plumbing is added
|
||||
- separate short-term callback bridge from long-term native confirmation flow
|
||||
|
||||
### 3.6 Telemetry and Observability
|
||||
|
||||
Assessment: the design direction is fine, but the example implementation is incorrect.
|
||||
|
||||
Required change:
|
||||
- correlate tools by function-call id, not event id
|
||||
- account for merged / parallel tool events
|
||||
- replace fake API examples with a telemetry parity matrix
|
||||
|
||||
### 3.7 Dynamic Model Routing and Configurability
|
||||
|
||||
Assessment: partly feasible, materially overstated.
|
||||
|
||||
Required change:
|
||||
- remove “100% possible today”
|
||||
- separate alias rewrite, fallback selection, classifier routing, and banning/rejection behavior
|
||||
- define a real `RoutingContextBuilder`
|
||||
|
||||
### 3.8 Fallbacks and Availability Management
|
||||
|
||||
Assessment: incomplete.
|
||||
|
||||
Required change:
|
||||
- split preflight fallback from post-failure transition behavior
|
||||
- define retry safety
|
||||
- define main-call vs utility-call fallback ownership
|
||||
|
||||
### 3.9 State-Driven Mode Switching
|
||||
|
||||
Assessment: too narrow.
|
||||
|
||||
Required change:
|
||||
- define a single mode state source of truth
|
||||
- enumerate all consumers: prompt, toolset, policy, router, UI
|
||||
|
||||
### 3.10 Tool Output Masking
|
||||
|
||||
Assessment: one of the stronger sections.
|
||||
|
||||
Required change:
|
||||
- define whether masking mutates persisted history or only outgoing request history
|
||||
- define ordering and idempotence relative to compaction
|
||||
- define file/artifact lifecycle
|
||||
|
||||
### 4. SDK Facade / Stateful Orchestration
|
||||
|
||||
Assessment: this should become the architectural center of the document.
|
||||
|
||||
Required change:
|
||||
- expand ownership boundaries
|
||||
- describe the shared runtime plus adapter model
|
||||
- document temporary approval/elicitation limitations explicitly
|
||||
|
||||
### 4.1 Hybrid Tool Instantiation
|
||||
|
||||
Assessment: directionally good.
|
||||
|
||||
Required change:
|
||||
- include message bus derivation
|
||||
- include MCP discovery scoping
|
||||
- include recursion prevention and tool isolation details
|
||||
|
||||
### 4.2 Custom File-Based Persistence
|
||||
|
||||
Assessment: high-risk and underdesigned.
|
||||
|
||||
Required change:
|
||||
- define on-disk schema
|
||||
- define lock strategy
|
||||
- define crash recovery
|
||||
- define state-scope storage
|
||||
- define rewind algorithm
|
||||
|
||||
### 4.3 Decomposing `AgentLoopContext`
|
||||
|
||||
Assessment: right instinct, incomplete decomposition.
|
||||
|
||||
Required change:
|
||||
- account for message bus / confirmation routing
|
||||
- account for injection queues
|
||||
- account for prompt/resource registries and MCP scoping
|
||||
|
||||
### 5.1 Real-Time User Message Injections
|
||||
|
||||
Assessment: blocker analysis is incomplete.
|
||||
|
||||
Required change:
|
||||
- mention TS live runtime gaps directly
|
||||
- describe temporary UX until true interruption exists
|
||||
|
||||
### 5.2 Conversation Rewind and State Reversal
|
||||
|
||||
Assessment: too shallow today.
|
||||
|
||||
Required change:
|
||||
- define rewind as event-log truncation plus state rollback semantics
|
||||
- define confirmation/auth rollback semantics
|
||||
- define artifact/version rollback expectations
|
||||
|
||||
### 5.3 Concurrent Sessions and Locking
|
||||
|
||||
Assessment: real issue, mitigation is missing.
|
||||
|
||||
Required change:
|
||||
- define file-backed equivalent of locking / single-writer policy
|
||||
- state whether concurrent runs per session are forbidden
|
||||
- state where serialization is enforced
|
||||
|
||||
## Corrections and Softenings Relative to the Earlier Reviews
|
||||
|
||||
These points should be preserved in the unified version so the review stays technically precise.
|
||||
|
||||
### 1. `FunctionTool` vs `AgentTool`
|
||||
|
||||
The earlier reviews were correct that `FunctionalTool` is the wrong name.
|
||||
|
||||
But the stronger unified position is:
|
||||
- `FunctionTool` is real and can be the right fit for SDK-first subagent invocation
|
||||
- `AgentTool` is the native nested-agent abstraction
|
||||
- the design should choose deliberately between them
|
||||
|
||||
### 2. `AgentTool` criticism should be more precise
|
||||
|
||||
Avoid overstating that `AgentTool` always creates a totally isolated in-memory boundary.
|
||||
|
||||
The more accurate criticism is:
|
||||
- nested-runner semantics differ from top-level semantics
|
||||
- session reuse/state propagation are nuanced
|
||||
- the document must define intended boundaries explicitly
|
||||
|
||||
### 3. Auth criticism should be precise
|
||||
|
||||
Do not say “header injection is impossible.”
|
||||
|
||||
Say:
|
||||
- request/header mutation exists for unary requests
|
||||
- that does not justify the broader parity claim currently made in the doc
|
||||
|
||||
### 4. Interactive PR findings should be demoted below the core architecture issues
|
||||
|
||||
Keep:
|
||||
- branch-stack caveat
|
||||
- hook-order risk
|
||||
- `_meta.legacyState` leakage
|
||||
|
||||
But frame them as rollout and protocol-hygiene evidence, not as the single biggest issue.
|
||||
|
||||
## What To Change Before Principal Review
|
||||
|
||||
1. Add an explicit `ADK runtime -> AgentProtocol / AgentSession` translation architecture section.
|
||||
2. Add a “shared runtime, different adapters” section for top-level agent vs subagent invocation.
|
||||
3. Replace strong parity language with explicit labels:
|
||||
- native ADK support
|
||||
- bridgeable with custom logic
|
||||
- blocked today
|
||||
4. Add a persistence section with:
|
||||
- on-disk schema
|
||||
- locking
|
||||
- rewind semantics
|
||||
- state recomputation
|
||||
- crash recovery
|
||||
5. Add a parity matrix with:
|
||||
- feature
|
||||
- current owner
|
||||
- ADK integration point
|
||||
- implementable now
|
||||
- bridge workaround
|
||||
- degraded semantics
|
||||
- long-term target
|
||||
6. Add explicit phase-1 non-goals:
|
||||
- no live/bidi parity
|
||||
- no resumable approval parity
|
||||
- no hidden claim of native subagent parity
|
||||
- no concurrent multi-run semantics beyond defined limits
|
||||
7. Add success criteria to the migration sequence:
|
||||
- non-interactive parity
|
||||
- replay/reattach parity
|
||||
- approval parity
|
||||
- interactive UI parity
|
||||
- subagent activity parity
|
||||
|
||||
## Positive Signals
|
||||
|
||||
The design is not wrong across the board. These parts are promising:
|
||||
|
||||
- isolating the migration behind flags and separate paths is the right rollout shape
|
||||
- tool output masking maps relatively well to request-preprocessing
|
||||
- hybrid tool isolation is aligned with current subagent registry/message-bus behavior
|
||||
- the introduction of `AgentProtocol` / `AgentSession` is the right boundary; it just needs a much more explicit runtime/translation design behind it
|
||||
|
||||
## Final Verdict
|
||||
|
||||
The design should be reframed as:
|
||||
|
||||
- one shared ADK-based runtime core
|
||||
- one explicit translation layer from ADK runtime semantics to Gemini CLI session semantics
|
||||
- one temporary callback-based bridge for approvals/elicitation
|
||||
- different adapters for top-level agent sessions vs subagent-as-tool invocation
|
||||
|
||||
If the doc is revised around that architecture and stops overclaiming what is already supported natively by ADK TS, it can become principal-review ready.
|
||||
Reference in New Issue
Block a user