From f5dce650a4e454a93a3dfc9ffea6f00cfd1d6df3 Mon Sep 17 00:00:00 2001
From: Adam Weidman <adamweidman@Adams-MacBook-Pro.local>
Date: Tue, 7 Apr 2026 11:24:22 -0400
Subject: [PATCH] Add Coast Assist auth and Clearcut telemetry validation to
 unified review

Auth:
- Coast Assist / LOGIN_WITH_GOOGLE uses AuthClient.request() against a
  Code Assist backend, NOT the standard Gemini API
- GcliAgentModel must extend BaseLlm directly with dual transport
  (API-key via httpOptions.headers, Coast Assist via AuthClient.request())
- OAuth flow, token refresh, credential caching all work as-is

Telemetry:
- Stream-interceptor approach replaced with BasePlugin recommendation
- ClearcutTelemetryPlugin for token counts (afterModelCallback),
  per-tool timing (beforeToolCallback/afterToolCallback with functionCallId),
  agent lifecycle, and model errors
- Corrects earlier claim: token usage IS available on ADK events
  (llm_agent.ts spreads LlmResponse into events)
- HTTP-level metrics captured in GcliAgentModel wrapper
- ~70% of Clearcut metrics stay in existing call sites unchanged

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/adk-replat/unified-review.md | 95 +++++++++++++++++++++----------
 1 file changed, 64 insertions(+), 31 deletions(-)

diff --git a/docs/adk-replat/unified-review.md b/docs/adk-replat/unified-review.md
index 569ec4a77b..cd25ab2b72 100644
--- a/docs/adk-replat/unified-review.md
+++ b/docs/adk-replat/unified-review.md
@@ -19,7 +19,7 @@ Scope reviewed:
 
 The design is directionally sound and the migration goal is correct. However, the document is not yet principal-review ready due to:
 
-1. **Three non-compilable code samples** that will immediately erode reviewer confidence
+1. **Four non-compilable code samples** (C1-C4) that will immediately erode reviewer confidence
 2. **An under-specified translation boundary** — the ADK→AgentProtocol mapping is the real architecture, but the doc treats it as an afterthought
 3. **Overclaimed parity** in routing, telemetry, and approvals
 4. **Missing phased plan** with clear milestones and non-goals
@@ -91,19 +91,25 @@ Several statements currently read like native ADK parity when the real meaning i
 
 That distinction needs to be explicit everywhere the doc uses strong language like “validated” or “100% possible today.”
 
-### 4. Auth is materially overstated
+### 4. Auth architecture is deeper than header injection — Coast Assist requires a custom transport
 
-The auth section has real issues:
-- `LlmRequest` has no top-level `headers` field in `adk-js/core/src/models/llm_request.ts`
-- backend selection still depends on constructor-level `apiKey` vs Vertex config in `adk-js/core/src/models/google_llm.ts`
-- live/bidi handling is still a separate path with different behavior
+The auth section has two distinct problems:
 
-Important nuance:
-- unary header injection via `llmRequest.config.httpOptions.headers` and constructor `headers` does exist in `google_llm.ts`
-- but that is not enough to claim general auth support across the full runtime
+**Problem A: API surface mismatch (C1).** `LlmRequest` has no `headers` field. Per-request headers are possible via `llmRequest.config.httpOptions.headers`, and `Gemini` constructor accepts a `headers` param (`google_llm.ts:57,79`). This is fixable.
 
-Conclusion:
-- the doc should present auth as a bridge with caveats or a real blocker, not as a solved integration
+**Problem B: Coast Assist / `LOGIN_WITH_GOOGLE` uses a different backend entirely.** The current `CodeAssistServer` (`packages/core/src/code_assist/server.ts:407`) does NOT call the standard Gemini API. It uses `google-auth-library`'s `AuthClient.request()` to talk to a Code Assist backend endpoint. The OAuth flow (`packages/core/src/code_assist/oauth2.ts`) handles browser launch (L305), local callback server (L492-615), token exchange (L546-550), credential caching (L739-750), and automatic token refresh via `OAuth2Client`.
+
+This means `GcliAgentModel` cannot wrap a standard `Gemini` BaseLlm and just inject headers. For Coast Assist auth, `GcliAgentModel` must **extend `BaseLlm` directly** and implement its own HTTP transport using `AuthClient.request()`, translating between `LlmRequest`/`LlmResponse` and the Code Assist protocol.
+
+What works:
+- The OAuth dance itself (browser launch, token caching, refresh) runs before the agent session starts — no blocking inside `generateContentAsync`
+- `OAuth2Client.getAccessToken()` handles mid-session token refresh transparently
+- Per-request header injection works for API-key-based auth via `httpOptions.headers`
+
+What the design doc must change:
+- Stop assuming `Gemini` as the inner model — `GcliAgentModel extends BaseLlm` directly
+- Define two transport paths: standard Gemini API (API key) and Code Assist backend (OAuth)
+- The dummy-key workaround is irrelevant for Coast Assist — you never call `GoogleGenAI.models.generateContent`
 
 ### 5. Approval / elicitation bridging is acceptable only as a temporary non-native bridge
 
@@ -141,10 +147,28 @@ Routing:
 - model banning via simulated error is not the same as model routing
 
 Telemetry:
-- `event.id` is not a safe per-tool correlation key — use `functionCall.id`
-- ADK can batch multiple function calls or responses in one event (`getFunctionCalls` extracts multiple from one event; `mergeParallelFunctionResponseEvents` merges multiple responses)
-- token usage is not available on ADK `Event` — it lives on `LlmResponse` and is lost by the time events reach the runner output stream. The doc needs to define how token metrics reach telemetry
-- the sample APIs in the design do not match current `ClearcutLogger`
+- **The stream-interceptor approach in section 3.6 is wrong.** The correct mechanism is ADK's `BasePlugin` system.
+- `event.id` is not a safe per-tool correlation key — use `functionCall.id` via `beforeToolCallback`/`afterToolCallback`
+- ADK can batch multiple function calls or responses in one event — stream interception cannot distinguish them, but plugin hooks fire per-tool with distinct `functionCallId`
+- **Correction:** Token usage IS available on ADK events. `llm_agent.ts:831-834` spreads `LlmResponse` (including `usageMetadata`) into the event via `createEvent({...modelResponseEvent, ...llmResponse})`. Additionally, `afterModelCallback` receives the raw `LlmResponse` with `usageMetadata` directly — providing a second capture point.
+- HTTP-level metrics (status code, request duration) are NOT on ADK events — must be captured in the `GcliAgentModel` wrapper
+- The sample APIs in the design do not match the current `ClearcutLogger` taxonomy (~195 distinct metadata keys)
+
+**Recommended telemetry architecture:**
+
+| Metric Category | Capture Mechanism | ADK Hook |
+|---|---|---|
+| Token counts (input, output, cached, thinking) | `afterModelCallback` → `LlmResponse.usageMetadata` | `BasePlugin.afterModelCallback` |
+| Per-tool timing | Timer keyed on `functionCallId` | `BasePlugin.beforeToolCallback` / `afterToolCallback` |
+| Agent lifecycle | Start/end tracking | `BasePlugin.beforeAgentCallback` / `afterAgentCallback` |
+| Model errors | Error classification | `BasePlugin.onModelErrorCallback` |
+| HTTP status, API duration | Captured in model wrapper | `GcliAgentModel.generateContentAsync` |
+| Routing decisions, latency | Captured in model or beforeModelCallback | `beforeModelCallback` / model wrapper |
+| Context token breakdowns (system, tools, history) | Computed in request pipeline | Model wrapper |
+| Tool approval decisions | Injected from approval layer | `ApprovalBridge` |
+| Session config, compression, rewind, IDE, extensions, billing, slash commands, hooks, plan execution, onboarding | **Unchanged** — stays in current call sites | Existing gemini-cli code |
+
+~30% of Clearcut metrics map to ADK plugin hooks. ~70% remain in their current call sites unchanged.
 
 Plan mode:
 - current behavior is broader than prompt + tool filtering
@@ -236,15 +260,17 @@ The right formulation is:
 
 ### 3.1 Authentication Flexibility
 
-Assessment: bridgeable, not validated parity.
+Assessment: implementable, but the design doc’s approach is wrong for Coast Assist.
 
 Keep:
 - ADK `Gemini` does not natively accept the CLI’s auth shape
 
 Change:
-- fix request-shape examples
-- downgrade the workaround claim
-- state ownership of refresh, header precedence, and live/bidi caveats
+- `GcliAgentModel` must extend `BaseLlm` directly, not wrap `Gemini` — Coast Assist uses `AuthClient.request()` against a non-standard backend, not `GoogleGenAI`
+- fix `request.headers` to `llmRequest.config.httpOptions.headers` (C1)
+- define two transport paths: API-key (standard Gemini) and OAuth (Code Assist)
+- token refresh is transparent via `OAuth2Client` — this works as-is
+- browser-based OAuth runs before agent session starts — no blocking concern
 
 ### 3.2 Model Steering and Mid-Stream Injection
 
@@ -281,14 +307,17 @@ Required change:
 - list policy inputs lost unless extra context plumbing is added
 - separate short-term callback bridge from long-term native confirmation flow
 
-### 3.6 Telemetry and Observability
+### 3.6 Telemetry and Observability (Clearcut)
 
-Assessment: the design direction is fine, but the example implementation is incorrect.
+Assessment: fully implementable, but the stream-interceptor approach must be replaced with a `BasePlugin`.
 
 Required change:
-- correlate tools by function-call id, not event id
-- account for merged / parallel tool events
-- replace fake API examples with a telemetry parity matrix
+- replace stream interception with `ClearcutTelemetryPlugin extends BasePlugin`
+- use `afterModelCallback` for token counts (`LlmResponse.usageMetadata` is available)
+- use `beforeToolCallback`/`afterToolCallback` with `functionCallId` for per-tool timing
+- capture HTTP-level metrics (status code, duration) in `GcliAgentModel` wrapper
+- ~70% of Clearcut metrics stay in their current call sites unchanged — document which
+- replace fake API examples with a telemetry parity matrix showing capture mechanism per metric
 
 ### 3.7 Dynamic Model Routing and Configurability
 
@@ -390,8 +419,8 @@ Required change:
 
 ## Corrections to Earlier Reviews
 
-### Auth is bridgeable, not impossible
-Header injection works for unary requests via `llmRequest.config.httpOptions.headers` and constructor-level `headers` param. The criticism is that the design doc overclaims parity — not that auth is blocked.
+### Auth is implementable, not blocked
+API-key auth works via `httpOptions.headers`. Coast Assist / `LOGIN_WITH_GOOGLE` requires `GcliAgentModel` to extend `BaseLlm` directly with its own transport (using `AuthClient.request()`), not wrap `Gemini`. The OAuth flow, token refresh, and credential caching all work as-is — the issue was the design doc's assumption about the inner model, not auth capability.
 
 ### Persistence is simpler than initially claimed
 DB-level concerns (PESSIMISTIC_WRITE, stale-writer detection, concurrent append policy) come from `DatabaseSessionService` and don't apply. The file-backed service wraps existing `ChatRecordingService`. Only rewind semantics need deeper design.
@@ -416,7 +445,7 @@ The branch-stack dependency, hook-order risk, and `_meta.legacyState` leakage in
 - [ ] Fix compilation blockers (C1-C4) in design doc code samples
 - [ ] Implement `AdkRuntimeCore` wrapping ADK `Runner` + `LlmAgent`
 - [ ] Implement `AdkEventTranslator` (ADK `Event` → gemini-cli `AgentEvent`)
-- [ ] Implement `GcliAgentModel extends BaseLlm` with auth header injection and model rewrite
+- [ ] Implement `GcliAgentModel extends BaseLlm` with dual transport (API-key via `httpOptions.headers`, Coast Assist via `AuthClient.request()`)
 - [ ] Implement `GcliFileSessionService extends BaseSessionService` wrapping `ChatRecordingService`
 - [ ] Implement `GcliContextCompactor implements BaseContextCompactor` wrapping `ChatCompressionService`
 - [ ] Wire behind `experimental.adk` feature gate
@@ -436,8 +465,9 @@ The branch-stack dependency, hook-order risk, and `_meta.legacyState` leakage in
 - [ ] Model routing via `beforeModelCallback` + `RoutingContextBuilder` adapter
 - [ ] Policy enforcement via `beforeToolCallback` using existing callbacks (temporary bridge)
 - [ ] Quota/availability handling via `beforeModelCallback`
-- [ ] Telemetry projection from runner event stream (correlate on `functionCall.id`)
-- [ ] Token usage capture (solve the `LlmResponse` → event gap)
+- [ ] `ClearcutTelemetryPlugin extends BasePlugin` for token counts (`afterModelCallback`), per-tool timing (`beforeToolCallback`/`afterToolCallback` with `functionCallId`), agent lifecycle, model errors
+- [ ] HTTP-level telemetry (status code, duration) captured in `GcliAgentModel` wrapper
+- [ ] Verify ~70% of existing Clearcut call sites work unchanged
 
 **Non-goals for Phase 1:**
 - No interactive UI changes
@@ -496,14 +526,17 @@ The branch-stack dependency, hook-order risk, and `_meta.legacyState` leakage in
 
 | Concern | Level | ADK Mechanism |
 |---|---|---|
-| Auth headers | Transport | `BaseLlm` constructor / `httpOptions.headers` |
+| Auth (API key) | Transport | `BaseLlm` constructor / `httpOptions.headers` |
+| Auth (Coast Assist) | Transport | `BaseLlm` direct — custom transport via `AuthClient.request()` |
 | Model rewrite | Transport | `BaseLlm.generateContentAsync` model override |
 | Model routing | Agent | `beforeModelCallback` + `RoutingContextBuilder` |
 | Token compaction | Agent | `BaseContextCompactor` (native ADK) |
 | Quota/availability | Agent | `beforeModelCallback` |
 | Tool masking | Agent | `BaseLlmRequestProcessor` |
 | Tool confirmations | Agent+Consumer | `beforeToolCallback` + existing callbacks (phase 1) |
-| Telemetry | Consumer | Runner event wrapper (correlate on `functionCall.id`) |
+| Telemetry (lifecycle) | Agent | `ClearcutTelemetryPlugin` — `afterModelCallback`, `beforeToolCallback`/`afterToolCallback` |
+| Telemetry (HTTP) | Transport | `GcliAgentModel` wrapper (status code, duration) |
+| Telemetry (CLI) | Consumer | Existing call sites (~70% of Clearcut metrics, unchanged) |
 | UI rendering | Consumer | Event subscriber / adapter |
 | Subagent invocation | Agent | Custom `BaseTool` over shared runtime core |