- Add SSE keepalive middleware (15s comment heartbeat) to prevent Cloud Run
LB from closing idle SSE connections during long tool executions
- Increase A2A server request timeout from 300s to 3600s (1 hour)
- Cache loaded tasks in NoOpTaskStore to prevent redundant GCS workspace
restores on every SDK event cycle (was restoring 5+ times per request)
- Add undici dispatcher with 10-min timeouts for bridge SSE connections
- Install gh CLI and gcloud CLI in Docker image for agent GitHub/GCP access
Streaming fixes:
- Fix A2UI text extraction preferring incremental chunks over accumulated
text by swapping push order in extractFromStreamEvent
- Defer tool approval handling to post-stream — server YOLO mode briefly
publishes awaiting_approval before auto-approving, which was breaking
the stream loop prematurely
- Always update latestPendingApprovals to clear stale approvals after
server auto-approves
Message chunking:
- Split long messages at paragraph/line boundaries to stay within Google
Chat's 4096 character limit
- Cards attach to first chunk only
Streaming tool confirmations:
- Add sendToolConfirmationStream and sendBatchToolConfirmationsStream
for SSE-based tool approval flow
README:
- Update bridge deployment to max-instances=1 (single instance for
in-memory async guard consistency)
- Add Known Limitations section
Working:
- Server YOLO mode with streaming text extraction
- Chat API push with proper threading
- Session persistence across Cloud Run restarts
- Retry logic for agent concurrency exhaustion
- Text chunking for long responses
- Per-session /yolo and /safe commands
Not working:
- Tool confirmation streaming with GEMINI_YOLO_MODE=false (executor
aborts on SSE disconnect — SDK-level issue)
- CARD_CLICKED button routing through Add-ons (text-based approval
works as fallback)
- All bot responses now pushed via Chat API instead of synchronous webhook
response — Add-ons createMessageAction ignores thread info in Spaces,
causing messages to appear as top-level instead of in-thread
- Webhook returns bare {} for empty responses to prevent Google Chat
from retrying (wrapped empty createMessageAction caused retry storm)
- Add retryStream() with exponential backoff (3 retries, 5s/10s/20s)
for transient A2A server 500/503 errors
- Add GEMINI_MODEL env var override in config (was hardcoded to
gemini-3-pro-preview which hit capacity limits)
- Extract pushAndReturn() helper for fire-and-forget Chat API sends
- Add session cancellation on /reset to stop in-flight async streams
- Implement per-session YOLO auto-approval with batch tool confirmations
(sendBatchToolConfirmations sends all approvals in one A2A message to
avoid hangs when agent needs ALL tools approved before proceeding)
- Fix threading: include thread info in Add-ons response wrapper so
replies appear in the user's thread instead of top-level
- Make tool approval async: return immediate ack, process confirmation
in background, push result via Chat API (fixes "Agent is processing..."
empty response after approve)
- Replace text-based approval with clickable Approve/Always Allow/Reject
buttons on compact Cards V2
- Wire CARD_CLICKED handler to async approval flow (fire-and-forget
with UPDATE_MESSAGE ack)
Tested via Cloud Run proxy curl suite:
/reset, simple messages, async guard, /yolo, /safe, CARD_CLICKED
(approve + reject), ADDED_TO_SPACE, empty message, cancellation.
- Add identity token auth in A2ABridgeClient for Cloud Run (K_SERVICE)
- Support CODER_AGENT_PUBLIC_URL env var for agent card URL on Cloud Run
- Strip @Bot mention prefix before slash command detection (Add-ons)
- Grant bridge SA roles/run.invoker on A2A server via IAM
Separate the Google Chat bridge from the A2A agent server so each can
scale independently on Cloud Run. The bridge is a lightweight proxy
(concurrency=80) while the agent needs concurrency=1.
- Add standalone entry point (src/chat-bridge/server.ts)
- Add Dockerfile.chat-bridge and cloudbuild-chat-bridge.yaml
- Remove chat bridge setup from app.ts
- Inline A2UI constants in a2a-bridge-client.ts (no agent deps)
- Update README for two-service architecture
Replace blocking A2A calls with streaming: webhook returns
immediate "Processing..." response, then streams results from
A2A agent and pushes them to Google Chat via REST API.
- Add ChatApiClient for proactive messaging via Chat REST API
- Add sendMessageStream() to A2ABridgeClient for SSE streaming
- Add extractFromStreamEvent() for parsing individual stream events
- Refactor handler to fire-and-forget async processing
- Fix isTerminal logic to use stream state instead of taskId presence
- Add asyncProcessing guard to prevent overlapping requests
- Add comprehensive README with deployment and setup guide
Pass thread name through Add-ons response wrapper so bot replies stay
in the user's thread instead of posting top-level messages. Add
GIT_TERMINAL_PROMPT=0 to Dockerfile to prevent git from hanging on
credential prompts, which was blocking all requests under concurrency=1.
Enable session resumability across Cloud Run restarts:
- executor.ts: Save conversation history in task metadata during
toSDKTask(), restore via setHistory() in reconstruct()
- gcs.ts: Persist conversation history as separate GCS object
(conversation.tar.gz) alongside metadata and workspace
- session-store.ts: Add optional GCS-backed persistence with periodic
flush and restore-on-startup for thread→session mappings
- handler.ts: Restore persisted sessions on initialize()
- types.ts: Add gcsBucket to ChatBridgeConfig
- app.ts: Pass GCS_BUCKET_NAME to chat bridge config
Validated end-to-end: message persists, Cloud Run restarts, follow-up
message in same thread correctly recalls prior context. Different
threads maintain isolation.
Cloud Run strips the Authorization header after using it for IAM
validation, so our JWT middleware can never see the token. When
running on Cloud Run (detected via K_SERVICE env var), skip
app-level JWT verification since Cloud Run IAM already ensures
only authorized service accounts (chat@system.gserviceaccount.com)
can reach the container.
Task history contains multiple status-update messages that may
reference the same A2UI surface. Use only the last non-empty
agent response to avoid duplicate text in Chat output.
The blocking DefaultRequestHandler accumulates intermediate status-update
events into task.history. The A2UI response content from "working" events
lives there, while the final "input-required" status has no message.
Updated extractAllParts to check history. Reverted to blocking mode
since streaming had transport issues.
Verifies Bearer tokens from Google Chat using google-auth-library.
Checks issuer (chat@system.gserviceaccount.com) and audience
(CHAT_PROJECT_NUMBER). Verification is skipped when project number
is not configured, allowing local testing without tokens.
Blocking mode only returns the final task state, missing intermediate
A2UI response content from working status-update events. Streaming
captures all events and aggregates parts into the response.
ClientFactory.createFromUrl expects the full agent card URL
(/.well-known/agent-card.json), not just the base server URL.
Also adds CHAT_BRIDGE_A2A_URL to k8s deployment and test script.
Implements a Google Chat HTTP webhook bridge that connects Google Chat
to the A2A server. Each Chat thread maps to an A2A contextId/taskId
pair. The bridge converts A2UI tool approval surfaces to Google Chat
Cards V2 with Approve/Always Allow/Reject buttons, and handles
CARD_CLICKED events to forward tool confirmations back to the A2A server.
Components:
- chat-bridge/types.ts: Google Chat event/response types
- chat-bridge/session-store.ts: Thread -> A2A session mapping
- chat-bridge/a2a-bridge-client.ts: A2A SDK client wrapper
- chat-bridge/response-renderer.ts: A2UI -> Google Chat Cards V2
- chat-bridge/handler.ts: Event handler (MESSAGE, CARD_CLICKED)
- chat-bridge/routes.ts: Express routes mounted at /chat/webhook