docs(offload): update maintainer onboarding for provider architecture

This commit is contained in:
mkorwel
2026-03-16 15:40:59 -07:00
parent cc7b7e2fbc
commit a77f1f5e24
2 changed files with 66 additions and 79 deletions
+6 -26
View File
@@ -59,37 +59,17 @@ The offload system uses a **Hybrid VM + Docker** architecture designed for maxim
- **`ready`**: Final full validation (clean install + preflight) before merge. - **`ready`**: Final full validation (clean install + preflight) before merge.
- **`open`**: Provision a worktree and drop directly into a remote tmux session. - **`open`**: Provision a worktree and drop directly into a remote tmux session.
## Scenarios and workflows ## Scenario and workflows
### First-time setup ### Getting Started (Onboarding)
Run the setup command once to configure your remote environment: For a complete guide on setting up your remote environment, see the [Maintainer Onboarding Guide](../../../MAINTAINER_ONBOARDING.md).
```bash
npm run offload:setup
```
### Offloading a task ### Technical details
To start an offload task for a pull request:
```bash
npm run offload <PR_NUMBER> [action]
```
### Monitoring progress This skill uses a **Worker Provider** abstraction (`GceCosProvider`) to manage the remote lifecycle. It uses an isolated Gemini profile on the remote host (`~/.offload/gemini-cli-config`) to ensure that verification tasks do not interfere with your primary configuration.
Check status from your local shell without switching windows:
```bash
npm run offload:check <PR_NUMBER>
```
### Cleanup
Wipe old PR worktrees and kill inactive sessions:
```bash
npm run offload:clean
```
## Technical details
This skill uses an isolated Gemini profile on the remote host (`~/.offload/gemini-cli-config`) to ensure that verification tasks do not interfere with your primary configuration.
### Directory structure ### Directory structure
- `scripts/providers/`: Modular worker implementations (GCE, etc.).
- `scripts/orchestrator.ts`: Local orchestrator (syncs scripts and pops terminal). - `scripts/orchestrator.ts`: Local orchestrator (syncs scripts and pops terminal).
- `scripts/worker.ts`: Remote engine (provisions worktree and runs playbooks). - `scripts/worker.ts`: Remote engine (provisions worktree and runs playbooks).
- `scripts/check.ts`: Local status poller. - `scripts/check.ts`: Local status poller.
+60 -53
View File
@@ -6,78 +6,81 @@ preflight) to a dedicated GCP worker.
## Prerequisites ## Prerequisites
1. **Google Cloud Access**: Ensure you have access to the 1. **Google Cloud Access**: Ensure you have access to the
`gemini-cli-team-quota` project. `gemini-cli-team-quota` project.
2. **GCloud CLI**: Authenticated locally (`gcloud auth login`). 2. **GCloud CLI**: Authenticated locally (`gcloud auth login`).
3. **GitHub CLI**: Authenticated locally (`gh auth login`). 3. **GitHub CLI**: Authenticated locally (`gh auth login`).
4. **IAP Permissions**: Ensure you have the `IAP-secured Tunnel User` role on the project.
5. **Corporate Identity**: Run `gcert` (or your internal equivalent) recently to ensure SSH certificates are valid.
- **iTerm2**: (Optional) For automated window popping on macOS. ## Architecture: Hybrid VM + Container 🏗️
## Architecture: Why this setup? The offload system uses a **Worker Provider** architecture to abstract the underlying infrastructure:
The offload system uses a **Hybrid VM + Docker** architecture to balance raw 1. **GCE VM (Host)**: A high-performance machine running **Container-Optimized OS (COS)**.
power with environmental stability: 2. **maintainer-worker (Container)**: A persistent Docker container acting as your remote workstation.
3. **Resilient Connectivity**: A dual-path strategy that uses **Fast-Path SSH** by default and automatically falls back to **IAP Tunneling** if direct access is blocked.
1. **GCE VM (Raw Power)**: High-performance machines handle the "heavy lifting" ---
(full project builds, exhaustive test suites), keeping your local primary
workstation responsive and cool.
2. **Docker (Consistency)**: All development tools (`node`, `gh`, `tsx`,
`vitest`) are managed via `.gcp/Dockerfile.maintainer`. This ensures every
maintainer works in an identical environment, eliminating "it works on my
machine" issues.
3. **Persistence + Isolation**: Tmux sessions on the host VM provide
persistence (surviving disconnects), while Git Worktrees and isolated Docker
runs ensure that multiple jobs don't interfere with each other.
## Setup Workflow ## Setup Workflow
### 1. Fork & Clone ### 1. Initial Configuration (Discovery)
Start by forking the repository and cloning it to your local machine. This interactive script configures your local environment to recognize the remote worker.
```bash
gh repo fork google-gemini/gemini-cli --clone
cd gemini-cli
```
### 2. Run the Offload Setup
This interactive script will handle all the complex orchestration setup:
- Configures your GCP project and compute zone.
- Sets up a **Fast-Path SSH Alias** (`gcli-worker`) in `~/.ssh/config`.
- Creates/Identifies your **Security Fork** for autonomous work.
- Performs a **One-Shot Authentication** for Gemini and GitHub.
- Pre-clones the repository to your remote worker.
```bash ```bash
npm run offload:setup npm run offload:setup
``` ```
### 3. Provision Your Worker * **What it does**: Generates `.gemini/offload_ssh_config`, verifies project access, and establishes the initial identity handshake.
* **Connectivity**: If direct internal SSH fails, it will attempt to verify access via an IAP tunnel.
Once setup is configured, spin up your dedicated, high-performance VM: ### 2. Provisioning Your Worker (Infrastructure)
Spin up your dedicated, high-performance VM. If it already exists, this command will verify its health.
```bash ```bash
npm run offload:fleet provision npm run offload:fleet provision
``` ```
## Daily Workflow * **Specs**: n2-standard-8, 200GB PD-Balanced disk.
* **Self-Healing**: It uses a COS startup script to ensure the `maintainer-worker` container is always running.
### Offloading a PR Review ### 3. Remote Initialization
To perform a deep behavioral review of a PR on your remote worker: Once provisioned, return to the setup script to finalize the remote environment.
```bash ```bash
npm run offload <PR_NUMBER> review npm run offload:setup
``` ```
_A new iTerm2 window will pop up, instantly connected to your worker, running * **Auth Sync**: Pushes your `google_accounts.json` to the worker.
the `review-pr` skill._ * **Scoped Token**: Generates a magic link for a GitHub PAT and stores it securely on the worker.
* **Repo Clone**: Performs a filtered (shallow) clone of the repository onto the remote disk.
### Monitoring Your Jobs ---
View the real-time status of all your in-flight remote jobs: ## Daily Workflow
### Running an Offloaded Job
To perform a deep behavioral review or an agentic fix on your remote worker:
```bash
# For a review
npm run offload <PR_NUMBER> review
# For an automated fix
npm run offload <PR_NUMBER> implement
```
* **Isolation**: Each job runs in a dedicated **Git Worktree** (`~/dev/worktrees/offload-<id>`).
* **Persistence**: Jobs run inside a `tmux` session on the remote host. You can disconnect and reconnect without losing progress.
### Monitoring "Mission Control"
View the real-time state of your worker and all in-flight jobs:
```bash ```bash
npm run offload:status npm run offload:status
@@ -85,17 +88,21 @@ npm run offload:status
### Stopping Your Worker ### Stopping Your Worker
To save costs, shut down your worker when you're done for the day. The To save costs, shut down your worker when finished. The orchestrator will automatically wake it up when you start a new task.
orchestrator will automatically wake it up when you run a new task.
```bash ```bash
npm run offload:fleet stop npm run offload:fleet stop
``` ```
## Security Model ---
- **Isolation**: Each maintainer has their own dedicated VM ## Resilience & Troubleshooting
(`gcli-offload-<user>`).
- **Permissions**: The agent uses a scoped token that is read-only to the main ### "SSH Connection Failed"
repo and read/write only to your personal fork. If the setup or orchestrator reports a connection failure:
- **OS Login**: Access is managed via your Google corporate identity. 1. **Check Identity**: Run `gcert` to refresh your SSH credentials.
2. **IAP Fallback**: The system should automatically attempt IAP tunneling. If it still fails, verify your GCP project permissions.
3. **Waking Up**: If the worker was stopped, the first command may take ~30 seconds to wake the VM.
### "Worker Not Found"
If `offload:setup` can't find your worker, ensure you have run `npm run offload:fleet provision` at least once in the current project.