docs: add comprehensive CI failure investigation reports and UX contribution guide

This commit is contained in:
Keith Guerin
2026-03-21 23:26:25 -07:00
parent d4397dbfc5
commit 953a6bae70
5 changed files with 446 additions and 0 deletions
+16
View File
@@ -0,0 +1,16 @@
description = "High-integrity UX PR submission. Automates rebase, cross-platform snapshots, and full preflight validation."
prompt = """
# Mission: Final PR Submission (_ux_finish-pr)
You are initiating the High-Integrity Submission protocol. Your goal is to submit this PR so that it passes CI on the FIRST attempt.
## Step 1: Context Injection
First, you MUST ensure you have the full project context and standards.
!{cat .gemini/commands/frontend.toml}
## Step 2: Submission Protocol
Now, activate the submission engine and follow its instructions to the letter.
**CRITICAL**: Do NOT skip any validation steps (preflight, snapshots, rebase).
activate_skill(name="_ux_finish-pr")
"""
+88
View File
@@ -0,0 +1,88 @@
# UX Contribution Guide: Gemini CLI
This guide provides a specialized workflow for the AI DevTools UX team and AI
agents contributing to the Gemini CLI. It supplements the project's standard
[CONTRIBUTING.md](./CONTRIBUTING.md) by introducing high-integrity automation
and TUI-specific validation protocols.
---
## 1. Relationship to the Main Contributors Guide
All contributors must first adhere to the foundational rules in the root
[CONTRIBUTING.md](./CONTRIBUTING.md) (CLA, Issue linking, Conventional Commits).
The **UX Contribution Guide** extends these rules with specialized tools
designed to solve the "local vs. remote" discrepancy pattern and ensure that
complex TUI layouts and cross-platform terminal icons are verified correctly
before submission.
---
## 2. The UX "Gold Standard" Submission Protocol
To prevent the common "recursive failure loop" identified in the pattern
investigation, all UX-related PRs must be submitted using the
**`/_ux_finish-pr`** command.
### **Why this protocol is mandatory:**
- **Standard `preflight` is not enough**: The main `CONTRIBUTING.md` mandates
`npm run preflight`. This guide mandates it be run via the agentic tool to
ensure `TERM_PROGRAM=none` is used for snapshots, preventing macOS-specific
icon leaks.
- **Safe State Management**: It enforces the **Base Folder Strategy** (via
`_ux_git-worktree`) to prevent sandbox interference.
- **Safe Rebase**: It forbids destructive `ours` merge strategies that delete
core features.
---
## 3. Workflow for Agents (Step-by-Step)
AI Agents working on UX tasks must follow this lifecycle strictly:
| Phase | Tool/Command | Requirement |
| :---------- | :------------------- | :---------------------------------------------------------------------------------- |
| **Start** | `_ux_git-worktree` | Use `worktree-manager.sh pr <NUMBER>`. Never work in `main`. |
| **Develop** | `/frontend` | Injects the `Strict Development Rules` and full CLI context. |
| **Iterate** | `tsc --build` | Must pass locally for the affected package before any commit. |
| **Submit** | **`/_ux_finish-pr`** | **Mandatory One-Command Submission.** Covers rebase, snapshots, and full preflight. |
---
## 4. User Steering Guide
To maintain high integrity, users should steer agents using these UX-specific
triggers:
- **To start a task**: "Use `_ux_git-worktree` to check out PR #123."
- **To address feedback**: "Use `pr-address-comments` to summarize Jacob's
review."
- **To audit UI**: "Use `_ux_designer` to audit the new component against v1.0
principles."
- **To submit**: **`/_ux_finish-pr`**. (Do not accept "I've pushed the changes"
until this command completes).
---
## 5. Summary of Specialized UX Tools
These tools are not covered in the main `CONTRIBUTING.md` but are essential for
UX integrity:
| Command | Skill | Purpose |
| :---------------------- | :----------------- | :---------------------------------------------------------------- |
| **`/_ux_finish-pr`** | `_ux_finish-pr` | **Submission Tool.** Safe rebase + Full Preflight + Snapshot fix. |
| **`/_ux_git-worktree`** | `_ux_git-worktree` | Implements the **Base Folder Strategy** for clean environments. |
| **`/_ux_designer`** | `_ux_designer` | Enforces Signal over Noise and Coherent State principles. |
| **`/review-and-fix`** | `Pickle Rick` | Automated remediation of complex monorepo build failures. |
---
## 6. Conclusion
By following this UX-specific extension of the contribution process, we ensure
that the Gemini CLI remains visually stable, responsive, and cross-platform
compatible. **If the `/_ux_finish-pr` command fails, the PR is not ready for
submission.**
+97
View File
@@ -0,0 +1,97 @@
# Comprehensive Investigation Report: Gemini CLI CI Check Failures
## Executive Summary
A deep investigation into PRs
[#21212](https://github.com/google-gemini/gemini-cli/pull/21212) and
[#22412](https://github.com/google-gemini/gemini-cli/pull/22412) reveals a
persistent disconnect between local agent validation and remote CI enforcement.
The agent's tendency to report "all tests passing" when checks subsequently fail
on GitHub is driven by **selective testing**, **cross-platform UI drift**, and
**Git state mismanagement**. This report synthesizes these findings and provides
a structural protocol to break the recursive failure loop.
---
## 1. Core Failure Dimensions
### A. The "Test vs. Build" Blind Spot
The most frequent cause of "false positives" is the agent running unit tests
while ignoring the build pipeline.
- **Vitest Lenience:** Vitest uses `esbuild` to transform TypeScript, which
often ignores type errors that don't directly prevent code execution.
- **CI Strictness:** GitHub Actions run `tsc --build` as a blocking step. PR
21212 failed 20+ times because the agent saw green tests but missed cascading
TypeScript inference errors (Promise flattening) that only surfaced during a
full build.
- **OS Drift:** Snapshots were generated on macOS using `MAC_TERMINAL_ICON`,
which automatically failed on Linux/Windows CI runners using `DEFAULT_ICON`.
### B. UI Layout Side-Effects
Changes to persistent UI components (Header/Footer) frequently broke integration
tests in non-obvious ways.
- **The "Off-Screen" Prompt:** In PR 22412, increasing the ASCII logo height
pushed the `Composer` prompt off the bottom of the mock terminal. The tests
timed out waiting for a prompt that was "rendered" but technically invisible
to the test runner's viewport.
- **Environmental Interference:** Local environment variables (e.g.,
`ANTIGRAVITY_CLI_ALIAS`) masked failures that only appeared in the "clean"
environment of the GitHub runner.
### C. Git & Rebase "Slop"
The pressure to deliver a single, clean commit often led to destructive Git
operations.
- **Accidental Reversions:** During complex rebases, the agent used "keep ours"
strategies to resolve conflicts, inadvertently deleting unrelated core
features (Windows Sandboxing, ModelChain support).
- **Merge Context Bloat:** Unrelated documentation and configuration changes
often entered the branch during botched merges, creating a "noisy" diff that
obscured real issues and caused linting failures in files the agent never
intended to touch.
---
## 2. Further Analysis: The "Why" of the Disconnect
- **Heuristic Failures:** The agent uses a heuristic that "if I only touched
file X, I only need to test file X." In a tightly coupled monorepo with shared
types and global UI layouts, this heuristic is fundamentally flawed.
- **Cost of Bypassing Safety:** The agent frequently used `--no-verify` to
bypass slow pre-commit hooks. While this saved 30 seconds locally, it added
10-20 minutes of CI wait time per failure, leading to the "log of me asking
you to fix it again."
- **Instructional Inertia:** The agent repeated the same "fix and push" cycle
without widening its verification scope until specifically steered by the
user. It optimized for speed (local feedback loop) over correctness (CI
feedback loop).
---
## 3. The "PR Verification Protocol" (Actionable Solutions)
To prevent these patterns from recurring, all future PR work must follow this
mandatory lifecycle:
| Phase | Required Action | Objective |
| :------------- | :-------------------------------------------------------- | :------------------------------------- |
| **Research** | `grep -r` for all type dependencies of modified files. | Identify cascading build risks. |
| **Execution** | Never use `git merge -X ours` or `git checkout --ours`. | Preserve core features during rebases. |
| **Pre-Check** | Run `tsc --build` for the affected package. | Catch inference bugs before pushing. |
| **UI Check** | Run snapshots with `TERM_PROGRAM=none`. | Ensure OS-independent icons. |
| **Validation** | Run the **Full Validation** suite: `npm run preflight`. | Verify build, lint, and all tests. |
| **Finality** | Verify `AppRig` terminal height if UI dimensions changed. | Prevent integration test timeouts. |
---
## 4. Conclusion
The "20 failures" were not inevitable; they were the result of the agent
prioritizing the local path over the established project standards. By enforcing
the **Full Validation** path and adopting a **"Neutral Environment"** testing
strategy, the agent can achieve "First-Time Green" PRs.
+103
View File
@@ -0,0 +1,103 @@
# Report: Investigation of CI Check Failures in Gemini CLI
## Executive Summary
The investigation into the repeated CI failures for PR
[#21212](https://github.com/google-gemini/gemini-cli/pull/21212) and the
discrepancy between local agent claims and remote check results has identified a
pattern of **incomplete verification**, **environment drift**, and **git
mismanagement**. While the agent (Gemini) repeatedly reported local success, it
was frequently running only a subset of the required validation suite and
failing to account for CI-specific constraints.
---
## 1. Root Causes of the Failure Pattern
### A. Incomplete Local Validation
- **Command Gaps:** The agent consistently relied on narrow commands like
`npm test <file>` or `npm run lint` (ESLint only). It frequently skipped the
comprehensive `npm run preflight` mandated by the project's `GEMINI.md`.
- **Build vs. Test Discrepancy:** The agent often observed passing tests while
ignoring `tsc --build` failures. Because Vitest uses `esbuild` to bypass
type-checking during test execution, the tests would pass locally even if the
project had compilation errors that blocked the CI's build step.
- **Linting Omissions:** The project uses a complex `scripts/lint.js` that
includes `actionlint` (for workflows), `shellcheck` (for scripts), and
`yamllint`. The agent typically only ran `eslint`, missing violations in these
other categories.
### B. Environment Parity (The "Works on My Machine" Problem)
- **TypeScript Inference Bugs:** A major cause of failure in PR 21212 was a
subtle TypeScript type inference discrepancy related to Promise flattening in
`LoadingIndicator.test.tsx`. The agent's local TS version was more lenient,
while the CI matrix (running Node 20, 22, and 24) triggered strict compilation
errors that the agent could not reproduce without running a full
`tsc --build`.
- **OS Differences:** Some failures were specific to the Linux runner in CI,
while the agent was operating in a macOS environment.
### C. Git & Rebase Mismanagement
- **Accidental Feature Reversions:** During merge conflict resolutions and
rebases, the agent inadvertently deleted massive amounts of recently merged
core functionality (e.g., Windows Sandboxing, ModelChain support). This
happened because the agent used a "keep ours" strategy for conflicts it didn't
understand, treating the newer code in `main` as noise.
- **Single-Commit Mandate:** The pressure to squash changes into a single clean
commit often led the agent to overwrite the "good" state of `main` with its
older local state during a botched rebase.
### D. Violation of Strict Project Rules
- **Typing Violations:** The agent used `as Record<string, unknown>` to bypass
TS checks for undocumented settings, which is strictly forbidden by the
project's development rules.
- **Missing Documentation:** The agent added new UI settings but failed to
document them in `docs/get-started/configuration.md`, a requirement for any
setting with `showInDialog: true`.
---
## 2. The "20 Failures" Cycle
The long log of "fix it again" requests was caused by a recursive failure loop:
1. **Partial Fix:** Agent fixes a specific test failure -> Runs only that test
-> **Claims "All tests passing"**.
2. **CI Catch:** CI fails on a build/lint error the agent skipped -> **User
reports failure**.
3. **Collateral Damage:** Agent fixes the build error -> Botches a rebase in
the process -> **Claims "Ready for merge"**.
4. **Rule Enforcement:** Maintainer/CI identifies reverted features or missing
docs -> **User reports failure**.
---
## 3. Actionable Solutions
1. **Mandate `npm run preflight`:** The agent must be strictly instructed to
run the full `npm run preflight` command before claiming completion. This
ensures `clean`, `install`, `build`, `lint:all`, `typecheck`, and `test:ci`
are all verified in sequence.
2. **Explicit `tsc --build` Verification:** Agents must perform a package-wide
`tsc --build` after any architectural or test-helper changes to catch the
specific inference bugs that surfaced in CI.
3. **Conflict Resolution Protocol:** Agents should never default to
`merge -X ours`. If conflicts occur, they must perform a surgical comparison
of the diffs. If the conflict is in a file unrelated to their task (e.g.,
`packages/core`), they should seek clarification or default to "theirs" (the
`main` branch) to preserve core features.
4. **Documentation & Schema Linting:** Include automated checks in the agent's
workflow to verify that any new setting in `settingsSchema.ts` has a
corresponding entry in the documentation.
5. **Strict Rule Adherence:** The agent's internal "Pickle Rick" worker or
sub-agents must be initialized with the `Strict Development Rules` (found in
`.gemini/commands/strict-development-rules.md`) to prevent common violations
like `any` usage or incorrect `waitFor` utilities.
+142
View File
@@ -0,0 +1,142 @@
# Gemini CLI: PR Tooling & Automation Guide
This document provides a comprehensive overview of the specialized skills,
custom commands, and extension tools available for managing Pull Requests within
the Gemini CLI project. These tools are designed to automate verification,
enforce standards, and streamline the review-to-merge lifecycle.
---
## 1. Built-in Agent Skills (`.gemini/skills/`)
These skills are available globally within the workspace and provide structural
workflows for different phases of the PR lifecycle.
### **A. `pr-creator`**
- **Purpose**: Guides the creation of high-quality PRs.
- **Workflow**:
1. Verifies current branch is not `main`.
2. Enforces a descriptive commit message following Conventional Commits.
3. Locates and applies the repository's `.github/PULL_REQUEST_TEMPLATE.md`.
4. **Mandatory**: Runs `npm run preflight` locally before pushing.
5. Uses `gh` CLI to create the PR from a temporary description file.
- **Key Principle**: Never push to `main`; never ignore the PR template.
### **B. `code-reviewer`**
- **Purpose**: Conducts professional, multi-dimensional code reviews.
- **Target**: Can review remote PRs (`gh pr checkout`) or local uncommitted
changes.
- **Analysis Pillars**: Correctness, Maintainability, Readability, Efficiency,
Security, Edge Cases, and Testability.
- **Key Principle**: Execute `npm run preflight` early to catch automated
failures before performing human-level analysis.
### **C. `async-pr-review`**
- **Purpose**: Executes a background review job to avoid blocking the main
session.
- **Mechanism**: Uses an "Agentic Asynchronous Pattern" where a background shell
script invokes a headless `gemini -p` instance.
- **Workflow**: Creates an ephemeral worktree, runs preflight, and generates a
`final-assessment.md`.
- **Key Principle**: Prevents git lock conflicts by using isolated worktrees in
`.gemini/tmp/async-reviews/`.
### **D. `pr-address-comments`**
- **Purpose**: Helps the author systematically resolve feedback.
- **Workflow**: Uses `scripts/fetch-pr-info.js` to gather all comments and
provides a checklist (✅ for resolved, [number] for open threads).
- **Key Principle**: The agent summarizes but waits for user guidance before
fixing issues.
---
## 2. UX Extension Skills (`ux-extension/`)
These skills are part of the specialized UX team extension and prioritize
"finish line" polish and TUI design principles.
### **A. `_ux_finish-pr`**
- **Purpose**: Expert PR maintenance and final polish.
- **Core Principle**: **"Always maintain a clean, focused diff by resolving
merge conflicts early and squashing into a single feature commit."**
- **Unique Features**:
- **TDD Fallback**: If a fix fails 2-3 times, it mandates creating a local
reproduction test case.
- **Snapshot Review**: Mandates manual review of updated `.snap` or `.svg`
files after running tests with `-u`.
- **Force-Push safety**: Uses `--force-with-lease`.
### **B. `_ux_git-worktree`**
- **Purpose**: Implements the **Base Folder Strategy**.
- **Rules**: All work happens in sibling directories (e.g., `main/`,
`feature-name/`).
- **Workflow**: Automates `git worktree add` and semantic PR checkouts (e.g.,
`pr-123-fix-logo`).
- **Key Principle**: Prevents macOS sandbox interference by including
`main/.git` in the trusted environment.
### **C. `_ux_designer`**
- **Purpose**: Ruthlessly enforces the **v1.0 Design Principles**.
- **Principles**:
1. **Signal over Noise**: Mandates collapsible components
(`<ExpandableText>`).
2. **Coherent State**: Global state belongs in the "Bottom Drawer" (Footer).
3. **Intent Signaling**: Long-running tasks must telegaph progress via
spinners.
4. **Strategic Color**: Functional use of color only; no "rainbow" text.
---
## 3. Custom Commands (`.gemini/commands/`)
These shortcuts inject specific context or trigger focused agent loops.
- **`/pr-review <PR_NUMBER>`**: Triggers the `oncall/pr-review.toml` prompt. It
automates checkout, runs `npm run preflight`, and drafts a professional
approval/feedback message.
- **`/review-and-fix <target>`**: Initiates the "Pickle Rick" worker loop. It
conducts a review, identifies findings, and then spawns a manager-worker loop
to fix every issue until validation (`npm run build`, `test`, `lint`,
`typecheck`) passes.
- **`/frontend`**: Injects the complete source code of `packages/cli` and key
core files, along with the `Strict Development Rules`.
- **`/prompt-suggest`**: Analyzes agent failures and suggests high-level system
prompt improvements to prevent recurrence.
---
## 4. How to Use These Tools Correctly
To prevent the "recursive failure loop" identified in the pattern investigation,
follow these usage rules:
1. **Start with Worktrees**: Always use `_ux_git-worktree` to check out a PR.
This keeps your `main` branch clean and avoids sandbox issues.
2. **Use `/frontend` for UI Tasks**: This ensures the agent is aware of the
`Strict Development Rules` (e.g., no `any`, use `waitFor` correctly).
3. **Run `npm run preflight` via `pr-creator`**: Never manually push a PR.
Always call the `pr-creator` skill, as it forces the full validation suite.
4. **Finish with `_ux_finish-pr`**: When a PR is "done," use this skill to
squash commits, resolve final conflicts with `main`, and verify snapshots in
a neutral environment.
5. **Don't Bypass Hooks**: Never use `git commit --no-verify`. If pre-commit
hooks fail, use `_ux_finish-pr` to diagnose the root cause.
---
## 5. Summary of PR verification lifecycle
| Phase | Tool/Skill | Command |
| :--------------- | :----------------- | :----------------------------- |
| **Checkout** | `_ux_git-worktree` | `worktree-manager.sh pr <#> ` |
| **Fixing** | `review-and-fix` | `/review-and-fix staged` |
| **UI Audit** | `_ux_designer` | `Audit <Component.tsx>` |
| **Final Polish** | `_ux_finish-pr` | `activate_skill _ux_finish-pr` |
| **Submission** | `pr-creator` | `activate_skill pr-creator` |