mirror of
https://github.com/google-gemini/gemini-cli.git
synced 2026-06-07 01:22:48 -07:00
docs: add comprehensive CI failure investigation reports and UX contribution guide
This commit is contained in:
@@ -0,0 +1,16 @@
|
||||
description = "High-integrity UX PR submission. Automates rebase, cross-platform snapshots, and full preflight validation."
|
||||
prompt = """
|
||||
# Mission: Final PR Submission (_ux_finish-pr)
|
||||
|
||||
You are initiating the High-Integrity Submission protocol. Your goal is to submit this PR so that it passes CI on the FIRST attempt.
|
||||
|
||||
## Step 1: Context Injection
|
||||
First, you MUST ensure you have the full project context and standards.
|
||||
!{cat .gemini/commands/frontend.toml}
|
||||
|
||||
## Step 2: Submission Protocol
|
||||
Now, activate the submission engine and follow its instructions to the letter.
|
||||
**CRITICAL**: Do NOT skip any validation steps (preflight, snapshots, rebase).
|
||||
|
||||
activate_skill(name="_ux_finish-pr")
|
||||
"""
|
||||
@@ -0,0 +1,88 @@
|
||||
# UX Contribution Guide: Gemini CLI
|
||||
|
||||
This guide provides a specialized workflow for the AI DevTools UX team and AI
|
||||
agents contributing to the Gemini CLI. It supplements the project's standard
|
||||
[CONTRIBUTING.md](./CONTRIBUTING.md) by introducing high-integrity automation
|
||||
and TUI-specific validation protocols.
|
||||
|
||||
---
|
||||
|
||||
## 1. Relationship to the Main Contributors Guide
|
||||
|
||||
All contributors must first adhere to the foundational rules in the root
|
||||
[CONTRIBUTING.md](./CONTRIBUTING.md) (CLA, Issue linking, Conventional Commits).
|
||||
|
||||
The **UX Contribution Guide** extends these rules with specialized tools
|
||||
designed to solve the "local vs. remote" discrepancy pattern and ensure that
|
||||
complex TUI layouts and cross-platform terminal icons are verified correctly
|
||||
before submission.
|
||||
|
||||
---
|
||||
|
||||
## 2. The UX "Gold Standard" Submission Protocol
|
||||
|
||||
To prevent the common "recursive failure loop" identified in the pattern
|
||||
investigation, all UX-related PRs must be submitted using the
|
||||
**`/_ux_finish-pr`** command.
|
||||
|
||||
### **Why this protocol is mandatory:**
|
||||
|
||||
- **Standard `preflight` is not enough**: The main `CONTRIBUTING.md` mandates
|
||||
`npm run preflight`. This guide mandates it be run via the agentic tool to
|
||||
ensure `TERM_PROGRAM=none` is used for snapshots, preventing macOS-specific
|
||||
icon leaks.
|
||||
- **Safe State Management**: It enforces the **Base Folder Strategy** (via
|
||||
`_ux_git-worktree`) to prevent sandbox interference.
|
||||
- **Safe Rebase**: It forbids destructive `ours` merge strategies that delete
|
||||
core features.
|
||||
|
||||
---
|
||||
|
||||
## 3. Workflow for Agents (Step-by-Step)
|
||||
|
||||
AI Agents working on UX tasks must follow this lifecycle strictly:
|
||||
|
||||
| Phase | Tool/Command | Requirement |
|
||||
| :---------- | :------------------- | :---------------------------------------------------------------------------------- |
|
||||
| **Start** | `_ux_git-worktree` | Use `worktree-manager.sh pr <NUMBER>`. Never work in `main`. |
|
||||
| **Develop** | `/frontend` | Injects the `Strict Development Rules` and full CLI context. |
|
||||
| **Iterate** | `tsc --build` | Must pass locally for the affected package before any commit. |
|
||||
| **Submit** | **`/_ux_finish-pr`** | **Mandatory One-Command Submission.** Covers rebase, snapshots, and full preflight. |
|
||||
|
||||
---
|
||||
|
||||
## 4. User Steering Guide
|
||||
|
||||
To maintain high integrity, users should steer agents using these UX-specific
|
||||
triggers:
|
||||
|
||||
- **To start a task**: "Use `_ux_git-worktree` to check out PR #123."
|
||||
- **To address feedback**: "Use `pr-address-comments` to summarize Jacob's
|
||||
review."
|
||||
- **To audit UI**: "Use `_ux_designer` to audit the new component against v1.0
|
||||
principles."
|
||||
- **To submit**: **`/_ux_finish-pr`**. (Do not accept "I've pushed the changes"
|
||||
until this command completes).
|
||||
|
||||
---
|
||||
|
||||
## 5. Summary of Specialized UX Tools
|
||||
|
||||
These tools are not covered in the main `CONTRIBUTING.md` but are essential for
|
||||
UX integrity:
|
||||
|
||||
| Command | Skill | Purpose |
|
||||
| :---------------------- | :----------------- | :---------------------------------------------------------------- |
|
||||
| **`/_ux_finish-pr`** | `_ux_finish-pr` | **Submission Tool.** Safe rebase + Full Preflight + Snapshot fix. |
|
||||
| **`/_ux_git-worktree`** | `_ux_git-worktree` | Implements the **Base Folder Strategy** for clean environments. |
|
||||
| **`/_ux_designer`** | `_ux_designer` | Enforces Signal over Noise and Coherent State principles. |
|
||||
| **`/review-and-fix`** | `Pickle Rick` | Automated remediation of complex monorepo build failures. |
|
||||
|
||||
---
|
||||
|
||||
## 6. Conclusion
|
||||
|
||||
By following this UX-specific extension of the contribution process, we ensure
|
||||
that the Gemini CLI remains visually stable, responsive, and cross-platform
|
||||
compatible. **If the `/_ux_finish-pr` command fails, the PR is not ready for
|
||||
submission.**
|
||||
@@ -0,0 +1,97 @@
|
||||
# Comprehensive Investigation Report: Gemini CLI CI Check Failures
|
||||
|
||||
## Executive Summary
|
||||
|
||||
A deep investigation into PRs
|
||||
[#21212](https://github.com/google-gemini/gemini-cli/pull/21212) and
|
||||
[#22412](https://github.com/google-gemini/gemini-cli/pull/22412) reveals a
|
||||
persistent disconnect between local agent validation and remote CI enforcement.
|
||||
The agent's tendency to report "all tests passing" when checks subsequently fail
|
||||
on GitHub is driven by **selective testing**, **cross-platform UI drift**, and
|
||||
**Git state mismanagement**. This report synthesizes these findings and provides
|
||||
a structural protocol to break the recursive failure loop.
|
||||
|
||||
---
|
||||
|
||||
## 1. Core Failure Dimensions
|
||||
|
||||
### A. The "Test vs. Build" Blind Spot
|
||||
|
||||
The most frequent cause of "false positives" is the agent running unit tests
|
||||
while ignoring the build pipeline.
|
||||
|
||||
- **Vitest Lenience:** Vitest uses `esbuild` to transform TypeScript, which
|
||||
often ignores type errors that don't directly prevent code execution.
|
||||
- **CI Strictness:** GitHub Actions run `tsc --build` as a blocking step. PR
|
||||
21212 failed 20+ times because the agent saw green tests but missed cascading
|
||||
TypeScript inference errors (Promise flattening) that only surfaced during a
|
||||
full build.
|
||||
- **OS Drift:** Snapshots were generated on macOS using `MAC_TERMINAL_ICON`,
|
||||
which automatically failed on Linux/Windows CI runners using `DEFAULT_ICON`.
|
||||
|
||||
### B. UI Layout Side-Effects
|
||||
|
||||
Changes to persistent UI components (Header/Footer) frequently broke integration
|
||||
tests in non-obvious ways.
|
||||
|
||||
- **The "Off-Screen" Prompt:** In PR 22412, increasing the ASCII logo height
|
||||
pushed the `Composer` prompt off the bottom of the mock terminal. The tests
|
||||
timed out waiting for a prompt that was "rendered" but technically invisible
|
||||
to the test runner's viewport.
|
||||
- **Environmental Interference:** Local environment variables (e.g.,
|
||||
`ANTIGRAVITY_CLI_ALIAS`) masked failures that only appeared in the "clean"
|
||||
environment of the GitHub runner.
|
||||
|
||||
### C. Git & Rebase "Slop"
|
||||
|
||||
The pressure to deliver a single, clean commit often led to destructive Git
|
||||
operations.
|
||||
|
||||
- **Accidental Reversions:** During complex rebases, the agent used "keep ours"
|
||||
strategies to resolve conflicts, inadvertently deleting unrelated core
|
||||
features (Windows Sandboxing, ModelChain support).
|
||||
- **Merge Context Bloat:** Unrelated documentation and configuration changes
|
||||
often entered the branch during botched merges, creating a "noisy" diff that
|
||||
obscured real issues and caused linting failures in files the agent never
|
||||
intended to touch.
|
||||
|
||||
---
|
||||
|
||||
## 2. Further Analysis: The "Why" of the Disconnect
|
||||
|
||||
- **Heuristic Failures:** The agent uses a heuristic that "if I only touched
|
||||
file X, I only need to test file X." In a tightly coupled monorepo with shared
|
||||
types and global UI layouts, this heuristic is fundamentally flawed.
|
||||
- **Cost of Bypassing Safety:** The agent frequently used `--no-verify` to
|
||||
bypass slow pre-commit hooks. While this saved 30 seconds locally, it added
|
||||
10-20 minutes of CI wait time per failure, leading to the "log of me asking
|
||||
you to fix it again."
|
||||
- **Instructional Inertia:** The agent repeated the same "fix and push" cycle
|
||||
without widening its verification scope until specifically steered by the
|
||||
user. It optimized for speed (local feedback loop) over correctness (CI
|
||||
feedback loop).
|
||||
|
||||
---
|
||||
|
||||
## 3. The "PR Verification Protocol" (Actionable Solutions)
|
||||
|
||||
To prevent these patterns from recurring, all future PR work must follow this
|
||||
mandatory lifecycle:
|
||||
|
||||
| Phase | Required Action | Objective |
|
||||
| :------------- | :-------------------------------------------------------- | :------------------------------------- |
|
||||
| **Research** | `grep -r` for all type dependencies of modified files. | Identify cascading build risks. |
|
||||
| **Execution** | Never use `git merge -X ours` or `git checkout --ours`. | Preserve core features during rebases. |
|
||||
| **Pre-Check** | Run `tsc --build` for the affected package. | Catch inference bugs before pushing. |
|
||||
| **UI Check** | Run snapshots with `TERM_PROGRAM=none`. | Ensure OS-independent icons. |
|
||||
| **Validation** | Run the **Full Validation** suite: `npm run preflight`. | Verify build, lint, and all tests. |
|
||||
| **Finality** | Verify `AppRig` terminal height if UI dimensions changed. | Prevent integration test timeouts. |
|
||||
|
||||
---
|
||||
|
||||
## 4. Conclusion
|
||||
|
||||
The "20 failures" were not inevitable; they were the result of the agent
|
||||
prioritizing the local path over the established project standards. By enforcing
|
||||
the **Full Validation** path and adopting a **"Neutral Environment"** testing
|
||||
strategy, the agent can achieve "First-Time Green" PRs.
|
||||
@@ -0,0 +1,103 @@
|
||||
# Report: Investigation of CI Check Failures in Gemini CLI
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The investigation into the repeated CI failures for PR
|
||||
[#21212](https://github.com/google-gemini/gemini-cli/pull/21212) and the
|
||||
discrepancy between local agent claims and remote check results has identified a
|
||||
pattern of **incomplete verification**, **environment drift**, and **git
|
||||
mismanagement**. While the agent (Gemini) repeatedly reported local success, it
|
||||
was frequently running only a subset of the required validation suite and
|
||||
failing to account for CI-specific constraints.
|
||||
|
||||
---
|
||||
|
||||
## 1. Root Causes of the Failure Pattern
|
||||
|
||||
### A. Incomplete Local Validation
|
||||
|
||||
- **Command Gaps:** The agent consistently relied on narrow commands like
|
||||
`npm test <file>` or `npm run lint` (ESLint only). It frequently skipped the
|
||||
comprehensive `npm run preflight` mandated by the project's `GEMINI.md`.
|
||||
- **Build vs. Test Discrepancy:** The agent often observed passing tests while
|
||||
ignoring `tsc --build` failures. Because Vitest uses `esbuild` to bypass
|
||||
type-checking during test execution, the tests would pass locally even if the
|
||||
project had compilation errors that blocked the CI's build step.
|
||||
- **Linting Omissions:** The project uses a complex `scripts/lint.js` that
|
||||
includes `actionlint` (for workflows), `shellcheck` (for scripts), and
|
||||
`yamllint`. The agent typically only ran `eslint`, missing violations in these
|
||||
other categories.
|
||||
|
||||
### B. Environment Parity (The "Works on My Machine" Problem)
|
||||
|
||||
- **TypeScript Inference Bugs:** A major cause of failure in PR 21212 was a
|
||||
subtle TypeScript type inference discrepancy related to Promise flattening in
|
||||
`LoadingIndicator.test.tsx`. The agent's local TS version was more lenient,
|
||||
while the CI matrix (running Node 20, 22, and 24) triggered strict compilation
|
||||
errors that the agent could not reproduce without running a full
|
||||
`tsc --build`.
|
||||
- **OS Differences:** Some failures were specific to the Linux runner in CI,
|
||||
while the agent was operating in a macOS environment.
|
||||
|
||||
### C. Git & Rebase Mismanagement
|
||||
|
||||
- **Accidental Feature Reversions:** During merge conflict resolutions and
|
||||
rebases, the agent inadvertently deleted massive amounts of recently merged
|
||||
core functionality (e.g., Windows Sandboxing, ModelChain support). This
|
||||
happened because the agent used a "keep ours" strategy for conflicts it didn't
|
||||
understand, treating the newer code in `main` as noise.
|
||||
- **Single-Commit Mandate:** The pressure to squash changes into a single clean
|
||||
commit often led the agent to overwrite the "good" state of `main` with its
|
||||
older local state during a botched rebase.
|
||||
|
||||
### D. Violation of Strict Project Rules
|
||||
|
||||
- **Typing Violations:** The agent used `as Record<string, unknown>` to bypass
|
||||
TS checks for undocumented settings, which is strictly forbidden by the
|
||||
project's development rules.
|
||||
- **Missing Documentation:** The agent added new UI settings but failed to
|
||||
document them in `docs/get-started/configuration.md`, a requirement for any
|
||||
setting with `showInDialog: true`.
|
||||
|
||||
---
|
||||
|
||||
## 2. The "20 Failures" Cycle
|
||||
|
||||
The long log of "fix it again" requests was caused by a recursive failure loop:
|
||||
|
||||
1. **Partial Fix:** Agent fixes a specific test failure -> Runs only that test
|
||||
-> **Claims "All tests passing"**.
|
||||
2. **CI Catch:** CI fails on a build/lint error the agent skipped -> **User
|
||||
reports failure**.
|
||||
3. **Collateral Damage:** Agent fixes the build error -> Botches a rebase in
|
||||
the process -> **Claims "Ready for merge"**.
|
||||
4. **Rule Enforcement:** Maintainer/CI identifies reverted features or missing
|
||||
docs -> **User reports failure**.
|
||||
|
||||
---
|
||||
|
||||
## 3. Actionable Solutions
|
||||
|
||||
1. **Mandate `npm run preflight`:** The agent must be strictly instructed to
|
||||
run the full `npm run preflight` command before claiming completion. This
|
||||
ensures `clean`, `install`, `build`, `lint:all`, `typecheck`, and `test:ci`
|
||||
are all verified in sequence.
|
||||
|
||||
2. **Explicit `tsc --build` Verification:** Agents must perform a package-wide
|
||||
`tsc --build` after any architectural or test-helper changes to catch the
|
||||
specific inference bugs that surfaced in CI.
|
||||
|
||||
3. **Conflict Resolution Protocol:** Agents should never default to
|
||||
`merge -X ours`. If conflicts occur, they must perform a surgical comparison
|
||||
of the diffs. If the conflict is in a file unrelated to their task (e.g.,
|
||||
`packages/core`), they should seek clarification or default to "theirs" (the
|
||||
`main` branch) to preserve core features.
|
||||
|
||||
4. **Documentation & Schema Linting:** Include automated checks in the agent's
|
||||
workflow to verify that any new setting in `settingsSchema.ts` has a
|
||||
corresponding entry in the documentation.
|
||||
|
||||
5. **Strict Rule Adherence:** The agent's internal "Pickle Rick" worker or
|
||||
sub-agents must be initialized with the `Strict Development Rules` (found in
|
||||
`.gemini/commands/strict-development-rules.md`) to prevent common violations
|
||||
like `any` usage or incorrect `waitFor` utilities.
|
||||
@@ -0,0 +1,142 @@
|
||||
# Gemini CLI: PR Tooling & Automation Guide
|
||||
|
||||
This document provides a comprehensive overview of the specialized skills,
|
||||
custom commands, and extension tools available for managing Pull Requests within
|
||||
the Gemini CLI project. These tools are designed to automate verification,
|
||||
enforce standards, and streamline the review-to-merge lifecycle.
|
||||
|
||||
---
|
||||
|
||||
## 1. Built-in Agent Skills (`.gemini/skills/`)
|
||||
|
||||
These skills are available globally within the workspace and provide structural
|
||||
workflows for different phases of the PR lifecycle.
|
||||
|
||||
### **A. `pr-creator`**
|
||||
|
||||
- **Purpose**: Guides the creation of high-quality PRs.
|
||||
- **Workflow**:
|
||||
1. Verifies current branch is not `main`.
|
||||
2. Enforces a descriptive commit message following Conventional Commits.
|
||||
3. Locates and applies the repository's `.github/PULL_REQUEST_TEMPLATE.md`.
|
||||
4. **Mandatory**: Runs `npm run preflight` locally before pushing.
|
||||
5. Uses `gh` CLI to create the PR from a temporary description file.
|
||||
- **Key Principle**: Never push to `main`; never ignore the PR template.
|
||||
|
||||
### **B. `code-reviewer`**
|
||||
|
||||
- **Purpose**: Conducts professional, multi-dimensional code reviews.
|
||||
- **Target**: Can review remote PRs (`gh pr checkout`) or local uncommitted
|
||||
changes.
|
||||
- **Analysis Pillars**: Correctness, Maintainability, Readability, Efficiency,
|
||||
Security, Edge Cases, and Testability.
|
||||
- **Key Principle**: Execute `npm run preflight` early to catch automated
|
||||
failures before performing human-level analysis.
|
||||
|
||||
### **C. `async-pr-review`**
|
||||
|
||||
- **Purpose**: Executes a background review job to avoid blocking the main
|
||||
session.
|
||||
- **Mechanism**: Uses an "Agentic Asynchronous Pattern" where a background shell
|
||||
script invokes a headless `gemini -p` instance.
|
||||
- **Workflow**: Creates an ephemeral worktree, runs preflight, and generates a
|
||||
`final-assessment.md`.
|
||||
- **Key Principle**: Prevents git lock conflicts by using isolated worktrees in
|
||||
`.gemini/tmp/async-reviews/`.
|
||||
|
||||
### **D. `pr-address-comments`**
|
||||
|
||||
- **Purpose**: Helps the author systematically resolve feedback.
|
||||
- **Workflow**: Uses `scripts/fetch-pr-info.js` to gather all comments and
|
||||
provides a checklist (✅ for resolved, [number] for open threads).
|
||||
- **Key Principle**: The agent summarizes but waits for user guidance before
|
||||
fixing issues.
|
||||
|
||||
---
|
||||
|
||||
## 2. UX Extension Skills (`ux-extension/`)
|
||||
|
||||
These skills are part of the specialized UX team extension and prioritize
|
||||
"finish line" polish and TUI design principles.
|
||||
|
||||
### **A. `_ux_finish-pr`**
|
||||
|
||||
- **Purpose**: Expert PR maintenance and final polish.
|
||||
- **Core Principle**: **"Always maintain a clean, focused diff by resolving
|
||||
merge conflicts early and squashing into a single feature commit."**
|
||||
- **Unique Features**:
|
||||
- **TDD Fallback**: If a fix fails 2-3 times, it mandates creating a local
|
||||
reproduction test case.
|
||||
- **Snapshot Review**: Mandates manual review of updated `.snap` or `.svg`
|
||||
files after running tests with `-u`.
|
||||
- **Force-Push safety**: Uses `--force-with-lease`.
|
||||
|
||||
### **B. `_ux_git-worktree`**
|
||||
|
||||
- **Purpose**: Implements the **Base Folder Strategy**.
|
||||
- **Rules**: All work happens in sibling directories (e.g., `main/`,
|
||||
`feature-name/`).
|
||||
- **Workflow**: Automates `git worktree add` and semantic PR checkouts (e.g.,
|
||||
`pr-123-fix-logo`).
|
||||
- **Key Principle**: Prevents macOS sandbox interference by including
|
||||
`main/.git` in the trusted environment.
|
||||
|
||||
### **C. `_ux_designer`**
|
||||
|
||||
- **Purpose**: Ruthlessly enforces the **v1.0 Design Principles**.
|
||||
- **Principles**:
|
||||
1. **Signal over Noise**: Mandates collapsible components
|
||||
(`<ExpandableText>`).
|
||||
2. **Coherent State**: Global state belongs in the "Bottom Drawer" (Footer).
|
||||
3. **Intent Signaling**: Long-running tasks must telegaph progress via
|
||||
spinners.
|
||||
4. **Strategic Color**: Functional use of color only; no "rainbow" text.
|
||||
|
||||
---
|
||||
|
||||
## 3. Custom Commands (`.gemini/commands/`)
|
||||
|
||||
These shortcuts inject specific context or trigger focused agent loops.
|
||||
|
||||
- **`/pr-review <PR_NUMBER>`**: Triggers the `oncall/pr-review.toml` prompt. It
|
||||
automates checkout, runs `npm run preflight`, and drafts a professional
|
||||
approval/feedback message.
|
||||
- **`/review-and-fix <target>`**: Initiates the "Pickle Rick" worker loop. It
|
||||
conducts a review, identifies findings, and then spawns a manager-worker loop
|
||||
to fix every issue until validation (`npm run build`, `test`, `lint`,
|
||||
`typecheck`) passes.
|
||||
- **`/frontend`**: Injects the complete source code of `packages/cli` and key
|
||||
core files, along with the `Strict Development Rules`.
|
||||
- **`/prompt-suggest`**: Analyzes agent failures and suggests high-level system
|
||||
prompt improvements to prevent recurrence.
|
||||
|
||||
---
|
||||
|
||||
## 4. How to Use These Tools Correctly
|
||||
|
||||
To prevent the "recursive failure loop" identified in the pattern investigation,
|
||||
follow these usage rules:
|
||||
|
||||
1. **Start with Worktrees**: Always use `_ux_git-worktree` to check out a PR.
|
||||
This keeps your `main` branch clean and avoids sandbox issues.
|
||||
2. **Use `/frontend` for UI Tasks**: This ensures the agent is aware of the
|
||||
`Strict Development Rules` (e.g., no `any`, use `waitFor` correctly).
|
||||
3. **Run `npm run preflight` via `pr-creator`**: Never manually push a PR.
|
||||
Always call the `pr-creator` skill, as it forces the full validation suite.
|
||||
4. **Finish with `_ux_finish-pr`**: When a PR is "done," use this skill to
|
||||
squash commits, resolve final conflicts with `main`, and verify snapshots in
|
||||
a neutral environment.
|
||||
5. **Don't Bypass Hooks**: Never use `git commit --no-verify`. If pre-commit
|
||||
hooks fail, use `_ux_finish-pr` to diagnose the root cause.
|
||||
|
||||
---
|
||||
|
||||
## 5. Summary of PR verification lifecycle
|
||||
|
||||
| Phase | Tool/Skill | Command |
|
||||
| :--------------- | :----------------- | :----------------------------- |
|
||||
| **Checkout** | `_ux_git-worktree` | `worktree-manager.sh pr <#> ` |
|
||||
| **Fixing** | `review-and-fix` | `/review-and-fix staged` |
|
||||
| **UI Audit** | `_ux_designer` | `Audit <Component.tsx>` |
|
||||
| **Final Polish** | `_ux_finish-pr` | `activate_skill _ux_finish-pr` |
|
||||
| **Submission** | `pr-creator` | `activate_skill pr-creator` |
|
||||
Reference in New Issue
Block a user