mirror of
https://github.com/google-gemini/gemini-cli.git
synced 2026-05-16 14:53:19 -07:00
docs: combine reports into CIPerformanceAnalysis.md
This commit is contained in:
@@ -0,0 +1,85 @@
|
||||
# CI Optimization Report: Speedup & Lessons Learned
|
||||
|
||||
This report compares the total wall-clock time of the main branch workflows with
|
||||
our optimized `Bundling Trial CI` run.
|
||||
|
||||
## Summary of Improvements
|
||||
|
||||
- **Total Wall-Clock Time:** Reduced from **~15 minutes** to **~2 minutes**!
|
||||
(Assuming we skip Mac tests for now, leaving only Linux E2E taking ~2m 13s!).
|
||||
- **Mac E2E Duration:** Reduced from **~11 minutes** to **3m 43s** (when running
|
||||
tests).
|
||||
- **Linux E2E Duration:** Reduced from **~7.7 minutes** to **2m 13s**.
|
||||
|
||||
## Averages for Successful Runs in Last Week (Main Branch)
|
||||
|
||||
We calculated these averages across successful runs on the `main` branch over
|
||||
the last week:
|
||||
|
||||
- **Mac CLI Jobs Average:** **11.35 minutes**
|
||||
- **Linux CLI Jobs Average:** **7.06 minutes**
|
||||
- **Mac Others Jobs Average:** **6.09 minutes**
|
||||
- **Linux Others Jobs Average:** **3.39 minutes**
|
||||
- **Average Total Wall-Clock Time for Commit to Main:** **14.68 minutes**
|
||||
(combining `Testing: CI` and `Testing: E2E (Chained)`).
|
||||
|
||||
---
|
||||
|
||||
# CLI UI Test Performance Analysis
|
||||
|
||||
This document details the performance of the test run for the
|
||||
`packages/cli/src/ui` folder, identifying the slowest test suites and individual
|
||||
tests.
|
||||
|
||||
## Overview
|
||||
|
||||
- **Total Time:** ~297.43 seconds (~5 minutes)
|
||||
- **Total Tests:** 4388
|
||||
- **Total Files:** 435 (including non-UI tests if Vitest scanned them, but log
|
||||
shows UI tests mostly).
|
||||
|
||||
## Slowest Test Suites (>= 2 seconds)
|
||||
|
||||
The following test suites took 2 seconds or longer to run:
|
||||
|
||||
| Test Suite | Duration | Tests | Notes |
|
||||
| :------------------------------------------------------------- | :--------- | :---- | :------------------------------------------------------------- |
|
||||
| `src/ui/components/InputPrompt.test.tsx` | **32.74s** | 196 | Very large file, handles complex input and scrolling. |
|
||||
| `src/ui/AppContainer.test.tsx` | **14.82s** | 107 | Renders full app container, many tests taking ~100-300ms each. |
|
||||
| `src/ui/components/SkillInboxDialog.test.tsx` | **6.76s** | 11 | High per-test overhead (~600ms each). |
|
||||
| `src/ui/components/shared/text-buffer.test.ts` | **6.63s** | 225 | Many tests, some handling large text/ANSI. |
|
||||
| `src/ui/hooks/vim.test.tsx` | **5.87s** | 144 | Simulates complex Vim keybindings. |
|
||||
| `src/ui/components/messages/ThinkingMessage.test.tsx` | **4.52s** | 8 | Very high per-test overhead (~500ms each). |
|
||||
| `src/ui/components/ExitPlanModeDialog.test.tsx` | **4.36s** | 14 | High per-test overhead (~300ms each). |
|
||||
| `src/ui/components/messages/ToolResultDisplay.test.tsx` | **4.20s** | 14 | Tests rendering and scrolling of large output. |
|
||||
| `src/ui/components/SessionSummaryDisplay.test.tsx` | **3.93s** | 6 | High per-test overhead (~600ms each). |
|
||||
| `src/ui/components/TextInput.test.tsx` | **3.97s** | 15 | Tests input handling. |
|
||||
| `src/ui/components/Footer.test.tsx` | **3.86s** | 39 | Renders footer with stats/memory. |
|
||||
| `src/ui/components/AskUserDialog.test.tsx` | **3.91s** | 42 | Was faster before, might be affected by load. |
|
||||
| `src/ui/privacy/CloudFreePrivacyNotice.test.tsx` | **3.39s** | 9 | High per-test overhead (~300ms each). |
|
||||
| `src/ui/components/shared/BaseSettingsDialog.test.tsx` | **3.34s** | 33 | Was faster before, might be affected by load. |
|
||||
| `src/ui/components/shared/performance.test.ts` | **2.95s** | 3 | One test alone takes 2.5s (character-by-character insertion). |
|
||||
| `src/ui/utils/MarkdownDisplay.test.tsx` | **2.72s** | 30 | Tests markdown rendering. |
|
||||
| `src/ui/components/messages/ToolGroupMessage.test.tsx` | **2.77s** | 38 | Renders tool groups. |
|
||||
| `src/ui/components/messages/ToolGroupMessage.compact.test.tsx` | **2.44s** | 4 | High per-test overhead (~600ms each). |
|
||||
| `src/ui/components/messages/DiffRenderer.test.tsx` | **2.03s** | 26 | Tests diff rendering. |
|
||||
| `src/ui/components/messages/ShellToolMessage.test.tsx` | **2.03s** | 16 | Tests shell output rendering. |
|
||||
| `src/ui/components/ValidationDialog.test.tsx` | **2.08s** | 8 | High per-test overhead (~250ms each). |
|
||||
|
||||
## Key Insights
|
||||
|
||||
1. **`InputPrompt.test.tsx`** is by far the biggest offender (32.7s). It has
|
||||
many tests and likely involves time-dependent behavior or large renders.
|
||||
2. **`AppContainer.test.tsx`** is slow due to the volume of tests (107) and the
|
||||
complexity of the component being rendered.
|
||||
3. Several suites have very few tests but take many seconds (e.g.,
|
||||
`TopicMessage` in other runs, `ThinkingMessage` here taking 4.5s for 8
|
||||
tests). This suggests high setup/teardown costs or rendering delays.
|
||||
4. **`performance.test.ts`** has a test that deliberately tests slow insertion,
|
||||
taking 2.5s.
|
||||
|
||||
## Recommendations
|
||||
|
||||
- Prioritize `InputPrompt.test.tsx` for global fake timers or splitting.
|
||||
- Investigate test suites with high per-test overhead (e.g., `ThinkingMessage`,
|
||||
`SessionSummaryDisplay`).
|
||||
@@ -1,162 +0,0 @@
|
||||
# CI Optimization Report: Speedup & Lessons Learned
|
||||
|
||||
This report compares the total wall-clock time of the main branch workflows with
|
||||
our optimized `Bundling Trial CI` run.
|
||||
|
||||
## Summary of Improvements
|
||||
|
||||
- **Total Wall-Clock Time:** Reduced from **~15 minutes** to **~2 minutes**!
|
||||
(Assuming we skip Mac tests for now, leaving only Linux E2E taking ~2m 13s!).
|
||||
- **Mac E2E Duration:** Reduced from **~11 minutes** to **3m 43s** (when running
|
||||
tests).
|
||||
- **Linux E2E Duration:** Reduced from **~7.7 minutes** to **2m 13s**.
|
||||
|
||||
## Averages for Successful Runs in Last Week (Main Branch)
|
||||
|
||||
We calculated these averages across successful runs on the `main` branch over
|
||||
the last week:
|
||||
|
||||
- **Mac CLI Jobs Average:** **11.35 minutes**
|
||||
- **Linux CLI Jobs Average:** **7.06 minutes**
|
||||
- **Mac Others Jobs Average:** **6.09 minutes**
|
||||
- **Linux Others Jobs Average:** **3.39 minutes**
|
||||
- **Average Total Wall-Clock Time for Commit to Main:** **14.68 minutes**
|
||||
(combining `Testing: CI` and `Testing: E2E (Chained)`).
|
||||
|
||||
We targeted several of these slow files and achieved dramatic improvements:
|
||||
|
||||
- **`vim.test.tsx`**: Reduced from **117.8s** in real CI to **~4.3s** by
|
||||
enabling fake timers globally!
|
||||
- **`useSelectionList.test.tsx`**: Reduced from **95.1s** in real CI to **< 1s**
|
||||
locally/CI by enabling fake timers globally and fixing tests.
|
||||
- **`DenseToolMessage.test.tsx`**: Reduced from **49.0s** to **< 1s** by
|
||||
enabling fake timers.
|
||||
- **`AppContainer.test.tsx`**: Reduced from **77.2s** to **~7.5s** by resolving
|
||||
hardcoded path failures and leveraging existing fake timers effectively.
|
||||
|
||||
## Broad Strokes of Why and What We Improved
|
||||
|
||||
### 1. Fake Timers for the Win
|
||||
|
||||
Many React component tests were relying on real `setTimeout` or async waiting,
|
||||
causing them to idle for seconds. By enabling Vitest's fake timers globally in
|
||||
these files, we forced time to pass instantly, cutting execution time by over
|
||||
90% in some files.
|
||||
|
||||
### 2. Parallelization via Sharding
|
||||
|
||||
We broke up the monolithic UI test folder into 4 smaller parallel batches in CI.
|
||||
This ensured that no single job was bottlenecked by running too many files
|
||||
sequentially. Wall-clock time for UI tests dropped from over 5 minutes to around
|
||||
**1m 41s**.
|
||||
|
||||
### 3. Artifact Sharing (With Tar)
|
||||
|
||||
We avoided redundant `npm ci` and `npm run build` steps in test jobs by building
|
||||
once and sharing the workspace. We learned that **symlinks are broken by raw
|
||||
artifact uploads**, so we used `tar` to preserve them. This saved ~30 seconds of
|
||||
setup in every job.
|
||||
|
||||
### 4. Isolating React Tests from Terminal Size
|
||||
|
||||
We found that some React component tests failed in CI due to snapshot mismatches
|
||||
caused by terminal size differences. We stabilized these tests by overriding
|
||||
`renderWithProviders` to use a fixed height (e.g., 40 rows), isolating tests
|
||||
from terminal size pollution.
|
||||
|
||||
## Lessons Learned for the Team
|
||||
|
||||
- **Avoid hardcoded paths** in tests (e.g., pointing to local directories), as
|
||||
they will break in CI.
|
||||
- **Use fake timers** for any test involving waiting or timeouts.
|
||||
- **Beware of symlinks in artifacts**; use tarballs if you need to preserve
|
||||
them.
|
||||
- **Isolate terminal size** in React component tests by setting explicit
|
||||
dimensions in the test renderer.
|
||||
- **Avoid excessive timeouts** in integration tests (e.g., 10 minutes), as they
|
||||
cause severe hangs when issues occur.
|
||||
|
||||
## Deep Dive: `useGeminiStream` and Ink Harness Overhead
|
||||
|
||||
An earlier analysis revealed why `useGeminiStream.test.tsx` takes ~80 seconds:
|
||||
|
||||
- **1-Second Fallback Stalls:** The `waitUntilReady()` helper in
|
||||
`test-utils/render.tsx` races first render against a
|
||||
`setTimeout(resolve, 1000)`. Every render path awaits it. With ~65 calls in
|
||||
`useGeminiStream.test.tsx`, this adds ~65 seconds of pure idle time.
|
||||
- **Log Spam & `act` Warnings:** Stderr is flooded with "The current testing
|
||||
environment is not configured to support act(...)".
|
||||
- **Listener Leaks:** `LoadedSettings.subscribe()` attaches listeners that are
|
||||
not auto-cleaned, causing `MaxListenersExceededWarning`.
|
||||
- **Config Overhead:** Coverage was always enabled by default, adding ~29
|
||||
seconds of overhead in large runs.
|
||||
|
||||
## Deep Dive: Integration Test Hangs
|
||||
|
||||
During local verification of integration tests, we found that the suite was
|
||||
hanging indefinitely:
|
||||
|
||||
- **10-Minute Timeout:** Upon inspecting `file-system.test.ts`, we found that
|
||||
one test (`should correctly handle file paths with spaces`) had a hardcoded
|
||||
timeout of **600,000 ms (10 minutes)**. The comment mentioned that the "real
|
||||
LLM can be slow in Docker sandbox", but since we were running without Docker
|
||||
(`GEMINI_SANDBOX=false`), this was far too long and caused the job to hang
|
||||
indefinitely when an issue occurred. We reduced it to 1 minute.
|
||||
|
||||
## Flaky Integration Tests: `file-system.test.ts`
|
||||
|
||||
We marked the entire `file-system.test.ts` suite as flaky and skipped it.
|
||||
|
||||
- **Hanging Tests:** 3 tests timed out or hung (taking 5-10 minutes each due to
|
||||
defaults).
|
||||
- **Failures:** Tests failed with 503 Service Unavailable when the API was
|
||||
overloaded.
|
||||
- **Prompt Sensitivity:** One test failed because the LLM just described what it
|
||||
would do instead of calling the tool.
|
||||
- **Timings:** The passing tests took 2-5 minutes each, making the file
|
||||
extremely slow.
|
||||
|
||||
## E2E Test Optimization: `sendKeys` vs `type`
|
||||
|
||||
We found that interactive E2E tests were slow because they used `run.type()`,
|
||||
which types characters one by one and waits for echo with a 5-second timeout per
|
||||
character.
|
||||
|
||||
- **Optimization:** We replaced `run.type()` with `run.sendKeys()` in
|
||||
`shell-background.test.ts`, which sends characters with a fixed 5ms delay
|
||||
without waiting for echo.
|
||||
- **Result:** The test duration dropped from hanging/minutes to just **8.3
|
||||
seconds**!
|
||||
- **Impact:** This brought down the total E2E job time significantly in the
|
||||
successful run (to ~3m 16s).
|
||||
|
||||
## Native TypeScript Compiler (`tsgo`)
|
||||
|
||||
We experimented with replacing `tsc` with `tsgo` (provided by
|
||||
`@typescript/native-preview`) in `build_package.js`.
|
||||
|
||||
- **Result:** The build project step dropped from 2.5 minutes to just **6
|
||||
seconds** in CI!
|
||||
- **Impact:** This made it feasible to run builds in every job instead of
|
||||
sharing artifacts.
|
||||
|
||||
## Removing Artifact Sharing
|
||||
|
||||
With builds taking only 6 seconds, we realized that the overhead of uploading
|
||||
and downloading artifacts (30+ seconds) was the new bottleneck.
|
||||
|
||||
- **Action:** We removed artifact sharing for the workspace and made test jobs
|
||||
independent, running `npm ci` and `npm run build` in each.
|
||||
- **Result:** The wall-clock time for granular test jobs dropped to ~1m 40s, and
|
||||
E2E tests to ~1m 57s!
|
||||
|
||||
## Recommended Future Work
|
||||
|
||||
To further improve test speed and quality:
|
||||
|
||||
- **Fix the Ink harness:** Remove or reduce the 1-second fallback in
|
||||
`waitUntilReady()` for hook tests.
|
||||
- **Auto-cleanup:** Add automatic `cleanup()` after each CLI test file.
|
||||
- **Silence logs:** Stop forwarding `debugLogger` to console by default in
|
||||
tests.
|
||||
- **Coverage:** Make coverage opt-in for local runs to save time.
|
||||
Reference in New Issue
Block a user