docs: combine reports into CIPerformanceAnalysis.md

This commit is contained in:
mkorwel
2026-04-25 04:47:40 +00:00
parent 6715ccb055
commit ab64822bc6
2 changed files with 85 additions and 162 deletions
+85
View File
@@ -0,0 +1,85 @@
# CI Optimization Report: Speedup & Lessons Learned
This report compares the total wall-clock time of the main branch workflows with
our optimized `Bundling Trial CI` run.
## Summary of Improvements
- **Total Wall-Clock Time:** Reduced from **~15 minutes** to **~2 minutes**!
(Assuming we skip Mac tests for now, leaving only Linux E2E taking ~2m 13s!).
- **Mac E2E Duration:** Reduced from **~11 minutes** to **3m 43s** (when running
tests).
- **Linux E2E Duration:** Reduced from **~7.7 minutes** to **2m 13s**.
## Averages for Successful Runs in Last Week (Main Branch)
We calculated these averages across successful runs on the `main` branch over
the last week:
- **Mac CLI Jobs Average:** **11.35 minutes**
- **Linux CLI Jobs Average:** **7.06 minutes**
- **Mac Others Jobs Average:** **6.09 minutes**
- **Linux Others Jobs Average:** **3.39 minutes**
- **Average Total Wall-Clock Time for Commit to Main:** **14.68 minutes**
(combining `Testing: CI` and `Testing: E2E (Chained)`).
---
# CLI UI Test Performance Analysis
This document details the performance of the test run for the
`packages/cli/src/ui` folder, identifying the slowest test suites and individual
tests.
## Overview
- **Total Time:** ~297.43 seconds (~5 minutes)
- **Total Tests:** 4388
- **Total Files:** 435 (including non-UI tests if Vitest scanned them, but log
shows UI tests mostly).
## Slowest Test Suites (>= 2 seconds)
The following test suites took 2 seconds or longer to run:
| Test Suite | Duration | Tests | Notes |
| :------------------------------------------------------------- | :--------- | :---- | :------------------------------------------------------------- |
| `src/ui/components/InputPrompt.test.tsx` | **32.74s** | 196 | Very large file, handles complex input and scrolling. |
| `src/ui/AppContainer.test.tsx` | **14.82s** | 107 | Renders full app container, many tests taking ~100-300ms each. |
| `src/ui/components/SkillInboxDialog.test.tsx` | **6.76s** | 11 | High per-test overhead (~600ms each). |
| `src/ui/components/shared/text-buffer.test.ts` | **6.63s** | 225 | Many tests, some handling large text/ANSI. |
| `src/ui/hooks/vim.test.tsx` | **5.87s** | 144 | Simulates complex Vim keybindings. |
| `src/ui/components/messages/ThinkingMessage.test.tsx` | **4.52s** | 8 | Very high per-test overhead (~500ms each). |
| `src/ui/components/ExitPlanModeDialog.test.tsx` | **4.36s** | 14 | High per-test overhead (~300ms each). |
| `src/ui/components/messages/ToolResultDisplay.test.tsx` | **4.20s** | 14 | Tests rendering and scrolling of large output. |
| `src/ui/components/SessionSummaryDisplay.test.tsx` | **3.93s** | 6 | High per-test overhead (~600ms each). |
| `src/ui/components/TextInput.test.tsx` | **3.97s** | 15 | Tests input handling. |
| `src/ui/components/Footer.test.tsx` | **3.86s** | 39 | Renders footer with stats/memory. |
| `src/ui/components/AskUserDialog.test.tsx` | **3.91s** | 42 | Was faster before, might be affected by load. |
| `src/ui/privacy/CloudFreePrivacyNotice.test.tsx` | **3.39s** | 9 | High per-test overhead (~300ms each). |
| `src/ui/components/shared/BaseSettingsDialog.test.tsx` | **3.34s** | 33 | Was faster before, might be affected by load. |
| `src/ui/components/shared/performance.test.ts` | **2.95s** | 3 | One test alone takes 2.5s (character-by-character insertion). |
| `src/ui/utils/MarkdownDisplay.test.tsx` | **2.72s** | 30 | Tests markdown rendering. |
| `src/ui/components/messages/ToolGroupMessage.test.tsx` | **2.77s** | 38 | Renders tool groups. |
| `src/ui/components/messages/ToolGroupMessage.compact.test.tsx` | **2.44s** | 4 | High per-test overhead (~600ms each). |
| `src/ui/components/messages/DiffRenderer.test.tsx` | **2.03s** | 26 | Tests diff rendering. |
| `src/ui/components/messages/ShellToolMessage.test.tsx` | **2.03s** | 16 | Tests shell output rendering. |
| `src/ui/components/ValidationDialog.test.tsx` | **2.08s** | 8 | High per-test overhead (~250ms each). |
## Key Insights
1. **`InputPrompt.test.tsx`** is by far the biggest offender (32.7s). It has
many tests and likely involves time-dependent behavior or large renders.
2. **`AppContainer.test.tsx`** is slow due to the volume of tests (107) and the
complexity of the component being rendered.
3. Several suites have very few tests but take many seconds (e.g.,
`TopicMessage` in other runs, `ThinkingMessage` here taking 4.5s for 8
tests). This suggests high setup/teardown costs or rendering delays.
4. **`performance.test.ts`** has a test that deliberately tests slow insertion,
taking 2.5s.
## Recommendations
- Prioritize `InputPrompt.test.tsx` for global fake timers or splitting.
- Investigate test suites with high per-test overhead (e.g., `ThinkingMessage`,
`SessionSummaryDisplay`).
-162
View File
@@ -1,162 +0,0 @@
# CI Optimization Report: Speedup & Lessons Learned
This report compares the total wall-clock time of the main branch workflows with
our optimized `Bundling Trial CI` run.
## Summary of Improvements
- **Total Wall-Clock Time:** Reduced from **~15 minutes** to **~2 minutes**!
(Assuming we skip Mac tests for now, leaving only Linux E2E taking ~2m 13s!).
- **Mac E2E Duration:** Reduced from **~11 minutes** to **3m 43s** (when running
tests).
- **Linux E2E Duration:** Reduced from **~7.7 minutes** to **2m 13s**.
## Averages for Successful Runs in Last Week (Main Branch)
We calculated these averages across successful runs on the `main` branch over
the last week:
- **Mac CLI Jobs Average:** **11.35 minutes**
- **Linux CLI Jobs Average:** **7.06 minutes**
- **Mac Others Jobs Average:** **6.09 minutes**
- **Linux Others Jobs Average:** **3.39 minutes**
- **Average Total Wall-Clock Time for Commit to Main:** **14.68 minutes**
(combining `Testing: CI` and `Testing: E2E (Chained)`).
We targeted several of these slow files and achieved dramatic improvements:
- **`vim.test.tsx`**: Reduced from **117.8s** in real CI to **~4.3s** by
enabling fake timers globally!
- **`useSelectionList.test.tsx`**: Reduced from **95.1s** in real CI to **< 1s**
locally/CI by enabling fake timers globally and fixing tests.
- **`DenseToolMessage.test.tsx`**: Reduced from **49.0s** to **< 1s** by
enabling fake timers.
- **`AppContainer.test.tsx`**: Reduced from **77.2s** to **~7.5s** by resolving
hardcoded path failures and leveraging existing fake timers effectively.
## Broad Strokes of Why and What We Improved
### 1. Fake Timers for the Win
Many React component tests were relying on real `setTimeout` or async waiting,
causing them to idle for seconds. By enabling Vitest's fake timers globally in
these files, we forced time to pass instantly, cutting execution time by over
90% in some files.
### 2. Parallelization via Sharding
We broke up the monolithic UI test folder into 4 smaller parallel batches in CI.
This ensured that no single job was bottlenecked by running too many files
sequentially. Wall-clock time for UI tests dropped from over 5 minutes to around
**1m 41s**.
### 3. Artifact Sharing (With Tar)
We avoided redundant `npm ci` and `npm run build` steps in test jobs by building
once and sharing the workspace. We learned that **symlinks are broken by raw
artifact uploads**, so we used `tar` to preserve them. This saved ~30 seconds of
setup in every job.
### 4. Isolating React Tests from Terminal Size
We found that some React component tests failed in CI due to snapshot mismatches
caused by terminal size differences. We stabilized these tests by overriding
`renderWithProviders` to use a fixed height (e.g., 40 rows), isolating tests
from terminal size pollution.
## Lessons Learned for the Team
- **Avoid hardcoded paths** in tests (e.g., pointing to local directories), as
they will break in CI.
- **Use fake timers** for any test involving waiting or timeouts.
- **Beware of symlinks in artifacts**; use tarballs if you need to preserve
them.
- **Isolate terminal size** in React component tests by setting explicit
dimensions in the test renderer.
- **Avoid excessive timeouts** in integration tests (e.g., 10 minutes), as they
cause severe hangs when issues occur.
## Deep Dive: `useGeminiStream` and Ink Harness Overhead
An earlier analysis revealed why `useGeminiStream.test.tsx` takes ~80 seconds:
- **1-Second Fallback Stalls:** The `waitUntilReady()` helper in
`test-utils/render.tsx` races first render against a
`setTimeout(resolve, 1000)`. Every render path awaits it. With ~65 calls in
`useGeminiStream.test.tsx`, this adds ~65 seconds of pure idle time.
- **Log Spam & `act` Warnings:** Stderr is flooded with "The current testing
environment is not configured to support act(...)".
- **Listener Leaks:** `LoadedSettings.subscribe()` attaches listeners that are
not auto-cleaned, causing `MaxListenersExceededWarning`.
- **Config Overhead:** Coverage was always enabled by default, adding ~29
seconds of overhead in large runs.
## Deep Dive: Integration Test Hangs
During local verification of integration tests, we found that the suite was
hanging indefinitely:
- **10-Minute Timeout:** Upon inspecting `file-system.test.ts`, we found that
one test (`should correctly handle file paths with spaces`) had a hardcoded
timeout of **600,000 ms (10 minutes)**. The comment mentioned that the "real
LLM can be slow in Docker sandbox", but since we were running without Docker
(`GEMINI_SANDBOX=false`), this was far too long and caused the job to hang
indefinitely when an issue occurred. We reduced it to 1 minute.
## Flaky Integration Tests: `file-system.test.ts`
We marked the entire `file-system.test.ts` suite as flaky and skipped it.
- **Hanging Tests:** 3 tests timed out or hung (taking 5-10 minutes each due to
defaults).
- **Failures:** Tests failed with 503 Service Unavailable when the API was
overloaded.
- **Prompt Sensitivity:** One test failed because the LLM just described what it
would do instead of calling the tool.
- **Timings:** The passing tests took 2-5 minutes each, making the file
extremely slow.
## E2E Test Optimization: `sendKeys` vs `type`
We found that interactive E2E tests were slow because they used `run.type()`,
which types characters one by one and waits for echo with a 5-second timeout per
character.
- **Optimization:** We replaced `run.type()` with `run.sendKeys()` in
`shell-background.test.ts`, which sends characters with a fixed 5ms delay
without waiting for echo.
- **Result:** The test duration dropped from hanging/minutes to just **8.3
seconds**!
- **Impact:** This brought down the total E2E job time significantly in the
successful run (to ~3m 16s).
## Native TypeScript Compiler (`tsgo`)
We experimented with replacing `tsc` with `tsgo` (provided by
`@typescript/native-preview`) in `build_package.js`.
- **Result:** The build project step dropped from 2.5 minutes to just **6
seconds** in CI!
- **Impact:** This made it feasible to run builds in every job instead of
sharing artifacts.
## Removing Artifact Sharing
With builds taking only 6 seconds, we realized that the overhead of uploading
and downloading artifacts (30+ seconds) was the new bottleneck.
- **Action:** We removed artifact sharing for the workspace and made test jobs
independent, running `npm ci` and `npm run build` in each.
- **Result:** The wall-clock time for granular test jobs dropped to ~1m 40s, and
E2E tests to ~1m 57s!
## Recommended Future Work
To further improve test speed and quality:
- **Fix the Ink harness:** Remove or reduce the 1-second fallback in
`waitUntilReady()` for hook tests.
- **Auto-cleanup:** Add automatic `cleanup()` after each CLI test file.
- **Silence logs:** Stop forwarding `debugLogger` to console by default in
tests.
- **Coverage:** Make coverage opt-in for local runs to save time.