docs: combine reports into CIPerformanceAnalysis.md

2026-05-16 14:53:19 -07:00 · 2026-04-25 04:47:40 +00:00
parent 6715ccb055
commit ab64822bc6
2 changed files with 85 additions and 162 deletions
@@ -0,0 +1,85 @@
+# CI Optimization Report: Speedup & Lessons Learned
+
+This report compares the total wall-clock time of the main branch workflows with
+our optimized `Bundling Trial CI` run.
+
+## Summary of Improvements
+
+- **Total Wall-Clock Time:** Reduced from **~15 minutes** to **~2 minutes**!
+  (Assuming we skip Mac tests for now, leaving only Linux E2E taking ~2m 13s!).
+- **Mac E2E Duration:** Reduced from **~11 minutes** to **3m 43s** (when running
+  tests).
+- **Linux E2E Duration:** Reduced from **~7.7 minutes** to **2m 13s**.
+
+## Averages for Successful Runs in Last Week (Main Branch)
+
+We calculated these averages across successful runs on the `main` branch over
+the last week:
+
+- **Mac CLI Jobs Average:** **11.35 minutes**
+- **Linux CLI Jobs Average:** **7.06 minutes**
+- **Mac Others Jobs Average:** **6.09 minutes**
+- **Linux Others Jobs Average:** **3.39 minutes**
+- **Average Total Wall-Clock Time for Commit to Main:** **14.68 minutes**
+  (combining `Testing: CI` and `Testing: E2E (Chained)`).
+
+---
+
+# CLI UI Test Performance Analysis
+
+This document details the performance of the test run for the
+`packages/cli/src/ui` folder, identifying the slowest test suites and individual
+tests.
+
+## Overview
+
+- **Total Time:** ~297.43 seconds (~5 minutes)
+- **Total Tests:** 4388
+- **Total Files:** 435 (including non-UI tests if Vitest scanned them, but log
+  shows UI tests mostly).
+
+## Slowest Test Suites (>= 2 seconds)
+
+The following test suites took 2 seconds or longer to run:
+
+| Test Suite                                                     | Duration   | Tests | Notes                                                          |
+| :------------------------------------------------------------- | :--------- | :---- | :------------------------------------------------------------- |
+| `src/ui/components/InputPrompt.test.tsx`                       | **32.74s** | 196   | Very large file, handles complex input and scrolling.          |
+| `src/ui/AppContainer.test.tsx`                                 | **14.82s** | 107   | Renders full app container, many tests taking ~100-300ms each. |
+| `src/ui/components/SkillInboxDialog.test.tsx`                  | **6.76s**  | 11    | High per-test overhead (~600ms each).                          |
+| `src/ui/components/shared/text-buffer.test.ts`                 | **6.63s**  | 225   | Many tests, some handling large text/ANSI.                     |
+| `src/ui/hooks/vim.test.tsx`                                    | **5.87s**  | 144   | Simulates complex Vim keybindings.                             |
+| `src/ui/components/messages/ThinkingMessage.test.tsx`          | **4.52s**  | 8     | Very high per-test overhead (~500ms each).                     |
+| `src/ui/components/ExitPlanModeDialog.test.tsx`                | **4.36s**  | 14    | High per-test overhead (~300ms each).                          |
+| `src/ui/components/messages/ToolResultDisplay.test.tsx`        | **4.20s**  | 14    | Tests rendering and scrolling of large output.                 |
+| `src/ui/components/SessionSummaryDisplay.test.tsx`             | **3.93s**  | 6     | High per-test overhead (~600ms each).                          |
+| `src/ui/components/TextInput.test.tsx`                         | **3.97s**  | 15    | Tests input handling.                                          |
+| `src/ui/components/Footer.test.tsx`                            | **3.86s**  | 39    | Renders footer with stats/memory.                              |
+| `src/ui/components/AskUserDialog.test.tsx`                     | **3.91s**  | 42    | Was faster before, might be affected by load.                  |
+| `src/ui/privacy/CloudFreePrivacyNotice.test.tsx`               | **3.39s**  | 9     | High per-test overhead (~300ms each).                          |
+| `src/ui/components/shared/BaseSettingsDialog.test.tsx`         | **3.34s**  | 33    | Was faster before, might be affected by load.                  |
+| `src/ui/components/shared/performance.test.ts`                 | **2.95s**  | 3     | One test alone takes 2.5s (character-by-character insertion).  |
+| `src/ui/utils/MarkdownDisplay.test.tsx`                        | **2.72s**  | 30    | Tests markdown rendering.                                      |
+| `src/ui/components/messages/ToolGroupMessage.test.tsx`         | **2.77s**  | 38    | Renders tool groups.                                           |
+| `src/ui/components/messages/ToolGroupMessage.compact.test.tsx` | **2.44s**  | 4     | High per-test overhead (~600ms each).                          |
+| `src/ui/components/messages/DiffRenderer.test.tsx`             | **2.03s**  | 26    | Tests diff rendering.                                          |
+| `src/ui/components/messages/ShellToolMessage.test.tsx`         | **2.03s**  | 16    | Tests shell output rendering.                                  |
+| `src/ui/components/ValidationDialog.test.tsx`                  | **2.08s**  | 8     | High per-test overhead (~250ms each).                          |
+
+## Key Insights
+
+1. **`InputPrompt.test.tsx`** is by far the biggest offender (32.7s). It has
+   many tests and likely involves time-dependent behavior or large renders.
+2. **`AppContainer.test.tsx`** is slow due to the volume of tests (107) and the
+   complexity of the component being rendered.
+3. Several suites have very few tests but take many seconds (e.g.,
+   `TopicMessage` in other runs, `ThinkingMessage` here taking 4.5s for 8
+   tests). This suggests high setup/teardown costs or rendering delays.
+4. **`performance.test.ts`** has a test that deliberately tests slow insertion,
+   taking 2.5s.
+
+## Recommendations
+
+- Prioritize `InputPrompt.test.tsx` for global fake timers or splitting.
+- Investigate test suites with high per-test overhead (e.g., `ThinkingMessage`,
+  `SessionSummaryDisplay`).
@@ -1,162 +0,0 @@
-# CI Optimization Report: Speedup & Lessons Learned
-
-This report compares the total wall-clock time of the main branch workflows with
-our optimized `Bundling Trial CI` run.
-
-## Summary of Improvements
-
- **Total Wall-Clock Time:** Reduced from **~15 minutes** to **~2 minutes**!
-  (Assuming we skip Mac tests for now, leaving only Linux E2E taking ~2m 13s!).
- **Mac E2E Duration:** Reduced from **~11 minutes** to **3m 43s** (when running
-  tests).
- **Linux E2E Duration:** Reduced from **~7.7 minutes** to **2m 13s**.
-
-## Averages for Successful Runs in Last Week (Main Branch)
-
-We calculated these averages across successful runs on the `main` branch over
-the last week:
-
- **Mac CLI Jobs Average:** **11.35 minutes**
- **Linux CLI Jobs Average:** **7.06 minutes**
- **Mac Others Jobs Average:** **6.09 minutes**
- **Linux Others Jobs Average:** **3.39 minutes**
- **Average Total Wall-Clock Time for Commit to Main:** **14.68 minutes**
-  (combining `Testing: CI` and `Testing: E2E (Chained)`).
-
-We targeted several of these slow files and achieved dramatic improvements:
-
- **`vim.test.tsx`**: Reduced from **117.8s** in real CI to **~4.3s** by
-  enabling fake timers globally!
- **`useSelectionList.test.tsx`**: Reduced from **95.1s** in real CI to **< 1s**
-  locally/CI by enabling fake timers globally and fixing tests.
- **`DenseToolMessage.test.tsx`**: Reduced from **49.0s** to **< 1s** by
-  enabling fake timers.
- **`AppContainer.test.tsx`**: Reduced from **77.2s** to **~7.5s** by resolving
-  hardcoded path failures and leveraging existing fake timers effectively.
-
-## Broad Strokes of Why and What We Improved
-
-### 1. Fake Timers for the Win
-
-Many React component tests were relying on real `setTimeout` or async waiting,
-causing them to idle for seconds. By enabling Vitest's fake timers globally in
-these files, we forced time to pass instantly, cutting execution time by over
-90% in some files.
-
-### 2. Parallelization via Sharding
-
-We broke up the monolithic UI test folder into 4 smaller parallel batches in CI.
-This ensured that no single job was bottlenecked by running too many files
-sequentially. Wall-clock time for UI tests dropped from over 5 minutes to around
-**1m 41s**.
-
-### 3. Artifact Sharing (With Tar)
-
-We avoided redundant `npm ci` and `npm run build` steps in test jobs by building
-once and sharing the workspace. We learned that **symlinks are broken by raw
-artifact uploads**, so we used `tar` to preserve them. This saved ~30 seconds of
-setup in every job.
-
-### 4. Isolating React Tests from Terminal Size
-
-We found that some React component tests failed in CI due to snapshot mismatches
-caused by terminal size differences. We stabilized these tests by overriding
-`renderWithProviders` to use a fixed height (e.g., 40 rows), isolating tests
-from terminal size pollution.
-
-## Lessons Learned for the Team
-
- **Avoid hardcoded paths** in tests (e.g., pointing to local directories), as
-  they will break in CI.
- **Use fake timers** for any test involving waiting or timeouts.
- **Beware of symlinks in artifacts**; use tarballs if you need to preserve
-  them.
- **Isolate terminal size** in React component tests by setting explicit
-  dimensions in the test renderer.
- **Avoid excessive timeouts** in integration tests (e.g., 10 minutes), as they
-  cause severe hangs when issues occur.
-
-## Deep Dive: `useGeminiStream` and Ink Harness Overhead
-
-An earlier analysis revealed why `useGeminiStream.test.tsx` takes ~80 seconds:
-
- **1-Second Fallback Stalls:** The `waitUntilReady()` helper in
-  `test-utils/render.tsx` races first render against a
-  `setTimeout(resolve, 1000)`. Every render path awaits it. With ~65 calls in
-  `useGeminiStream.test.tsx`, this adds ~65 seconds of pure idle time.
- **Log Spam & `act` Warnings:** Stderr is flooded with "The current testing
-  environment is not configured to support act(...)".
- **Listener Leaks:** `LoadedSettings.subscribe()` attaches listeners that are
-  not auto-cleaned, causing `MaxListenersExceededWarning`.
- **Config Overhead:** Coverage was always enabled by default, adding ~29
-  seconds of overhead in large runs.
-
-## Deep Dive: Integration Test Hangs
-
-During local verification of integration tests, we found that the suite was
-hanging indefinitely:
-
- **10-Minute Timeout:** Upon inspecting `file-system.test.ts`, we found that
-  one test (`should correctly handle file paths with spaces`) had a hardcoded
-  timeout of **600,000 ms (10 minutes)**. The comment mentioned that the "real
-  LLM can be slow in Docker sandbox", but since we were running without Docker
-  (`GEMINI_SANDBOX=false`), this was far too long and caused the job to hang
-  indefinitely when an issue occurred. We reduced it to 1 minute.
-
-## Flaky Integration Tests: `file-system.test.ts`
-
-We marked the entire `file-system.test.ts` suite as flaky and skipped it.
-
- **Hanging Tests:** 3 tests timed out or hung (taking 5-10 minutes each due to
-  defaults).
- **Failures:** Tests failed with 503 Service Unavailable when the API was
-  overloaded.
- **Prompt Sensitivity:** One test failed because the LLM just described what it
-  would do instead of calling the tool.
- **Timings:** The passing tests took 2-5 minutes each, making the file
-  extremely slow.
-
-## E2E Test Optimization: `sendKeys` vs `type`
-
-We found that interactive E2E tests were slow because they used `run.type()`,
-which types characters one by one and waits for echo with a 5-second timeout per
-character.
-
- **Optimization:** We replaced `run.type()` with `run.sendKeys()` in
-  `shell-background.test.ts`, which sends characters with a fixed 5ms delay
-  without waiting for echo.
- **Result:** The test duration dropped from hanging/minutes to just **8.3
-  seconds**!
- **Impact:** This brought down the total E2E job time significantly in the
-  successful run (to ~3m 16s).
-
-## Native TypeScript Compiler (`tsgo`)
-
-We experimented with replacing `tsc` with `tsgo` (provided by
-`@typescript/native-preview`) in `build_package.js`.
-
- **Result:** The build project step dropped from 2.5 minutes to just **6
-  seconds** in CI!
- **Impact:** This made it feasible to run builds in every job instead of
-  sharing artifacts.
-
-## Removing Artifact Sharing
-
-With builds taking only 6 seconds, we realized that the overhead of uploading
-and downloading artifacts (30+ seconds) was the new bottleneck.
-
- **Action:** We removed artifact sharing for the workspace and made test jobs
-  independent, running `npm ci` and `npm run build` in each.
- **Result:** The wall-clock time for granular test jobs dropped to ~1m 40s, and
-  E2E tests to ~1m 57s!
-
-## Recommended Future Work
-
-To further improve test speed and quality:
-
- **Fix the Ink harness:** Remove or reduce the 1-second fallback in
-  `waitUntilReady()` for hook tests.
- **Auto-cleanup:** Add automatic `cleanup()` after each CLI test file.
- **Silence logs:** Stop forwarding `debugLogger` to console by default in
-  tests.
- **Coverage:** Make coverage opt-in for local runs to save time.