From ab64822bc61af0cb77f24bc4ec006ec91969e9b3 Mon Sep 17 00:00:00 2001 From: mkorwel Date: Sat, 25 Apr 2026 04:47:40 +0000 Subject: [PATCH] docs: combine reports into CIPerformanceAnalysis.md --- docs/CIPerformanceAnalysis.md | 85 +++++++++++++++++ docs/ci_optimization_report.md | 162 --------------------------------- 2 files changed, 85 insertions(+), 162 deletions(-) create mode 100644 docs/CIPerformanceAnalysis.md delete mode 100644 docs/ci_optimization_report.md diff --git a/docs/CIPerformanceAnalysis.md b/docs/CIPerformanceAnalysis.md new file mode 100644 index 0000000000..7a9fb4984c --- /dev/null +++ b/docs/CIPerformanceAnalysis.md @@ -0,0 +1,85 @@ +# CI Optimization Report: Speedup & Lessons Learned + +This report compares the total wall-clock time of the main branch workflows with +our optimized `Bundling Trial CI` run. + +## Summary of Improvements + +- **Total Wall-Clock Time:** Reduced from **~15 minutes** to **~2 minutes**! + (Assuming we skip Mac tests for now, leaving only Linux E2E taking ~2m 13s!). +- **Mac E2E Duration:** Reduced from **~11 minutes** to **3m 43s** (when running + tests). +- **Linux E2E Duration:** Reduced from **~7.7 minutes** to **2m 13s**. + +## Averages for Successful Runs in Last Week (Main Branch) + +We calculated these averages across successful runs on the `main` branch over +the last week: + +- **Mac CLI Jobs Average:** **11.35 minutes** +- **Linux CLI Jobs Average:** **7.06 minutes** +- **Mac Others Jobs Average:** **6.09 minutes** +- **Linux Others Jobs Average:** **3.39 minutes** +- **Average Total Wall-Clock Time for Commit to Main:** **14.68 minutes** + (combining `Testing: CI` and `Testing: E2E (Chained)`). + +--- + +# CLI UI Test Performance Analysis + +This document details the performance of the test run for the +`packages/cli/src/ui` folder, identifying the slowest test suites and individual +tests. + +## Overview + +- **Total Time:** ~297.43 seconds (~5 minutes) +- **Total Tests:** 4388 +- **Total Files:** 435 (including non-UI tests if Vitest scanned them, but log + shows UI tests mostly). + +## Slowest Test Suites (>= 2 seconds) + +The following test suites took 2 seconds or longer to run: + +| Test Suite | Duration | Tests | Notes | +| :------------------------------------------------------------- | :--------- | :---- | :------------------------------------------------------------- | +| `src/ui/components/InputPrompt.test.tsx` | **32.74s** | 196 | Very large file, handles complex input and scrolling. | +| `src/ui/AppContainer.test.tsx` | **14.82s** | 107 | Renders full app container, many tests taking ~100-300ms each. | +| `src/ui/components/SkillInboxDialog.test.tsx` | **6.76s** | 11 | High per-test overhead (~600ms each). | +| `src/ui/components/shared/text-buffer.test.ts` | **6.63s** | 225 | Many tests, some handling large text/ANSI. | +| `src/ui/hooks/vim.test.tsx` | **5.87s** | 144 | Simulates complex Vim keybindings. | +| `src/ui/components/messages/ThinkingMessage.test.tsx` | **4.52s** | 8 | Very high per-test overhead (~500ms each). | +| `src/ui/components/ExitPlanModeDialog.test.tsx` | **4.36s** | 14 | High per-test overhead (~300ms each). | +| `src/ui/components/messages/ToolResultDisplay.test.tsx` | **4.20s** | 14 | Tests rendering and scrolling of large output. | +| `src/ui/components/SessionSummaryDisplay.test.tsx` | **3.93s** | 6 | High per-test overhead (~600ms each). | +| `src/ui/components/TextInput.test.tsx` | **3.97s** | 15 | Tests input handling. | +| `src/ui/components/Footer.test.tsx` | **3.86s** | 39 | Renders footer with stats/memory. | +| `src/ui/components/AskUserDialog.test.tsx` | **3.91s** | 42 | Was faster before, might be affected by load. | +| `src/ui/privacy/CloudFreePrivacyNotice.test.tsx` | **3.39s** | 9 | High per-test overhead (~300ms each). | +| `src/ui/components/shared/BaseSettingsDialog.test.tsx` | **3.34s** | 33 | Was faster before, might be affected by load. | +| `src/ui/components/shared/performance.test.ts` | **2.95s** | 3 | One test alone takes 2.5s (character-by-character insertion). | +| `src/ui/utils/MarkdownDisplay.test.tsx` | **2.72s** | 30 | Tests markdown rendering. | +| `src/ui/components/messages/ToolGroupMessage.test.tsx` | **2.77s** | 38 | Renders tool groups. | +| `src/ui/components/messages/ToolGroupMessage.compact.test.tsx` | **2.44s** | 4 | High per-test overhead (~600ms each). | +| `src/ui/components/messages/DiffRenderer.test.tsx` | **2.03s** | 26 | Tests diff rendering. | +| `src/ui/components/messages/ShellToolMessage.test.tsx` | **2.03s** | 16 | Tests shell output rendering. | +| `src/ui/components/ValidationDialog.test.tsx` | **2.08s** | 8 | High per-test overhead (~250ms each). | + +## Key Insights + +1. **`InputPrompt.test.tsx`** is by far the biggest offender (32.7s). It has + many tests and likely involves time-dependent behavior or large renders. +2. **`AppContainer.test.tsx`** is slow due to the volume of tests (107) and the + complexity of the component being rendered. +3. Several suites have very few tests but take many seconds (e.g., + `TopicMessage` in other runs, `ThinkingMessage` here taking 4.5s for 8 + tests). This suggests high setup/teardown costs or rendering delays. +4. **`performance.test.ts`** has a test that deliberately tests slow insertion, + taking 2.5s. + +## Recommendations + +- Prioritize `InputPrompt.test.tsx` for global fake timers or splitting. +- Investigate test suites with high per-test overhead (e.g., `ThinkingMessage`, + `SessionSummaryDisplay`). diff --git a/docs/ci_optimization_report.md b/docs/ci_optimization_report.md deleted file mode 100644 index 69d9df24bb..0000000000 --- a/docs/ci_optimization_report.md +++ /dev/null @@ -1,162 +0,0 @@ -# CI Optimization Report: Speedup & Lessons Learned - -This report compares the total wall-clock time of the main branch workflows with -our optimized `Bundling Trial CI` run. - -## Summary of Improvements - -- **Total Wall-Clock Time:** Reduced from **~15 minutes** to **~2 minutes**! - (Assuming we skip Mac tests for now, leaving only Linux E2E taking ~2m 13s!). -- **Mac E2E Duration:** Reduced from **~11 minutes** to **3m 43s** (when running - tests). -- **Linux E2E Duration:** Reduced from **~7.7 minutes** to **2m 13s**. - -## Averages for Successful Runs in Last Week (Main Branch) - -We calculated these averages across successful runs on the `main` branch over -the last week: - -- **Mac CLI Jobs Average:** **11.35 minutes** -- **Linux CLI Jobs Average:** **7.06 minutes** -- **Mac Others Jobs Average:** **6.09 minutes** -- **Linux Others Jobs Average:** **3.39 minutes** -- **Average Total Wall-Clock Time for Commit to Main:** **14.68 minutes** - (combining `Testing: CI` and `Testing: E2E (Chained)`). - -We targeted several of these slow files and achieved dramatic improvements: - -- **`vim.test.tsx`**: Reduced from **117.8s** in real CI to **~4.3s** by - enabling fake timers globally! -- **`useSelectionList.test.tsx`**: Reduced from **95.1s** in real CI to **< 1s** - locally/CI by enabling fake timers globally and fixing tests. -- **`DenseToolMessage.test.tsx`**: Reduced from **49.0s** to **< 1s** by - enabling fake timers. -- **`AppContainer.test.tsx`**: Reduced from **77.2s** to **~7.5s** by resolving - hardcoded path failures and leveraging existing fake timers effectively. - -## Broad Strokes of Why and What We Improved - -### 1. Fake Timers for the Win - -Many React component tests were relying on real `setTimeout` or async waiting, -causing them to idle for seconds. By enabling Vitest's fake timers globally in -these files, we forced time to pass instantly, cutting execution time by over -90% in some files. - -### 2. Parallelization via Sharding - -We broke up the monolithic UI test folder into 4 smaller parallel batches in CI. -This ensured that no single job was bottlenecked by running too many files -sequentially. Wall-clock time for UI tests dropped from over 5 minutes to around -**1m 41s**. - -### 3. Artifact Sharing (With Tar) - -We avoided redundant `npm ci` and `npm run build` steps in test jobs by building -once and sharing the workspace. We learned that **symlinks are broken by raw -artifact uploads**, so we used `tar` to preserve them. This saved ~30 seconds of -setup in every job. - -### 4. Isolating React Tests from Terminal Size - -We found that some React component tests failed in CI due to snapshot mismatches -caused by terminal size differences. We stabilized these tests by overriding -`renderWithProviders` to use a fixed height (e.g., 40 rows), isolating tests -from terminal size pollution. - -## Lessons Learned for the Team - -- **Avoid hardcoded paths** in tests (e.g., pointing to local directories), as - they will break in CI. -- **Use fake timers** for any test involving waiting or timeouts. -- **Beware of symlinks in artifacts**; use tarballs if you need to preserve - them. -- **Isolate terminal size** in React component tests by setting explicit - dimensions in the test renderer. -- **Avoid excessive timeouts** in integration tests (e.g., 10 minutes), as they - cause severe hangs when issues occur. - -## Deep Dive: `useGeminiStream` and Ink Harness Overhead - -An earlier analysis revealed why `useGeminiStream.test.tsx` takes ~80 seconds: - -- **1-Second Fallback Stalls:** The `waitUntilReady()` helper in - `test-utils/render.tsx` races first render against a - `setTimeout(resolve, 1000)`. Every render path awaits it. With ~65 calls in - `useGeminiStream.test.tsx`, this adds ~65 seconds of pure idle time. -- **Log Spam & `act` Warnings:** Stderr is flooded with "The current testing - environment is not configured to support act(...)". -- **Listener Leaks:** `LoadedSettings.subscribe()` attaches listeners that are - not auto-cleaned, causing `MaxListenersExceededWarning`. -- **Config Overhead:** Coverage was always enabled by default, adding ~29 - seconds of overhead in large runs. - -## Deep Dive: Integration Test Hangs - -During local verification of integration tests, we found that the suite was -hanging indefinitely: - -- **10-Minute Timeout:** Upon inspecting `file-system.test.ts`, we found that - one test (`should correctly handle file paths with spaces`) had a hardcoded - timeout of **600,000 ms (10 minutes)**. The comment mentioned that the "real - LLM can be slow in Docker sandbox", but since we were running without Docker - (`GEMINI_SANDBOX=false`), this was far too long and caused the job to hang - indefinitely when an issue occurred. We reduced it to 1 minute. - -## Flaky Integration Tests: `file-system.test.ts` - -We marked the entire `file-system.test.ts` suite as flaky and skipped it. - -- **Hanging Tests:** 3 tests timed out or hung (taking 5-10 minutes each due to - defaults). -- **Failures:** Tests failed with 503 Service Unavailable when the API was - overloaded. -- **Prompt Sensitivity:** One test failed because the LLM just described what it - would do instead of calling the tool. -- **Timings:** The passing tests took 2-5 minutes each, making the file - extremely slow. - -## E2E Test Optimization: `sendKeys` vs `type` - -We found that interactive E2E tests were slow because they used `run.type()`, -which types characters one by one and waits for echo with a 5-second timeout per -character. - -- **Optimization:** We replaced `run.type()` with `run.sendKeys()` in - `shell-background.test.ts`, which sends characters with a fixed 5ms delay - without waiting for echo. -- **Result:** The test duration dropped from hanging/minutes to just **8.3 - seconds**! -- **Impact:** This brought down the total E2E job time significantly in the - successful run (to ~3m 16s). - -## Native TypeScript Compiler (`tsgo`) - -We experimented with replacing `tsc` with `tsgo` (provided by -`@typescript/native-preview`) in `build_package.js`. - -- **Result:** The build project step dropped from 2.5 minutes to just **6 - seconds** in CI! -- **Impact:** This made it feasible to run builds in every job instead of - sharing artifacts. - -## Removing Artifact Sharing - -With builds taking only 6 seconds, we realized that the overhead of uploading -and downloading artifacts (30+ seconds) was the new bottleneck. - -- **Action:** We removed artifact sharing for the workspace and made test jobs - independent, running `npm ci` and `npm run build` in each. -- **Result:** The wall-clock time for granular test jobs dropped to ~1m 40s, and - E2E tests to ~1m 57s! - -## Recommended Future Work - -To further improve test speed and quality: - -- **Fix the Ink harness:** Remove or reduce the 1-second fallback in - `waitUntilReady()` for hook tests. -- **Auto-cleanup:** Add automatic `cleanup()` after each CLI test file. -- **Silence logs:** Stop forwarding `debugLogger` to console by default in - tests. -- **Coverage:** Make coverage opt-in for local runs to save time.