mirror of
https://github.com/google-gemini/gemini-cli.git
synced 2026-04-13 06:40:33 -07:00
3.7 KiB
3.7 KiB
CPU Performance Integration Test Harness
Overview
This directory contains performance/CPU integration tests for the Gemini CLI. These tests measure wall-clock time, CPU usage, and event loop responsiveness to detect regressions across key scenarios.
CPU performance is inherently noisy, especially in CI. The harness addresses this with:
- IQR outlier filtering — discards anomalous samples
- Median sampling — takes N runs, reports the median after filtering
- Warmup runs — discards the first run to mitigate JIT compilation noise
- 15% default tolerance — won't panic at slight regressions
Running
# Run tests (compare against committed baselines)
npm run test:perf
# Update baselines (after intentional changes)
npm run test:perf:update-baselines
# Verbose output
VERBOSE=true npm run test:perf
# Keep test artifacts for debugging
KEEP_OUTPUT=true npm run test:perf
How It Works
Measurement Primitives
The PerfTestHarness class (in packages/test-utils) provides:
performance.now()— high-resolution wall-clock timingprocess.cpuUsage()— user + system CPU microseconds (delta between start/stop)perf_hooks.monitorEventLoopDelay()— event loop delay histogram (p50/p95/p99/max)
Noise Reduction
- Warmup: First run is discarded to mitigate JIT compilation artifacts
- Multiple samples: Each scenario runs N times (default 5)
- IQR filtering: Samples outside Q1−1.5×IQR and Q3+1.5×IQR are discarded
- Median: The median of remaining samples is used for comparison
Baseline Management
Baselines are stored in baselines.json in this directory. Each scenario has:
{
"cold-startup-time": {
"wallClockMs": 1234.5,
"cpuTotalUs": 567890,
"eventLoopDelayP99Ms": 12.3,
"timestamp": "2026-04-08T..."
}
}
Tests fail if the measured value exceeds baseline × 1.15 (15% tolerance).
To recalibrate after intentional changes:
npm run test:perf:update-baselines
# then commit baselines.json
Report Output
After all tests, the harness prints an ASCII summary:
═══════════════════════════════════════════════════
PERFORMANCE TEST REPORT
═══════════════════════════════════════════════════
cold-startup-time: 1234.5 ms (Baseline: 1200.0 ms, Delta: +2.9%) ✅
idle-cpu-usage: 2.1 % (Baseline: 2.0 %, Delta: +5.0%) ✅
skill-loading-time: 1567.8 ms (Baseline: 1500.0 ms, Delta: +4.5%) ✅
Architecture
perf-tests/
├── README.md ← you are here
├── baselines.json ← committed baseline values
├── globalSetup.ts ← test environment setup
├── perf-usage.test.ts ← test scenarios
├── perf.*.responses ← fake API responses per scenario
├── tsconfig.json ← TypeScript config
└── vitest.config.ts ← vitest config (serial, isolated)
packages/test-utils/src/
├── perf-test-harness.ts ← PerfTestHarness class
└── index.ts ← re-exports
CI Integration
These tests are excluded from preflight and designed for nightly CI:
- name: Performance regression tests
run: npm run test:perf
Adding a New Scenario
- Add a fake response file:
perf.<scenario-name>.responses - Add a test case in
perf-usage.test.tsusingharness.runScenario() - Run
npm run test:perf:update-baselinesto establish initial baseline - Commit the updated
baselines.json