Files
gemini-cli/perf-tests/README.md

122 lines
3.7 KiB
Markdown
Raw Normal View History

# CPU Performance Integration Test Harness
## Overview
This directory contains performance/CPU integration tests for the Gemini CLI.
These tests measure wall-clock time, CPU usage, and event loop responsiveness to
detect regressions across key scenarios.
CPU performance is inherently noisy, especially in CI. The harness addresses
this with:
- **IQR outlier filtering** — discards anomalous samples
- **Median sampling** — takes N runs, reports the median after filtering
- **Warmup runs** — discards the first run to mitigate JIT compilation noise
- **15% default tolerance** — won't panic at slight regressions
## Running
```bash
# Run tests (compare against committed baselines)
npm run test:perf
# Update baselines (after intentional changes)
npm run test:perf:update-baselines
# Verbose output
VERBOSE=true npm run test:perf
# Keep test artifacts for debugging
KEEP_OUTPUT=true npm run test:perf
```
## How It Works
### Measurement Primitives
The `PerfTestHarness` class (in `packages/test-utils`) provides:
- **`performance.now()`** — high-resolution wall-clock timing
- **`process.cpuUsage()`** — user + system CPU microseconds (delta between
start/stop)
- **`perf_hooks.monitorEventLoopDelay()`** — event loop delay histogram
(p50/p95/p99/max)
### Noise Reduction
1. **Warmup**: First run is discarded to mitigate JIT compilation artifacts
2. **Multiple samples**: Each scenario runs N times (default 5)
3. **IQR filtering**: Samples outside Q11.5×IQR and Q3+1.5×IQR are discarded
4. **Median**: The median of remaining samples is used for comparison
### Baseline Management
Baselines are stored in `baselines.json` in this directory. Each scenario has:
```json
{
"cold-startup-time": {
"wallClockMs": 1234.5,
"cpuTotalUs": 567890,
"eventLoopDelayP99Ms": 12.3,
"timestamp": "2026-04-08T..."
}
}
```
Tests fail if the measured value exceeds `baseline × 1.15` (15% tolerance).
To recalibrate after intentional changes:
```bash
npm run test:perf:update-baselines
# then commit baselines.json
```
### Report Output
After all tests, the harness prints an ASCII summary:
```
═══════════════════════════════════════════════════
PERFORMANCE TEST REPORT
═══════════════════════════════════════════════════
cold-startup-time: 1234.5 ms (Baseline: 1200.0 ms, Delta: +2.9%) ✅
idle-cpu-usage: 2.1 % (Baseline: 2.0 %, Delta: +5.0%) ✅
skill-loading-time: 1567.8 ms (Baseline: 1500.0 ms, Delta: +4.5%) ✅
```
## Architecture
```
perf-tests/
├── README.md ← you are here
├── baselines.json ← committed baseline values
├── globalSetup.ts ← test environment setup
├── perf-usage.test.ts ← test scenarios
├── perf.*.responses ← fake API responses per scenario
├── tsconfig.json ← TypeScript config
└── vitest.config.ts ← vitest config (serial, isolated)
packages/test-utils/src/
├── perf-test-harness.ts ← PerfTestHarness class
└── index.ts ← re-exports
```
## CI Integration
These tests are **excluded from `preflight`** and designed for nightly CI:
```yaml
- name: Performance regression tests
run: npm run test:perf
```
## Adding a New Scenario
1. Add a fake response file: `perf.<scenario-name>.responses`
2. Add a test case in `perf-usage.test.ts` using `harness.runScenario()`
3. Run `npm run test:perf:update-baselines` to establish initial baseline
4. Commit the updated `baselines.json`