mirror of
https://github.com/google-gemini/gemini-cli.git
synced 2026-04-13 06:40:33 -07:00
122 lines
3.7 KiB
Markdown
122 lines
3.7 KiB
Markdown
# CPU Performance Integration Test Harness
|
||
|
||
## Overview
|
||
|
||
This directory contains performance/CPU integration tests for the Gemini CLI.
|
||
These tests measure wall-clock time, CPU usage, and event loop responsiveness to
|
||
detect regressions across key scenarios.
|
||
|
||
CPU performance is inherently noisy, especially in CI. The harness addresses
|
||
this with:
|
||
|
||
- **IQR outlier filtering** — discards anomalous samples
|
||
- **Median sampling** — takes N runs, reports the median after filtering
|
||
- **Warmup runs** — discards the first run to mitigate JIT compilation noise
|
||
- **15% default tolerance** — won't panic at slight regressions
|
||
|
||
## Running
|
||
|
||
```bash
|
||
# Run tests (compare against committed baselines)
|
||
npm run test:perf
|
||
|
||
# Update baselines (after intentional changes)
|
||
npm run test:perf:update-baselines
|
||
|
||
# Verbose output
|
||
VERBOSE=true npm run test:perf
|
||
|
||
# Keep test artifacts for debugging
|
||
KEEP_OUTPUT=true npm run test:perf
|
||
```
|
||
|
||
## How It Works
|
||
|
||
### Measurement Primitives
|
||
|
||
The `PerfTestHarness` class (in `packages/test-utils`) provides:
|
||
|
||
- **`performance.now()`** — high-resolution wall-clock timing
|
||
- **`process.cpuUsage()`** — user + system CPU microseconds (delta between
|
||
start/stop)
|
||
- **`perf_hooks.monitorEventLoopDelay()`** — event loop delay histogram
|
||
(p50/p95/p99/max)
|
||
|
||
### Noise Reduction
|
||
|
||
1. **Warmup**: First run is discarded to mitigate JIT compilation artifacts
|
||
2. **Multiple samples**: Each scenario runs N times (default 5)
|
||
3. **IQR filtering**: Samples outside Q1−1.5×IQR and Q3+1.5×IQR are discarded
|
||
4. **Median**: The median of remaining samples is used for comparison
|
||
|
||
### Baseline Management
|
||
|
||
Baselines are stored in `baselines.json` in this directory. Each scenario has:
|
||
|
||
```json
|
||
{
|
||
"cold-startup-time": {
|
||
"wallClockMs": 1234.5,
|
||
"cpuTotalUs": 567890,
|
||
"eventLoopDelayP99Ms": 12.3,
|
||
"timestamp": "2026-04-08T..."
|
||
}
|
||
}
|
||
```
|
||
|
||
Tests fail if the measured value exceeds `baseline × 1.15` (15% tolerance).
|
||
|
||
To recalibrate after intentional changes:
|
||
|
||
```bash
|
||
npm run test:perf:update-baselines
|
||
# then commit baselines.json
|
||
```
|
||
|
||
### Report Output
|
||
|
||
After all tests, the harness prints an ASCII summary:
|
||
|
||
```
|
||
═══════════════════════════════════════════════════
|
||
PERFORMANCE TEST REPORT
|
||
═══════════════════════════════════════════════════
|
||
|
||
cold-startup-time: 1234.5 ms (Baseline: 1200.0 ms, Delta: +2.9%) ✅
|
||
idle-cpu-usage: 2.1 % (Baseline: 2.0 %, Delta: +5.0%) ✅
|
||
skill-loading-time: 1567.8 ms (Baseline: 1500.0 ms, Delta: +4.5%) ✅
|
||
```
|
||
|
||
## Architecture
|
||
|
||
```
|
||
perf-tests/
|
||
├── README.md ← you are here
|
||
├── baselines.json ← committed baseline values
|
||
├── globalSetup.ts ← test environment setup
|
||
├── perf-usage.test.ts ← test scenarios
|
||
├── perf.*.responses ← fake API responses per scenario
|
||
├── tsconfig.json ← TypeScript config
|
||
└── vitest.config.ts ← vitest config (serial, isolated)
|
||
|
||
packages/test-utils/src/
|
||
├── perf-test-harness.ts ← PerfTestHarness class
|
||
└── index.ts ← re-exports
|
||
```
|
||
|
||
## CI Integration
|
||
|
||
These tests are **excluded from `preflight`** and designed for nightly CI:
|
||
|
||
```yaml
|
||
- name: Performance regression tests
|
||
run: npm run test:perf
|
||
```
|
||
|
||
## Adding a New Scenario
|
||
|
||
1. Add a fake response file: `perf.<scenario-name>.responses`
|
||
2. Add a test case in `perf-usage.test.ts` using `harness.runScenario()`
|
||
3. Run `npm run test:perf:update-baselines` to establish initial baseline
|
||
4. Commit the updated `baselines.json`
|