mirror of
https://github.com/google-gemini/gemini-cli.git
synced 2026-04-15 07:41:03 -07:00
122 lines
3.7 KiB
Markdown
122 lines
3.7 KiB
Markdown
|
|
# CPU Performance Integration Test Harness
|
|||
|
|
|
|||
|
|
## Overview
|
|||
|
|
|
|||
|
|
This directory contains performance/CPU integration tests for the Gemini CLI.
|
|||
|
|
These tests measure wall-clock time, CPU usage, and event loop responsiveness to
|
|||
|
|
detect regressions across key scenarios.
|
|||
|
|
|
|||
|
|
CPU performance is inherently noisy, especially in CI. The harness addresses
|
|||
|
|
this with:
|
|||
|
|
|
|||
|
|
- **IQR outlier filtering** — discards anomalous samples
|
|||
|
|
- **Median sampling** — takes N runs, reports the median after filtering
|
|||
|
|
- **Warmup runs** — discards the first run to mitigate JIT compilation noise
|
|||
|
|
- **15% default tolerance** — won't panic at slight regressions
|
|||
|
|
|
|||
|
|
## Running
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Run tests (compare against committed baselines)
|
|||
|
|
npm run test:perf
|
|||
|
|
|
|||
|
|
# Update baselines (after intentional changes)
|
|||
|
|
npm run test:perf:update-baselines
|
|||
|
|
|
|||
|
|
# Verbose output
|
|||
|
|
VERBOSE=true npm run test:perf
|
|||
|
|
|
|||
|
|
# Keep test artifacts for debugging
|
|||
|
|
KEEP_OUTPUT=true npm run test:perf
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## How It Works
|
|||
|
|
|
|||
|
|
### Measurement Primitives
|
|||
|
|
|
|||
|
|
The `PerfTestHarness` class (in `packages/test-utils`) provides:
|
|||
|
|
|
|||
|
|
- **`performance.now()`** — high-resolution wall-clock timing
|
|||
|
|
- **`process.cpuUsage()`** — user + system CPU microseconds (delta between
|
|||
|
|
start/stop)
|
|||
|
|
- **`perf_hooks.monitorEventLoopDelay()`** — event loop delay histogram
|
|||
|
|
(p50/p95/p99/max)
|
|||
|
|
|
|||
|
|
### Noise Reduction
|
|||
|
|
|
|||
|
|
1. **Warmup**: First run is discarded to mitigate JIT compilation artifacts
|
|||
|
|
2. **Multiple samples**: Each scenario runs N times (default 5)
|
|||
|
|
3. **IQR filtering**: Samples outside Q1−1.5×IQR and Q3+1.5×IQR are discarded
|
|||
|
|
4. **Median**: The median of remaining samples is used for comparison
|
|||
|
|
|
|||
|
|
### Baseline Management
|
|||
|
|
|
|||
|
|
Baselines are stored in `baselines.json` in this directory. Each scenario has:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"cold-startup-time": {
|
|||
|
|
"wallClockMs": 1234.5,
|
|||
|
|
"cpuTotalUs": 567890,
|
|||
|
|
"eventLoopDelayP99Ms": 12.3,
|
|||
|
|
"timestamp": "2026-04-08T..."
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Tests fail if the measured value exceeds `baseline × 1.15` (15% tolerance).
|
|||
|
|
|
|||
|
|
To recalibrate after intentional changes:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npm run test:perf:update-baselines
|
|||
|
|
# then commit baselines.json
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Report Output
|
|||
|
|
|
|||
|
|
After all tests, the harness prints an ASCII summary:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
═══════════════════════════════════════════════════
|
|||
|
|
PERFORMANCE TEST REPORT
|
|||
|
|
═══════════════════════════════════════════════════
|
|||
|
|
|
|||
|
|
cold-startup-time: 1234.5 ms (Baseline: 1200.0 ms, Delta: +2.9%) ✅
|
|||
|
|
idle-cpu-usage: 2.1 % (Baseline: 2.0 %, Delta: +5.0%) ✅
|
|||
|
|
skill-loading-time: 1567.8 ms (Baseline: 1500.0 ms, Delta: +4.5%) ✅
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Architecture
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
perf-tests/
|
|||
|
|
├── README.md ← you are here
|
|||
|
|
├── baselines.json ← committed baseline values
|
|||
|
|
├── globalSetup.ts ← test environment setup
|
|||
|
|
├── perf-usage.test.ts ← test scenarios
|
|||
|
|
├── perf.*.responses ← fake API responses per scenario
|
|||
|
|
├── tsconfig.json ← TypeScript config
|
|||
|
|
└── vitest.config.ts ← vitest config (serial, isolated)
|
|||
|
|
|
|||
|
|
packages/test-utils/src/
|
|||
|
|
├── perf-test-harness.ts ← PerfTestHarness class
|
|||
|
|
└── index.ts ← re-exports
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## CI Integration
|
|||
|
|
|
|||
|
|
These tests are **excluded from `preflight`** and designed for nightly CI:
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
- name: Performance regression tests
|
|||
|
|
run: npm run test:perf
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Adding a New Scenario
|
|||
|
|
|
|||
|
|
1. Add a fake response file: `perf.<scenario-name>.responses`
|
|||
|
|
2. Add a test case in `perf-usage.test.ts` using `harness.runScenario()`
|
|||
|
|
3. Run `npm run test:perf:update-baselines` to establish initial baseline
|
|||
|
|
4. Commit the updated `baselines.json`
|