CPU Performance Integration Test Harness

Overview

This directory contains performance/CPU integration tests for the Gemini CLI. These tests measure wall-clock time, CPU usage, and event loop responsiveness to detect regressions across key scenarios.

CPU performance is inherently noisy, especially in CI. The harness addresses this with:

IQR outlier filtering — discards anomalous samples
Median sampling — takes N runs, reports the median after filtering
Warmup runs — discards the first run to mitigate JIT compilation noise
15% default tolerance — won't panic at slight regressions

Running

# Run tests (compare against committed baselines)
npm run test:perf

# Update baselines (after intentional changes)
npm run test:perf:update-baselines

# Verbose output
VERBOSE=true npm run test:perf

# Keep test artifacts for debugging
KEEP_OUTPUT=true npm run test:perf

How It Works

Measurement Primitives

The PerfTestHarness class (in packages/test-utils) provides:

performance.now() — high-resolution wall-clock timing
process.cpuUsage() — user + system CPU microseconds (delta between start/stop)
perf_hooks.monitorEventLoopDelay() — event loop delay histogram (p50/p95/p99/max)

Noise Reduction

Warmup: First run is discarded to mitigate JIT compilation artifacts
Multiple samples: Each scenario runs N times (default 5)
IQR filtering: Samples outside Q1−1.5×IQR and Q3+1.5×IQR are discarded
Median: The median of remaining samples is used for comparison

Baseline Management

Baselines are stored in baselines.json in this directory. Each scenario has:

{
  "cold-startup-time": {
    "wallClockMs": 1234.5,
    "cpuTotalUs": 567890,
    "eventLoopDelayP99Ms": 12.3,
    "timestamp": "2026-04-08T..."
  }
}

Tests fail if the measured value exceeds baseline × 1.15 (15% tolerance).

To recalibrate after intentional changes:

npm run test:perf:update-baselines
# then commit baselines.json

Report Output

After all tests, the harness prints an ASCII summary:

═══════════════════════════════════════════════════
         PERFORMANCE TEST REPORT
═══════════════════════════════════════════════════

cold-startup-time:   1234.5 ms (Baseline: 1200.0 ms, Delta: +2.9%) ✅
idle-cpu-usage:         2.1 %  (Baseline: 2.0 %, Delta: +5.0%)     ✅
skill-loading-time:  1567.8 ms (Baseline: 1500.0 ms, Delta: +4.5%) ✅

Architecture

perf-tests/
├── README.md              ← you are here
├── baselines.json         ← committed baseline values
├── globalSetup.ts         ← test environment setup
├── perf-usage.test.ts     ← test scenarios
├── perf.*.responses       ← fake API responses per scenario
├── tsconfig.json          ← TypeScript config
└── vitest.config.ts       ← vitest config (serial, isolated)

packages/test-utils/src/
├── perf-test-harness.ts   ← PerfTestHarness class
└── index.ts               ← re-exports

CI Integration

These tests are excluded from preflight and designed for nightly CI:

- name: Performance regression tests
  run: npm run test:perf

Adding a New Scenario

Add a fake response file: perf.<scenario-name>.responses
Add a test case in perf-usage.test.ts using harness.runScenario()
Run npm run test:perf:update-baselines to establish initial baseline
Commit the updated baselines.json

3.7 KiB Raw Blame History Unescape Escape