Product Requirements Document (PRD): Gemini CLI Memory Optimization

1. Objective

Reduce the memory footprint of gemini-cli during long-running sessions (multi-hour) from the current peak of ~2GB down to a sustainable baseline (e.g., < 500MB), without degrading existing functionality, user experience, or context awareness.

2. Problem Statement

Users experience high memory consumption (up to 2GB) when running gemini-cli for extended periods. High memory usage leads to sluggish terminal responsiveness, system swapping, increased GC (Garbage Collection) pauses, and eventually OOM (Out of Memory) crashes. Node.js applications that retain large amounts of execution history, tool results (like large shell outputs or file reads), and conversational context in memory often suffer from "soft memory leaks" (unbounded data growth).

3. Scope

In Scope:

Analyzing and profiling the memory usage of the @google/gemini-cli-core and @google/gemini-cli packages.
Identifying and resolving memory leaks (e.g., un-deregistered event listeners).
Implementing bounded memory for unbounded data structures (e.g., chat history, activity logs, tool execution results).
Optimizing data serialization/deserialization and large string handling.
Creating automated memory profiling scripts and validation workflows.

Out of Scope:

Rewriting the CLI in another language (e.g., Rust/Go).
Removing core features or aggressively truncating the LLM context window (unless specifically configured by the user).

4. Key Results & Metrics

Peak Memory Usage: Reduce peak memory usage (RSS) during a 4-hour simulated session from ~2.0GB to < 500MB.
Baseline Memory: Ensure baseline memory after forced garbage collection remains flat (does not grow linearly with the number of turns).
Quality Gates: 100% of existing unit, integration (E2E), and preflight tests (npm run preflight) must pass.

5. Technical Approach & Hypotheses

Unbounded History Retention: The agent's session history stores full payloads of every tool execution (e.g., read_file of a 5MB file, or verbose run_shell_command outputs).
- Mitigation: Implement aggressive in-memory truncation for older turns that are no longer sent to the model, or offload historical payloads to temporary disk files.
React/Ink Memory Leaks in CLI UI: Unmounted Ink components might not be garbage collected if references are held in global state, context providers, or event listeners.
- Mitigation: Audit useEffect cleanup functions and global event listener deregistration in UI components.
DevTools / Logger Retention: The activityLogger.ts or telemetry systems might buffer unbounded amounts of events in memory before flushing.
- Mitigation: Ensure logs are streamed directly to disk or the WebSocket without retaining a massive ring buffer in memory.

6. Testing & Validation Strategy

To validate memory usage, we must simulate a heavy session, measure memory, and ensure correctness.

6.1 Creating the Memory Profiling Script

Create a script scripts/simulate-long-session.ts to programmatically drive the CLI and measure memory growth.

// scripts/simulate-long-session.ts
import { exec } from 'child_process';
import * as v8 from 'v8';
import * as fs from 'fs';

// Helper to force GC if run with --expose-gc
const runGC = () => {
  if (global.gc) {
    global.gc();
  }
};

const printMemory = (turn: number) => {
  runGC();
  const usage = process.memoryUsage();
  console.log(`Turn ${turn} - RSS: ${(usage.rss / 1024 / 1024).toFixed(2)} MB, HeapUsed: ${(usage.heapUsed / 1024 / 1024).toFixed(2)} MB`);
};

async function runSimulation() {
  console.log("Starting memory simulation...");
  // Simulate 100 heavy turns
  for (let i = 1; i <= 100; i++) {
    // Inject mock messages or trigger SDK agent actions here
    // e.g. agent.processInput("Read a large file and summarize it")

    // Simulate heavy string allocation
    const dummyData = "A".repeat(1024 * 1024 * 10); // 10MB dummy data

    printMemory(i);

    // Periodically take heap snapshots
    if (i % 25 === 0) {
      const snapshotName = \`heap-snapshot-turn-\${i}.heapsnapshot\`;
      v8.writeHeapSnapshot(snapshotName);
      console.log(\`Saved \${snapshotName}\`);
    }
  }
}

runSimulation();

6.2 Steps to Validate Memory Usage

Establish the Baseline:
- Run the simulation script on the main branch to capture the baseline metrics.
- NODE_OPTIONS="--expose-gc" npx tsx scripts/simulate-long-session.ts
Heap Snapshot Analysis:
- Run the CLI manually with the inspector enabled: npm run debug (or NODE_OPTIONS="--inspect" npm start).
- Open Chrome DevTools (chrome://inspect).
- Take a baseline heap snapshot at startup.
- Run heavy tasks (e.g., read_file on large files, run_shell_command with huge outputs).
- Take a second heap snapshot.
- Compare the two snapshots in DevTools. Look for retained objects, detached DOM nodes (Ink elements), or massive string allocations.
Verify the Fixes:
- Apply the memory optimizations.
- Re-run the simulation script. The printed HeapUsed and RSS should flatline after a certain number of turns rather than growing linearly.
- Compare the final heap snapshot size to the baseline.

6.3 Ensuring Build and Tests Pass

Memory optimization can inadvertently break functionality if data is truncated too aggressively.

Run Targeted Tests: During development, verify core logic using targeted tests:
- npm test -w @google/gemini-cli-core
- npm run test:e2e
Run the Preflight Checks: Before creating a PR, run the exhaustive validation suite to ensure no regressions:
- npm run preflight
E2E Validation: The existing E2E tests (packages/cli/integration-tests/) will verify that the CLI still behaves correctly from a user's perspective, ensuring that history truncation or memory offloading doesn't break multi-turn context.

7. Execution Plan

Phase 1: Instrumentation & Baselines
- Implement scripts/simulate-long-session.ts or add an eval script.
- Capture baseline memory metrics and initial heap snapshots.

Phase 2: Analysis & Implementation
- Identify the top 3 memory retainers using Chrome DevTools.
- Implement bounded retention (e.g., capping array sizes in memory, offloading heavy execution logs to the .gemini/history temp files).
- Audit React/Ink components for event listener leaks.
Phase 3: Validation & CI
- Run E2E tests to ensure behavioral parity.
- Run npm run preflight.
- Consider adding a lightweight memory-growth check to the CI pipeline to prevent future regressions.

6.7 KiB Raw Blame History