6.7 KiB
Product Requirements Document (PRD): Gemini CLI Memory Optimization
1. Objective
Reduce the memory footprint of gemini-cli during long-running sessions
(multi-hour) from the current peak of ~2GB down to a sustainable baseline (e.g.,
< 500MB), without degrading existing functionality, user experience, or context
awareness.
2. Problem Statement
Users experience high memory consumption (up to 2GB) when running gemini-cli
for extended periods. High memory usage leads to sluggish terminal
responsiveness, system swapping, increased GC (Garbage Collection) pauses, and
eventually OOM (Out of Memory) crashes. Node.js applications that retain large
amounts of execution history, tool results (like large shell outputs or file
reads), and conversational context in memory often suffer from "soft memory
leaks" (unbounded data growth).
3. Scope
In Scope:
- Analyzing and profiling the memory usage of the
@google/gemini-cli-coreand@google/gemini-clipackages. - Identifying and resolving memory leaks (e.g., un-deregistered event listeners).
- Implementing bounded memory for unbounded data structures (e.g., chat history, activity logs, tool execution results).
- Optimizing data serialization/deserialization and large string handling.
- Creating automated memory profiling scripts and validation workflows.
Out of Scope:
- Rewriting the CLI in another language (e.g., Rust/Go).
- Removing core features or aggressively truncating the LLM context window (unless specifically configured by the user).
4. Key Results & Metrics
- Peak Memory Usage: Reduce peak memory usage (
RSS) during a 4-hour simulated session from ~2.0GB to < 500MB. - Baseline Memory: Ensure baseline memory after forced garbage collection remains flat (does not grow linearly with the number of turns).
- Quality Gates: 100% of existing unit, integration (E2E), and preflight
tests (
npm run preflight) must pass.
5. Technical Approach & Hypotheses
- Unbounded History Retention: The agent's session history stores full
payloads of every tool execution (e.g.,
read_fileof a 5MB file, or verboserun_shell_commandoutputs).- Mitigation: Implement aggressive in-memory truncation for older turns that are no longer sent to the model, or offload historical payloads to temporary disk files.
- React/Ink Memory Leaks in CLI UI: Unmounted Ink components might not be
garbage collected if references are held in global state, context providers,
or event listeners.
- Mitigation: Audit
useEffectcleanup functions and global event listener deregistration in UI components.
- Mitigation: Audit
- DevTools / Logger Retention: The
activityLogger.tsor telemetry systems might buffer unbounded amounts of events in memory before flushing.- Mitigation: Ensure logs are streamed directly to disk or the WebSocket without retaining a massive ring buffer in memory.
6. Testing & Validation Strategy
To validate memory usage, we must simulate a heavy session, measure memory, and ensure correctness.
6.1 Creating the Memory Profiling Script
Create a script scripts/simulate-long-session.ts to programmatically drive the
CLI and measure memory growth.
// scripts/simulate-long-session.ts
import { exec } from 'child_process';
import * as v8 from 'v8';
import * as fs from 'fs';
// Helper to force GC if run with --expose-gc
const runGC = () => {
if (global.gc) {
global.gc();
}
};
const printMemory = (turn: number) => {
runGC();
const usage = process.memoryUsage();
console.log(`Turn ${turn} - RSS: ${(usage.rss / 1024 / 1024).toFixed(2)} MB, HeapUsed: ${(usage.heapUsed / 1024 / 1024).toFixed(2)} MB`);
};
async function runSimulation() {
console.log("Starting memory simulation...");
// Simulate 100 heavy turns
for (let i = 1; i <= 100; i++) {
// Inject mock messages or trigger SDK agent actions here
// e.g. agent.processInput("Read a large file and summarize it")
// Simulate heavy string allocation
const dummyData = "A".repeat(1024 * 1024 * 10); // 10MB dummy data
printMemory(i);
// Periodically take heap snapshots
if (i % 25 === 0) {
const snapshotName = \`heap-snapshot-turn-\${i}.heapsnapshot\`;
v8.writeHeapSnapshot(snapshotName);
console.log(\`Saved \${snapshotName}\`);
}
}
}
runSimulation();
6.2 Steps to Validate Memory Usage
- Establish the Baseline:
- Run the simulation script on the
mainbranch to capture the baseline metrics. NODE_OPTIONS="--expose-gc" npx tsx scripts/simulate-long-session.ts
- Run the simulation script on the
- Heap Snapshot Analysis:
- Run the CLI manually with the inspector enabled:
npm run debug(orNODE_OPTIONS="--inspect" npm start). - Open Chrome DevTools (
chrome://inspect). - Take a baseline heap snapshot at startup.
- Run heavy tasks (e.g.,
read_fileon large files,run_shell_commandwith huge outputs). - Take a second heap snapshot.
- Compare the two snapshots in DevTools. Look for retained objects, detached DOM nodes (Ink elements), or massive string allocations.
- Run the CLI manually with the inspector enabled:
- Verify the Fixes:
- Apply the memory optimizations.
- Re-run the simulation script. The printed
HeapUsedandRSSshould flatline after a certain number of turns rather than growing linearly. - Compare the final heap snapshot size to the baseline.
6.3 Ensuring Build and Tests Pass
Memory optimization can inadvertently break functionality if data is truncated too aggressively.
- Run Targeted Tests: During development, verify core logic using targeted
tests:
npm test -w @google/gemini-cli-corenpm run test:e2e
- Run the Preflight Checks: Before creating a PR, run the exhaustive
validation suite to ensure no regressions:
npm run preflight
- E2E Validation: The existing E2E tests
(
packages/cli/integration-tests/) will verify that the CLI still behaves correctly from a user's perspective, ensuring that history truncation or memory offloading doesn't break multi-turn context.
7. Execution Plan
- Phase 1: Instrumentation & Baselines
- Implement
scripts/simulate-long-session.tsor add an eval script. - Capture baseline memory metrics and initial heap snapshots.
- Implement
- Phase 2: Analysis & Implementation
- Identify the top 3 memory retainers using Chrome DevTools.
- Implement bounded retention (e.g., capping array sizes in memory,
offloading heavy execution logs to the
.gemini/historytemp files). - Audit React/Ink components for event listener leaks.
- Phase 3: Validation & CI
- Run E2E tests to ensure behavioral parity.
- Run
npm run preflight. - Consider adding a lightweight memory-growth check to the CI pipeline to prevent future regressions.