mirror of
https://github.com/google-gemini/gemini-cli.git
synced 2026-06-11 11:57:03 -07:00
174 lines
6.7 KiB
Markdown
174 lines
6.7 KiB
Markdown
# Product Requirements Document (PRD): Gemini CLI Memory Optimization
|
|
|
|
## 1. Objective
|
|
|
|
Reduce the memory footprint of `gemini-cli` during long-running sessions
|
|
(multi-hour) from the current peak of ~2GB down to a sustainable baseline (e.g.,
|
|
< 500MB), without degrading existing functionality, user experience, or context
|
|
awareness.
|
|
|
|
## 2. Problem Statement
|
|
|
|
Users experience high memory consumption (up to 2GB) when running `gemini-cli`
|
|
for extended periods. High memory usage leads to sluggish terminal
|
|
responsiveness, system swapping, increased GC (Garbage Collection) pauses, and
|
|
eventually OOM (Out of Memory) crashes. Node.js applications that retain large
|
|
amounts of execution history, tool results (like large shell outputs or file
|
|
reads), and conversational context in memory often suffer from "soft memory
|
|
leaks" (unbounded data growth).
|
|
|
|
## 3. Scope
|
|
|
|
**In Scope:**
|
|
|
|
- Analyzing and profiling the memory usage of the `@google/gemini-cli-core` and
|
|
`@google/gemini-cli` packages.
|
|
- Identifying and resolving memory leaks (e.g., un-deregistered event
|
|
listeners).
|
|
- Implementing bounded memory for unbounded data structures (e.g., chat history,
|
|
activity logs, tool execution results).
|
|
- Optimizing data serialization/deserialization and large string handling.
|
|
- Creating automated memory profiling scripts and validation workflows.
|
|
|
|
**Out of Scope:**
|
|
|
|
- Rewriting the CLI in another language (e.g., Rust/Go).
|
|
- Removing core features or aggressively truncating the LLM context window
|
|
(unless specifically configured by the user).
|
|
|
|
## 4. Key Results & Metrics
|
|
|
|
- **Peak Memory Usage:** Reduce peak memory usage (`RSS`) during a 4-hour
|
|
simulated session from ~2.0GB to < 500MB.
|
|
- **Baseline Memory:** Ensure baseline memory after forced garbage collection
|
|
remains flat (does not grow linearly with the number of turns).
|
|
- **Quality Gates:** 100% of existing unit, integration (E2E), and preflight
|
|
tests (`npm run preflight`) must pass.
|
|
|
|
## 5. Technical Approach & Hypotheses
|
|
|
|
1. **Unbounded History Retention:** The agent's session history stores full
|
|
payloads of every tool execution (e.g., `read_file` of a 5MB file, or verbose
|
|
`run_shell_command` outputs).
|
|
- _Mitigation:_ Implement aggressive in-memory truncation for older turns
|
|
that are no longer sent to the model, or offload historical payloads to
|
|
temporary disk files.
|
|
2. **React/Ink Memory Leaks in CLI UI:** Unmounted Ink components might not be
|
|
garbage collected if references are held in global state, context providers,
|
|
or event listeners.
|
|
- _Mitigation:_ Audit `useEffect` cleanup functions and global event listener
|
|
deregistration in UI components.
|
|
3. **DevTools / Logger Retention:** The `activityLogger.ts` or telemetry systems
|
|
might buffer unbounded amounts of events in memory before flushing.
|
|
- _Mitigation:_ Ensure logs are streamed directly to disk or the WebSocket
|
|
without retaining a massive ring buffer in memory.
|
|
|
|
## 6. Testing & Validation Strategy
|
|
|
|
To validate memory usage, we must simulate a heavy session, measure memory, and
|
|
ensure correctness.
|
|
|
|
### 6.1 Creating the Memory Profiling Script
|
|
|
|
Create a script `scripts/simulate-long-session.ts` to programmatically drive the
|
|
CLI and measure memory growth.
|
|
|
|
```typescript
|
|
// scripts/simulate-long-session.ts
|
|
import { exec } from 'child_process';
|
|
import * as v8 from 'v8';
|
|
import * as fs from 'fs';
|
|
|
|
// Helper to force GC if run with --expose-gc
|
|
const runGC = () => {
|
|
if (global.gc) {
|
|
global.gc();
|
|
}
|
|
};
|
|
|
|
const printMemory = (turn: number) => {
|
|
runGC();
|
|
const usage = process.memoryUsage();
|
|
console.log(`Turn ${turn} - RSS: ${(usage.rss / 1024 / 1024).toFixed(2)} MB, HeapUsed: ${(usage.heapUsed / 1024 / 1024).toFixed(2)} MB`);
|
|
};
|
|
|
|
async function runSimulation() {
|
|
console.log("Starting memory simulation...");
|
|
// Simulate 100 heavy turns
|
|
for (let i = 1; i <= 100; i++) {
|
|
// Inject mock messages or trigger SDK agent actions here
|
|
// e.g. agent.processInput("Read a large file and summarize it")
|
|
|
|
// Simulate heavy string allocation
|
|
const dummyData = "A".repeat(1024 * 1024 * 10); // 10MB dummy data
|
|
|
|
printMemory(i);
|
|
|
|
// Periodically take heap snapshots
|
|
if (i % 25 === 0) {
|
|
const snapshotName = \`heap-snapshot-turn-\${i}.heapsnapshot\`;
|
|
v8.writeHeapSnapshot(snapshotName);
|
|
console.log(\`Saved \${snapshotName}\`);
|
|
}
|
|
}
|
|
}
|
|
|
|
runSimulation();
|
|
```
|
|
|
|
### 6.2 Steps to Validate Memory Usage
|
|
|
|
1. **Establish the Baseline:**
|
|
- Run the simulation script on the `main` branch to capture the baseline
|
|
metrics.
|
|
- `NODE_OPTIONS="--expose-gc" npx tsx scripts/simulate-long-session.ts`
|
|
2. **Heap Snapshot Analysis:**
|
|
- Run the CLI manually with the inspector enabled: `npm run debug` (or
|
|
`NODE_OPTIONS="--inspect" npm start`).
|
|
- Open Chrome DevTools (`chrome://inspect`).
|
|
- Take a baseline heap snapshot at startup.
|
|
- Run heavy tasks (e.g., `read_file` on large files, `run_shell_command` with
|
|
huge outputs).
|
|
- Take a second heap snapshot.
|
|
- Compare the two snapshots in DevTools. Look for retained objects, detached
|
|
DOM nodes (Ink elements), or massive string allocations.
|
|
3. **Verify the Fixes:**
|
|
- Apply the memory optimizations.
|
|
- Re-run the simulation script. The printed `HeapUsed` and `RSS` should
|
|
flatline after a certain number of turns rather than growing linearly.
|
|
- Compare the final heap snapshot size to the baseline.
|
|
|
|
### 6.3 Ensuring Build and Tests Pass
|
|
|
|
Memory optimization can inadvertently break functionality if data is truncated
|
|
too aggressively.
|
|
|
|
1. **Run Targeted Tests:** During development, verify core logic using targeted
|
|
tests:
|
|
- `npm test -w @google/gemini-cli-core`
|
|
- `npm run test:e2e`
|
|
2. **Run the Preflight Checks:** Before creating a PR, run the exhaustive
|
|
validation suite to ensure no regressions:
|
|
- `npm run preflight`
|
|
3. **E2E Validation:** The existing E2E tests
|
|
(`packages/cli/integration-tests/`) will verify that the CLI still behaves
|
|
correctly from a user's perspective, ensuring that history truncation or
|
|
memory offloading doesn't break multi-turn context.
|
|
|
|
## 7. Execution Plan
|
|
|
|
- [x] **Phase 1: Instrumentation & Baselines**
|
|
- [x] Implement `scripts/simulate-long-session.ts` or add an eval script.
|
|
- [x] Capture baseline memory metrics and initial heap snapshots.
|
|
|
|
2. **Phase 2: Analysis & Implementation**
|
|
- Identify the top 3 memory retainers using Chrome DevTools.
|
|
- Implement bounded retention (e.g., capping array sizes in memory,
|
|
offloading heavy execution logs to the `.gemini/history` temp files).
|
|
- Audit React/Ink components for event listener leaks.
|
|
3. **Phase 3: Validation & CI**
|
|
- Run E2E tests to ensure behavioral parity.
|
|
- Run `npm run preflight`.
|
|
- Consider adding a lightweight memory-growth check to the CI pipeline to
|
|
prevent future regressions.
|