Initial Version

2026-05-13 13:22:35 -07:00 · 2026-02-06 14:34:32 -08:00
parent a3af4a8cae
commit f18e45d34d
8 changed files with 418 additions and 13 deletions
@@ -124,6 +124,8 @@ npm install -g @google/gemini-cli@nightly
 ### Advanced Capabilities
 - **Automated Iterative Loops**: Use [Ralph Wiggum mode](./docs/ralph-wiggum.md)
  to repeatedly execute prompts until a goal is met (e.g., fixing tests).
 - Ground your queries with built-in
  [Google Search](https://ai.google.dev/gemini-api/docs/grounding) for real-time
  information
@@ -256,6 +258,33 @@ use `--output-format stream-json` to get newline-delimited JSON events:
 gemini -p "Run tests and deploy" --output-format stream-json
 ```
 ### Ralph Wiggum Mode (Iterative Automation)
 Ralph Wiggum mode is an advanced automation feature that allows Gemini CLI to
 repeatedly execute a prompt in a loop until a specific goal is achieved. This is
 ideal for tasks like fixing failing tests or complex refactoring.
 To use Ralph Wiggum mode, provide a prompt and a **completion promise** (a
 string to look for in the output). The CLI will:
 1.  Enter **YOLO mode** to auto-approve all tool calls.
 2.  Run the prompt and check the response for your completion string.
 3.  If not found, it repeats the process up to a specified **max iterations**.
 4.  **Persistent Context**: It uses a **memory file** (`memories.md` by default)
    to pass notes between iterations. **Note:** Use a unique `--memory-file` for
    different tasks in the same directory to ensure context isolation.
 ```bash
 gemini -p "Fix the failing tests in this repo" \
  --ralph-wiggum \
  --completion-promise "ALL TESTS PASSED" \
  --max-iterations 5 \
  --memory-file "task-fix-tests.md"
 ```
 At the end of the run, a summary table displays the result of each iteration and
 extracted test statistics.
 ### Quick Examples
 #### Start a new project
@@ -35,6 +35,9 @@ and parameters.
 | `--model`                        | `-m`  | string  | `auto`    | Model to use. See [Model Selection](#model-selection) for available values.                                |
 | `--prompt`                       | `-p`  | string  | -         | Prompt text. Appended to stdin input if provided. **Deprecated:** Use positional arguments instead.        |
 | `--prompt-interactive`           | `-i`  | string  | -         | Execute prompt and continue in interactive mode                                                            |
 | `--ralph-wiggum`                 | -     | boolean | `false`   | Enable Ralph Wiggum iterative loop mode                                                                    |
 | `--completion-promise`           | -     | string  | -         | String to look for to signal completion in Ralph Wiggum mode                                               |
 | `--max-iterations`               | -     | number  | `10`      | Maximum loop iterations for Ralph Wiggum mode                                                              |
 | `--sandbox`                      | `-s`  | boolean | `false`   | Run in a sandboxed environment for safer execution                                                         |
 | `--approval-mode`                | -     | string  | `default` | Approval mode for tool execution. Choices: `default`, `auto_edit`, `yolo`                                  |
 | `--yolo`                         | `-y`  | boolean | `false`   | **Deprecated.** Auto-approve all actions. Use `--approval-mode=yolo` instead.                              |
@@ -0,0 +1,141 @@
 # Ralph Wiggum mode
 Ralph Wiggum mode is an iterative automation technique that lets Gemini CLI
 repeatedly execute a prompt until a specific goal is met. This mode is designed
 for tasks that benefit from persistent refinement, such as fixing failing tests
 or performing complex refactoring.
 > **Note:** This is a preview feature currently under active development.
 ## Overview
 Inspired by the "Ralph Wiggum" technique, this mode treats failures as data and
 uses a feedback loop to reach a successful state. When you enable Ralph Wiggum
 mode, Gemini CLI enters YOLO (auto-approval) mode and continues to process the
 provided prompt until it detects your specified completion string in the model's
 output or reaches the maximum number of iterations.
 ## Usage
 To use Ralph Wiggum mode, you must provide a prompt using the `-p` or `--prompt`
 flag. You then configure the loop behavior using the following flags:
 | Flag                   | Description                                                |
 | :--------------------- | :--------------------------------------------------------- |
 | `--ralph-wiggum`       | Enables the Ralph Wiggum iterative loop mode.              |
 | `--completion-promise` | The string to look for in the output to signal completion. |
 | `--max-iterations`     | The maximum number of times to run the loop (default: 10). |
 | `--memory-file`        | Task-specific memory file (default: `memories.md`).        |
 ### Example
 The following command attempts to fix tests by running the loop up to 5 times
 until the string "TESTS PASSED" appears in the output, using a specific memory
 file for this task:
 ```bash
 gemini -p "Fix the tests in packages/core" \
  --ralph-wiggum \
  --completion-promise "TESTS PASSED" \
  --max-iterations 5 \
  --memory-file "fix-core-tests.md"
 ```
 ## How it works
 When you run Gemini CLI with the `--ralph-wiggum` flag, the following process
 occurs:
 1.  **Enforces YOLO mode:** The tool automatically sets the approval mode to
    `yolo`. This ensures that tool calls (like writing files or running shell
    commands) are approved automatically to allow the automation to proceed
    without human intervention.
 2.  **Iterative execution:** The CLI executes the provided prompt in a loop.
 3.  **Completion check:** After each iteration, the CLI scans the full text of
    the assistant's response for the string provided in `--completion-promise`.
 4.  **Loop termination:**
    - If the completion string is found, the loop exits successfully.
    - If the completion string is not found, the CLI starts a new iteration
      using the same initial prompt.
    - If the number of iterations reaches the `--max-iterations` limit, the loop
      stops.
 ## Persistent context (Memories)
 To help the agent learn from previous attempts, Ralph Wiggum mode uses a
 `memories.md` file in your current working directory.
 - **Automatic creation:** If the file doesn't exist, the CLI creates it with a
  default header.
 - **Context injection:** At the start of each iteration, the content of
  `memories.md` is read and prepended to your prompt.
 - **Usage:** You (or the agent, via tool use) can write notes, error logs, or
  successful patterns into this file. This allows the agent to "remember" what
  failed in iteration 1 and avoid repeating the same mistake in iteration 2.
 ## Summary statistics
 At the end of the execution, Ralph Wiggum mode provides a summary table in the
 terminal. This table details the performance of each iteration, including:
 - **Iteration number:** The sequence of the run.
 - **Status:** Whether the iteration met the completion promise ("Success") or
  failed to do so ("Failed").
 - **Tests Passed/Failed:** If the output contains recognizable test runner
  patterns (such as those from Vitest, Jest, or Mocha), the CLI extracts and
  displays the number of passing and failing tests.
 ### Example summary table
 ```text
 --- Ralph Wiggum Mode Summary ---
 | Iteration | Status  | Tests Passed | Tests Failed |
 |-----------|---------|--------------|--------------|
 | 1         | Failed  | 2            | 10           |
 | 2         | Failed  | 8            | 4            |
 | 3         | Success | 12           | 0            |
 ---------------------------------
 ```
 ## Best practices
 To get the most out of Ralph Wiggum mode, we recommend the following:
 - **Clear completion criteria:** Ensure your prompt instructs the model to emit
  a specific, unique string (like "ALL TESTS PASSED") only when the task is
  truly complete.
 - **Incremental goals:** Use prompts that encourage the model to make small,
  verifiable changes in each iteration.
 - **Safety nets:** Always set a reasonable `--max-iterations` limit to prevent
  unintended long-running processes.
 ## Development and rebuilding
 If you're modifying Ralph Wiggum mode or enabling it in a development
 environment, you must recompile the TypeScript source code.
 ### Full rebuild
 To build all packages in the monorepo, run the following command from the root
 directory:
 ```bash
 npm run build
 ```
 ### Fast CLI rebuild
 If you've already performed a full build and are only making changes to the CLI
 package, you can run a targeted build:
 ```bash
 npm run build -w @google/gemini-cli
 ```
 ### Running in development
 After rebuilding, test your changes using the `npm run start` script:
 ```bash
 npm run start -- -p "Your task" --ralph-wiggum --completion-promise "SUCCESS"
 ```
@@ -45,6 +45,7 @@
      { "label": "Custom commands", "slug": "docs/cli/custom-commands" },
      { "label": "Enterprise features", "slug": "docs/cli/enterprise" },
      { "label": "Headless mode & scripting", "slug": "docs/cli/headless" },
      { "label": "Ralph Wiggum mode", "slug": "docs/ralph-wiggum" },
      { "label": "Sandboxing", "slug": "docs/cli/sandbox" },
      { "label": "System prompt override", "slug": "docs/cli/system-prompt" },
      { "label": "Telemetry", "slug": "docs/cli/telemetry" }
@@ -70,6 +70,11 @@ export interface CliArgs {
  prompt: string | undefined;
  promptInteractive: string | undefined;
  ralphWiggum: boolean | undefined;
  completionPromise: string | undefined;
  maxIterations: number | undefined;
  memoryFile: string | undefined;
  yolo: boolean | undefined;
  approvalMode: string | undefined;
  allowedMcpServerNames: string[] | undefined;
@@ -141,6 +146,31 @@ export async function parseArguments(
          description: 'Run in sandbox?',
        })
        .option('ralph-wiggum', {
          alias: 'ralphWiggum',
          type: 'boolean',
          description:
            'Enable Ralph Wiggum mode (iterative loop with YOLO mode).',
        })
        .option('completion-promise', {
          alias: 'completionPromise',
          type: 'string',
          description:
            'The string to look for in the output to signal completion in Ralph Wiggum mode.',
        })
        .option('max-iterations', {
          alias: 'maxIterations',
          type: 'number',
          description: 'Maximum number of iterations for Ralph Wiggum mode.',
        })
        .option('memory-file', {
          alias: 'memoryFile',
          type: 'string',
          description:
            'Task-specific memory file for Ralph Wiggum mode (defaults to memories.md).',
          default: 'memories.md',
        })
        .option('yolo', {
          alias: 'y',
          type: 'boolean',
@@ -476,6 +476,10 @@ describe('gemini.tsx main function kitty protocol', () => {
      prompt: undefined,
      promptInteractive: undefined,
      query: undefined,
      ralphWiggum: undefined,
      completionPromise: undefined,
      maxIterations: undefined,
      memoryFile: undefined,
      yolo: undefined,
      approvalMode: undefined,
      allowedMcpServerNames: undefined,
@@ -24,7 +24,7 @@ import { loadSettings, SettingScope } from './config/settings.js';
 import { getStartupWarnings } from './utils/startupWarnings.js';
 import { getUserStartupWarnings } from './utils/userStartupWarnings.js';
 import { ConsolePatcher } from './ui/utils/ConsolePatcher.js';
-import { runNonInteractive } from './nonInteractiveCli.js';
+import { runNonInteractive, runRalphWiggum } from './nonInteractiveCli.js';
 import {
  cleanupCheckpoints,
  registerCleanup,
@@ -740,13 +740,25 @@ export async function main() {
    initializeOutputListenersAndFlush();
-    await runNonInteractive({
+    if (argv.ralphWiggum) {
-      config,
+      await runRalphWiggum({
-      settings,
+        config,
-      input,
+        settings,
-      prompt_id,
+        input,
-      resumedSessionData,
+        prompt_id,
-    });
+        resumedSessionData,
        completionPromise: argv.completionPromise,
        maxIterations: argv.maxIterations,
      });
    } else {
      await runNonInteractive({
        config,
        settings,
        input,
        prompt_id,
        resumedSessionData,
      });
    }
    // Call cleanup before process.exit, which causes cleanup to not run
    await runExitCleanup();
    process.exit(ExitCodes.SUCCESS);
@@ -55,13 +55,187 @@ interface RunNonInteractiveParams {
  resumedSessionData?: ResumedSessionData;
 }
 interface IterationResult {
  iteration: number;
  status: 'Success' | 'Failed';
  testsPassed?: number;
  testsFailed?: number;
  testsTotal?: number;
 }
 function extractTestStats(output: string): {
  passed?: number;
  failed?: number;
  total?: number;
 } {
  // Common patterns for test runners (Vitest, Jest, Mocha, etc.)
  const patterns = [
    // Vitest/Jest: "Tests:       3 passed, 1 failed, 4 total"
    /Tests:\s*(?:(\d+)\s+passed)?(?:,\s*)?(?:(\d+)\s+failed)?(?:,\s*)?(?:(\d+)\s+total)?/i,
    // Mocha: "3 passing (10ms)"
    /(\d+)\s+passing/i,
    // Mocha: "1 failing"
    /(\d+)\s+failing/i,
    // Generic: "Passed: 3, Failed: 1"
    /Passed:\s*(\d+)/i,
    /Failed:\s*(\d+)/i,
  ];
  let passed: number | undefined;
  let failed: number | undefined;
  let total: number | undefined;
  // Try Vitest/Jest pattern first as it is most comprehensive
  const vitestMatch = output.match(patterns[0]);
  if (vitestMatch && (vitestMatch[1] || vitestMatch[2] || vitestMatch[3])) {
    passed = vitestMatch[1] ? parseInt(vitestMatch[1], 10) : 0;
    failed = vitestMatch[2] ? parseInt(vitestMatch[2], 10) : 0;
    total = vitestMatch[3] ? parseInt(vitestMatch[3], 10) : 0;
    return { passed, failed, total };
  }
  // Fallback to individual patterns
  const passingMatch = output.match(patterns[1]);
  if (passingMatch) {
    passed = parseInt(passingMatch[1], 10);
  } else {
    const passedMatch = output.match(patterns[3]);
    if (passedMatch) passed = parseInt(passedMatch[1], 10);
  }
  const failingMatch = output.match(patterns[2]);
  if (failingMatch) {
    failed = parseInt(failingMatch[1], 10);
  } else {
    const failedMatch = output.match(patterns[4]);
    if (failedMatch) failed = parseInt(failedMatch[1], 10);
  }
  return { passed, failed, total };
 }
 function printSummary(results: IterationResult[]) {
  process.stderr.write('\n--- Ralph Wiggum Mode Summary ---\n');
  process.stderr.write(
    '| Iteration | Status  | Tests Passed | Tests Failed |\n',
  );
  process.stderr.write(
    '|-----------|---------|--------------|--------------|\n',
  );
  for (const result of results) {
    const passed = result.testsPassed !== undefined ? result.testsPassed : '-';
    const failed = result.testsFailed !== undefined ? result.testsFailed : '-';
    process.stderr.write(
      `| ${result.iteration.toString().padEnd(9)} | ${result.status.padEnd(7)} | ${passed.toString().padEnd(12)} | ${failed.toString().padEnd(12)} |\n`,
    );
  }
  process.stderr.write('---------------------------------\n\n');
 }
 import fs from 'node:fs';
 import path from 'node:path';
 // ... (existing imports)
 export async function runRalphWiggum({
  config,
  settings,
  input,
  prompt_id,
  resumedSessionData,
  completionPromise,
  maxIterations,
  memoryFile,
 }: RunNonInteractiveParams & {
  completionPromise?: string;
  maxIterations?: number;
  memoryFile?: string;
 }): Promise<void> {
  const effectiveMaxIterations = maxIterations ?? 10;
  let iterations = 0;
  let currentResumedSessionData = resumedSessionData;
  const results: IterationResult[] = [];
  const effectiveMemoryFile = memoryFile || 'memories.md';
  const memoriesPath = path.join(process.cwd(), effectiveMemoryFile);
  if (!fs.existsSync(memoriesPath)) {
    fs.writeFileSync(
      memoriesPath,
      `# Ralph Wiggum Memories\n\nTask: ${input}\n\nUse this file (${effectiveMemoryFile}) to store notes on what worked and what didn't work across iterations. The agent will read this at the start of each run.\n\n`,
    );
  }
  process.stderr.write(
    `[Ralph Wiggum] Starting loop. Max iterations: ${effectiveMaxIterations}\n`,
  );
  while (iterations < effectiveMaxIterations) {
    iterations++;
    process.stderr.write(
      `[Ralph Wiggum] Iteration ${iterations}/${effectiveMaxIterations}\n`,
    );
    let currentInput = input;
    try {
      if (fs.existsSync(memoriesPath)) {
        const memories = fs.readFileSync(memoriesPath, 'utf-8');
        if (memories.trim()) {
          currentInput = `Context from previous iterations (${effectiveMemoryFile}):\n${memories}\n\nTask:\n${input}`;
          process.stderr.write(
            `[Ralph Wiggum] Loaded context from ${effectiveMemoryFile}\n`,
          );
        }
      }
    } catch (error) {
      process.stderr.write(
        `[Ralph Wiggum] Failed to read ${effectiveMemoryFile}: ${error}\n`,
      );
    }
    const output = await runNonInteractive({
      config,
      settings,
      input: currentInput,
      prompt_id,
      resumedSessionData: currentResumedSessionData,
    });
    const stats = extractTestStats(output);
    const success =
      completionPromise && output.includes(completionPromise) ? true : false;
    results.push({
      iteration: iterations,
      status: success ? 'Success' : 'Failed',
      testsPassed: stats.passed,
      testsFailed: stats.failed,
      testsTotal: stats.total,
    });
    if (success) {
      process.stderr.write(
        `[Ralph Wiggum] Completion promise "${completionPromise}" met. Exiting.\n`,
      );
      printSummary(results);
      return;
    }
    // Clear resumedSessionData so we don't try to resume partially through
    currentResumedSessionData = undefined;
  }
  process.stderr.write(
    `[Ralph Wiggum] Max iterations reached without meeting completion promise.\n`,
  );
  printSummary(results);
 }
 export async function runNonInteractive({
  config,
  settings,
  input,
  prompt_id,
  resumedSessionData,
-}: RunNonInteractiveParams): Promise<void> {
+}: RunNonInteractiveParams): Promise<string> {
  return promptIdContext.run(prompt_id, async () => {
    const consolePatcher = new ConsolePatcher({
      stderr: true,
@@ -181,6 +355,9 @@ export async function runNonInteractive({
      }
    };
    // Store accumulated response text to return
    let fullResponseText = '';
    let errorToHandle: unknown | undefined;
    try {
      consolePatcher.patch();
@@ -316,6 +493,13 @@ export async function runNonInteractive({
            const isRaw =
              config.getRawOutput() || config.getAcceptRawOutputRisk();
            const output = isRaw ? event.value : stripAnsi(event.value);
            // Accumulate full response
            if (event.value) {
              fullResponseText += event.value;
              responseText += output;
            }
            if (streamFormatter) {
              streamFormatter.emitEvent({
                type: JsonStreamEventType.MESSAGE,
@@ -325,7 +509,7 @@ export async function runNonInteractive({
                delta: true,
              });
            } else if (config.getOutputFormat() === OutputFormat.JSON) {
-              responseText += output;
+              // responseText is already updated
            } else {
              if (event.value) {
                textOutput.write(output);
@@ -381,7 +565,7 @@ export async function runNonInteractive({
                ),
              });
            }
-            return;
+            return fullResponseText;
          } else if (event.type === GeminiEventType.AgentExecutionBlocked) {
            const blockMessage = `Agent execution blocked: ${event.value.systemMessage?.trim() || event.value.reason}`;
            if (config.getOutputFormat() === OutputFormat.TEXT) {
@@ -488,7 +672,7 @@ export async function runNonInteractive({
            } else {
              textOutput.ensureTrailingNewline(); // Ensure a final newline
            }
-            return;
+            return fullResponseText;
          }
          currentMessages = [{ role: 'user', parts: toolResponseParts }];
@@ -512,7 +696,7 @@ export async function runNonInteractive({
          } else {
            textOutput.ensureTrailingNewline(); // Ensure a final newline
          }
-          return;
+          return fullResponseText;
        }
      }
    } catch (error) {
@@ -528,5 +712,6 @@ export async function runNonInteractive({
    if (errorToHandle) {
      handleError(errorToHandle, config);
    }
    return fullResponseText;
  });
 }