diff --git a/README.md b/README.md index 22e258e289..9eaba19eee 100644 --- a/README.md +++ b/README.md @@ -124,6 +124,8 @@ npm install -g @google/gemini-cli@nightly ### Advanced Capabilities +- **Automated Iterative Loops**: Use [Ralph Wiggum mode](./docs/ralph-wiggum.md) + to repeatedly execute prompts until a goal is met (e.g., fixing tests). - Ground your queries with built-in [Google Search](https://ai.google.dev/gemini-api/docs/grounding) for real-time information @@ -256,6 +258,33 @@ use `--output-format stream-json` to get newline-delimited JSON events: gemini -p "Run tests and deploy" --output-format stream-json ``` +### Ralph Wiggum Mode (Iterative Automation) + +Ralph Wiggum mode is an advanced automation feature that allows Gemini CLI to +repeatedly execute a prompt in a loop until a specific goal is achieved. This is +ideal for tasks like fixing failing tests or complex refactoring. + +To use Ralph Wiggum mode, provide a prompt and a **completion promise** (a +string to look for in the output). The CLI will: + +1. Enter **YOLO mode** to auto-approve all tool calls. +2. Run the prompt and check the response for your completion string. +3. If not found, it repeats the process up to a specified **max iterations**. +4. **Persistent Context**: It uses a **memory file** (`memories.md` by default) + to pass notes between iterations. **Note:** Use a unique `--memory-file` for + different tasks in the same directory to ensure context isolation. + +```bash +gemini -p "Fix the failing tests in this repo" \ + --ralph-wiggum \ + --completion-promise "ALL TESTS PASSED" \ + --max-iterations 5 \ + --memory-file "task-fix-tests.md" +``` + +At the end of the run, a summary table displays the result of each iteration and +extracted test statistics. + ### Quick Examples #### Start a new project diff --git a/docs/cli/cli-reference.md b/docs/cli/cli-reference.md index d1094a15e2..440762ef5b 100644 --- a/docs/cli/cli-reference.md +++ b/docs/cli/cli-reference.md @@ -35,6 +35,9 @@ and parameters. | `--model` | `-m` | string | `auto` | Model to use. See [Model Selection](#model-selection) for available values. | | `--prompt` | `-p` | string | - | Prompt text. Appended to stdin input if provided. **Deprecated:** Use positional arguments instead. | | `--prompt-interactive` | `-i` | string | - | Execute prompt and continue in interactive mode | +| `--ralph-wiggum` | - | boolean | `false` | Enable Ralph Wiggum iterative loop mode | +| `--completion-promise` | - | string | - | String to look for to signal completion in Ralph Wiggum mode | +| `--max-iterations` | - | number | `10` | Maximum loop iterations for Ralph Wiggum mode | | `--sandbox` | `-s` | boolean | `false` | Run in a sandboxed environment for safer execution | | `--approval-mode` | - | string | `default` | Approval mode for tool execution. Choices: `default`, `auto_edit`, `yolo` | | `--yolo` | `-y` | boolean | `false` | **Deprecated.** Auto-approve all actions. Use `--approval-mode=yolo` instead. | diff --git a/docs/ralph-wiggum.md b/docs/ralph-wiggum.md new file mode 100644 index 0000000000..5f1a1b7847 --- /dev/null +++ b/docs/ralph-wiggum.md @@ -0,0 +1,141 @@ +# Ralph Wiggum mode + +Ralph Wiggum mode is an iterative automation technique that lets Gemini CLI +repeatedly execute a prompt until a specific goal is met. This mode is designed +for tasks that benefit from persistent refinement, such as fixing failing tests +or performing complex refactoring. + +> **Note:** This is a preview feature currently under active development. + +## Overview + +Inspired by the "Ralph Wiggum" technique, this mode treats failures as data and +uses a feedback loop to reach a successful state. When you enable Ralph Wiggum +mode, Gemini CLI enters YOLO (auto-approval) mode and continues to process the +provided prompt until it detects your specified completion string in the model's +output or reaches the maximum number of iterations. + +## Usage + +To use Ralph Wiggum mode, you must provide a prompt using the `-p` or `--prompt` +flag. You then configure the loop behavior using the following flags: + +| Flag | Description | +| :--------------------- | :--------------------------------------------------------- | +| `--ralph-wiggum` | Enables the Ralph Wiggum iterative loop mode. | +| `--completion-promise` | The string to look for in the output to signal completion. | +| `--max-iterations` | The maximum number of times to run the loop (default: 10). | +| `--memory-file` | Task-specific memory file (default: `memories.md`). | + +### Example + +The following command attempts to fix tests by running the loop up to 5 times +until the string "TESTS PASSED" appears in the output, using a specific memory +file for this task: + +```bash +gemini -p "Fix the tests in packages/core" \ + --ralph-wiggum \ + --completion-promise "TESTS PASSED" \ + --max-iterations 5 \ + --memory-file "fix-core-tests.md" +``` + +## How it works + +When you run Gemini CLI with the `--ralph-wiggum` flag, the following process +occurs: + +1. **Enforces YOLO mode:** The tool automatically sets the approval mode to + `yolo`. This ensures that tool calls (like writing files or running shell + commands) are approved automatically to allow the automation to proceed + without human intervention. +2. **Iterative execution:** The CLI executes the provided prompt in a loop. +3. **Completion check:** After each iteration, the CLI scans the full text of + the assistant's response for the string provided in `--completion-promise`. +4. **Loop termination:** + - If the completion string is found, the loop exits successfully. + - If the completion string is not found, the CLI starts a new iteration + using the same initial prompt. + - If the number of iterations reaches the `--max-iterations` limit, the loop + stops. + +## Persistent context (Memories) + +To help the agent learn from previous attempts, Ralph Wiggum mode uses a +`memories.md` file in your current working directory. + +- **Automatic creation:** If the file doesn't exist, the CLI creates it with a + default header. +- **Context injection:** At the start of each iteration, the content of + `memories.md` is read and prepended to your prompt. +- **Usage:** You (or the agent, via tool use) can write notes, error logs, or + successful patterns into this file. This allows the agent to "remember" what + failed in iteration 1 and avoid repeating the same mistake in iteration 2. + +## Summary statistics + +At the end of the execution, Ralph Wiggum mode provides a summary table in the +terminal. This table details the performance of each iteration, including: + +- **Iteration number:** The sequence of the run. +- **Status:** Whether the iteration met the completion promise ("Success") or + failed to do so ("Failed"). +- **Tests Passed/Failed:** If the output contains recognizable test runner + patterns (such as those from Vitest, Jest, or Mocha), the CLI extracts and + displays the number of passing and failing tests. + +### Example summary table + +```text +--- Ralph Wiggum Mode Summary --- +| Iteration | Status | Tests Passed | Tests Failed | +|-----------|---------|--------------|--------------| +| 1 | Failed | 2 | 10 | +| 2 | Failed | 8 | 4 | +| 3 | Success | 12 | 0 | +--------------------------------- +``` + +## Best practices + +To get the most out of Ralph Wiggum mode, we recommend the following: + +- **Clear completion criteria:** Ensure your prompt instructs the model to emit + a specific, unique string (like "ALL TESTS PASSED") only when the task is + truly complete. +- **Incremental goals:** Use prompts that encourage the model to make small, + verifiable changes in each iteration. +- **Safety nets:** Always set a reasonable `--max-iterations` limit to prevent + unintended long-running processes. + +## Development and rebuilding + +If you're modifying Ralph Wiggum mode or enabling it in a development +environment, you must recompile the TypeScript source code. + +### Full rebuild + +To build all packages in the monorepo, run the following command from the root +directory: + +```bash +npm run build +``` + +### Fast CLI rebuild + +If you've already performed a full build and are only making changes to the CLI +package, you can run a targeted build: + +```bash +npm run build -w @google/gemini-cli +``` + +### Running in development + +After rebuilding, test your changes using the `npm run start` script: + +```bash +npm run start -- -p "Your task" --ralph-wiggum --completion-promise "SUCCESS" +``` diff --git a/docs/sidebar.json b/docs/sidebar.json index dfbfba80e7..46d720b44e 100644 --- a/docs/sidebar.json +++ b/docs/sidebar.json @@ -45,6 +45,7 @@ { "label": "Custom commands", "slug": "docs/cli/custom-commands" }, { "label": "Enterprise features", "slug": "docs/cli/enterprise" }, { "label": "Headless mode & scripting", "slug": "docs/cli/headless" }, + { "label": "Ralph Wiggum mode", "slug": "docs/ralph-wiggum" }, { "label": "Sandboxing", "slug": "docs/cli/sandbox" }, { "label": "System prompt override", "slug": "docs/cli/system-prompt" }, { "label": "Telemetry", "slug": "docs/cli/telemetry" } diff --git a/packages/cli/src/config/config.ts b/packages/cli/src/config/config.ts index 6ddaada892..ae841099ad 100755 --- a/packages/cli/src/config/config.ts +++ b/packages/cli/src/config/config.ts @@ -70,6 +70,11 @@ export interface CliArgs { prompt: string | undefined; promptInteractive: string | undefined; + ralphWiggum: boolean | undefined; + completionPromise: string | undefined; + maxIterations: number | undefined; + memoryFile: string | undefined; + yolo: boolean | undefined; approvalMode: string | undefined; allowedMcpServerNames: string[] | undefined; @@ -141,6 +146,31 @@ export async function parseArguments( description: 'Run in sandbox?', }) + .option('ralph-wiggum', { + alias: 'ralphWiggum', + type: 'boolean', + description: + 'Enable Ralph Wiggum mode (iterative loop with YOLO mode).', + }) + .option('completion-promise', { + alias: 'completionPromise', + type: 'string', + description: + 'The string to look for in the output to signal completion in Ralph Wiggum mode.', + }) + .option('max-iterations', { + alias: 'maxIterations', + type: 'number', + description: 'Maximum number of iterations for Ralph Wiggum mode.', + }) + .option('memory-file', { + alias: 'memoryFile', + type: 'string', + description: + 'Task-specific memory file for Ralph Wiggum mode (defaults to memories.md).', + default: 'memories.md', + }) + .option('yolo', { alias: 'y', type: 'boolean', diff --git a/packages/cli/src/gemini.test.tsx b/packages/cli/src/gemini.test.tsx index 41f9978d7c..ecb74c791f 100644 --- a/packages/cli/src/gemini.test.tsx +++ b/packages/cli/src/gemini.test.tsx @@ -476,6 +476,10 @@ describe('gemini.tsx main function kitty protocol', () => { prompt: undefined, promptInteractive: undefined, query: undefined, + ralphWiggum: undefined, + completionPromise: undefined, + maxIterations: undefined, + memoryFile: undefined, yolo: undefined, approvalMode: undefined, allowedMcpServerNames: undefined, diff --git a/packages/cli/src/gemini.tsx b/packages/cli/src/gemini.tsx index 494b857656..ab64420b3a 100644 --- a/packages/cli/src/gemini.tsx +++ b/packages/cli/src/gemini.tsx @@ -24,7 +24,7 @@ import { loadSettings, SettingScope } from './config/settings.js'; import { getStartupWarnings } from './utils/startupWarnings.js'; import { getUserStartupWarnings } from './utils/userStartupWarnings.js'; import { ConsolePatcher } from './ui/utils/ConsolePatcher.js'; -import { runNonInteractive } from './nonInteractiveCli.js'; +import { runNonInteractive, runRalphWiggum } from './nonInteractiveCli.js'; import { cleanupCheckpoints, registerCleanup, @@ -740,13 +740,25 @@ export async function main() { initializeOutputListenersAndFlush(); - await runNonInteractive({ - config, - settings, - input, - prompt_id, - resumedSessionData, - }); + if (argv.ralphWiggum) { + await runRalphWiggum({ + config, + settings, + input, + prompt_id, + resumedSessionData, + completionPromise: argv.completionPromise, + maxIterations: argv.maxIterations, + }); + } else { + await runNonInteractive({ + config, + settings, + input, + prompt_id, + resumedSessionData, + }); + } // Call cleanup before process.exit, which causes cleanup to not run await runExitCleanup(); process.exit(ExitCodes.SUCCESS); diff --git a/packages/cli/src/nonInteractiveCli.ts b/packages/cli/src/nonInteractiveCli.ts index a2ca92a4e8..9afba010d7 100644 --- a/packages/cli/src/nonInteractiveCli.ts +++ b/packages/cli/src/nonInteractiveCli.ts @@ -55,13 +55,187 @@ interface RunNonInteractiveParams { resumedSessionData?: ResumedSessionData; } +interface IterationResult { + iteration: number; + status: 'Success' | 'Failed'; + testsPassed?: number; + testsFailed?: number; + testsTotal?: number; +} + +function extractTestStats(output: string): { + passed?: number; + failed?: number; + total?: number; +} { + // Common patterns for test runners (Vitest, Jest, Mocha, etc.) + const patterns = [ + // Vitest/Jest: "Tests: 3 passed, 1 failed, 4 total" + /Tests:\s*(?:(\d+)\s+passed)?(?:,\s*)?(?:(\d+)\s+failed)?(?:,\s*)?(?:(\d+)\s+total)?/i, + // Mocha: "3 passing (10ms)" + /(\d+)\s+passing/i, + // Mocha: "1 failing" + /(\d+)\s+failing/i, + // Generic: "Passed: 3, Failed: 1" + /Passed:\s*(\d+)/i, + /Failed:\s*(\d+)/i, + ]; + + let passed: number | undefined; + let failed: number | undefined; + let total: number | undefined; + + // Try Vitest/Jest pattern first as it is most comprehensive + const vitestMatch = output.match(patterns[0]); + if (vitestMatch && (vitestMatch[1] || vitestMatch[2] || vitestMatch[3])) { + passed = vitestMatch[1] ? parseInt(vitestMatch[1], 10) : 0; + failed = vitestMatch[2] ? parseInt(vitestMatch[2], 10) : 0; + total = vitestMatch[3] ? parseInt(vitestMatch[3], 10) : 0; + return { passed, failed, total }; + } + + // Fallback to individual patterns + const passingMatch = output.match(patterns[1]); + if (passingMatch) { + passed = parseInt(passingMatch[1], 10); + } else { + const passedMatch = output.match(patterns[3]); + if (passedMatch) passed = parseInt(passedMatch[1], 10); + } + + const failingMatch = output.match(patterns[2]); + if (failingMatch) { + failed = parseInt(failingMatch[1], 10); + } else { + const failedMatch = output.match(patterns[4]); + if (failedMatch) failed = parseInt(failedMatch[1], 10); + } + + return { passed, failed, total }; +} + +function printSummary(results: IterationResult[]) { + process.stderr.write('\n--- Ralph Wiggum Mode Summary ---\n'); + process.stderr.write( + '| Iteration | Status | Tests Passed | Tests Failed |\n', + ); + process.stderr.write( + '|-----------|---------|--------------|--------------|\n', + ); + for (const result of results) { + const passed = result.testsPassed !== undefined ? result.testsPassed : '-'; + const failed = result.testsFailed !== undefined ? result.testsFailed : '-'; + process.stderr.write( + `| ${result.iteration.toString().padEnd(9)} | ${result.status.padEnd(7)} | ${passed.toString().padEnd(12)} | ${failed.toString().padEnd(12)} |\n`, + ); + } + process.stderr.write('---------------------------------\n\n'); +} + +import fs from 'node:fs'; +import path from 'node:path'; + +// ... (existing imports) + +export async function runRalphWiggum({ + config, + settings, + input, + prompt_id, + resumedSessionData, + completionPromise, + maxIterations, + memoryFile, +}: RunNonInteractiveParams & { + completionPromise?: string; + maxIterations?: number; + memoryFile?: string; +}): Promise { + const effectiveMaxIterations = maxIterations ?? 10; + let iterations = 0; + let currentResumedSessionData = resumedSessionData; + const results: IterationResult[] = []; + const effectiveMemoryFile = memoryFile || 'memories.md'; + const memoriesPath = path.join(process.cwd(), effectiveMemoryFile); + + if (!fs.existsSync(memoriesPath)) { + fs.writeFileSync( + memoriesPath, + `# Ralph Wiggum Memories\n\nTask: ${input}\n\nUse this file (${effectiveMemoryFile}) to store notes on what worked and what didn't work across iterations. The agent will read this at the start of each run.\n\n`, + ); + } + + process.stderr.write( + `[Ralph Wiggum] Starting loop. Max iterations: ${effectiveMaxIterations}\n`, + ); + + while (iterations < effectiveMaxIterations) { + iterations++; + process.stderr.write( + `[Ralph Wiggum] Iteration ${iterations}/${effectiveMaxIterations}\n`, + ); + + let currentInput = input; + try { + if (fs.existsSync(memoriesPath)) { + const memories = fs.readFileSync(memoriesPath, 'utf-8'); + if (memories.trim()) { + currentInput = `Context from previous iterations (${effectiveMemoryFile}):\n${memories}\n\nTask:\n${input}`; + process.stderr.write( + `[Ralph Wiggum] Loaded context from ${effectiveMemoryFile}\n`, + ); + } + } + } catch (error) { + process.stderr.write( + `[Ralph Wiggum] Failed to read ${effectiveMemoryFile}: ${error}\n`, + ); + } + + const output = await runNonInteractive({ + config, + settings, + input: currentInput, + prompt_id, + resumedSessionData: currentResumedSessionData, + }); + + const stats = extractTestStats(output); + const success = + completionPromise && output.includes(completionPromise) ? true : false; + + results.push({ + iteration: iterations, + status: success ? 'Success' : 'Failed', + testsPassed: stats.passed, + testsFailed: stats.failed, + testsTotal: stats.total, + }); + + if (success) { + process.stderr.write( + `[Ralph Wiggum] Completion promise "${completionPromise}" met. Exiting.\n`, + ); + printSummary(results); + return; + } + + // Clear resumedSessionData so we don't try to resume partially through + currentResumedSessionData = undefined; + } + process.stderr.write( + `[Ralph Wiggum] Max iterations reached without meeting completion promise.\n`, + ); + printSummary(results); +} + export async function runNonInteractive({ config, settings, input, prompt_id, resumedSessionData, -}: RunNonInteractiveParams): Promise { +}: RunNonInteractiveParams): Promise { return promptIdContext.run(prompt_id, async () => { const consolePatcher = new ConsolePatcher({ stderr: true, @@ -181,6 +355,9 @@ export async function runNonInteractive({ } }; + // Store accumulated response text to return + let fullResponseText = ''; + let errorToHandle: unknown | undefined; try { consolePatcher.patch(); @@ -316,6 +493,13 @@ export async function runNonInteractive({ const isRaw = config.getRawOutput() || config.getAcceptRawOutputRisk(); const output = isRaw ? event.value : stripAnsi(event.value); + + // Accumulate full response + if (event.value) { + fullResponseText += event.value; + responseText += output; + } + if (streamFormatter) { streamFormatter.emitEvent({ type: JsonStreamEventType.MESSAGE, @@ -325,7 +509,7 @@ export async function runNonInteractive({ delta: true, }); } else if (config.getOutputFormat() === OutputFormat.JSON) { - responseText += output; + // responseText is already updated } else { if (event.value) { textOutput.write(output); @@ -381,7 +565,7 @@ export async function runNonInteractive({ ), }); } - return; + return fullResponseText; } else if (event.type === GeminiEventType.AgentExecutionBlocked) { const blockMessage = `Agent execution blocked: ${event.value.systemMessage?.trim() || event.value.reason}`; if (config.getOutputFormat() === OutputFormat.TEXT) { @@ -488,7 +672,7 @@ export async function runNonInteractive({ } else { textOutput.ensureTrailingNewline(); // Ensure a final newline } - return; + return fullResponseText; } currentMessages = [{ role: 'user', parts: toolResponseParts }]; @@ -512,7 +696,7 @@ export async function runNonInteractive({ } else { textOutput.ensureTrailingNewline(); // Ensure a final newline } - return; + return fullResponseText; } } } catch (error) { @@ -528,5 +712,6 @@ export async function runNonInteractive({ if (errorToHandle) { handleError(errorToHandle, config); } + return fullResponseText; }); }