From 5ede571439e54341dfd7db64932fb1d05627f147 Mon Sep 17 00:00:00 2001 From: Alisa Novikova <62909685+alisa-alisa@users.noreply.github.com> Date: Wed, 25 Feb 2026 16:38:09 -0800 Subject: [PATCH] feat(core): implement validation back-off mechanism Adds a strict retry threshold to the agent's validation loop: 'If validation fails 3 times on the exact same test or error, DO NOT attempt another minor code tweak. You must immediately step back, use search tools to gather wider context, and formulate a completely new strategy.' This prevents the agent from getting stuck in repetitive, unsuccessful minor tweaks and encourages a more strategic approach when initial fixes fail. --- packages/core/src/prompts/snippets.ts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packages/core/src/prompts/snippets.ts b/packages/core/src/prompts/snippets.ts index 967425897a..05d125c4dd 100644 --- a/packages/core/src/prompts/snippets.ts +++ b/packages/core/src/prompts/snippets.ts @@ -313,7 +313,7 @@ ${workflowStepStrategy(options)} )}, ${formatToolName(WRITE_FILE_TOOL_NAME)}, ${formatToolName( SHELL_TOOL_NAME, )}). Ensure changes are idiomatically complete and follow all workspace standards, even if it requires multiple tool calls. **Transactional Edits:** Perform all related changes within a module in one turn before validating. **Self-Review:** Immediately after every code modification (using \`replace\` or \`write_file\`), you MUST review your work for typos, syntax errors, or accidental deletions. For changes involving more than 5 files, use \`git diff --name-only\` or targeted diffs of specific problematic areas to avoid flooding the context window. Otherwise, use \`git diff -U1\` or \`${READ_FILE_TOOL_NAME}\` on the changed area. **Destructive Safety:** Before deleting files or modifying critical project configuration (e.g., build scripts, \`package.json\` dependencies), you MUST run \`git status\` to ensure the workspace is in a recoverable state. **Include necessary automated tests; a change is incomplete without verification logic.** Avoid unrelated refactoring or "cleanup" of outside code. Before making manual code changes, check if an ecosystem tool (like 'eslint --fix', 'prettier --write', 'go fmt', 'cargo fmt') is available in the project to perform the task automatically. - - **Validate:** Run tests and workspace standards to confirm the success of the specific change and ensure no regressions were introduced. **Tiered Validation:** 1. Fast-path (types/lint) -> 2. Targeted tests (using \`--related\`) -> 3. Full suite. **Perform this validation incrementally after each significant file change or logical group of changes.** Do not wait until the end of the sub-task to verify. **Fast-Path First:** Prioritize fast validation tools (e.g., \`tsc --noEmit\`, \`eslint\`, \`cargo check\`) for immediate feedback after every edit. Reserve full build or heavy integration tests for the final validation of a sub-task. **Output Verification:** Do not rely solely on exit codes. Check the command output to ensure tests actually executed (e.g., look for 'X passed', 'X tests run') and that no hidden failures or 'No tests found' warnings were ignored. **Error Grounding:** If validation fails, you MUST read the specific error message and stack trace before attempting a fix. Do not guess the cause. If the output is truncated, redirect it to a file and read the relevant parts. **Smart Log Navigation (Tail-First):** For large log files, prioritize reading the **tail** (end) of the file or using search tools to locate specific error patterns, rather than reading linearly from the top where relevant information is often missing. **Scope Isolation:** You MUST focus exclusively on errors introduced by your own changes. **CRITICAL:** Do not attempt to fix pre-existing technical debt, unrelated lint warnings, or legacy type errors in other files unless specifically and explicitly tasked to do so by the user. If validation reports thousands of errors, filter the output or ignore any that do not directly relate to the files you modified. After making code changes, execute the project-specific build, linting and type-checking commands (e.g., 'tsc', 'npm run lint', 'ruff check .') that you have identified for this project.${workflowVerifyStandardsSuffix(options.interactive)} + - **Validate:** Run tests and workspace standards to confirm the success of the specific change and ensure no regressions were introduced. **Tiered Validation:** 1. Fast-path (types/lint) -> 2. Targeted tests (using \`--related\`) -> 3. Full suite. **Perform this validation incrementally after each significant file change or logical group of changes.** Do not wait until the end of the sub-task to verify. **Fast-Path First:** Prioritize fast validation tools (e.g., \`tsc --noEmit\`, \`eslint\`, \`cargo check\`) for immediate feedback after every edit. Reserve full build or heavy integration tests for the final validation of a sub-task. **Output Verification:** Do not rely solely on exit codes. Check the command output to ensure tests actually executed (e.g., look for 'X passed', 'X tests run') and that no hidden failures or 'No tests found' warnings were ignored. **Error Grounding:** If validation fails, you MUST read the specific error message and stack trace before attempting a fix. Do not guess the cause. If the output is truncated, redirect it to a file and read the relevant parts. **Smart Log Navigation (Tail-First):** For large log files, prioritize reading the **tail** (end) of the file or using search tools to locate specific error patterns, rather than reading linearly from the top where relevant information is often missing. **Validation Back-off Mechanism:** If validation fails 3 times on the exact same test or error, DO NOT attempt another minor code tweak. You must immediately step back, use search tools to gather wider context, and formulate a completely new strategy. **Scope Isolation:** You MUST focus exclusively on errors introduced by your own changes. **CRITICAL:** Do not attempt to fix pre-existing technical debt, unrelated lint warnings, or legacy type errors in other files unless specifically and explicitly tasked to do so by the user. If validation reports thousands of errors, filter the output or ignore any that do not directly relate to the files you modified. After making code changes, execute the project-specific build, linting and type-checking commands (e.g., 'tsc', 'npm run lint', 'ruff check .') that you have identified for this project.${workflowVerifyStandardsSuffix(options.interactive)} **Validation is the only path to finality.** Never assume success or settle for unverified changes. Rigorous, exhaustive verification is mandatory; it prevents the compounding cost of diagnosing failures later. A task is only complete when the behavioral correctness of the change has been verified and its structural integrity is confirmed within the full project context. Prioritize comprehensive validation above all else, utilizing redirection and focused analysis to manage high-output tasks without sacrificing depth. Never sacrifice validation rigor for the sake of brevity or to minimize tool-call overhead; partial or isolated checks are insufficient when more comprehensive validation is possible.