Commit Graph

56 Commits

Author SHA1 Message Date
Alisa Novikova 901e94cba8 chore(core): simplify agent mandates to improve efficiency and reduce turn count 2026-03-03 01:00:26 -08:00
Alisa Novikova 0b06a9ae04 feat(core): implement circular behavior detection mandate
Adds a self-awareness mandate to the agent's planning phase:
'Before attempting a fix for a validation error, review your recent tool calls. If you are repeatedly applying similar regex replacements or edits to the same block of code without the validation error changing, you are in a loop. Stop, revert your changes to a known good state, and rethink your approach.'

This helps the agent identify and break out of unproductive loops during debugging and implementation.
2026-03-03 00:50:59 -08:00
Alisa Novikova 5ede571439 feat(core): implement validation back-off mechanism
Adds a strict retry threshold to the agent's validation loop:
'If validation fails 3 times on the exact same test or error, DO NOT attempt another minor code tweak. You must immediately step back, use search tools to gather wider context, and formulate a completely new strategy.'

This prevents the agent from getting stuck in repetitive, unsuccessful minor tweaks and encourages a more strategic approach when initial fixes fail.
2026-03-03 00:50:59 -08:00
Alisa Novikova 1df5178800 feat(core): prioritize existing test infrastructure and timebox test setup
Introduces three critical mandates to the agent's testing and validation workflow:
1. **Prioritize Existing Infrastructure:** Strictly prefer running the project's existing test suite over writing custom reproduction scripts to avoid environment/import difficulties.
2. **Timebox Test Setup:** Abandon custom reproduction scripts if they fail to set up within 3-5 turns due to environment or import errors, falling back to static analysis and built-in tests.
3. **Mandate Exhaustive Validation:** Explicitly requires running relevant existing project tests to prevent regressions, ensuring a passing custom reproduction script is treated as a necessary but not sufficient condition for completion.

These changes prevent 'Early Exhaustion' by reducing the complexity of standalone test setup in frameworks like Django.
2026-03-03 00:50:59 -08:00
Alisa Novikova c3215aed93 feat(core): implement self-validation workflow with prompt-verbatim restoration
This commit upgrades the agent with a robust self-validation workflow while
ensuring 100% semantic and verbatim coverage of the original system prompt.
By moving to an additive model, we preserve the original reasoning anchors
(lead-ins, heuristics, and formatting) while injecting critical autonomous
engineering mandates.

Self-Validation Workflow Injections:
- Research Phase: Parallel Discovery (combining manifests/logic) and High-Signal Grep.
- Bug Fixing: Negative Verification (confirming repro failure) and Coverage Expansion.
- Implementation: Transactional Edits (logical batching of module changes).
- Validation Loop: Tiered Validation (Fixers -> Fast-Path -> Related Tests) and Smart Log Navigation.

Technical Verification:
- Verbatim restoration verified against 66 core tests and 14 snapshots.
- New behavioral eval suite passed (evals/self_validation_workflow.eval.ts).
- Full 'npm run preflight' validation successful.
2026-03-03 00:50:59 -08:00
Alisa Novikova 61b35ff745 feat(core): comprehensive agent self-validation and engineering mandates
Major upgrade to the agent's self-validation, safety, and project integrity
capabilities through five iterations of system prompt enhancements:

Workflow & Quality Mandates:
1. Incremental Validation: Mandates building, linting, and testing after
   every significant file change to maintain a "green" state.
2. Mandatory Reproduction: Requires creating a failing test case to confirm
   a bug before fixing, and explicitly verifying the failure (Negative Verification).
3. Test Persistence & Locality: Requires integrating repro cases into the
   permanent test suite, preferably by amending existing related test files.
4. Script Discovery: Mandates identifying project-specific validation
   commands from configuration files (package.json, Makefile, etc.).
5. Self-Review: Mandates running `git diff` after every edit, using
   `--name-only` for large changes to preserve context window tokens.
6. Fast-Path Validation: Prioritizes lightweight checks (e.g., `tsc --noEmit`)
   for frequent feedback, reserving heavy builds for final verification.
7. Output Verification: Requires checking command output (not just exit codes)
   to prevent false-positives from empty test runs or hidden warnings.

Semantic Integrity & Dependency Safety:
8. Global Usage Discovery: Mandates searching the entire workspace for all
   usages (via `grep_search`) before modifying exported symbols or APIs.
9. Dependency Integrity: Requires verifying that new imports are explicitly
   declared in the project's dependency manifest (e.g., package.json).
10. Configuration Sync: Mandates updating build/environment configs
    (tsconfig, Dockerfile, etc.) to support new file types or entry points.
11. Documentation Sync: Requires searching for and updating documentation
    references when public APIs or CLI interfaces change.
12. Anti-Silencing Mandate: Prohibits using `any`, `@ts-ignore`, or lint
    suppressions to resolve validation errors.

Diagnostics, Safety & Runtime Verification:
13. Error Grounding: Mandates reading full error logs and stack traces upon
    failure. Includes Smart Log Navigation to prioritize the tail of large files.
14. Scope Isolation: Instructs the agent to focus only on errors introduced
    by its changes and ignore unrelated legacy technical debt.
15. Destructive Safety: Mandates a `git status` check before deleting files
    or modifying critical project configurations.
16. Non-Blocking Smoke Tests: Requires briefly running applications to
    verify boot stability, using background/timeout strategies for servers.

Includes 15 new behavioral evaluations verifying these mandates and updated
snapshots in packages/core/src/core/prompts.test.ts.
2026-03-03 00:50:59 -08:00
Sandy Tao a153ff587b refactor(core): Extract tool parameter names as constants (#20460) 2026-02-28 21:27:54 +00:00
Adib234 23905bcd77 fix(plan): prevent agent from using ask_user for shell command confirmation (#20504) 2026-02-27 17:51:47 +00:00
Adib234 25ade7bcb7 feat(plan): update planning workflow to encourage multi-select with descriptions of options (#20491) 2026-02-27 15:42:37 +00:00
Jerop Kipruto aa98cafca7 feat(plan): adapt planning workflow based on complexity of task (#20465)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-02-26 22:58:19 +00:00
joshualitt 611d934829 feat(core): Enable generalist agent (#19665) 2026-02-26 16:38:49 +00:00
Sandy Tao 39938000a9 feat(core): rename grep_search include parameter to include_pattern (#20328) 2026-02-26 04:16:21 +00:00
Jerop Kipruto baccda969d feat(plan): summarize work after executing a plan (#19432) 2026-02-24 17:35:32 +00:00
Jerop Kipruto 182c858e67 feat(policy): centralize plan mode tool visibility in policy engine (#20178)
Co-authored-by: Mahima Shanware <mshanware@google.com>
2026-02-24 17:17:43 +00:00
Adam Weidman 547f5d45f5 feat(core): migrate read_file to 1-based start_line/end_line parameters (#19526) 2026-02-20 22:59:18 +00:00
matt korwel 6cfd29ef9b feat(plan): enforce read-only constraints in Plan Mode (#19433)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Jerop Kipruto <jerop@google.com>
2026-02-20 19:33:04 +00:00
joshualitt 87f5dd15d6 feat(core): experimental in-progress steering hints (2 of 2) (#19307) 2026-02-18 22:05:50 +00:00
Jerop Kipruto 8f6a711a3a fix(core): clarify plan mode constraints and exit mechanism (#19438) 2026-02-18 20:09:59 +00:00
Christian Gunderman ce84b3cb5f Use ranged reads and limited searches and fuzzy editing improvements (#19240) 2026-02-17 23:54:08 +00:00
Adib234 14aabbbe8b feat(plan): support project exploration without planning when in plan mode (#18992) 2026-02-17 16:52:59 +00:00
N. Taylor Mullen 39d36108d7 feat(core): support custom reasoning models by default (#19227) 2026-02-16 20:47:58 +00:00
N. Taylor Mullen 6eec9f3350 fix(core): Encourage non-interactive flags for scaffolding commands (#18804) 2026-02-15 20:26:59 +00:00
N. Taylor Mullen 27a1bae03b feat(core): refine Plan Mode system prompt for agentic execution (#18799) 2026-02-12 17:37:47 +00:00
Christian Gunderman 6c1773170e More grep prompt tweaks (#18846) 2026-02-11 21:55:27 +00:00
Christian Gunderman 2a08456ed0 Update prompt and grep tool definition to limit context size (#18780) 2026-02-11 19:20:51 +00:00
Jerop Kipruto 49d55d972e feat(core): formalize 5-phase sequential planning workflow (#18759) 2026-02-11 03:02:20 +00:00
N. Taylor Mullen cb4e1e684d chore(core): update activate_skill prompt verbiage to be more direct (#18605) 2026-02-10 22:17:42 +00:00
Christian Gunderman 8b762111a8 Fix issue where Gemini CLI creates tests in a new file (#18409) 2026-02-10 20:53:29 +00:00
N. Taylor Mullen 55571de066 feat: redact disabled tools from system prompt (#13597) (#18613) 2026-02-10 19:00:36 +00:00
Jack Wotherspoon 740f0e4c3d fix: allow ask_user tool in yolo mode (#18541) 2026-02-10 18:56:51 +00:00
N. Taylor Mullen 41bbe6ca0a fix(core): standardize tool formatting in system prompts (#18615) 2026-02-10 15:30:08 +00:00
N. Taylor Mullen 2ae5e1ae20 feat(core): optimize sub-agents system prompt intro (#18608) 2026-02-10 08:25:42 +00:00
N. Taylor Mullen 92a5f725a1 refactor(core): refine Security & System Integrity section in system prompt (#18601) 2026-02-10 04:32:36 +00:00
joshualitt 89d4556c45 feat(core): Render memory hierarchically in context. (#18350) 2026-02-10 02:01:59 +00:00
N. Taylor Mullen cc2798018b feat: handle multiple dynamic context filenames in system prompt (#18598) 2026-02-10 00:37:08 +00:00
N. Taylor Mullen aebc107d2c feat: move shell efficiency guidelines to tool description (#18614) 2026-02-09 18:51:13 +00:00
N. Taylor Mullen d45a45d565 chore: strengthen validation guidance in system prompt (#18544) 2026-02-09 05:32:46 +00:00
N. Taylor Mullen cb73fbf384 feat(core): transition sub-agents to XML format and improve definitions (#18555) 2026-02-09 02:25:04 +00:00
N. Taylor Mullen 97a4e62dfa feat(core): conditionally include ctrl+f prompt based on interactive shell setting (#18561) 2026-02-09 00:23:22 +00:00
N. Taylor Mullen 92012365ca fix(core): correct escaped interpolation in system prompt (#18557) 2026-02-08 21:08:17 +00:00
N. Taylor Mullen 86bd7dbd4f chore: remove feedback instruction from system prompt (#18560) 2026-02-08 02:22:50 +00:00
N. Taylor Mullen eee95c509d refactor(core): remove memory tool instructions from Gemini 3 prompt (#18559) 2026-02-08 01:57:53 +00:00
Jerop Kipruto be6723ebcc chore: remove redundant planning prompt from final shell (#18528) 2026-02-07 19:45:09 +00:00
N. Taylor Mullen 9178b31629 feat(core): overhaul system prompt for rigor, integrity, and intent alignment (#17263) 2026-02-07 03:13:07 +00:00
Jerop Kipruto dc09b4988d feat(plan): integrate planning artifacts and tools into primary workflows (#18375) 2026-02-05 20:07:33 +00:00
Jerop Kipruto 6860556afe feat(plan): add guidance on iterating on approved plans vs creating new plans (#18346) 2026-02-05 19:11:45 +00:00
Jerop Kipruto 4a6e3eb646 feat(plan): support replace tool in plan mode to edit plans (#18379) 2026-02-05 17:51:35 +00:00
Tommaso Sciortino e4c80e6382 fix: Windows Specific Agent Quality & System Prompt (#18351) 2026-02-05 17:50:12 +00:00
Christian Gunderman a0b6602d09 Fix issue where agent gets stuck at interactive commands. (#18272) 2026-02-04 07:02:09 +00:00
Jerop Kipruto d866e7e6e7 feat(plan): unify workflow location in system prompt to optimize caching (#18258) 2026-02-04 03:11:28 +00:00