Commit Graph

4979 Commits

Author SHA1 Message Date
Alisa Novikova
5ede571439 feat(core): implement validation back-off mechanism
Adds a strict retry threshold to the agent's validation loop:
'If validation fails 3 times on the exact same test or error, DO NOT attempt another minor code tweak. You must immediately step back, use search tools to gather wider context, and formulate a completely new strategy.'

This prevents the agent from getting stuck in repetitive, unsuccessful minor tweaks and encourages a more strategic approach when initial fixes fail.
2026-03-03 00:50:59 -08:00
Alisa Novikova
1df5178800 feat(core): prioritize existing test infrastructure and timebox test setup
Introduces three critical mandates to the agent's testing and validation workflow:
1. **Prioritize Existing Infrastructure:** Strictly prefer running the project's existing test suite over writing custom reproduction scripts to avoid environment/import difficulties.
2. **Timebox Test Setup:** Abandon custom reproduction scripts if they fail to set up within 3-5 turns due to environment or import errors, falling back to static analysis and built-in tests.
3. **Mandate Exhaustive Validation:** Explicitly requires running relevant existing project tests to prevent regressions, ensuring a passing custom reproduction script is treated as a necessary but not sufficient condition for completion.

These changes prevent 'Early Exhaustion' by reducing the complexity of standalone test setup in frameworks like Django.
2026-03-03 00:50:59 -08:00
Alisa Novikova
616062bdf6 feat(core): implement self-validation workflow with exact verbatim restoration
This commit upgrades the agent with a robust self-validation workflow while
ensuring 100% verbatim coverage of the original system prompt text.
By moving to an additive model, we preserve all original reasoning anchors,
instructional lead-ins, and senior engineering heuristics while injecting
critical autonomous mandates.

Verbatim Restoration:
- All 'Context Efficiency' guidelines, lead-ins, and scenarios (Search/Understand/Navigate).
- All 'Engineering Standards' regarding style mimicry, abstractions, and debt isolation.
- Full 'Primary Workflows' sequence and formatting.

Self-Validation Workflow Injections:
- Research Phase: Parallel Discovery (manifests + logic) and High-Signal Grep.
- Bug Fixing: Negative Verification (confirming repro failure) and Coverage Expansion.
- Implementation: Transactional Edits (logical batching of module changes).
- Validation Loop: Tiered Validation (Fixers -> Fast-Path -> Related Tests) and Smart Log Navigation (Tail-First).

Technical Verification:
- Verified against 67 core prompt tests and 14 snapshots.
- New behavioral eval suite passed (evals/self_validation_workflow.eval.ts).
- Full 'npm run preflight' successful.
2026-03-03 00:50:59 -08:00
Alisa Novikova
c3215aed93 feat(core): implement self-validation workflow with prompt-verbatim restoration
This commit upgrades the agent with a robust self-validation workflow while
ensuring 100% semantic and verbatim coverage of the original system prompt.
By moving to an additive model, we preserve the original reasoning anchors
(lead-ins, heuristics, and formatting) while injecting critical autonomous
engineering mandates.

Self-Validation Workflow Injections:
- Research Phase: Parallel Discovery (combining manifests/logic) and High-Signal Grep.
- Bug Fixing: Negative Verification (confirming repro failure) and Coverage Expansion.
- Implementation: Transactional Edits (logical batching of module changes).
- Validation Loop: Tiered Validation (Fixers -> Fast-Path -> Related Tests) and Smart Log Navigation.

Technical Verification:
- Verbatim restoration verified against 66 core tests and 14 snapshots.
- New behavioral eval suite passed (evals/self_validation_workflow.eval.ts).
- Full 'npm run preflight' validation successful.
2026-03-03 00:50:59 -08:00
Alisa Novikova
61b35ff745 feat(core): comprehensive agent self-validation and engineering mandates
Major upgrade to the agent's self-validation, safety, and project integrity
capabilities through five iterations of system prompt enhancements:

Workflow & Quality Mandates:
1. Incremental Validation: Mandates building, linting, and testing after
   every significant file change to maintain a "green" state.
2. Mandatory Reproduction: Requires creating a failing test case to confirm
   a bug before fixing, and explicitly verifying the failure (Negative Verification).
3. Test Persistence & Locality: Requires integrating repro cases into the
   permanent test suite, preferably by amending existing related test files.
4. Script Discovery: Mandates identifying project-specific validation
   commands from configuration files (package.json, Makefile, etc.).
5. Self-Review: Mandates running `git diff` after every edit, using
   `--name-only` for large changes to preserve context window tokens.
6. Fast-Path Validation: Prioritizes lightweight checks (e.g., `tsc --noEmit`)
   for frequent feedback, reserving heavy builds for final verification.
7. Output Verification: Requires checking command output (not just exit codes)
   to prevent false-positives from empty test runs or hidden warnings.

Semantic Integrity & Dependency Safety:
8. Global Usage Discovery: Mandates searching the entire workspace for all
   usages (via `grep_search`) before modifying exported symbols or APIs.
9. Dependency Integrity: Requires verifying that new imports are explicitly
   declared in the project's dependency manifest (e.g., package.json).
10. Configuration Sync: Mandates updating build/environment configs
    (tsconfig, Dockerfile, etc.) to support new file types or entry points.
11. Documentation Sync: Requires searching for and updating documentation
    references when public APIs or CLI interfaces change.
12. Anti-Silencing Mandate: Prohibits using `any`, `@ts-ignore`, or lint
    suppressions to resolve validation errors.

Diagnostics, Safety & Runtime Verification:
13. Error Grounding: Mandates reading full error logs and stack traces upon
    failure. Includes Smart Log Navigation to prioritize the tail of large files.
14. Scope Isolation: Instructs the agent to focus only on errors introduced
    by its changes and ignore unrelated legacy technical debt.
15. Destructive Safety: Mandates a `git status` check before deleting files
    or modifying critical project configurations.
16. Non-Blocking Smoke Tests: Requires briefly running applications to
    verify boot stability, using background/timeout strategies for servers.

Includes 15 new behavioral evaluations verifying these mandates and updated
snapshots in packages/core/src/core/prompts.test.ts.
2026-03-03 00:50:59 -08:00
Bryan Morgan
208291f391 fix(ci): handle empty APP_ID in stale PR closer (#20919) 2026-03-03 00:14:36 -05:00
Jacob Richman
8303edbb54 Code review fixes as a pr (#20612) 2026-03-03 04:32:50 +00:00
Aswin Ashok
0d69f9f7fa Build binary (#18933)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-03-03 01:02:19 +00:00
Christian Gunderman
46231a1755 ci(evals): only run evals in CI if prompts or tools changed (#20898)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-03 00:29:31 +00:00
Sandy Tao
2e7722d6a3 fix(core): restrict "System: Please continue" invalid stream retry to Gemini 2 models (#20897) 2026-03-02 23:21:13 +00:00
Yuna Seol
69e15a50d1 fix(core): skip telemetry logging for AbortError exceptions (#19477)
Co-authored-by: Yuna Seol <yunaseol@google.com>
2026-03-02 23:14:31 +00:00
Christian Gunderman
25f59a0099 Add some dos and don'ts to behavioral evals README. (#20629) 2026-03-02 23:14:00 +00:00
Adib234
01927a36d1 feat(plan): support annotating plans with feedback for iteration (#20876) 2026-03-02 23:03:59 +00:00
Shreya Keshive
06ddfa5c4c feat(admin): enable 30 day default retention for chat history & remove warning (#20853) 2026-03-02 22:44:49 +00:00
Christian Gunderman
3f7ef816f1 fix(core): increase default headers timeout to 5 minutes (#20890) 2026-03-02 22:36:58 +00:00
Jerop Kipruto
d05ba11a31 refactor(core): replace manual syncPlanModeTools with declarative policy rules (#20596) 2026-03-02 22:30:50 +00:00
Hamdanbinhashim
e43b1cff58 docs: fix broken markdown links in main README.md (#20300) 2026-03-02 21:51:52 +00:00
Allen Hutchison
bb6d1a2775 feat(core): add tool name validation in TOML policy files (#19281)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-02 21:47:21 +00:00
Nayana Parameswarappa
dd9ccc9807 Adding MCPOAuthProvider implementing the MCPSDK OAuthClientProvider (#20121) 2026-03-02 21:37:44 +00:00
Pyush Sinha
8133d63ac6 refactor(cli): fully remove React anti patterns, improve type safety and fix UX oversights in SettingsDialog.tsx (#18963)
Co-authored-by: Jacob Richman <jacob314@gmail.com>
2026-03-02 21:30:58 +00:00
Sandy Tao
18d0375a7f feat(core): support authenticated A2A agent card discovery (#20622)
Co-authored-by: Adam Weidman <adamfweidman@google.com>
Co-authored-by: Adam Weidman <65992621+adamfweidman@users.noreply.github.com>
2026-03-02 21:29:31 +00:00
Keith Guerin
31ca57ec94 feat: redesign header to be compact with ASCII icon (#18713)
Co-authored-by: Jacob Richman <jacob314@gmail.com>
2026-03-02 21:12:17 +00:00
Abhi
b7a8f0d1f9 fix(core): ensure subagents use qualified MCP tool names (#20801) 2026-03-02 21:12:13 +00:00
Abdul Tawab
1502e5cbc3 style(cli) : Dialog pattern for /hooks Command (#17930) 2026-03-02 21:12:05 +00:00
Christian Gunderman
7ca3a33f8b Subagent activity UX. (#17570) 2026-03-02 21:04:31 +00:00
Sandy Tao
ce5a2d0760 feat(core): truncate large MCP tool output (#19365) 2026-03-02 21:01:49 +00:00
Sam Roberts
aa321b3d8c Update CODEOWNERS for README.md reviewers (#20860) 2026-03-02 20:54:05 +00:00
David Pierce
3a7a6e1540 Add install as an option when extension is selected. (#20358) 2026-03-02 20:41:16 +00:00
Tommaso Sciortino
66530e44c8 document node limitation for shift+tab (#20877) 2026-03-02 20:31:52 +00:00
Christian Gunderman
b034dcd412 Do not block CI on evals (#20870) 2026-03-02 20:31:02 +00:00
Aishanee Shah
659301ff83 feat(core): centralize read_file limits and update gemini-3 description (#20619) 2026-03-02 20:11:58 +00:00
Sandy Tao
446a4316c4 feat(core): implement HTTP authentication support for A2A remote agents (#20510)
Co-authored-by: Adam Weidman <adamfweidman@google.com>
2026-03-02 19:59:48 +00:00
Tommaso Sciortino
48412a068e Add /unassign support (#20864)
Co-authored-by: Jacob Richman <jacob314@gmail.com>
2026-03-02 19:54:26 +00:00
Adib234
2e1efaebe4 fix(plan): deflake plan mode integration tests (#20477) 2026-03-02 19:51:44 +00:00
Sandy Tao
7c9fceba7f fix(core): reduce LLM-based loop detection false positives (#20701) 2026-03-02 19:08:15 +00:00
Adam Weidman
740efa2ac2 Merge User and Agent Card Descriptions #20849 (#20850) 2026-03-02 17:59:29 +00:00
Abhi
703759cfae fix(cli): allow sub-agent confirmation requests in UI while preventing background flicker (#20722) 2026-03-01 02:39:25 +00:00
Sehoon Shon
0063581e47 feat(skills): add github-issue-creator skill (#20709) 2026-02-28 23:22:22 +00:00
Sehoon Shon
6757d4b5c5 fix(cli): resolve autoThemeSwitching when background hasn't changed but theme mismatches (#20706) 2026-02-28 23:22:10 +00:00
Sandy Tao
a153ff587b refactor(core): Extract tool parameter names as constants (#20460) 2026-02-28 21:27:54 +00:00
N. Taylor Mullen
cd3a8c3f07 fix(cli): reset themeManager between tests to ensure isolation (#20598) 2026-02-28 19:45:31 +00:00
kartik
b2214a6676 fix: acp/zed race condition between MCP initialisation and prompt (#20205)
Signed-off-by: Kartik Angiras <angiraskartik@gmail.com>
2026-02-28 17:33:08 +00:00
gemini-cli-robot
6c65a2d813 Changelog for v0.32.0-preview.0 (#20627)
Co-authored-by: gemini-cli-robot <224641728+gemini-cli-robot@users.noreply.github.com>
2026-02-28 16:03:50 +00:00
Jagjeevan Kashid
fae0639ba2 fix: use full paths for ACP diff payloads (#19539)
Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>
2026-02-28 15:54:44 +00:00
gemini-cli-robot
76f70d65ff Changelog for v0.31.0 (#20634)
Co-authored-by: gemini-cli-robot <224641728+gemini-cli-robot@users.noreply.github.com>
2026-02-28 03:45:07 +00:00
gemini-cli-robot
fb6ff847dd chore/release: bump version to 0.33.0-nightly.20260228.1ca5c05d0 (#20644) 2026-02-28 02:13:48 +00:00
Gal Zahavi
1ca5c05d0d fix(github): use robot PAT for automated PRs to pass CLA check (#20641) 2026-02-28 01:13:58 +00:00
Gal Zahavi
0c6c9c6a62 chore(release): bump version to 0.33.0-nightly.20260227.ba149afa0 (#20637) 2026-02-28 00:51:22 +00:00
Sehoon Shon
a1367e9cdd fix(core): parse raw ASCII buffer strings in Gaxios errors (#20626) 2026-02-27 23:57:32 +00:00
Tommaso Sciortino
c89d4f9c6c docs: add Windows PowerShell equivalents for environments and scripting (#20333) 2026-02-27 23:41:47 +00:00