Commit Graph

4982 Commits

Author SHA1 Message Date
Alisa Novikova
901e94cba8 chore(core): simplify agent mandates to improve efficiency and reduce turn count 2026-03-03 01:00:26 -08:00
Alisa Novikova
adc62a76e0 test(core): update snapshots for new agent behavior mandates
Updates core prompt snapshots to include:
- Priority for existing test infrastructure
- Timeboxed test setup (3-5 turn limit)
- Mandate for exhaustive validation against regressions
- Validation back-off mechanism (retry threshold)
- Detection of circular/looping behavior
2026-03-03 00:50:59 -08:00
Alisa Novikova
0b06a9ae04 feat(core): implement circular behavior detection mandate
Adds a self-awareness mandate to the agent's planning phase:
'Before attempting a fix for a validation error, review your recent tool calls. If you are repeatedly applying similar regex replacements or edits to the same block of code without the validation error changing, you are in a loop. Stop, revert your changes to a known good state, and rethink your approach.'

This helps the agent identify and break out of unproductive loops during debugging and implementation.
2026-03-03 00:50:59 -08:00
Alisa Novikova
5ede571439 feat(core): implement validation back-off mechanism
Adds a strict retry threshold to the agent's validation loop:
'If validation fails 3 times on the exact same test or error, DO NOT attempt another minor code tweak. You must immediately step back, use search tools to gather wider context, and formulate a completely new strategy.'

This prevents the agent from getting stuck in repetitive, unsuccessful minor tweaks and encourages a more strategic approach when initial fixes fail.
2026-03-03 00:50:59 -08:00
Alisa Novikova
1df5178800 feat(core): prioritize existing test infrastructure and timebox test setup
Introduces three critical mandates to the agent's testing and validation workflow:
1. **Prioritize Existing Infrastructure:** Strictly prefer running the project's existing test suite over writing custom reproduction scripts to avoid environment/import difficulties.
2. **Timebox Test Setup:** Abandon custom reproduction scripts if they fail to set up within 3-5 turns due to environment or import errors, falling back to static analysis and built-in tests.
3. **Mandate Exhaustive Validation:** Explicitly requires running relevant existing project tests to prevent regressions, ensuring a passing custom reproduction script is treated as a necessary but not sufficient condition for completion.

These changes prevent 'Early Exhaustion' by reducing the complexity of standalone test setup in frameworks like Django.
2026-03-03 00:50:59 -08:00
Alisa Novikova
616062bdf6 feat(core): implement self-validation workflow with exact verbatim restoration
This commit upgrades the agent with a robust self-validation workflow while
ensuring 100% verbatim coverage of the original system prompt text.
By moving to an additive model, we preserve all original reasoning anchors,
instructional lead-ins, and senior engineering heuristics while injecting
critical autonomous mandates.

Verbatim Restoration:
- All 'Context Efficiency' guidelines, lead-ins, and scenarios (Search/Understand/Navigate).
- All 'Engineering Standards' regarding style mimicry, abstractions, and debt isolation.
- Full 'Primary Workflows' sequence and formatting.

Self-Validation Workflow Injections:
- Research Phase: Parallel Discovery (manifests + logic) and High-Signal Grep.
- Bug Fixing: Negative Verification (confirming repro failure) and Coverage Expansion.
- Implementation: Transactional Edits (logical batching of module changes).
- Validation Loop: Tiered Validation (Fixers -> Fast-Path -> Related Tests) and Smart Log Navigation (Tail-First).

Technical Verification:
- Verified against 67 core prompt tests and 14 snapshots.
- New behavioral eval suite passed (evals/self_validation_workflow.eval.ts).
- Full 'npm run preflight' successful.
2026-03-03 00:50:59 -08:00
Alisa Novikova
c3215aed93 feat(core): implement self-validation workflow with prompt-verbatim restoration
This commit upgrades the agent with a robust self-validation workflow while
ensuring 100% semantic and verbatim coverage of the original system prompt.
By moving to an additive model, we preserve the original reasoning anchors
(lead-ins, heuristics, and formatting) while injecting critical autonomous
engineering mandates.

Self-Validation Workflow Injections:
- Research Phase: Parallel Discovery (combining manifests/logic) and High-Signal Grep.
- Bug Fixing: Negative Verification (confirming repro failure) and Coverage Expansion.
- Implementation: Transactional Edits (logical batching of module changes).
- Validation Loop: Tiered Validation (Fixers -> Fast-Path -> Related Tests) and Smart Log Navigation.

Technical Verification:
- Verbatim restoration verified against 66 core tests and 14 snapshots.
- New behavioral eval suite passed (evals/self_validation_workflow.eval.ts).
- Full 'npm run preflight' validation successful.
2026-03-03 00:50:59 -08:00
Alisa Novikova
61b35ff745 feat(core): comprehensive agent self-validation and engineering mandates
Major upgrade to the agent's self-validation, safety, and project integrity
capabilities through five iterations of system prompt enhancements:

Workflow & Quality Mandates:
1. Incremental Validation: Mandates building, linting, and testing after
   every significant file change to maintain a "green" state.
2. Mandatory Reproduction: Requires creating a failing test case to confirm
   a bug before fixing, and explicitly verifying the failure (Negative Verification).
3. Test Persistence & Locality: Requires integrating repro cases into the
   permanent test suite, preferably by amending existing related test files.
4. Script Discovery: Mandates identifying project-specific validation
   commands from configuration files (package.json, Makefile, etc.).
5. Self-Review: Mandates running `git diff` after every edit, using
   `--name-only` for large changes to preserve context window tokens.
6. Fast-Path Validation: Prioritizes lightweight checks (e.g., `tsc --noEmit`)
   for frequent feedback, reserving heavy builds for final verification.
7. Output Verification: Requires checking command output (not just exit codes)
   to prevent false-positives from empty test runs or hidden warnings.

Semantic Integrity & Dependency Safety:
8. Global Usage Discovery: Mandates searching the entire workspace for all
   usages (via `grep_search`) before modifying exported symbols or APIs.
9. Dependency Integrity: Requires verifying that new imports are explicitly
   declared in the project's dependency manifest (e.g., package.json).
10. Configuration Sync: Mandates updating build/environment configs
    (tsconfig, Dockerfile, etc.) to support new file types or entry points.
11. Documentation Sync: Requires searching for and updating documentation
    references when public APIs or CLI interfaces change.
12. Anti-Silencing Mandate: Prohibits using `any`, `@ts-ignore`, or lint
    suppressions to resolve validation errors.

Diagnostics, Safety & Runtime Verification:
13. Error Grounding: Mandates reading full error logs and stack traces upon
    failure. Includes Smart Log Navigation to prioritize the tail of large files.
14. Scope Isolation: Instructs the agent to focus only on errors introduced
    by its changes and ignore unrelated legacy technical debt.
15. Destructive Safety: Mandates a `git status` check before deleting files
    or modifying critical project configurations.
16. Non-Blocking Smoke Tests: Requires briefly running applications to
    verify boot stability, using background/timeout strategies for servers.

Includes 15 new behavioral evaluations verifying these mandates and updated
snapshots in packages/core/src/core/prompts.test.ts.
2026-03-03 00:50:59 -08:00
Bryan Morgan
208291f391 fix(ci): handle empty APP_ID in stale PR closer (#20919) 2026-03-03 00:14:36 -05:00
Jacob Richman
8303edbb54 Code review fixes as a pr (#20612) 2026-03-03 04:32:50 +00:00
Aswin Ashok
0d69f9f7fa Build binary (#18933)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-03-03 01:02:19 +00:00
Christian Gunderman
46231a1755 ci(evals): only run evals in CI if prompts or tools changed (#20898)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-03 00:29:31 +00:00
Sandy Tao
2e7722d6a3 fix(core): restrict "System: Please continue" invalid stream retry to Gemini 2 models (#20897) 2026-03-02 23:21:13 +00:00
Yuna Seol
69e15a50d1 fix(core): skip telemetry logging for AbortError exceptions (#19477)
Co-authored-by: Yuna Seol <yunaseol@google.com>
2026-03-02 23:14:31 +00:00
Christian Gunderman
25f59a0099 Add some dos and don'ts to behavioral evals README. (#20629) 2026-03-02 23:14:00 +00:00
Adib234
01927a36d1 feat(plan): support annotating plans with feedback for iteration (#20876) 2026-03-02 23:03:59 +00:00
Shreya Keshive
06ddfa5c4c feat(admin): enable 30 day default retention for chat history & remove warning (#20853) 2026-03-02 22:44:49 +00:00
Christian Gunderman
3f7ef816f1 fix(core): increase default headers timeout to 5 minutes (#20890) 2026-03-02 22:36:58 +00:00
Jerop Kipruto
d05ba11a31 refactor(core): replace manual syncPlanModeTools with declarative policy rules (#20596) 2026-03-02 22:30:50 +00:00
Hamdanbinhashim
e43b1cff58 docs: fix broken markdown links in main README.md (#20300) 2026-03-02 21:51:52 +00:00
Allen Hutchison
bb6d1a2775 feat(core): add tool name validation in TOML policy files (#19281)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-02 21:47:21 +00:00
Nayana Parameswarappa
dd9ccc9807 Adding MCPOAuthProvider implementing the MCPSDK OAuthClientProvider (#20121) 2026-03-02 21:37:44 +00:00
Pyush Sinha
8133d63ac6 refactor(cli): fully remove React anti patterns, improve type safety and fix UX oversights in SettingsDialog.tsx (#18963)
Co-authored-by: Jacob Richman <jacob314@gmail.com>
2026-03-02 21:30:58 +00:00
Sandy Tao
18d0375a7f feat(core): support authenticated A2A agent card discovery (#20622)
Co-authored-by: Adam Weidman <adamfweidman@google.com>
Co-authored-by: Adam Weidman <65992621+adamfweidman@users.noreply.github.com>
2026-03-02 21:29:31 +00:00
Keith Guerin
31ca57ec94 feat: redesign header to be compact with ASCII icon (#18713)
Co-authored-by: Jacob Richman <jacob314@gmail.com>
2026-03-02 21:12:17 +00:00
Abhi
b7a8f0d1f9 fix(core): ensure subagents use qualified MCP tool names (#20801) 2026-03-02 21:12:13 +00:00
Abdul Tawab
1502e5cbc3 style(cli) : Dialog pattern for /hooks Command (#17930) 2026-03-02 21:12:05 +00:00
Christian Gunderman
7ca3a33f8b Subagent activity UX. (#17570) 2026-03-02 21:04:31 +00:00
Sandy Tao
ce5a2d0760 feat(core): truncate large MCP tool output (#19365) 2026-03-02 21:01:49 +00:00
Sam Roberts
aa321b3d8c Update CODEOWNERS for README.md reviewers (#20860) 2026-03-02 20:54:05 +00:00
David Pierce
3a7a6e1540 Add install as an option when extension is selected. (#20358) 2026-03-02 20:41:16 +00:00
Tommaso Sciortino
66530e44c8 document node limitation for shift+tab (#20877) 2026-03-02 20:31:52 +00:00
Christian Gunderman
b034dcd412 Do not block CI on evals (#20870) 2026-03-02 20:31:02 +00:00
Aishanee Shah
659301ff83 feat(core): centralize read_file limits and update gemini-3 description (#20619) 2026-03-02 20:11:58 +00:00
Sandy Tao
446a4316c4 feat(core): implement HTTP authentication support for A2A remote agents (#20510)
Co-authored-by: Adam Weidman <adamfweidman@google.com>
2026-03-02 19:59:48 +00:00
Tommaso Sciortino
48412a068e Add /unassign support (#20864)
Co-authored-by: Jacob Richman <jacob314@gmail.com>
2026-03-02 19:54:26 +00:00
Adib234
2e1efaebe4 fix(plan): deflake plan mode integration tests (#20477) 2026-03-02 19:51:44 +00:00
Sandy Tao
7c9fceba7f fix(core): reduce LLM-based loop detection false positives (#20701) 2026-03-02 19:08:15 +00:00
Adam Weidman
740efa2ac2 Merge User and Agent Card Descriptions #20849 (#20850) 2026-03-02 17:59:29 +00:00
Abhi
703759cfae fix(cli): allow sub-agent confirmation requests in UI while preventing background flicker (#20722) 2026-03-01 02:39:25 +00:00
Sehoon Shon
0063581e47 feat(skills): add github-issue-creator skill (#20709) 2026-02-28 23:22:22 +00:00
Sehoon Shon
6757d4b5c5 fix(cli): resolve autoThemeSwitching when background hasn't changed but theme mismatches (#20706) 2026-02-28 23:22:10 +00:00
Sandy Tao
a153ff587b refactor(core): Extract tool parameter names as constants (#20460) 2026-02-28 21:27:54 +00:00
N. Taylor Mullen
cd3a8c3f07 fix(cli): reset themeManager between tests to ensure isolation (#20598) 2026-02-28 19:45:31 +00:00
kartik
b2214a6676 fix: acp/zed race condition between MCP initialisation and prompt (#20205)
Signed-off-by: Kartik Angiras <angiraskartik@gmail.com>
2026-02-28 17:33:08 +00:00
gemini-cli-robot
6c65a2d813 Changelog for v0.32.0-preview.0 (#20627)
Co-authored-by: gemini-cli-robot <224641728+gemini-cli-robot@users.noreply.github.com>
2026-02-28 16:03:50 +00:00
Jagjeevan Kashid
fae0639ba2 fix: use full paths for ACP diff payloads (#19539)
Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>
2026-02-28 15:54:44 +00:00
gemini-cli-robot
76f70d65ff Changelog for v0.31.0 (#20634)
Co-authored-by: gemini-cli-robot <224641728+gemini-cli-robot@users.noreply.github.com>
2026-02-28 03:45:07 +00:00
gemini-cli-robot
fb6ff847dd chore/release: bump version to 0.33.0-nightly.20260228.1ca5c05d0 (#20644) 2026-02-28 02:13:48 +00:00
Gal Zahavi
1ca5c05d0d fix(github): use robot PAT for automated PRs to pass CLA check (#20641) 2026-02-28 01:13:58 +00:00