Commit Graph

3840 Commits

Author SHA1 Message Date
Alisa Novikova
5ede571439 feat(core): implement validation back-off mechanism
Adds a strict retry threshold to the agent's validation loop:
'If validation fails 3 times on the exact same test or error, DO NOT attempt another minor code tweak. You must immediately step back, use search tools to gather wider context, and formulate a completely new strategy.'

This prevents the agent from getting stuck in repetitive, unsuccessful minor tweaks and encourages a more strategic approach when initial fixes fail.
2026-03-03 00:50:59 -08:00
Alisa Novikova
1df5178800 feat(core): prioritize existing test infrastructure and timebox test setup
Introduces three critical mandates to the agent's testing and validation workflow:
1. **Prioritize Existing Infrastructure:** Strictly prefer running the project's existing test suite over writing custom reproduction scripts to avoid environment/import difficulties.
2. **Timebox Test Setup:** Abandon custom reproduction scripts if they fail to set up within 3-5 turns due to environment or import errors, falling back to static analysis and built-in tests.
3. **Mandate Exhaustive Validation:** Explicitly requires running relevant existing project tests to prevent regressions, ensuring a passing custom reproduction script is treated as a necessary but not sufficient condition for completion.

These changes prevent 'Early Exhaustion' by reducing the complexity of standalone test setup in frameworks like Django.
2026-03-03 00:50:59 -08:00
Alisa Novikova
616062bdf6 feat(core): implement self-validation workflow with exact verbatim restoration
This commit upgrades the agent with a robust self-validation workflow while
ensuring 100% verbatim coverage of the original system prompt text.
By moving to an additive model, we preserve all original reasoning anchors,
instructional lead-ins, and senior engineering heuristics while injecting
critical autonomous mandates.

Verbatim Restoration:
- All 'Context Efficiency' guidelines, lead-ins, and scenarios (Search/Understand/Navigate).
- All 'Engineering Standards' regarding style mimicry, abstractions, and debt isolation.
- Full 'Primary Workflows' sequence and formatting.

Self-Validation Workflow Injections:
- Research Phase: Parallel Discovery (manifests + logic) and High-Signal Grep.
- Bug Fixing: Negative Verification (confirming repro failure) and Coverage Expansion.
- Implementation: Transactional Edits (logical batching of module changes).
- Validation Loop: Tiered Validation (Fixers -> Fast-Path -> Related Tests) and Smart Log Navigation (Tail-First).

Technical Verification:
- Verified against 67 core prompt tests and 14 snapshots.
- New behavioral eval suite passed (evals/self_validation_workflow.eval.ts).
- Full 'npm run preflight' successful.
2026-03-03 00:50:59 -08:00
Alisa Novikova
c3215aed93 feat(core): implement self-validation workflow with prompt-verbatim restoration
This commit upgrades the agent with a robust self-validation workflow while
ensuring 100% semantic and verbatim coverage of the original system prompt.
By moving to an additive model, we preserve the original reasoning anchors
(lead-ins, heuristics, and formatting) while injecting critical autonomous
engineering mandates.

Self-Validation Workflow Injections:
- Research Phase: Parallel Discovery (combining manifests/logic) and High-Signal Grep.
- Bug Fixing: Negative Verification (confirming repro failure) and Coverage Expansion.
- Implementation: Transactional Edits (logical batching of module changes).
- Validation Loop: Tiered Validation (Fixers -> Fast-Path -> Related Tests) and Smart Log Navigation.

Technical Verification:
- Verbatim restoration verified against 66 core tests and 14 snapshots.
- New behavioral eval suite passed (evals/self_validation_workflow.eval.ts).
- Full 'npm run preflight' validation successful.
2026-03-03 00:50:59 -08:00
Alisa Novikova
61b35ff745 feat(core): comprehensive agent self-validation and engineering mandates
Major upgrade to the agent's self-validation, safety, and project integrity
capabilities through five iterations of system prompt enhancements:

Workflow & Quality Mandates:
1. Incremental Validation: Mandates building, linting, and testing after
   every significant file change to maintain a "green" state.
2. Mandatory Reproduction: Requires creating a failing test case to confirm
   a bug before fixing, and explicitly verifying the failure (Negative Verification).
3. Test Persistence & Locality: Requires integrating repro cases into the
   permanent test suite, preferably by amending existing related test files.
4. Script Discovery: Mandates identifying project-specific validation
   commands from configuration files (package.json, Makefile, etc.).
5. Self-Review: Mandates running `git diff` after every edit, using
   `--name-only` for large changes to preserve context window tokens.
6. Fast-Path Validation: Prioritizes lightweight checks (e.g., `tsc --noEmit`)
   for frequent feedback, reserving heavy builds for final verification.
7. Output Verification: Requires checking command output (not just exit codes)
   to prevent false-positives from empty test runs or hidden warnings.

Semantic Integrity & Dependency Safety:
8. Global Usage Discovery: Mandates searching the entire workspace for all
   usages (via `grep_search`) before modifying exported symbols or APIs.
9. Dependency Integrity: Requires verifying that new imports are explicitly
   declared in the project's dependency manifest (e.g., package.json).
10. Configuration Sync: Mandates updating build/environment configs
    (tsconfig, Dockerfile, etc.) to support new file types or entry points.
11. Documentation Sync: Requires searching for and updating documentation
    references when public APIs or CLI interfaces change.
12. Anti-Silencing Mandate: Prohibits using `any`, `@ts-ignore`, or lint
    suppressions to resolve validation errors.

Diagnostics, Safety & Runtime Verification:
13. Error Grounding: Mandates reading full error logs and stack traces upon
    failure. Includes Smart Log Navigation to prioritize the tail of large files.
14. Scope Isolation: Instructs the agent to focus only on errors introduced
    by its changes and ignore unrelated legacy technical debt.
15. Destructive Safety: Mandates a `git status` check before deleting files
    or modifying critical project configurations.
16. Non-Blocking Smoke Tests: Requires briefly running applications to
    verify boot stability, using background/timeout strategies for servers.

Includes 15 new behavioral evaluations verifying these mandates and updated
snapshots in packages/core/src/core/prompts.test.ts.
2026-03-03 00:50:59 -08:00
Jacob Richman
8303edbb54 Code review fixes as a pr (#20612) 2026-03-03 04:32:50 +00:00
Aswin Ashok
0d69f9f7fa Build binary (#18933)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-03-03 01:02:19 +00:00
Sandy Tao
2e7722d6a3 fix(core): restrict "System: Please continue" invalid stream retry to Gemini 2 models (#20897) 2026-03-02 23:21:13 +00:00
Yuna Seol
69e15a50d1 fix(core): skip telemetry logging for AbortError exceptions (#19477)
Co-authored-by: Yuna Seol <yunaseol@google.com>
2026-03-02 23:14:31 +00:00
Adib234
01927a36d1 feat(plan): support annotating plans with feedback for iteration (#20876) 2026-03-02 23:03:59 +00:00
Shreya Keshive
06ddfa5c4c feat(admin): enable 30 day default retention for chat history & remove warning (#20853) 2026-03-02 22:44:49 +00:00
Christian Gunderman
3f7ef816f1 fix(core): increase default headers timeout to 5 minutes (#20890) 2026-03-02 22:36:58 +00:00
Jerop Kipruto
d05ba11a31 refactor(core): replace manual syncPlanModeTools with declarative policy rules (#20596) 2026-03-02 22:30:50 +00:00
Allen Hutchison
bb6d1a2775 feat(core): add tool name validation in TOML policy files (#19281)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-02 21:47:21 +00:00
Nayana Parameswarappa
dd9ccc9807 Adding MCPOAuthProvider implementing the MCPSDK OAuthClientProvider (#20121) 2026-03-02 21:37:44 +00:00
Pyush Sinha
8133d63ac6 refactor(cli): fully remove React anti patterns, improve type safety and fix UX oversights in SettingsDialog.tsx (#18963)
Co-authored-by: Jacob Richman <jacob314@gmail.com>
2026-03-02 21:30:58 +00:00
Sandy Tao
18d0375a7f feat(core): support authenticated A2A agent card discovery (#20622)
Co-authored-by: Adam Weidman <adamfweidman@google.com>
Co-authored-by: Adam Weidman <65992621+adamfweidman@users.noreply.github.com>
2026-03-02 21:29:31 +00:00
Keith Guerin
31ca57ec94 feat: redesign header to be compact with ASCII icon (#18713)
Co-authored-by: Jacob Richman <jacob314@gmail.com>
2026-03-02 21:12:17 +00:00
Abhi
b7a8f0d1f9 fix(core): ensure subagents use qualified MCP tool names (#20801) 2026-03-02 21:12:13 +00:00
Abdul Tawab
1502e5cbc3 style(cli) : Dialog pattern for /hooks Command (#17930) 2026-03-02 21:12:05 +00:00
Christian Gunderman
7ca3a33f8b Subagent activity UX. (#17570) 2026-03-02 21:04:31 +00:00
Sandy Tao
ce5a2d0760 feat(core): truncate large MCP tool output (#19365) 2026-03-02 21:01:49 +00:00
David Pierce
3a7a6e1540 Add install as an option when extension is selected. (#20358) 2026-03-02 20:41:16 +00:00
Aishanee Shah
659301ff83 feat(core): centralize read_file limits and update gemini-3 description (#20619) 2026-03-02 20:11:58 +00:00
Sandy Tao
446a4316c4 feat(core): implement HTTP authentication support for A2A remote agents (#20510)
Co-authored-by: Adam Weidman <adamfweidman@google.com>
2026-03-02 19:59:48 +00:00
Adib234
2e1efaebe4 fix(plan): deflake plan mode integration tests (#20477) 2026-03-02 19:51:44 +00:00
Sandy Tao
7c9fceba7f fix(core): reduce LLM-based loop detection false positives (#20701) 2026-03-02 19:08:15 +00:00
Adam Weidman
740efa2ac2 Merge User and Agent Card Descriptions #20849 (#20850) 2026-03-02 17:59:29 +00:00
Abhi
703759cfae fix(cli): allow sub-agent confirmation requests in UI while preventing background flicker (#20722) 2026-03-01 02:39:25 +00:00
Sehoon Shon
6757d4b5c5 fix(cli): resolve autoThemeSwitching when background hasn't changed but theme mismatches (#20706) 2026-02-28 23:22:10 +00:00
Sandy Tao
a153ff587b refactor(core): Extract tool parameter names as constants (#20460) 2026-02-28 21:27:54 +00:00
N. Taylor Mullen
cd3a8c3f07 fix(cli): reset themeManager between tests to ensure isolation (#20598) 2026-02-28 19:45:31 +00:00
kartik
b2214a6676 fix: acp/zed race condition between MCP initialisation and prompt (#20205)
Signed-off-by: Kartik Angiras <angiraskartik@gmail.com>
2026-02-28 17:33:08 +00:00
Jagjeevan Kashid
fae0639ba2 fix: use full paths for ACP diff payloads (#19539)
Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>
2026-02-28 15:54:44 +00:00
gemini-cli-robot
fb6ff847dd chore/release: bump version to 0.33.0-nightly.20260228.1ca5c05d0 (#20644) 2026-02-28 02:13:48 +00:00
Gal Zahavi
0c6c9c6a62 chore(release): bump version to 0.33.0-nightly.20260227.ba149afa0 (#20637) 2026-02-28 00:51:22 +00:00
Sehoon Shon
a1367e9cdd fix(core): parse raw ASCII buffer strings in Gaxios errors (#20626) 2026-02-27 23:57:32 +00:00
nityam
ba149afa0b fix: merge duplicate imports in a2a-server package (2/4) (#19781) 2026-02-27 21:13:30 +00:00
nityam
f6533c0250 fix: merge duplicate imports in sdk and test-utils packages (1/4) (#19777) 2026-02-27 21:13:15 +00:00
Abhi
966b9059d0 feat(core): enable contiguous parallel admission for Kind.Agent tools (#20583) 2026-02-27 21:08:10 +00:00
Spencer
20d884da2f fix(core): reduce intrusive MCP errors and deduplicate diagnostics (#20232) 2026-02-27 20:04:36 +00:00
Dmitry Lyalin
7f8ce8657c Add low/full CLI error verbosity mode for cleaner UI (#20399) 2026-02-27 19:15:10 +00:00
Jacob Richman
e00e8f4728 fix(cli): Shell autocomplete polish (#20411) 2026-02-27 19:03:37 +00:00
Abhi
c914fd0700 fix(cli): prevent sub-agent tool calls from leaking into UI (#20580) 2026-02-27 19:00:19 +00:00
Jerop Kipruto
5d24d6a9e1 fix(ui): persist expansion in AskUser dialog when navigating options (#20559) 2026-02-27 18:30:16 +00:00
Gaurav
ea48bd9414 feat: better error messages (#20577)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-02-27 18:18:16 +00:00
Gaurav
b2d6844f9b feat(billing): implement G1 AI credits overage flow with billing telemetry (#18590) 2026-02-27 18:15:06 +00:00
Sehoon Shon
fdd844b405 fix(core): disable retries for code assist streaming requests (#20561) 2026-02-27 18:04:43 +00:00
Adib234
23905bcd77 fix(plan): prevent agent from using ask_user for shell command confirmation (#20504) 2026-02-27 17:51:47 +00:00
Dev Randalpura
ec39aa17c2 Moved markdown parsing logic to a separate util file (#20526) 2026-02-27 17:43:18 +00:00