Commit Graph

1977 Commits

Author SHA1 Message Date
Alisa Novikova
5ede571439 feat(core): implement validation back-off mechanism
Adds a strict retry threshold to the agent's validation loop:
'If validation fails 3 times on the exact same test or error, DO NOT attempt another minor code tweak. You must immediately step back, use search tools to gather wider context, and formulate a completely new strategy.'

This prevents the agent from getting stuck in repetitive, unsuccessful minor tweaks and encourages a more strategic approach when initial fixes fail.
2026-03-03 00:50:59 -08:00
Alisa Novikova
1df5178800 feat(core): prioritize existing test infrastructure and timebox test setup
Introduces three critical mandates to the agent's testing and validation workflow:
1. **Prioritize Existing Infrastructure:** Strictly prefer running the project's existing test suite over writing custom reproduction scripts to avoid environment/import difficulties.
2. **Timebox Test Setup:** Abandon custom reproduction scripts if they fail to set up within 3-5 turns due to environment or import errors, falling back to static analysis and built-in tests.
3. **Mandate Exhaustive Validation:** Explicitly requires running relevant existing project tests to prevent regressions, ensuring a passing custom reproduction script is treated as a necessary but not sufficient condition for completion.

These changes prevent 'Early Exhaustion' by reducing the complexity of standalone test setup in frameworks like Django.
2026-03-03 00:50:59 -08:00
Alisa Novikova
616062bdf6 feat(core): implement self-validation workflow with exact verbatim restoration
This commit upgrades the agent with a robust self-validation workflow while
ensuring 100% verbatim coverage of the original system prompt text.
By moving to an additive model, we preserve all original reasoning anchors,
instructional lead-ins, and senior engineering heuristics while injecting
critical autonomous mandates.

Verbatim Restoration:
- All 'Context Efficiency' guidelines, lead-ins, and scenarios (Search/Understand/Navigate).
- All 'Engineering Standards' regarding style mimicry, abstractions, and debt isolation.
- Full 'Primary Workflows' sequence and formatting.

Self-Validation Workflow Injections:
- Research Phase: Parallel Discovery (manifests + logic) and High-Signal Grep.
- Bug Fixing: Negative Verification (confirming repro failure) and Coverage Expansion.
- Implementation: Transactional Edits (logical batching of module changes).
- Validation Loop: Tiered Validation (Fixers -> Fast-Path -> Related Tests) and Smart Log Navigation (Tail-First).

Technical Verification:
- Verified against 67 core prompt tests and 14 snapshots.
- New behavioral eval suite passed (evals/self_validation_workflow.eval.ts).
- Full 'npm run preflight' successful.
2026-03-03 00:50:59 -08:00
Alisa Novikova
c3215aed93 feat(core): implement self-validation workflow with prompt-verbatim restoration
This commit upgrades the agent with a robust self-validation workflow while
ensuring 100% semantic and verbatim coverage of the original system prompt.
By moving to an additive model, we preserve the original reasoning anchors
(lead-ins, heuristics, and formatting) while injecting critical autonomous
engineering mandates.

Self-Validation Workflow Injections:
- Research Phase: Parallel Discovery (combining manifests/logic) and High-Signal Grep.
- Bug Fixing: Negative Verification (confirming repro failure) and Coverage Expansion.
- Implementation: Transactional Edits (logical batching of module changes).
- Validation Loop: Tiered Validation (Fixers -> Fast-Path -> Related Tests) and Smart Log Navigation.

Technical Verification:
- Verbatim restoration verified against 66 core tests and 14 snapshots.
- New behavioral eval suite passed (evals/self_validation_workflow.eval.ts).
- Full 'npm run preflight' validation successful.
2026-03-03 00:50:59 -08:00
Alisa Novikova
61b35ff745 feat(core): comprehensive agent self-validation and engineering mandates
Major upgrade to the agent's self-validation, safety, and project integrity
capabilities through five iterations of system prompt enhancements:

Workflow & Quality Mandates:
1. Incremental Validation: Mandates building, linting, and testing after
   every significant file change to maintain a "green" state.
2. Mandatory Reproduction: Requires creating a failing test case to confirm
   a bug before fixing, and explicitly verifying the failure (Negative Verification).
3. Test Persistence & Locality: Requires integrating repro cases into the
   permanent test suite, preferably by amending existing related test files.
4. Script Discovery: Mandates identifying project-specific validation
   commands from configuration files (package.json, Makefile, etc.).
5. Self-Review: Mandates running `git diff` after every edit, using
   `--name-only` for large changes to preserve context window tokens.
6. Fast-Path Validation: Prioritizes lightweight checks (e.g., `tsc --noEmit`)
   for frequent feedback, reserving heavy builds for final verification.
7. Output Verification: Requires checking command output (not just exit codes)
   to prevent false-positives from empty test runs or hidden warnings.

Semantic Integrity & Dependency Safety:
8. Global Usage Discovery: Mandates searching the entire workspace for all
   usages (via `grep_search`) before modifying exported symbols or APIs.
9. Dependency Integrity: Requires verifying that new imports are explicitly
   declared in the project's dependency manifest (e.g., package.json).
10. Configuration Sync: Mandates updating build/environment configs
    (tsconfig, Dockerfile, etc.) to support new file types or entry points.
11. Documentation Sync: Requires searching for and updating documentation
    references when public APIs or CLI interfaces change.
12. Anti-Silencing Mandate: Prohibits using `any`, `@ts-ignore`, or lint
    suppressions to resolve validation errors.

Diagnostics, Safety & Runtime Verification:
13. Error Grounding: Mandates reading full error logs and stack traces upon
    failure. Includes Smart Log Navigation to prioritize the tail of large files.
14. Scope Isolation: Instructs the agent to focus only on errors introduced
    by its changes and ignore unrelated legacy technical debt.
15. Destructive Safety: Mandates a `git status` check before deleting files
    or modifying critical project configurations.
16. Non-Blocking Smoke Tests: Requires briefly running applications to
    verify boot stability, using background/timeout strategies for servers.

Includes 15 new behavioral evaluations verifying these mandates and updated
snapshots in packages/core/src/core/prompts.test.ts.
2026-03-03 00:50:59 -08:00
Aswin Ashok
0d69f9f7fa Build binary (#18933)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-03-03 01:02:19 +00:00
Sandy Tao
2e7722d6a3 fix(core): restrict "System: Please continue" invalid stream retry to Gemini 2 models (#20897) 2026-03-02 23:21:13 +00:00
Yuna Seol
69e15a50d1 fix(core): skip telemetry logging for AbortError exceptions (#19477)
Co-authored-by: Yuna Seol <yunaseol@google.com>
2026-03-02 23:14:31 +00:00
Christian Gunderman
3f7ef816f1 fix(core): increase default headers timeout to 5 minutes (#20890) 2026-03-02 22:36:58 +00:00
Jerop Kipruto
d05ba11a31 refactor(core): replace manual syncPlanModeTools with declarative policy rules (#20596) 2026-03-02 22:30:50 +00:00
Allen Hutchison
bb6d1a2775 feat(core): add tool name validation in TOML policy files (#19281)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-02 21:47:21 +00:00
Nayana Parameswarappa
dd9ccc9807 Adding MCPOAuthProvider implementing the MCPSDK OAuthClientProvider (#20121) 2026-03-02 21:37:44 +00:00
Sandy Tao
18d0375a7f feat(core): support authenticated A2A agent card discovery (#20622)
Co-authored-by: Adam Weidman <adamfweidman@google.com>
Co-authored-by: Adam Weidman <65992621+adamfweidman@users.noreply.github.com>
2026-03-02 21:29:31 +00:00
Abhi
b7a8f0d1f9 fix(core): ensure subagents use qualified MCP tool names (#20801) 2026-03-02 21:12:13 +00:00
Christian Gunderman
7ca3a33f8b Subagent activity UX. (#17570) 2026-03-02 21:04:31 +00:00
Sandy Tao
ce5a2d0760 feat(core): truncate large MCP tool output (#19365) 2026-03-02 21:01:49 +00:00
Aishanee Shah
659301ff83 feat(core): centralize read_file limits and update gemini-3 description (#20619) 2026-03-02 20:11:58 +00:00
Sandy Tao
446a4316c4 feat(core): implement HTTP authentication support for A2A remote agents (#20510)
Co-authored-by: Adam Weidman <adamfweidman@google.com>
2026-03-02 19:59:48 +00:00
Adib234
2e1efaebe4 fix(plan): deflake plan mode integration tests (#20477) 2026-03-02 19:51:44 +00:00
Sandy Tao
7c9fceba7f fix(core): reduce LLM-based loop detection false positives (#20701) 2026-03-02 19:08:15 +00:00
Adam Weidman
740efa2ac2 Merge User and Agent Card Descriptions #20849 (#20850) 2026-03-02 17:59:29 +00:00
Sandy Tao
a153ff587b refactor(core): Extract tool parameter names as constants (#20460) 2026-02-28 21:27:54 +00:00
N. Taylor Mullen
cd3a8c3f07 fix(cli): reset themeManager between tests to ensure isolation (#20598) 2026-02-28 19:45:31 +00:00
kartik
b2214a6676 fix: acp/zed race condition between MCP initialisation and prompt (#20205)
Signed-off-by: Kartik Angiras <angiraskartik@gmail.com>
2026-02-28 17:33:08 +00:00
gemini-cli-robot
fb6ff847dd chore/release: bump version to 0.33.0-nightly.20260228.1ca5c05d0 (#20644) 2026-02-28 02:13:48 +00:00
Gal Zahavi
0c6c9c6a62 chore(release): bump version to 0.33.0-nightly.20260227.ba149afa0 (#20637) 2026-02-28 00:51:22 +00:00
Sehoon Shon
a1367e9cdd fix(core): parse raw ASCII buffer strings in Gaxios errors (#20626) 2026-02-27 23:57:32 +00:00
nityam
ba149afa0b fix: merge duplicate imports in a2a-server package (2/4) (#19781) 2026-02-27 21:13:30 +00:00
Abhi
966b9059d0 feat(core): enable contiguous parallel admission for Kind.Agent tools (#20583) 2026-02-27 21:08:10 +00:00
Spencer
20d884da2f fix(core): reduce intrusive MCP errors and deduplicate diagnostics (#20232) 2026-02-27 20:04:36 +00:00
Gaurav
ea48bd9414 feat: better error messages (#20577)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-02-27 18:18:16 +00:00
Gaurav
b2d6844f9b feat(billing): implement G1 AI credits overage flow with billing telemetry (#18590) 2026-02-27 18:15:06 +00:00
Sehoon Shon
fdd844b405 fix(core): disable retries for code assist streaming requests (#20561) 2026-02-27 18:04:43 +00:00
Adib234
23905bcd77 fix(plan): prevent agent from using ask_user for shell command confirmation (#20504) 2026-02-27 17:51:47 +00:00
Sehoon Shon
e709789067 fix(core): handle optional response fields from code assist API (#20345) 2026-02-27 16:52:37 +00:00
Abhijit Balaji
32e777f838 fix(core): revert auto-save of policies to user space (#20531) 2026-02-27 16:03:36 +00:00
Pyush Sinha
d7320f5425 refactor(core,cli): useAlternateBuffer read from config (#20346)
Co-authored-by: Jacob Richman <jacob314@gmail.com>
2026-02-27 15:55:02 +00:00
Adib234
25ade7bcb7 feat(plan): update planning workflow to encourage multi-select with descriptions of options (#20491) 2026-02-27 15:42:37 +00:00
christine betts
58df1c6237 Fix extension MCP server env var loading (#20374) 2026-02-27 14:49:10 +00:00
Bryan Morgan
522e95439c fix(core): apply retry logic to CodeAssistServer for all users (#20507) 2026-02-27 09:26:53 -05:00
christine betts
e17f927a69 Add support for policy engine in extensions (#20049)
Co-authored-by: Jerop Kipruto <jerop@google.com>
2026-02-27 03:29:33 +00:00
heaventourist
b1befee8fb feat(telemetry) Instrument traces with more attributes and make them available to OTEL users (#20237)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Jerop Kipruto <jerop@google.com>
Co-authored-by: MD. MOHIBUR RAHMAN <35300157+mrpmohiburrahman@users.noreply.github.com>
Co-authored-by: Jeffrey Ying <jeffrey.ying86@live.com>
Co-authored-by: Bryan Morgan <bryanmorgan@google.com>
Co-authored-by: joshualitt <joshualitt@google.com>
Co-authored-by: Dev Randalpura <devrandalpura@google.com>
Co-authored-by: Google Admin <github-admin@google.com>
Co-authored-by: Ben Knutson <benknutson@google.com>
2026-02-27 02:26:16 +00:00
Tommaso Sciortino
4b7ce1fe67 Avoid overaggressive unescaping (#20520) 2026-02-27 01:50:21 +00:00
Siddharth Diwan
9b7852f11c [Gemma x Gemini CLI] Add an Experimental Gemma Router that uses a LiteRT-LM shim into the Composite Model Classifier Strategy (#17231)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Allen Hutchison <adh@google.com>
2026-02-26 23:43:43 +00:00
Bryan Morgan
6dc9d5ff11 feat(core): increase fetch timeout and fix [object Object] error stringification (#20441)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-02-26 23:41:09 +00:00
Jerop Kipruto
aa98cafca7 feat(plan): adapt planning workflow based on complexity of task (#20465)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-02-26 22:58:19 +00:00
krishdef7
f700c923d9 fix(core): flush transcript for pure tool-call responses to ensure BeforeTool hooks see complete state (#20419)
Co-authored-by: Bryan Morgan <bryanmorgan@google.com>
2026-02-26 22:39:36 +00:00
Sehoon Shon
edb1fdea30 fix(cli): support quota error fallbacks for all authentication types (#20475)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-02-26 22:39:25 +00:00
Adam Weidman
10c5bd8ce9 feat(core): improve A2A content extraction (#20487)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-02-26 22:38:30 +00:00
joshualitt
611d934829 feat(core): Enable generalist agent (#19665) 2026-02-26 16:38:49 +00:00