Commit Graph

1855 Commits

Author SHA1 Message Date
Alisa Novikova
901e94cba8 chore(core): simplify agent mandates to improve efficiency and reduce turn count 2026-03-03 01:00:26 -08:00
Alisa Novikova
adc62a76e0 test(core): update snapshots for new agent behavior mandates
Updates core prompt snapshots to include:
- Priority for existing test infrastructure
- Timeboxed test setup (3-5 turn limit)
- Mandate for exhaustive validation against regressions
- Validation back-off mechanism (retry threshold)
- Detection of circular/looping behavior
2026-03-03 00:50:59 -08:00
Alisa Novikova
0b06a9ae04 feat(core): implement circular behavior detection mandate
Adds a self-awareness mandate to the agent's planning phase:
'Before attempting a fix for a validation error, review your recent tool calls. If you are repeatedly applying similar regex replacements or edits to the same block of code without the validation error changing, you are in a loop. Stop, revert your changes to a known good state, and rethink your approach.'

This helps the agent identify and break out of unproductive loops during debugging and implementation.
2026-03-03 00:50:59 -08:00
Alisa Novikova
5ede571439 feat(core): implement validation back-off mechanism
Adds a strict retry threshold to the agent's validation loop:
'If validation fails 3 times on the exact same test or error, DO NOT attempt another minor code tweak. You must immediately step back, use search tools to gather wider context, and formulate a completely new strategy.'

This prevents the agent from getting stuck in repetitive, unsuccessful minor tweaks and encourages a more strategic approach when initial fixes fail.
2026-03-03 00:50:59 -08:00
Alisa Novikova
1df5178800 feat(core): prioritize existing test infrastructure and timebox test setup
Introduces three critical mandates to the agent's testing and validation workflow:
1. **Prioritize Existing Infrastructure:** Strictly prefer running the project's existing test suite over writing custom reproduction scripts to avoid environment/import difficulties.
2. **Timebox Test Setup:** Abandon custom reproduction scripts if they fail to set up within 3-5 turns due to environment or import errors, falling back to static analysis and built-in tests.
3. **Mandate Exhaustive Validation:** Explicitly requires running relevant existing project tests to prevent regressions, ensuring a passing custom reproduction script is treated as a necessary but not sufficient condition for completion.

These changes prevent 'Early Exhaustion' by reducing the complexity of standalone test setup in frameworks like Django.
2026-03-03 00:50:59 -08:00
Alisa Novikova
616062bdf6 feat(core): implement self-validation workflow with exact verbatim restoration
This commit upgrades the agent with a robust self-validation workflow while
ensuring 100% verbatim coverage of the original system prompt text.
By moving to an additive model, we preserve all original reasoning anchors,
instructional lead-ins, and senior engineering heuristics while injecting
critical autonomous mandates.

Verbatim Restoration:
- All 'Context Efficiency' guidelines, lead-ins, and scenarios (Search/Understand/Navigate).
- All 'Engineering Standards' regarding style mimicry, abstractions, and debt isolation.
- Full 'Primary Workflows' sequence and formatting.

Self-Validation Workflow Injections:
- Research Phase: Parallel Discovery (manifests + logic) and High-Signal Grep.
- Bug Fixing: Negative Verification (confirming repro failure) and Coverage Expansion.
- Implementation: Transactional Edits (logical batching of module changes).
- Validation Loop: Tiered Validation (Fixers -> Fast-Path -> Related Tests) and Smart Log Navigation (Tail-First).

Technical Verification:
- Verified against 67 core prompt tests and 14 snapshots.
- New behavioral eval suite passed (evals/self_validation_workflow.eval.ts).
- Full 'npm run preflight' successful.
2026-03-03 00:50:59 -08:00
Alisa Novikova
c3215aed93 feat(core): implement self-validation workflow with prompt-verbatim restoration
This commit upgrades the agent with a robust self-validation workflow while
ensuring 100% semantic and verbatim coverage of the original system prompt.
By moving to an additive model, we preserve the original reasoning anchors
(lead-ins, heuristics, and formatting) while injecting critical autonomous
engineering mandates.

Self-Validation Workflow Injections:
- Research Phase: Parallel Discovery (combining manifests/logic) and High-Signal Grep.
- Bug Fixing: Negative Verification (confirming repro failure) and Coverage Expansion.
- Implementation: Transactional Edits (logical batching of module changes).
- Validation Loop: Tiered Validation (Fixers -> Fast-Path -> Related Tests) and Smart Log Navigation.

Technical Verification:
- Verbatim restoration verified against 66 core tests and 14 snapshots.
- New behavioral eval suite passed (evals/self_validation_workflow.eval.ts).
- Full 'npm run preflight' validation successful.
2026-03-03 00:50:59 -08:00
Alisa Novikova
61b35ff745 feat(core): comprehensive agent self-validation and engineering mandates
Major upgrade to the agent's self-validation, safety, and project integrity
capabilities through five iterations of system prompt enhancements:

Workflow & Quality Mandates:
1. Incremental Validation: Mandates building, linting, and testing after
   every significant file change to maintain a "green" state.
2. Mandatory Reproduction: Requires creating a failing test case to confirm
   a bug before fixing, and explicitly verifying the failure (Negative Verification).
3. Test Persistence & Locality: Requires integrating repro cases into the
   permanent test suite, preferably by amending existing related test files.
4. Script Discovery: Mandates identifying project-specific validation
   commands from configuration files (package.json, Makefile, etc.).
5. Self-Review: Mandates running `git diff` after every edit, using
   `--name-only` for large changes to preserve context window tokens.
6. Fast-Path Validation: Prioritizes lightweight checks (e.g., `tsc --noEmit`)
   for frequent feedback, reserving heavy builds for final verification.
7. Output Verification: Requires checking command output (not just exit codes)
   to prevent false-positives from empty test runs or hidden warnings.

Semantic Integrity & Dependency Safety:
8. Global Usage Discovery: Mandates searching the entire workspace for all
   usages (via `grep_search`) before modifying exported symbols or APIs.
9. Dependency Integrity: Requires verifying that new imports are explicitly
   declared in the project's dependency manifest (e.g., package.json).
10. Configuration Sync: Mandates updating build/environment configs
    (tsconfig, Dockerfile, etc.) to support new file types or entry points.
11. Documentation Sync: Requires searching for and updating documentation
    references when public APIs or CLI interfaces change.
12. Anti-Silencing Mandate: Prohibits using `any`, `@ts-ignore`, or lint
    suppressions to resolve validation errors.

Diagnostics, Safety & Runtime Verification:
13. Error Grounding: Mandates reading full error logs and stack traces upon
    failure. Includes Smart Log Navigation to prioritize the tail of large files.
14. Scope Isolation: Instructs the agent to focus only on errors introduced
    by its changes and ignore unrelated legacy technical debt.
15. Destructive Safety: Mandates a `git status` check before deleting files
    or modifying critical project configurations.
16. Non-Blocking Smoke Tests: Requires briefly running applications to
    verify boot stability, using background/timeout strategies for servers.

Includes 15 new behavioral evaluations verifying these mandates and updated
snapshots in packages/core/src/core/prompts.test.ts.
2026-03-03 00:50:59 -08:00
Aswin Ashok
0d69f9f7fa Build binary (#18933)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-03-03 01:02:19 +00:00
Sandy Tao
2e7722d6a3 fix(core): restrict "System: Please continue" invalid stream retry to Gemini 2 models (#20897) 2026-03-02 23:21:13 +00:00
Yuna Seol
69e15a50d1 fix(core): skip telemetry logging for AbortError exceptions (#19477)
Co-authored-by: Yuna Seol <yunaseol@google.com>
2026-03-02 23:14:31 +00:00
Christian Gunderman
3f7ef816f1 fix(core): increase default headers timeout to 5 minutes (#20890) 2026-03-02 22:36:58 +00:00
Jerop Kipruto
d05ba11a31 refactor(core): replace manual syncPlanModeTools with declarative policy rules (#20596) 2026-03-02 22:30:50 +00:00
Allen Hutchison
bb6d1a2775 feat(core): add tool name validation in TOML policy files (#19281)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-02 21:47:21 +00:00
Nayana Parameswarappa
dd9ccc9807 Adding MCPOAuthProvider implementing the MCPSDK OAuthClientProvider (#20121) 2026-03-02 21:37:44 +00:00
Sandy Tao
18d0375a7f feat(core): support authenticated A2A agent card discovery (#20622)
Co-authored-by: Adam Weidman <adamfweidman@google.com>
Co-authored-by: Adam Weidman <65992621+adamfweidman@users.noreply.github.com>
2026-03-02 21:29:31 +00:00
Abhi
b7a8f0d1f9 fix(core): ensure subagents use qualified MCP tool names (#20801) 2026-03-02 21:12:13 +00:00
Christian Gunderman
7ca3a33f8b Subagent activity UX. (#17570) 2026-03-02 21:04:31 +00:00
Sandy Tao
ce5a2d0760 feat(core): truncate large MCP tool output (#19365) 2026-03-02 21:01:49 +00:00
Aishanee Shah
659301ff83 feat(core): centralize read_file limits and update gemini-3 description (#20619) 2026-03-02 20:11:58 +00:00
Sandy Tao
446a4316c4 feat(core): implement HTTP authentication support for A2A remote agents (#20510)
Co-authored-by: Adam Weidman <adamfweidman@google.com>
2026-03-02 19:59:48 +00:00
Adib234
2e1efaebe4 fix(plan): deflake plan mode integration tests (#20477) 2026-03-02 19:51:44 +00:00
Sandy Tao
7c9fceba7f fix(core): reduce LLM-based loop detection false positives (#20701) 2026-03-02 19:08:15 +00:00
Adam Weidman
740efa2ac2 Merge User and Agent Card Descriptions #20849 (#20850) 2026-03-02 17:59:29 +00:00
Sandy Tao
a153ff587b refactor(core): Extract tool parameter names as constants (#20460) 2026-02-28 21:27:54 +00:00
N. Taylor Mullen
cd3a8c3f07 fix(cli): reset themeManager between tests to ensure isolation (#20598) 2026-02-28 19:45:31 +00:00
kartik
b2214a6676 fix: acp/zed race condition between MCP initialisation and prompt (#20205)
Signed-off-by: Kartik Angiras <angiraskartik@gmail.com>
2026-02-28 17:33:08 +00:00
Sehoon Shon
a1367e9cdd fix(core): parse raw ASCII buffer strings in Gaxios errors (#20626) 2026-02-27 23:57:32 +00:00
nityam
ba149afa0b fix: merge duplicate imports in a2a-server package (2/4) (#19781) 2026-02-27 21:13:30 +00:00
Abhi
966b9059d0 feat(core): enable contiguous parallel admission for Kind.Agent tools (#20583) 2026-02-27 21:08:10 +00:00
Spencer
20d884da2f fix(core): reduce intrusive MCP errors and deduplicate diagnostics (#20232) 2026-02-27 20:04:36 +00:00
Gaurav
ea48bd9414 feat: better error messages (#20577)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-02-27 18:18:16 +00:00
Gaurav
b2d6844f9b feat(billing): implement G1 AI credits overage flow with billing telemetry (#18590) 2026-02-27 18:15:06 +00:00
Sehoon Shon
fdd844b405 fix(core): disable retries for code assist streaming requests (#20561) 2026-02-27 18:04:43 +00:00
Adib234
23905bcd77 fix(plan): prevent agent from using ask_user for shell command confirmation (#20504) 2026-02-27 17:51:47 +00:00
Sehoon Shon
e709789067 fix(core): handle optional response fields from code assist API (#20345) 2026-02-27 16:52:37 +00:00
Abhijit Balaji
32e777f838 fix(core): revert auto-save of policies to user space (#20531) 2026-02-27 16:03:36 +00:00
Pyush Sinha
d7320f5425 refactor(core,cli): useAlternateBuffer read from config (#20346)
Co-authored-by: Jacob Richman <jacob314@gmail.com>
2026-02-27 15:55:02 +00:00
Adib234
25ade7bcb7 feat(plan): update planning workflow to encourage multi-select with descriptions of options (#20491) 2026-02-27 15:42:37 +00:00
christine betts
58df1c6237 Fix extension MCP server env var loading (#20374) 2026-02-27 14:49:10 +00:00
Bryan Morgan
522e95439c fix(core): apply retry logic to CodeAssistServer for all users (#20507) 2026-02-27 09:26:53 -05:00
christine betts
e17f927a69 Add support for policy engine in extensions (#20049)
Co-authored-by: Jerop Kipruto <jerop@google.com>
2026-02-27 03:29:33 +00:00
heaventourist
b1befee8fb feat(telemetry) Instrument traces with more attributes and make them available to OTEL users (#20237)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Jerop Kipruto <jerop@google.com>
Co-authored-by: MD. MOHIBUR RAHMAN <35300157+mrpmohiburrahman@users.noreply.github.com>
Co-authored-by: Jeffrey Ying <jeffrey.ying86@live.com>
Co-authored-by: Bryan Morgan <bryanmorgan@google.com>
Co-authored-by: joshualitt <joshualitt@google.com>
Co-authored-by: Dev Randalpura <devrandalpura@google.com>
Co-authored-by: Google Admin <github-admin@google.com>
Co-authored-by: Ben Knutson <benknutson@google.com>
2026-02-27 02:26:16 +00:00
Tommaso Sciortino
4b7ce1fe67 Avoid overaggressive unescaping (#20520) 2026-02-27 01:50:21 +00:00
Siddharth Diwan
9b7852f11c [Gemma x Gemini CLI] Add an Experimental Gemma Router that uses a LiteRT-LM shim into the Composite Model Classifier Strategy (#17231)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Allen Hutchison <adh@google.com>
2026-02-26 23:43:43 +00:00
Bryan Morgan
6dc9d5ff11 feat(core): increase fetch timeout and fix [object Object] error stringification (#20441)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-02-26 23:41:09 +00:00
Jerop Kipruto
aa98cafca7 feat(plan): adapt planning workflow based on complexity of task (#20465)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-02-26 22:58:19 +00:00
krishdef7
f700c923d9 fix(core): flush transcript for pure tool-call responses to ensure BeforeTool hooks see complete state (#20419)
Co-authored-by: Bryan Morgan <bryanmorgan@google.com>
2026-02-26 22:39:36 +00:00
Sehoon Shon
edb1fdea30 fix(cli): support quota error fallbacks for all authentication types (#20475)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-02-26 22:39:25 +00:00
Adam Weidman
10c5bd8ce9 feat(core): improve A2A content extraction (#20487)
Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com>
2026-02-26 22:38:30 +00:00