Commit Graph

86 Commits

Author SHA1 Message Date
Mahima Shanware 8d584e4d96 fix(core): handle nested plan files by resolving paths correctly
This fixes a bug where path.basename was incorrectly stripping directory structures from plan files (e.g., trying to write to plans/nested/file.md would incorrectly write to plans/file.md). By using path.resolve and verifying with isSubpath, nested files are now handled securely and correctly.
2026-04-10 18:33:02 +00:00
Mahima Shanware a99cd0be28 Merge branch 'main' into worktree-con-plan-bug
Resolves conflict in packages/core/src/tools/enter-plan-mode.test.ts by removing an assertion for directory creation, which has been centralized in config.ts in this branch.
2026-04-10 18:00:53 +00:00
Mahima Shanware 74ce3eef0c fix(evals): refine plan mode eval prompt to ensure toolchain completion
Align the 'foo' test prompt with existing project conventions while ensuring the model has the 'informal agreement' signal required to proceed to formal approval and implementation.
2026-04-09 18:13:35 +00:00
Abhi b238a453e3 feat(core): refactor subagent tool to unified invoke_subagent tool (#24489) 2026-04-09 16:48:24 +00:00
Christian Gunderman f1bb2af6de Generalize evals infra to support more types of evals, organization and queuing of named suites (#24941) 2026-04-08 23:57:26 +00:00
Christian Gunderman d2b775f9a7 Add an eval for and fix unsafe cloning behavior. (#24457) 2026-04-07 03:17:44 +00:00
Christian Gunderman 8f131ffef7 Fix issue where topic headers can be posted back to back (#24759) 2026-04-06 18:36:22 +00:00
Samee Zahid 4fb3790051 feat(core): discourage update topic tool for simple tasks (#24640)
Co-authored-by: Samee Zahid <sameez@google.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-03 23:52:24 +00:00
Coco Sheng f510394721 Implement background process monitoring and inspection tools (#23799) 2026-04-02 15:01:00 +00:00
Alisa 973092df50 feat: implement high-signal PR regression check for evaluations (#23937) 2026-04-02 05:14:43 +00:00
anj-s 43cf63e189 fix: update task tracker storage location in system prompt (#24034) 2026-04-01 18:29:09 +00:00
Jerop Kipruto ca43f8c291 feat(core): prioritize discussion before formal plan approval (#24423) 2026-04-01 15:55:47 +00:00
David Pierce 94f9480a3a fix(core): resolve Plan Mode deadlock during plan file creation due to sandbox restrictions (#24047) 2026-03-31 22:06:50 +00:00
ruomeng 07e2053e12 feat(plan): promote planning feature to stable (#24282) 2026-03-31 16:10:13 +00:00
Christian Gunderman 117a2d3844 fix(evals): add update_topic behavioral eval (#24223) 2026-03-30 22:02:53 +00:00
Abhi d9d2ce36f2 test(evals): add comprehensive subagent delegation evaluations (#24132) 2026-03-29 23:13:50 +00:00
Alisa 2e03e3aed5 feat(evals): add reliability harvester and 500/503 retry support (#23626) 2026-03-26 01:48:45 +00:00
Samee Zahid 84f40768a1 feat(evals): centralize test agents into test-utils for reuse (#23616)
Co-authored-by: Samee Zahid <sameez@google.com>
2026-03-24 19:50:48 +00:00
Adib234 bf80e27dbc test(evals): fix overlapping act() deadlock in app-test-helper (#23666) 2026-03-24 19:12:22 +00:00
Christian Gunderman 6b7dc4d822 refactor(core): stop gemini CLI from producing unsafe casts (#23611) 2026-03-24 16:19:59 +00:00
Adib234 dcedc42979 fix(plan): sandbox path resolution in Plan Mode to prevent hallucinations (#22737)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-24 13:19:29 +00:00
Gal Zahavi 36e6445dba feat(sandbox): dynamic macOS sandbox expansion and worktree support (#23301) 2026-03-24 04:48:13 +00:00
Samee Zahid 57a66f5f0d feat(evals): add behavioral evaluations for subagent routing (#23272)
Co-authored-by: Samee Zahid <sameez@google.com>
2026-03-24 01:19:21 +00:00
Sandy Tao f784e192fa eval(save_memory): add multi-turn interactive evals for memoryManager (#23572) 2026-03-23 22:58:55 +00:00
Abhi db14cdf92b feat(skills): add behavioral-evals skill with fixing and promoting guides (#23349) 2026-03-23 21:06:43 +00:00
Abhi 0df9498674 fix(core): refine CliHelpAgent description for better delegation (#23310) 2026-03-21 06:24:37 +00:00
Sandy Tao d3766875f8 fix(evals): remove tool restrictions and add compile-time guards (#23312) 2026-03-21 03:45:33 +00:00
Christian Gunderman 28935d1e6b Retry evals on API error. (#23322) 2026-03-21 02:52:19 +00:00
Alisa fbb17ebf58 Disabling failing test while investigating (#23311) 2026-03-20 22:52:35 +00:00
ruomeng 1725ec346b feat(plan): support plan mode in non-interactive mode (#22670) 2026-03-18 20:00:26 +00:00
Christian Gunderman fe8d93c75a Promote stable tests. (#22253) 2026-03-13 21:32:00 +00:00
Adib234 263b8cd3b3 fix(plan): Fix AskUser evals (#22074) 2026-03-13 13:30:19 +00:00
anj-s 2dd037682c Add behavioral evals for tracker (#20069) 2026-03-10 18:51:54 +00:00
Abhi 4669148a4c feat(core): add concurrency safety guidance for subagent delegation (#17753) (#21278) 2026-03-06 18:09:45 +00:00
Adib234 fe332bbef7 feat(evals): add behavioral evals for ask_user tool (#20620) 2026-03-03 17:51:15 +00:00
Christian Gunderman 25f59a0099 Add some dos and don'ts to behavioral evals README. (#20629) 2026-03-02 23:14:00 +00:00
Christian Gunderman 05ef2eb362 Promote stable tests to CI blocking. (#20581) 2026-02-27 21:08:12 +00:00
Christian Gunderman b2b6092c24 Add slash command for promoting behavioral evals to CI blocking (#20575)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-27 19:11:30 +00:00
Christian Gunderman 514d431049 Demote unreliable test. (#20571) 2026-02-27 16:48:46 +00:00
joshualitt 611d934829 feat(core): Enable generalist agent (#19665) 2026-02-26 16:38:49 +00:00
Sandy Tao 39938000a9 feat(core): rename grep_search include parameter to include_pattern (#20328) 2026-02-26 04:16:21 +00:00
Christian Gunderman 56c8d7e985 Stabilize tests. (#20095) 2026-02-24 00:01:39 +00:00
Alisa 27b7fc04de Search updates (#19482)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-20 17:54:28 +00:00
joshualitt 87f5dd15d6 feat(core): experimental in-progress steering hints (2 of 2) (#19307) 2026-02-18 22:05:50 +00:00
Jerop Kipruto 8f6a711a3a fix(core): clarify plan mode constraints and exit mechanism (#19438) 2026-02-18 20:09:59 +00:00
christine betts 8a8826654c Disable failing eval test (#19455) 2026-02-18 19:27:21 +00:00
N. Taylor Mullen f1aa38b258 test(evals): add behavioral tests for tool output masking (#19172) 2026-02-18 05:07:25 +00:00
Christian Gunderman ce84b3cb5f Use ranged reads and limited searches and fuzzy editing improvements (#19240) 2026-02-17 23:54:08 +00:00
joshualitt 55c628e967 feat(core): experimental in-progress steering hints (1 of 3) (#19008) 2026-02-17 22:59:33 +00:00
N. Taylor Mullen 6eec9f3350 fix(core): Encourage non-interactive flags for scaffolding commands (#18804) 2026-02-15 20:26:59 +00:00