Mahima Shanware
8d584e4d96
fix(core): handle nested plan files by resolving paths correctly
...
This fixes a bug where path.basename was incorrectly stripping directory structures from plan files (e.g., trying to write to plans/nested/file.md would incorrectly write to plans/file.md). By using path.resolve and verifying with isSubpath, nested files are now handled securely and correctly.
2026-04-10 18:33:02 +00:00
Mahima Shanware
a99cd0be28
Merge branch 'main' into worktree-con-plan-bug
...
Resolves conflict in packages/core/src/tools/enter-plan-mode.test.ts by removing an assertion for directory creation, which has been centralized in config.ts in this branch.
2026-04-10 18:00:53 +00:00
Mahima Shanware
74ce3eef0c
fix(evals): refine plan mode eval prompt to ensure toolchain completion
...
Align the 'foo' test prompt with existing project conventions while ensuring the model has the 'informal agreement' signal required to proceed to formal approval and implementation.
2026-04-09 18:13:35 +00:00
Abhi
b238a453e3
feat(core): refactor subagent tool to unified invoke_subagent tool ( #24489 )
2026-04-09 16:48:24 +00:00
Christian Gunderman
f1bb2af6de
Generalize evals infra to support more types of evals, organization and queuing of named suites ( #24941 )
2026-04-08 23:57:26 +00:00
Christian Gunderman
d2b775f9a7
Add an eval for and fix unsafe cloning behavior. ( #24457 )
2026-04-07 03:17:44 +00:00
Christian Gunderman
8f131ffef7
Fix issue where topic headers can be posted back to back ( #24759 )
2026-04-06 18:36:22 +00:00
Samee Zahid
4fb3790051
feat(core): discourage update topic tool for simple tasks ( #24640 )
...
Co-authored-by: Samee Zahid <sameez@google.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-03 23:52:24 +00:00
Coco Sheng
f510394721
Implement background process monitoring and inspection tools ( #23799 )
2026-04-02 15:01:00 +00:00
Alisa
973092df50
feat: implement high-signal PR regression check for evaluations ( #23937 )
2026-04-02 05:14:43 +00:00
anj-s
43cf63e189
fix: update task tracker storage location in system prompt ( #24034 )
2026-04-01 18:29:09 +00:00
Jerop Kipruto
ca43f8c291
feat(core): prioritize discussion before formal plan approval ( #24423 )
2026-04-01 15:55:47 +00:00
David Pierce
94f9480a3a
fix(core): resolve Plan Mode deadlock during plan file creation due to sandbox restrictions ( #24047 )
2026-03-31 22:06:50 +00:00
ruomeng
07e2053e12
feat(plan): promote planning feature to stable ( #24282 )
2026-03-31 16:10:13 +00:00
Christian Gunderman
117a2d3844
fix(evals): add update_topic behavioral eval ( #24223 )
2026-03-30 22:02:53 +00:00
Abhi
d9d2ce36f2
test(evals): add comprehensive subagent delegation evaluations ( #24132 )
2026-03-29 23:13:50 +00:00
Alisa
2e03e3aed5
feat(evals): add reliability harvester and 500/503 retry support ( #23626 )
2026-03-26 01:48:45 +00:00
Samee Zahid
84f40768a1
feat(evals): centralize test agents into test-utils for reuse ( #23616 )
...
Co-authored-by: Samee Zahid <sameez@google.com >
2026-03-24 19:50:48 +00:00
Adib234
bf80e27dbc
test(evals): fix overlapping act() deadlock in app-test-helper ( #23666 )
2026-03-24 19:12:22 +00:00
Christian Gunderman
6b7dc4d822
refactor(core): stop gemini CLI from producing unsafe casts ( #23611 )
2026-03-24 16:19:59 +00:00
Adib234
dcedc42979
fix(plan): sandbox path resolution in Plan Mode to prevent hallucinations ( #22737 )
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-24 13:19:29 +00:00
Gal Zahavi
36e6445dba
feat(sandbox): dynamic macOS sandbox expansion and worktree support ( #23301 )
2026-03-24 04:48:13 +00:00
Samee Zahid
57a66f5f0d
feat(evals): add behavioral evaluations for subagent routing ( #23272 )
...
Co-authored-by: Samee Zahid <sameez@google.com >
2026-03-24 01:19:21 +00:00
Sandy Tao
f784e192fa
eval(save_memory): add multi-turn interactive evals for memoryManager ( #23572 )
2026-03-23 22:58:55 +00:00
Abhi
db14cdf92b
feat(skills): add behavioral-evals skill with fixing and promoting guides ( #23349 )
2026-03-23 21:06:43 +00:00
Abhi
0df9498674
fix(core): refine CliHelpAgent description for better delegation ( #23310 )
2026-03-21 06:24:37 +00:00
Sandy Tao
d3766875f8
fix(evals): remove tool restrictions and add compile-time guards ( #23312 )
2026-03-21 03:45:33 +00:00
Christian Gunderman
28935d1e6b
Retry evals on API error. ( #23322 )
2026-03-21 02:52:19 +00:00
Alisa
fbb17ebf58
Disabling failing test while investigating ( #23311 )
2026-03-20 22:52:35 +00:00
ruomeng
1725ec346b
feat(plan): support plan mode in non-interactive mode ( #22670 )
2026-03-18 20:00:26 +00:00
Christian Gunderman
fe8d93c75a
Promote stable tests. ( #22253 )
2026-03-13 21:32:00 +00:00
Adib234
263b8cd3b3
fix(plan): Fix AskUser evals ( #22074 )
2026-03-13 13:30:19 +00:00
anj-s
2dd037682c
Add behavioral evals for tracker ( #20069 )
2026-03-10 18:51:54 +00:00
Abhi
4669148a4c
feat(core): add concurrency safety guidance for subagent delegation ( #17753 ) ( #21278 )
2026-03-06 18:09:45 +00:00
Adib234
fe332bbef7
feat(evals): add behavioral evals for ask_user tool ( #20620 )
2026-03-03 17:51:15 +00:00
Christian Gunderman
25f59a0099
Add some dos and don'ts to behavioral evals README. ( #20629 )
2026-03-02 23:14:00 +00:00
Christian Gunderman
05ef2eb362
Promote stable tests to CI blocking. ( #20581 )
2026-02-27 21:08:12 +00:00
Christian Gunderman
b2b6092c24
Add slash command for promoting behavioral evals to CI blocking ( #20575 )
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-27 19:11:30 +00:00
Christian Gunderman
514d431049
Demote unreliable test. ( #20571 )
2026-02-27 16:48:46 +00:00
joshualitt
611d934829
feat(core): Enable generalist agent ( #19665 )
2026-02-26 16:38:49 +00:00
Sandy Tao
39938000a9
feat(core): rename grep_search include parameter to include_pattern ( #20328 )
2026-02-26 04:16:21 +00:00
Christian Gunderman
56c8d7e985
Stabilize tests. ( #20095 )
2026-02-24 00:01:39 +00:00
Alisa
27b7fc04de
Search updates ( #19482 )
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-20 17:54:28 +00:00
joshualitt
87f5dd15d6
feat(core): experimental in-progress steering hints (2 of 2) ( #19307 )
2026-02-18 22:05:50 +00:00
Jerop Kipruto
8f6a711a3a
fix(core): clarify plan mode constraints and exit mechanism ( #19438 )
2026-02-18 20:09:59 +00:00
christine betts
8a8826654c
Disable failing eval test ( #19455 )
2026-02-18 19:27:21 +00:00
N. Taylor Mullen
f1aa38b258
test(evals): add behavioral tests for tool output masking ( #19172 )
2026-02-18 05:07:25 +00:00
Christian Gunderman
ce84b3cb5f
Use ranged reads and limited searches and fuzzy editing improvements ( #19240 )
2026-02-17 23:54:08 +00:00
joshualitt
55c628e967
feat(core): experimental in-progress steering hints (1 of 3) ( #19008 )
2026-02-17 22:59:33 +00:00
N. Taylor Mullen
6eec9f3350
fix(core): Encourage non-interactive flags for scaffolding commands ( #18804 )
2026-02-15 20:26:59 +00:00