cocosheng-g
fe50e580fa
fix(ci): isolate workflow evals, revert unrelated changes, and fix aggregation
2026-02-03 23:04:50 -05:00
cocosheng-g
9da1542071
feat(ci): isolate workflow evals into independent nightly job
...
- Splits 'Evals: Nightly' into 'evals' (general capabilities) and 'workflow-evals' (specific workflow simulations).
- 'workflow-evals' runs only on 'gemini-2.5-pro' (the target model).
- 'evals' excludes workflow tests to prevent noise/skewed metrics on other models.
- Removes code-level 'targetModels' restrictions in favor of CI configuration.
- Updates aggregation script to handle skipped tests correctly (though exclusion avoids them).
2026-02-03 22:37:15 -05:00
cocosheng-g
d23499db90
fix(scripts): exclude skipped tests from pass rate calculation
...
Ensures that models skipping irrelevant tests (like workflow evals) don't suffer a drop in reported pass rate.
2026-02-03 21:56:37 -05:00
cocosheng-g
8c82777ea0
fix(evals): use direct path to tsx binary in dedup mocks
...
Avoids reliance on 'npx' which might be flaky or prompt for installation in CI environments.
2026-02-03 21:39:16 -05:00
cocosheng-g
b4762750e2
fix(dedup): clarify tool sequencing in prompts to fix regression
...
Explicitly instruct the model to call the domain tool first, then echo the output, to avoid it skipping the tool call.
2026-02-03 21:24:55 -05:00
cocosheng-g
36ce66933e
chore: remove old eval files moved to workflows/
2026-02-03 21:03:24 -05:00
cocosheng-g
ff4e816a70
refactor(evals): isolate workflow evals and target specific models
2026-02-03 21:02:55 -05:00
cocosheng-g
a3b7f5d5e1
fix(dedup): clarify tool usage for logging output
2026-02-03 20:34:42 -05:00
cocosheng-g
d9a0e9b9c5
fix(triage): tune prompt to return json text and harden eval json extraction
2026-02-03 19:55:00 -05:00
cocosheng-g
5c2f477adf
feat(evals): add comprehensive workflow evaluations and tune prompts (Issue #219 )
...
- Established evals for all agent workflows (triage, dedup, refresh).
- Refactored all evals to use modern --output-format=json flag for robust validation.
- Tuned prompts for strict JSON compliance and corrected spam handling in scheduled triage.
- Expanded edge case coverage for false positives, security leaks, and mixed batches.
2026-02-03 19:31:56 -05:00
cocosheng-g
2daee0d066
feat(evals): add more edge case tests
2026-02-03 19:31:56 -05:00
cocosheng-g
aa4b1c0056
fix(evals): address robustness feedback
2026-02-03 19:31:56 -05:00
cocosheng-g
9f8f31cce9
fix(evals): address review feedback on triage tests
2026-02-03 19:31:56 -05:00
cocosheng-g
259a3e7891
fix(workflows): tune triage prompt and add robustness evals
2026-02-03 19:31:56 -05:00
Jack Wotherspoon
d1cde575d9
fix: remove ask_user tool from non-interactive modes ( #18154 )
2026-02-03 23:41:36 +00:00
Gal Zahavi
71f46f1160
fix: enforce folder trust for workspace settings, skills, and context ( #17596 )
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-03 22:53:31 +00:00
Sehoon Shon
d63c34b6e1
feat(ui): move user identity display to header ( #18216 )
2026-02-03 21:51:21 +00:00
Shreya Keshive
7a6dfa3704
fix(sandbox): propagate GOOGLE_GEMINI_BASE_URL&GOOGLE_VERTEX_BASE_URL env vars ( #18231 )
2026-02-03 21:40:41 +00:00
Emily Hedlund
69f8273481
feat(core): require user consent before MCP server OAuth ( #18132 )
2026-02-03 21:26:00 +00:00
Shreya Keshive
1fc59484b1
feat(admin): add support for MCP configuration via admin controls (pt1) ( #18223 )
2026-02-03 21:19:14 +00:00
Adib234
53027af94c
Add telemetry to rewind ( #18122 )
2026-02-03 21:17:29 +00:00
Jack Wotherspoon
69c0585ab2
feat: Add markdown rendering to ask_user tool ( #18211 )
2026-02-03 21:04:38 +00:00
christine betts
3e954930f1
Fix handling of empty settings ( #18131 )
2026-02-03 20:39:20 +00:00
christine betts
2cf3a14439
Reload skills when extensions change ( #18225 )
2026-02-03 20:31:14 +00:00
Adam Weidman
0f918f0cc8
feat(a2a): Add pluggable auth provider infrastructure ( #17934 )
2026-02-03 20:22:07 +00:00
David Pierce
75dbf9022c
A2a admin setting ( #17868 )
2026-02-03 20:16:20 +00:00
Jerop Kipruto
675ca07c8b
chore(core): explicitly state plan storage path in prompt ( #18222 )
2026-02-03 19:55:43 +00:00
Adam Weidman
40f505e257
docs: document GEMINI_CLI_HOME environment variable ( #18219 )
2026-02-03 19:49:37 +00:00
Serghei
5407878138
fix(core): Respect user's .gitignore preference ( #15482 )
...
Co-authored-by: Gaurav <39389231+gsquared94@users.noreply.github.com >
2026-02-03 19:37:21 +00:00
christine betts
14e2e09d10
Match on extension ID when stopping extensions ( #18218 )
2026-02-03 19:29:15 +00:00
Adib234
0365f13caa
feat(plan): use custom deny messages in plan mode policies ( #18195 )
2026-02-03 19:23:22 +00:00
Coco Sheng
3183e4137a
fix(test): improve test isolation and enable subagent evaluations ( #18138 )
2026-02-03 19:05:26 +00:00
Jerop Kipruto
4aa295994d
feat(plan): add exit_plan_mode ui and prompt ( #18162 )
2026-02-03 18:04:07 +00:00
Sehoon Shon
e1bd1d239f
Set default max attempts to 3 and use the common variable ( #18209 )
2026-02-03 17:47:13 +00:00
Adam Weidman
b84585d0c8
feat(core): Add A2A auth config types ( #18205 )
2026-02-03 17:44:22 +00:00
Bryan Morgan
0e7944df4f
refactor: localize ACP error parsing logic to cli package ( #18193 )
2026-02-03 16:32:20 +00:00
christine betts
d8837ec95e
Remove MCP servers on extension uninstall ( #18121 )
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-03 16:18:09 +00:00
Adam Weidman
dc37b190fa
refactor(core): robust trimPreservingTrailingNewline and regression test ( #18196 )
2026-02-03 15:51:09 +00:00
Alexander Farber
19b1a74c99
feat(core): add draft-2020-12 JSON Schema support with lenient fallback ( #15060 )
...
Co-authored-by: A.K.M. Adib <adibakm@google.com >
Co-authored-by: Jack Wotherspoon <jackwoth@google.com >
2026-02-03 15:49:08 +00:00
Adib234
1d045792ce
Add link to rewind doc in commands.md ( #17961 )
2026-02-03 15:06:33 +00:00
matt korwel
a8b4c38c89
chore(core): reassign telemetry keys to avoid server conflict ( #18161 )
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-03 06:43:57 -06:00
N. Taylor Mullen
ad8796b02d
feat(core): add .agents/skills directory alias for skill discovery ( #18151 )
2026-02-03 06:07:36 +00:00
Bryan Morgan
e7bfd2bf83
fix(cli): resolve environment loading and auth validation issues in ACP mode ( #18025 )
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-03 05:54:10 +00:00
Gaurav
1b274b081d
fix(core): prioritize detailed error messages for code assist setup ( #17852 )
2026-02-03 04:27:55 +00:00
Sandy Tao
5b254c379c
feat(core): rename search_file_content tool to grep_search and add legacy alias ( #18003 )
2026-02-03 04:18:24 +00:00
Jerop Kipruto
ed26ea49e9
feat(plan): add core logic and exit_plan_mode tool definition ( #18110 )
2026-02-03 03:30:03 +00:00
Adib234
01e33465bd
feat(plan): handle inconsistency in schedulers ( #17813 )
2026-02-03 03:10:04 +00:00
Jack Wotherspoon
18cce6a9ab
fix: improve Ctrl+R reverse search ( #18075 )
2026-02-03 01:03:28 +00:00
Gal Zahavi
18d7d1a92c
feat: update review-frontend-and-fix slash command to review-and-fix ( #18146 )
2026-02-03 00:44:20 +00:00
Shreya Keshive
0dd0b83612
fix(ide): no-op refactoring that moves the connection logic to helper functions ( #18118 )
2026-02-03 00:42:29 +00:00