Commit Graph

4421 Commits

Author SHA1 Message Date
cocosheng-g
fe50e580fa fix(ci): isolate workflow evals, revert unrelated changes, and fix aggregation 2026-02-03 23:04:50 -05:00
cocosheng-g
9da1542071 feat(ci): isolate workflow evals into independent nightly job
- Splits 'Evals: Nightly' into 'evals' (general capabilities) and 'workflow-evals' (specific workflow simulations).
- 'workflow-evals' runs only on 'gemini-2.5-pro' (the target model).
- 'evals' excludes workflow tests to prevent noise/skewed metrics on other models.
- Removes code-level 'targetModels' restrictions in favor of CI configuration.
- Updates aggregation script to handle skipped tests correctly (though exclusion avoids them).
2026-02-03 22:37:15 -05:00
cocosheng-g
d23499db90 fix(scripts): exclude skipped tests from pass rate calculation
Ensures that models skipping irrelevant tests (like workflow evals) don't suffer a drop in reported pass rate.
2026-02-03 21:56:37 -05:00
cocosheng-g
8c82777ea0 fix(evals): use direct path to tsx binary in dedup mocks
Avoids reliance on 'npx' which might be flaky or prompt for installation in CI environments.
2026-02-03 21:39:16 -05:00
cocosheng-g
b4762750e2 fix(dedup): clarify tool sequencing in prompts to fix regression
Explicitly instruct the model to call the domain tool first, then echo the output, to avoid it skipping the tool call.
2026-02-03 21:24:55 -05:00
cocosheng-g
36ce66933e chore: remove old eval files moved to workflows/ 2026-02-03 21:03:24 -05:00
cocosheng-g
ff4e816a70 refactor(evals): isolate workflow evals and target specific models 2026-02-03 21:02:55 -05:00
cocosheng-g
a3b7f5d5e1 fix(dedup): clarify tool usage for logging output 2026-02-03 20:34:42 -05:00
cocosheng-g
d9a0e9b9c5 fix(triage): tune prompt to return json text and harden eval json extraction 2026-02-03 19:55:00 -05:00
cocosheng-g
5c2f477adf feat(evals): add comprehensive workflow evaluations and tune prompts (Issue #219)
- Established evals for all agent workflows (triage, dedup, refresh).
- Refactored all evals to use modern --output-format=json flag for robust validation.
- Tuned prompts for strict JSON compliance and corrected spam handling in scheduled triage.
- Expanded edge case coverage for false positives, security leaks, and mixed batches.
2026-02-03 19:31:56 -05:00
cocosheng-g
2daee0d066 feat(evals): add more edge case tests 2026-02-03 19:31:56 -05:00
cocosheng-g
aa4b1c0056 fix(evals): address robustness feedback 2026-02-03 19:31:56 -05:00
cocosheng-g
9f8f31cce9 fix(evals): address review feedback on triage tests 2026-02-03 19:31:56 -05:00
cocosheng-g
259a3e7891 fix(workflows): tune triage prompt and add robustness evals 2026-02-03 19:31:56 -05:00
Jack Wotherspoon
d1cde575d9 fix: remove ask_user tool from non-interactive modes (#18154) 2026-02-03 23:41:36 +00:00
Gal Zahavi
71f46f1160 fix: enforce folder trust for workspace settings, skills, and context (#17596)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-03 22:53:31 +00:00
Sehoon Shon
d63c34b6e1 feat(ui): move user identity display to header (#18216) 2026-02-03 21:51:21 +00:00
Shreya Keshive
7a6dfa3704 fix(sandbox): propagate GOOGLE_GEMINI_BASE_URL&GOOGLE_VERTEX_BASE_URL env vars (#18231) 2026-02-03 21:40:41 +00:00
Emily Hedlund
69f8273481 feat(core): require user consent before MCP server OAuth (#18132) 2026-02-03 21:26:00 +00:00
Shreya Keshive
1fc59484b1 feat(admin): add support for MCP configuration via admin controls (pt1) (#18223) 2026-02-03 21:19:14 +00:00
Adib234
53027af94c Add telemetry to rewind (#18122) 2026-02-03 21:17:29 +00:00
Jack Wotherspoon
69c0585ab2 feat: Add markdown rendering to ask_user tool (#18211) 2026-02-03 21:04:38 +00:00
christine betts
3e954930f1 Fix handling of empty settings (#18131) 2026-02-03 20:39:20 +00:00
christine betts
2cf3a14439 Reload skills when extensions change (#18225) 2026-02-03 20:31:14 +00:00
Adam Weidman
0f918f0cc8 feat(a2a): Add pluggable auth provider infrastructure (#17934) 2026-02-03 20:22:07 +00:00
David Pierce
75dbf9022c A2a admin setting (#17868) 2026-02-03 20:16:20 +00:00
Jerop Kipruto
675ca07c8b chore(core): explicitly state plan storage path in prompt (#18222) 2026-02-03 19:55:43 +00:00
Adam Weidman
40f505e257 docs: document GEMINI_CLI_HOME environment variable (#18219) 2026-02-03 19:49:37 +00:00
Serghei
5407878138 fix(core): Respect user's .gitignore preference (#15482)
Co-authored-by: Gaurav <39389231+gsquared94@users.noreply.github.com>
2026-02-03 19:37:21 +00:00
christine betts
14e2e09d10 Match on extension ID when stopping extensions (#18218) 2026-02-03 19:29:15 +00:00
Adib234
0365f13caa feat(plan): use custom deny messages in plan mode policies (#18195) 2026-02-03 19:23:22 +00:00
Coco Sheng
3183e4137a fix(test): improve test isolation and enable subagent evaluations (#18138) 2026-02-03 19:05:26 +00:00
Jerop Kipruto
4aa295994d feat(plan): add exit_plan_mode ui and prompt (#18162) 2026-02-03 18:04:07 +00:00
Sehoon Shon
e1bd1d239f Set default max attempts to 3 and use the common variable (#18209) 2026-02-03 17:47:13 +00:00
Adam Weidman
b84585d0c8 feat(core): Add A2A auth config types (#18205) 2026-02-03 17:44:22 +00:00
Bryan Morgan
0e7944df4f refactor: localize ACP error parsing logic to cli package (#18193) 2026-02-03 16:32:20 +00:00
christine betts
d8837ec95e Remove MCP servers on extension uninstall (#18121)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-03 16:18:09 +00:00
Adam Weidman
dc37b190fa refactor(core): robust trimPreservingTrailingNewline and regression test (#18196) 2026-02-03 15:51:09 +00:00
Alexander Farber
19b1a74c99 feat(core): add draft-2020-12 JSON Schema support with lenient fallback (#15060)
Co-authored-by: A.K.M. Adib <adibakm@google.com>
Co-authored-by: Jack Wotherspoon <jackwoth@google.com>
2026-02-03 15:49:08 +00:00
Adib234
1d045792ce Add link to rewind doc in commands.md (#17961) 2026-02-03 15:06:33 +00:00
matt korwel
a8b4c38c89 chore(core): reassign telemetry keys to avoid server conflict (#18161)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-03 06:43:57 -06:00
N. Taylor Mullen
ad8796b02d feat(core): add .agents/skills directory alias for skill discovery (#18151) 2026-02-03 06:07:36 +00:00
Bryan Morgan
e7bfd2bf83 fix(cli): resolve environment loading and auth validation issues in ACP mode (#18025)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-03 05:54:10 +00:00
Gaurav
1b274b081d fix(core): prioritize detailed error messages for code assist setup (#17852) 2026-02-03 04:27:55 +00:00
Sandy Tao
5b254c379c feat(core): rename search_file_content tool to grep_search and add legacy alias (#18003) 2026-02-03 04:18:24 +00:00
Jerop Kipruto
ed26ea49e9 feat(plan): add core logic and exit_plan_mode tool definition (#18110) 2026-02-03 03:30:03 +00:00
Adib234
01e33465bd feat(plan): handle inconsistency in schedulers (#17813) 2026-02-03 03:10:04 +00:00
Jack Wotherspoon
18cce6a9ab fix: improve Ctrl+R reverse search (#18075) 2026-02-03 01:03:28 +00:00
Gal Zahavi
18d7d1a92c feat: update review-frontend-and-fix slash command to review-and-fix (#18146) 2026-02-03 00:44:20 +00:00
Shreya Keshive
0dd0b83612 fix(ide): no-op refactoring that moves the connection logic to helper functions (#18118) 2026-02-03 00:42:29 +00:00