Commit Graph

390 Commits

Author SHA1 Message Date
cocosheng-g
9da1542071 feat(ci): isolate workflow evals into independent nightly job
- Splits 'Evals: Nightly' into 'evals' (general capabilities) and 'workflow-evals' (specific workflow simulations).
- 'workflow-evals' runs only on 'gemini-2.5-pro' (the target model).
- 'evals' excludes workflow tests to prevent noise/skewed metrics on other models.
- Removes code-level 'targetModels' restrictions in favor of CI configuration.
- Updates aggregation script to handle skipped tests correctly (though exclusion avoids them).
2026-02-03 22:37:15 -05:00
cocosheng-g
b4762750e2 fix(dedup): clarify tool sequencing in prompts to fix regression
Explicitly instruct the model to call the domain tool first, then echo the output, to avoid it skipping the tool call.
2026-02-03 21:24:55 -05:00
cocosheng-g
a3b7f5d5e1 fix(dedup): clarify tool usage for logging output 2026-02-03 20:34:42 -05:00
cocosheng-g
d9a0e9b9c5 fix(triage): tune prompt to return json text and harden eval json extraction 2026-02-03 19:55:00 -05:00
cocosheng-g
5c2f477adf feat(evals): add comprehensive workflow evaluations and tune prompts (Issue #219)
- Established evals for all agent workflows (triage, dedup, refresh).
- Refactored all evals to use modern --output-format=json flag for robust validation.
- Tuned prompts for strict JSON compliance and corrected spam handling in scheduled triage.
- Expanded edge case coverage for false positives, security leaks, and mixed batches.
2026-02-03 19:31:56 -05:00
cocosheng-g
259a3e7891 fix(workflows): tune triage prompt and add robustness evals 2026-02-03 19:31:56 -05:00
Bryan Morgan
707b3e85d5 fix(ci): prevent stale PR closer from incorrectly closing new PRs (#18069) 2026-02-01 15:13:29 -05:00
Bryan Morgan
b0f38104d7 fix(workflow): update maintainer check logic to be inclusive and case-insensitive (#18009) 2026-01-30 22:32:58 -05:00
Sehoon Shon
12531a06f8 run npx pointing to the specific commit SHA (#17970)
Co-authored-by: Bryan Morgan <bryanmorgan@google.com>
2026-01-30 19:42:29 +00:00
Bryan Morgan
94f4e027f8 feat(ci): add npx smoke test to verify installability (#17927) 2026-01-30 16:09:59 +00:00
steven semyung oh
6807a2fa28 Add a email privacy note to bug_report template (#17474)
Co-authored-by: Adib234 <30782825+Adib234@users.noreply.github.com>
Co-authored-by: Bryan Morgan <bryanmorgan@google.com>
2026-01-28 13:37:54 +00:00
Bryan Morgan
fd36d65723 chore(workflow): remove redundant label-enforcer workflow (#17460)
just removing this workflow
2026-01-24 16:54:56 -05:00
Bryan Morgan
05e73c4193 feat(workflow): expand stale-exempt labels to include help wanted and Public Roadmap (#17459) 2026-01-24 16:39:15 -05:00
Bryan Morgan
a76c2986c2 feat(workflow): add stale pull request closer with linked-issue enforcement (#17449) 2026-01-24 13:07:56 -05:00
Jacob Richman
1f9f3dd1c2 Fix pr-triage.sh script to update pull requests with tags "help wanted" and "maintainer only" (#17324) 2026-01-23 02:57:21 +00:00
Christian Gunderman
ba8c64459b ci: allow failure in evals-nightly run step (#17319) 2026-01-22 18:23:15 +00:00
Christian Gunderman
982a093791 Enable the ability to queue specific nightly eval tests (#17262) 2026-01-22 03:07:33 +00:00
Bryan Morgan
7399c623d8 fix(github): improve label-workstream-rollup efficiency and fix bugs (#17219) 2026-01-21 12:25:48 -05:00
Bryan Morgan
9e3d36c19b fix(github): improve label-workstream-rollup efficiency with GraphQL (#17217) 2026-01-21 12:10:19 -05:00
Christian Gunderman
c43b04b44c Run evals for all models. (#17123) 2026-01-21 16:38:37 +00:00
Bryan Morgan
8605d0d024 feat(workflows): support recursive workstream labeling and new IDs (#17207) 2026-01-21 11:15:38 -05:00
Bryan Morgan
05c0a8eac3 fix(workflows): use author_association for maintainer check (#17060) 2026-01-20 03:05:15 +00:00
김현수
4b4bdd10b6 fix(automation): fix jq quoting error in pr-triage.sh (#16958)
Co-authored-by: Bryan Morgan <bryanmorgan@google.com>
2026-01-19 21:10:35 +00:00
Sehoon Shon
59da61602f remove need-triage label from bug_report template (#16864) 2026-01-16 19:12:28 +00:00
Bryan Morgan
c6cf3a4234 chore(workflows): rename label-workstream-rollup workflow (#16818) 2026-01-15 23:48:56 -05:00
Bryan Morgan
420a419f5e fix(infra): use GraphQL to detect direct parents in rollup workflow (#16811) 2026-01-15 23:38:27 -05:00
Bryan Morgan
8dde66c0dd fix(infra): update maintainer rollup label to 'workstream-rollup' (#16809) 2026-01-15 21:37:11 -05:00
Bryan Morgan
48fdb9872f fix(automation): robust label enforcement with permission checks (#16762) 2026-01-15 19:53:08 +00:00
Bryan Morgan
2b6bfe4097 feat(automation): enforce '🔒 maintainer only' and fix bot loop (#16751) 2026-01-15 16:40:33 +00:00
Bryan Morgan
d545a3b614 fix(automation): prevent label-enforcer loop by ignoring all bots (#16746)
Co-authored-by: Sehoon Shon <sshon@google.com>
2026-01-15 15:50:36 +00:00
Bryan Morgan
a8631a109e fix(automation): correct status/need-issue label matching wildcard (#16727) 2026-01-15 14:26:00 +00:00
Bryan Morgan
53f54436c9 chore(automation): enforce 'help wanted' label permissions and update guidelines (#16707) 2026-01-15 05:36:58 +00:00
Bryan Morgan
467e869326 chore(automation): ensure status/need-triage is applied and never cleared automatically (#16657) 2026-01-15 01:58:50 +00:00
Patrick Schimpl
b3527dc9e4 chore: update dependabot configuration (#13507)
Co-authored-by: Tommaso Sciortino <sciortino@gmail.com>
2026-01-15 00:02:54 +00:00
Bryan Morgan
b14cf1dc30 chore(automation): improve scheduled issue triage discovery and throughput (#16652) 2026-01-14 21:55:19 +00:00
Bryan Morgan
b3eecc3a50 chore(automation): remove automated PR size and complexity labeler (#16648) 2026-01-14 21:04:55 +00:00
Bryan Morgan
1212161d1d chore(automation): recursive labeling for workstream descendants (#16609)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-14 20:56:16 +00:00
Christian Gunderman
66e7b479ae Aggregate test results. (#16581) 2026-01-14 07:08:05 +00:00
Christian Gunderman
8030404b08 Behavioral evals framework. (#16047) 2026-01-14 04:49:17 +00:00
Bryan Morgan
2306e60be4 perf(workflows): optimize PR triage script for faster execution (#16355)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-12 19:49:20 +00:00
Bryan Morgan
33e3ed0f6c fix(workflows): resolve triage workflow failures and actionlint errors (#16338) 2026-01-10 19:31:17 -05:00
Bryan Morgan
446058cb1c fix: fallback to GITHUB_TOKEN if App ID is missing 2026-01-10 16:46:27 -05:00
Bryan Morgan
d130d99ff0 fix: Add event-driven trigger to issue triage workflow (#16334) 2026-01-10 16:23:41 -05:00
Bryan Morgan
72dae7e0ee Triage action cleanup (#16319) 2026-01-10 19:58:36 +00:00
Wesley Tanaka
f1ca7fa40a ci: guard nightly release workflow from running on forks (#15463)
Co-authored-by: wtanaka.com <wtanaka@users.noreply.github.com>
Co-authored-by: Tommaso Sciortino <sciortino@gmail.com>
2026-01-09 00:17:58 +00:00
Wesley Tanaka
6166d7f6ec ci: guard links workflow from running on forks (#15461)
Co-authored-by: Tommaso Sciortino <sciortino@gmail.com>
2026-01-09 00:16:00 +00:00
sangwook
e51f3e11f1 fix: remove unsupported 'enabled' key from workflow config (#15611)
Co-authored-by: Tommaso Sciortino <sciortino@gmail.com>
2026-01-08 22:45:41 +00:00
Jacob Richman
41cc6cf105 Reduce nags about PRs that reference issues but don't fix them. (#16112)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-08 00:53:03 +00:00
Tommaso Sciortino
dd04b46e86 Fix CI for forks (#16113) 2026-01-08 00:18:33 +00:00
Jacob Richman
bd77515fd9 fix(workflows): fix and limit labels for pr-triage.sh script (#16096)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-07 22:58:42 +00:00