cocosheng-g
9da1542071
feat(ci): isolate workflow evals into independent nightly job
...
- Splits 'Evals: Nightly' into 'evals' (general capabilities) and 'workflow-evals' (specific workflow simulations).
- 'workflow-evals' runs only on 'gemini-2.5-pro' (the target model).
- 'evals' excludes workflow tests to prevent noise/skewed metrics on other models.
- Removes code-level 'targetModels' restrictions in favor of CI configuration.
- Updates aggregation script to handle skipped tests correctly (though exclusion avoids them).
2026-02-03 22:37:15 -05:00
cocosheng-g
b4762750e2
fix(dedup): clarify tool sequencing in prompts to fix regression
...
Explicitly instruct the model to call the domain tool first, then echo the output, to avoid it skipping the tool call.
2026-02-03 21:24:55 -05:00
cocosheng-g
a3b7f5d5e1
fix(dedup): clarify tool usage for logging output
2026-02-03 20:34:42 -05:00
cocosheng-g
d9a0e9b9c5
fix(triage): tune prompt to return json text and harden eval json extraction
2026-02-03 19:55:00 -05:00
cocosheng-g
5c2f477adf
feat(evals): add comprehensive workflow evaluations and tune prompts (Issue #219 )
...
- Established evals for all agent workflows (triage, dedup, refresh).
- Refactored all evals to use modern --output-format=json flag for robust validation.
- Tuned prompts for strict JSON compliance and corrected spam handling in scheduled triage.
- Expanded edge case coverage for false positives, security leaks, and mixed batches.
2026-02-03 19:31:56 -05:00
cocosheng-g
259a3e7891
fix(workflows): tune triage prompt and add robustness evals
2026-02-03 19:31:56 -05:00
Bryan Morgan
707b3e85d5
fix(ci): prevent stale PR closer from incorrectly closing new PRs ( #18069 )
2026-02-01 15:13:29 -05:00
Bryan Morgan
b0f38104d7
fix(workflow): update maintainer check logic to be inclusive and case-insensitive ( #18009 )
2026-01-30 22:32:58 -05:00
Sehoon Shon
12531a06f8
run npx pointing to the specific commit SHA ( #17970 )
...
Co-authored-by: Bryan Morgan <bryanmorgan@google.com >
2026-01-30 19:42:29 +00:00
Bryan Morgan
94f4e027f8
feat(ci): add npx smoke test to verify installability ( #17927 )
2026-01-30 16:09:59 +00:00
steven semyung oh
6807a2fa28
Add a email privacy note to bug_report template ( #17474 )
...
Co-authored-by: Adib234 <30782825+Adib234@users.noreply.github.com >
Co-authored-by: Bryan Morgan <bryanmorgan@google.com >
2026-01-28 13:37:54 +00:00
Bryan Morgan
fd36d65723
chore(workflow): remove redundant label-enforcer workflow ( #17460 )
...
just removing this workflow
2026-01-24 16:54:56 -05:00
Bryan Morgan
05e73c4193
feat(workflow): expand stale-exempt labels to include help wanted and Public Roadmap ( #17459 )
2026-01-24 16:39:15 -05:00
Bryan Morgan
a76c2986c2
feat(workflow): add stale pull request closer with linked-issue enforcement ( #17449 )
2026-01-24 13:07:56 -05:00
Jacob Richman
1f9f3dd1c2
Fix pr-triage.sh script to update pull requests with tags "help wanted" and "maintainer only" ( #17324 )
2026-01-23 02:57:21 +00:00
Christian Gunderman
ba8c64459b
ci: allow failure in evals-nightly run step ( #17319 )
2026-01-22 18:23:15 +00:00
Christian Gunderman
982a093791
Enable the ability to queue specific nightly eval tests ( #17262 )
2026-01-22 03:07:33 +00:00
Bryan Morgan
7399c623d8
fix(github): improve label-workstream-rollup efficiency and fix bugs ( #17219 )
2026-01-21 12:25:48 -05:00
Bryan Morgan
9e3d36c19b
fix(github): improve label-workstream-rollup efficiency with GraphQL ( #17217 )
2026-01-21 12:10:19 -05:00
Christian Gunderman
c43b04b44c
Run evals for all models. ( #17123 )
2026-01-21 16:38:37 +00:00
Bryan Morgan
8605d0d024
feat(workflows): support recursive workstream labeling and new IDs ( #17207 )
2026-01-21 11:15:38 -05:00
Bryan Morgan
05c0a8eac3
fix(workflows): use author_association for maintainer check ( #17060 )
2026-01-20 03:05:15 +00:00
김현수
4b4bdd10b6
fix(automation): fix jq quoting error in pr-triage.sh ( #16958 )
...
Co-authored-by: Bryan Morgan <bryanmorgan@google.com >
2026-01-19 21:10:35 +00:00
Sehoon Shon
59da61602f
remove need-triage label from bug_report template ( #16864 )
2026-01-16 19:12:28 +00:00
Bryan Morgan
c6cf3a4234
chore(workflows): rename label-workstream-rollup workflow ( #16818 )
2026-01-15 23:48:56 -05:00
Bryan Morgan
420a419f5e
fix(infra): use GraphQL to detect direct parents in rollup workflow ( #16811 )
2026-01-15 23:38:27 -05:00
Bryan Morgan
8dde66c0dd
fix(infra): update maintainer rollup label to 'workstream-rollup' ( #16809 )
2026-01-15 21:37:11 -05:00
Bryan Morgan
48fdb9872f
fix(automation): robust label enforcement with permission checks ( #16762 )
2026-01-15 19:53:08 +00:00
Bryan Morgan
2b6bfe4097
feat(automation): enforce ' 🔒 maintainer only' and fix bot loop ( #16751 )
2026-01-15 16:40:33 +00:00
Bryan Morgan
d545a3b614
fix(automation): prevent label-enforcer loop by ignoring all bots ( #16746 )
...
Co-authored-by: Sehoon Shon <sshon@google.com >
2026-01-15 15:50:36 +00:00
Bryan Morgan
a8631a109e
fix(automation): correct status/need-issue label matching wildcard ( #16727 )
2026-01-15 14:26:00 +00:00
Bryan Morgan
53f54436c9
chore(automation): enforce 'help wanted' label permissions and update guidelines ( #16707 )
2026-01-15 05:36:58 +00:00
Bryan Morgan
467e869326
chore(automation): ensure status/need-triage is applied and never cleared automatically ( #16657 )
2026-01-15 01:58:50 +00:00
Patrick Schimpl
b3527dc9e4
chore: update dependabot configuration ( #13507 )
...
Co-authored-by: Tommaso Sciortino <sciortino@gmail.com >
2026-01-15 00:02:54 +00:00
Bryan Morgan
b14cf1dc30
chore(automation): improve scheduled issue triage discovery and throughput ( #16652 )
2026-01-14 21:55:19 +00:00
Bryan Morgan
b3eecc3a50
chore(automation): remove automated PR size and complexity labeler ( #16648 )
2026-01-14 21:04:55 +00:00
Bryan Morgan
1212161d1d
chore(automation): recursive labeling for workstream descendants ( #16609 )
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-14 20:56:16 +00:00
Christian Gunderman
66e7b479ae
Aggregate test results. ( #16581 )
2026-01-14 07:08:05 +00:00
Christian Gunderman
8030404b08
Behavioral evals framework. ( #16047 )
2026-01-14 04:49:17 +00:00
Bryan Morgan
2306e60be4
perf(workflows): optimize PR triage script for faster execution ( #16355 )
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-12 19:49:20 +00:00
Bryan Morgan
33e3ed0f6c
fix(workflows): resolve triage workflow failures and actionlint errors ( #16338 )
2026-01-10 19:31:17 -05:00
Bryan Morgan
446058cb1c
fix: fallback to GITHUB_TOKEN if App ID is missing
2026-01-10 16:46:27 -05:00
Bryan Morgan
d130d99ff0
fix: Add event-driven trigger to issue triage workflow ( #16334 )
2026-01-10 16:23:41 -05:00
Bryan Morgan
72dae7e0ee
Triage action cleanup ( #16319 )
2026-01-10 19:58:36 +00:00
Wesley Tanaka
f1ca7fa40a
ci: guard nightly release workflow from running on forks ( #15463 )
...
Co-authored-by: wtanaka.com <wtanaka@users.noreply.github.com >
Co-authored-by: Tommaso Sciortino <sciortino@gmail.com >
2026-01-09 00:17:58 +00:00
Wesley Tanaka
6166d7f6ec
ci: guard links workflow from running on forks ( #15461 )
...
Co-authored-by: Tommaso Sciortino <sciortino@gmail.com >
2026-01-09 00:16:00 +00:00
sangwook
e51f3e11f1
fix: remove unsupported 'enabled' key from workflow config ( #15611 )
...
Co-authored-by: Tommaso Sciortino <sciortino@gmail.com >
2026-01-08 22:45:41 +00:00
Jacob Richman
41cc6cf105
Reduce nags about PRs that reference issues but don't fix them. ( #16112 )
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-08 00:53:03 +00:00
Tommaso Sciortino
dd04b46e86
Fix CI for forks ( #16113 )
2026-01-08 00:18:33 +00:00
Jacob Richman
bd77515fd9
fix(workflows): fix and limit labels for pr-triage.sh script ( #16096 )
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-07 22:58:42 +00:00