cocosheng-g
9da1542071
feat(ci): isolate workflow evals into independent nightly job
...
- Splits 'Evals: Nightly' into 'evals' (general capabilities) and 'workflow-evals' (specific workflow simulations).
- 'workflow-evals' runs only on 'gemini-2.5-pro' (the target model).
- 'evals' excludes workflow tests to prevent noise/skewed metrics on other models.
- Removes code-level 'targetModels' restrictions in favor of CI configuration.
- Updates aggregation script to handle skipped tests correctly (though exclusion avoids them).
2026-02-03 22:37:15 -05:00
cocosheng-g
b4762750e2
fix(dedup): clarify tool sequencing in prompts to fix regression
...
Explicitly instruct the model to call the domain tool first, then echo the output, to avoid it skipping the tool call.
2026-02-03 21:24:55 -05:00
cocosheng-g
a3b7f5d5e1
fix(dedup): clarify tool usage for logging output
2026-02-03 20:34:42 -05:00
cocosheng-g
d9a0e9b9c5
fix(triage): tune prompt to return json text and harden eval json extraction
2026-02-03 19:55:00 -05:00
cocosheng-g
5c2f477adf
feat(evals): add comprehensive workflow evaluations and tune prompts (Issue #219 )
...
- Established evals for all agent workflows (triage, dedup, refresh).
- Refactored all evals to use modern --output-format=json flag for robust validation.
- Tuned prompts for strict JSON compliance and corrected spam handling in scheduled triage.
- Expanded edge case coverage for false positives, security leaks, and mixed batches.
2026-02-03 19:31:56 -05:00
cocosheng-g
259a3e7891
fix(workflows): tune triage prompt and add robustness evals
2026-02-03 19:31:56 -05:00
Bryan Morgan
707b3e85d5
fix(ci): prevent stale PR closer from incorrectly closing new PRs ( #18069 )
2026-02-01 15:13:29 -05:00
Bryan Morgan
b0f38104d7
fix(workflow): update maintainer check logic to be inclusive and case-insensitive ( #18009 )
2026-01-30 22:32:58 -05:00
Sehoon Shon
12531a06f8
run npx pointing to the specific commit SHA ( #17970 )
...
Co-authored-by: Bryan Morgan <bryanmorgan@google.com >
2026-01-30 19:42:29 +00:00
Bryan Morgan
94f4e027f8
feat(ci): add npx smoke test to verify installability ( #17927 )
2026-01-30 16:09:59 +00:00
Bryan Morgan
fd36d65723
chore(workflow): remove redundant label-enforcer workflow ( #17460 )
...
just removing this workflow
2026-01-24 16:54:56 -05:00
Bryan Morgan
05e73c4193
feat(workflow): expand stale-exempt labels to include help wanted and Public Roadmap ( #17459 )
2026-01-24 16:39:15 -05:00
Bryan Morgan
a76c2986c2
feat(workflow): add stale pull request closer with linked-issue enforcement ( #17449 )
2026-01-24 13:07:56 -05:00
Christian Gunderman
ba8c64459b
ci: allow failure in evals-nightly run step ( #17319 )
2026-01-22 18:23:15 +00:00
Christian Gunderman
982a093791
Enable the ability to queue specific nightly eval tests ( #17262 )
2026-01-22 03:07:33 +00:00
Bryan Morgan
7399c623d8
fix(github): improve label-workstream-rollup efficiency and fix bugs ( #17219 )
2026-01-21 12:25:48 -05:00
Bryan Morgan
9e3d36c19b
fix(github): improve label-workstream-rollup efficiency with GraphQL ( #17217 )
2026-01-21 12:10:19 -05:00
Christian Gunderman
c43b04b44c
Run evals for all models. ( #17123 )
2026-01-21 16:38:37 +00:00
Bryan Morgan
8605d0d024
feat(workflows): support recursive workstream labeling and new IDs ( #17207 )
2026-01-21 11:15:38 -05:00
Bryan Morgan
05c0a8eac3
fix(workflows): use author_association for maintainer check ( #17060 )
2026-01-20 03:05:15 +00:00
Bryan Morgan
c6cf3a4234
chore(workflows): rename label-workstream-rollup workflow ( #16818 )
2026-01-15 23:48:56 -05:00
Bryan Morgan
420a419f5e
fix(infra): use GraphQL to detect direct parents in rollup workflow ( #16811 )
2026-01-15 23:38:27 -05:00
Bryan Morgan
48fdb9872f
fix(automation): robust label enforcement with permission checks ( #16762 )
2026-01-15 19:53:08 +00:00
Bryan Morgan
2b6bfe4097
feat(automation): enforce ' 🔒 maintainer only' and fix bot loop ( #16751 )
2026-01-15 16:40:33 +00:00
Bryan Morgan
d545a3b614
fix(automation): prevent label-enforcer loop by ignoring all bots ( #16746 )
...
Co-authored-by: Sehoon Shon <sshon@google.com >
2026-01-15 15:50:36 +00:00
Bryan Morgan
53f54436c9
chore(automation): enforce 'help wanted' label permissions and update guidelines ( #16707 )
2026-01-15 05:36:58 +00:00
Bryan Morgan
467e869326
chore(automation): ensure status/need-triage is applied and never cleared automatically ( #16657 )
2026-01-15 01:58:50 +00:00
Bryan Morgan
b14cf1dc30
chore(automation): improve scheduled issue triage discovery and throughput ( #16652 )
2026-01-14 21:55:19 +00:00
Bryan Morgan
b3eecc3a50
chore(automation): remove automated PR size and complexity labeler ( #16648 )
2026-01-14 21:04:55 +00:00
Bryan Morgan
1212161d1d
chore(automation): recursive labeling for workstream descendants ( #16609 )
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-14 20:56:16 +00:00
Christian Gunderman
66e7b479ae
Aggregate test results. ( #16581 )
2026-01-14 07:08:05 +00:00
Christian Gunderman
8030404b08
Behavioral evals framework. ( #16047 )
2026-01-14 04:49:17 +00:00
Bryan Morgan
2306e60be4
perf(workflows): optimize PR triage script for faster execution ( #16355 )
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-12 19:49:20 +00:00
Bryan Morgan
33e3ed0f6c
fix(workflows): resolve triage workflow failures and actionlint errors ( #16338 )
2026-01-10 19:31:17 -05:00
Bryan Morgan
446058cb1c
fix: fallback to GITHUB_TOKEN if App ID is missing
2026-01-10 16:46:27 -05:00
Bryan Morgan
d130d99ff0
fix: Add event-driven trigger to issue triage workflow ( #16334 )
2026-01-10 16:23:41 -05:00
Bryan Morgan
72dae7e0ee
Triage action cleanup ( #16319 )
2026-01-10 19:58:36 +00:00
Wesley Tanaka
f1ca7fa40a
ci: guard nightly release workflow from running on forks ( #15463 )
...
Co-authored-by: wtanaka.com <wtanaka@users.noreply.github.com >
Co-authored-by: Tommaso Sciortino <sciortino@gmail.com >
2026-01-09 00:17:58 +00:00
Wesley Tanaka
6166d7f6ec
ci: guard links workflow from running on forks ( #15461 )
...
Co-authored-by: Tommaso Sciortino <sciortino@gmail.com >
2026-01-09 00:16:00 +00:00
sangwook
e51f3e11f1
fix: remove unsupported 'enabled' key from workflow config ( #15611 )
...
Co-authored-by: Tommaso Sciortino <sciortino@gmail.com >
2026-01-08 22:45:41 +00:00
Tommaso Sciortino
dd04b46e86
Fix CI for forks ( #16113 )
2026-01-08 00:18:33 +00:00
N. Taylor Mullen
d5996fea99
Optimize CI workflow: Parallelize jobs and cache linters ( #16054 )
...
Co-authored-by: matt korwel <matt.korwel@gmail.com >
2026-01-07 21:50:22 +00:00
Jerop Kipruto
8f5bf33eac
ci(github-actions): triage all new issues automatically ( #16018 )
2026-01-07 00:15:00 +00:00
Jerop Kipruto
7eeb7bd74c
fix: limit scheduled issue triage queries to prevent argument list too long error ( #16021 )
2026-01-07 00:03:23 +00:00
joshualitt
1e31427da8
Remove trailing whitespace in yaml. ( #16036 )
2026-01-06 15:41:58 -08:00
Bryan Morgan
7feb2f8f42
Add 'reopened' type to issue labeling workflow
2026-01-06 18:17:12 -05:00
Bryan Morgan
2122604b32
Refactor parent issue check to use URLs
2026-01-06 18:00:39 -05:00
Bryan Morgan
d4b4aede2f
Add debugging logs for issue parent checks
...
Added debugging logs to inspect issue object and parent information.
2026-01-06 17:56:39 -05:00
Bryan Morgan
4b5c044272
Fix label-backlog-child-issues workflow logic
2026-01-06 17:48:08 -05:00
Bryan Morgan
86b5995f12
Add workflow to label child issues for rollup ( #16002 )
2026-01-06 16:00:47 -05:00