Commit Graph

7 Commits

Author SHA1 Message Date
Alisa Novikova c1083b91c6 feat(evals): implement production-grade PR impact analysis with isolation and policy-aware filtering 2026-03-19 14:23:40 -07:00
Alisa Novikova 8f9f412327 fix(evals): filter out infrastructure failures from baseline calculation 2 2026-03-18 22:39:53 -07:00
Alisa Novikova 5a27d970dc feat(evals): add actionable status and CTAs to PR impact report 2026-03-18 17:19:39 -07:00
Alisa Novikova 6bef72cddd feat(evals): add PR impact analysis workflow 2026-03-18 15:30:07 -07:00
Christian Gunderman 54885214a1 feat(evals): add overall pass rate row to eval nightly summary table (#20905) 2026-03-04 18:58:18 +00:00
Christian Gunderman c43b04b44c Run evals for all models. (#17123) 2026-01-21 16:38:37 +00:00
Christian Gunderman 66e7b479ae Aggregate test results. (#16581) 2026-01-14 07:08:05 +00:00