Commit Graph

6 Commits

Author SHA1 Message Date
Alisa Novikova
8f9f412327 fix(evals): filter out infrastructure failures from baseline calculation 2 2026-03-18 22:39:53 -07:00
Alisa Novikova
5a27d970dc feat(evals): add actionable status and CTAs to PR impact report 2026-03-18 17:19:39 -07:00
Alisa Novikova
6bef72cddd feat(evals): add PR impact analysis workflow 2026-03-18 15:30:07 -07:00
Christian Gunderman
54885214a1 feat(evals): add overall pass rate row to eval nightly summary table (#20905) 2026-03-04 18:58:18 +00:00
Christian Gunderman
c43b04b44c Run evals for all models. (#17123) 2026-01-21 16:38:37 +00:00
Christian Gunderman
66e7b479ae Aggregate test results. (#16581) 2026-01-14 07:08:05 +00:00