Commit Graph

4 Commits

Author SHA1 Message Date
cocosheng-g
fe50e580fa fix(ci): isolate workflow evals, revert unrelated changes, and fix aggregation 2026-02-03 23:04:50 -05:00
cocosheng-g
d23499db90 fix(scripts): exclude skipped tests from pass rate calculation
Ensures that models skipping irrelevant tests (like workflow evals) don't suffer a drop in reported pass rate.
2026-02-03 21:56:37 -05:00
Christian Gunderman
c43b04b44c Run evals for all models. (#17123) 2026-01-21 16:38:37 +00:00
Christian Gunderman
66e7b479ae Aggregate test results. (#16581) 2026-01-14 07:08:05 +00:00