Alisa Novikova
|
c1083b91c6
|
feat(evals): implement production-grade PR impact analysis with isolation and policy-aware filtering
|
2026-03-19 14:23:40 -07:00 |
|
Alisa Novikova
|
8f9f412327
|
fix(evals): filter out infrastructure failures from baseline calculation 2
|
2026-03-18 22:39:53 -07:00 |
|
Alisa Novikova
|
5a27d970dc
|
feat(evals): add actionable status and CTAs to PR impact report
|
2026-03-18 17:19:39 -07:00 |
|
Alisa Novikova
|
6bef72cddd
|
feat(evals): add PR impact analysis workflow
|
2026-03-18 15:30:07 -07:00 |
|
Christian Gunderman
|
54885214a1
|
feat(evals): add overall pass rate row to eval nightly summary table (#20905)
|
2026-03-04 18:58:18 +00:00 |
|
Christian Gunderman
|
c43b04b44c
|
Run evals for all models. (#17123)
|
2026-01-21 16:38:37 +00:00 |
|
Christian Gunderman
|
66e7b479ae
|
Aggregate test results. (#16581)
|
2026-01-14 07:08:05 +00:00 |
|