Christian Gunderman
|
f1bb2af6de
|
Generalize evals infra to support more types of evals, organization and queuing of named suites (#24941)
|
2026-04-08 23:57:26 +00:00 |
|
Alisa
|
2e03e3aed5
|
feat(evals): add reliability harvester and 500/503 retry support (#23626)
|
2026-03-26 01:48:45 +00:00 |
|
Christian Gunderman
|
28935d1e6b
|
Retry evals on API error. (#23322)
|
2026-03-21 02:52:19 +00:00 |
|
joshualitt
|
55c628e967
|
feat(core): experimental in-progress steering hints (1 of 3) (#19008)
|
2026-02-17 22:59:33 +00:00 |
|
Christian Gunderman
|
66e7b479ae
|
Aggregate test results. (#16581)
|
2026-01-14 07:08:05 +00:00 |
|
Christian Gunderman
|
8030404b08
|
Behavioral evals framework. (#16047)
|
2026-01-14 04:49:17 +00:00 |
|