Commit Graph

6 Commits

Author SHA1 Message Date
cocosheng-g d9a0e9b9c5 fix(triage): tune prompt to return json text and harden eval json extraction 2026-02-03 19:55:00 -05:00
cocosheng-g 5c2f477adf feat(evals): add comprehensive workflow evaluations and tune prompts (Issue #219)
- Established evals for all agent workflows (triage, dedup, refresh).
- Refactored all evals to use modern --output-format=json flag for robust validation.
- Tuned prompts for strict JSON compliance and corrected spam handling in scheduled triage.
- Expanded edge case coverage for false positives, security leaks, and mixed batches.
2026-02-03 19:31:56 -05:00
cocosheng-g 2daee0d066 feat(evals): add more edge case tests 2026-02-03 19:31:56 -05:00
cocosheng-g aa4b1c0056 fix(evals): address robustness feedback 2026-02-03 19:31:56 -05:00
cocosheng-g 9f8f31cce9 fix(evals): address review feedback on triage tests 2026-02-03 19:31:56 -05:00
cocosheng-g 259a3e7891 fix(workflows): tune triage prompt and add robustness evals 2026-02-03 19:31:56 -05:00