# Promoting Behavioral Evals Use this guide when asked to analyze nightly results and promote incubated tests to stable suites. --- ## 1. 🔍 Investigate candidates 1. **Audit Nightly Logs**: Use the `gh` CLI to fetch results from `evals-nightly.yml` (Direct URL: `https://github.com/google-gemini/gemini-cli/actions/workflows/evals-nightly.yml`). - **Tip**: The aggregate summary from the most recent run integrates the last 7 runs of history automatically. - **Safety**: DO NOT push changes or start remote runs. All verification is local. 2. **Assess Stability**: Identify tests that pass **100% of the time** across ALL enabled models over the **last 7 nightly runs** in a row. - _100% means the test passed 3/3 times for every model and run._ 3. **Promotion Targets**: Tests meeting this criteria are candidates for promotion from `USUALLY_PASSES` to `ALWAYS_PASSES`. --- ## 2. 🚥 Promotion Steps 1. **Locate File**: Locate the eval file in the `evals/` directory. 2. **Update Policy**: Modify the policy argument to `ALWAYS_PASSES`. ```typescript evalTest('ALWAYS_PASSES', { ... }) ``` 3. **Targeting**: Follow guidelines in `evals/README.md` regarding stable suite organization. 4. **Constraint**: Your final change must be **minimal and targeted** strictly to promoting the test status. Do not refactor the test or setup fixtures. --- ## 3. ✅ Verify 1. **Run Prompted Tests**: Run the promoted test locally using non-interactive Vitest to confirm structure validity. 2. **Verify Suite Inclusion**: Check that the test is successfully picked up by standard runnable ranges. --- ## 4. 📊 Report Provide a summary of: - Which tests were promoted. - Provide the success rate evidence (e.g., 7/7 runs passed for all models). - If no candidates qualified, list the next closest candidates and their current pass rate.