mirror of
https://github.com/google-gemini/gemini-cli.git
synced 2026-03-13 15:40:57 -07:00
feat(evals): add comprehensive workflow evaluations and tune prompts (Issue #219)
- Established evals for all agent workflows (triage, dedup, refresh). - Refactored all evals to use modern --output-format=json flag for robust validation. - Tuned prompts for strict JSON compliance and corrected spam handling in scheduled triage. - Expanded edge case coverage for false positives, security leaks, and mixed batches.
This commit is contained in:
@@ -257,6 +257,13 @@ jobs:
|
||||
area/unknown
|
||||
- Description: Issues that do not clearly fit into any other defined area/ category, or where information is too limited to make a determination. Use this when no other area is appropriate.
|
||||
|
||||
## Final Instructions
|
||||
|
||||
- Output ONLY valid JSON format.
|
||||
- Do NOT include any introductory or concluding remarks, explanations, or additional text.
|
||||
- Do NOT include any thoughts or reasoning outside the JSON block.
|
||||
- Ensure the output is a single JSON object with a "labels_to_set" array.
|
||||
|
||||
- name: 'Apply Labels to Issue'
|
||||
if: |-
|
||||
${{ steps.gemini_issue_analysis.outputs.summary != '' }}
|
||||
|
||||
@@ -159,7 +159,7 @@ jobs:
|
||||
}
|
||||
]
|
||||
```
|
||||
If an issue cannot be classified, do not include it in the output array.
|
||||
If an issue cannot be classified (e.g. spam), classify it as area/unknown.
|
||||
9. For each issue please check if CLI version is present, this is usually in the output of the /about command and will look like 0.1.5
|
||||
- Anything more than 6 versions older than the most recent should add the status/need-retesting label
|
||||
10. If you see that the issue doesn't look like it has sufficient information recommend the status/need-information label and leave a comment politely requesting the relevant information, eg.. if repro steps are missing request for repro steps. if version information is missing request for version information into the explanation section below.
|
||||
@@ -207,7 +207,7 @@ jobs:
|
||||
area/enterprise: Telemetry, Policy, Quota / Licensing
|
||||
area/extensions: Gemini CLI extensions capability
|
||||
area/non-interactive: GitHub Actions, SDK, 3P Integrations, Shell Scripting, Command line automation
|
||||
area/platform: Build infra, Release mgmt, Testing, Eval infra, Capacity, Quota mgmt
|
||||
area/platform: Build infra, Release mgmt, Automated testing infrastructure (evals), Capacity, Quota mgmt. NOT for local test failures.
|
||||
area/security: security related issues
|
||||
|
||||
Additional Context:
|
||||
@@ -215,6 +215,13 @@ jobs:
|
||||
- This product is designed to use different models eg.. using pro, downgrading to flash etc.
|
||||
- When users report that they dont expect the model to change those would be categorized as feature requests.
|
||||
|
||||
## Final Instructions
|
||||
|
||||
- Output ONLY valid JSON format.
|
||||
- Do NOT include any introductory or concluding remarks, explanations, or additional text.
|
||||
- Do NOT include any thoughts or reasoning outside the JSON block.
|
||||
- Ensure the output is a single JSON array of objects.
|
||||
|
||||
- name: 'Apply Labels to Issues'
|
||||
if: |-
|
||||
${{ steps.gemini_issue_analysis.outcome == 'success' &&
|
||||
|
||||
Reference in New Issue
Block a user