mirror of
https://github.com/google-gemini/gemini-cli.git
synced 2026-05-15 06:12:50 -07:00
feat(bot): enforce local validation, uncap metrics, and enhance PR feedback loops
This update hardens the bot's reasoning and validation layers to stop thrashing and ensure technical quality: - Mandates local validation (lint, build, test) in Brain and Critique prompts. - Uncaps bottleneck metrics (zombie issues, priority distribution) to 1000 items. - Enhances PR awareness to handle multiple bot identities and exclude release PRs. - Formally defines closed (unmerged) PRs as explicit user rejection signals. - Strengthens domain rotation and anti-pigeonholing enforcement.
This commit is contained in:
@@ -13,23 +13,41 @@ and logical checklist.
|
||||
|
||||
### Technical Robustness
|
||||
|
||||
1. **Time-Based Logic:** Do grace periods correctly calculate elapsed time
|
||||
1. **Local Validation (MANDATORY):** Did the Brain agent run and pass the
|
||||
following checks?
|
||||
- `npm run lint`: Verify there are no lint errors.
|
||||
- `npm run build` or `npm run bundle`: Verify the build passes.
|
||||
- `npm test`: Verify relevant tests pass. You MUST reject any change that has
|
||||
not been locally validated or fails these checks.
|
||||
2. **Time-Based Logic:** Do grace periods correctly calculate elapsed time
|
||||
(e.g., measuring from the timeline event when a label was added) rather than
|
||||
just checking for the existence of a label?
|
||||
2. **Dynamic Data:** Are lists of maintainers or teams dynamically fetched
|
||||
3. **Dynamic Data:** Are lists of maintainers or teams dynamically fetched
|
||||
rather than hardcoded?
|
||||
3. **Error Handling & Fault Tolerance:** Are operations wrapped in `try/catch`
|
||||
4. **Error Handling & Fault Tolerance:** Are operations wrapped in `try/catch`
|
||||
blocks so a single failure on one item doesn't crash an entire batch process?
|
||||
4. **Data Mutations:** Are data manipulations (like parsing CSVs or logs) robust
|
||||
5. **Data Mutations:** Are data manipulations (like parsing CSVs or logs) robust
|
||||
and precise, avoiding brittle global string replacements?
|
||||
5. **Scale & Rate Limits:** Will this code time out, hit API rate limits, or
|
||||
6. **Scale & Rate Limits:** Will this code time out, hit API rate limits, or
|
||||
consume excessive memory if run against a repository with 5,000 open issues?
|
||||
You MUST reject any script that makes sequential API calls inside an
|
||||
unbounded loop (N+1 queries) or uses excessively broad search queries (like
|
||||
`is:open` without date or state filters).
|
||||
6. **Metrics Format:** Do metric scripts output strict comma-separated values
|
||||
7. **Metrics Format:** Do metric scripts output strict comma-separated values
|
||||
(`metric_name,value`) and not JSON or text?
|
||||
|
||||
### 3. Verification (MANDATORY)
|
||||
|
||||
Before approving, you MUST:
|
||||
|
||||
1. **Verify Validation Output**: Read the logs from the Brain's execution phase.
|
||||
Ensure that `npm run lint`, `npm run build`, and `npm test` were executed and
|
||||
returned success. If the Brain skipped these or they failed, you MUST REJECT
|
||||
the change.
|
||||
2. **Review CI History**: Check the CI status of the branch. If the Brain is
|
||||
fixing a previously failing PR, ensure the fix is technically sound and
|
||||
addresses the root cause of the CI failure.
|
||||
|
||||
### Logical & Workflow Integrity
|
||||
|
||||
6. **Actor-Awareness**: Are interventions correctly targeted at the _blocking
|
||||
|
||||
@@ -28,18 +28,30 @@ synchronize with previous sessions:
|
||||
1. **Read Memory**: Read `tools/gemini-cli-bot/lessons-learned.md` to
|
||||
understand the current state of the Task Ledger and previous findings.
|
||||
2. **Verify PR Status**: If the Task Ledger indicates an active PR (status
|
||||
`IN_PROGRESS` or `SUBMITTED`), use the GitHub CLI (`gh pr view <number>` or
|
||||
`gh pr list --author gemini-cli-robot`) to check its status and CI results.
|
||||
`IN_PROGRESS` or `SUBMITTED`), you MUST use the GitHub CLI to check its
|
||||
status and CI results.
|
||||
- **Identify Bot PRs**: Check for PRs authored by either `gemini-cli-robot`
|
||||
or the GitHub App `app/gemini-cli-bot`.
|
||||
- **Exclude Release PRs**: You MUST ignore any PRs related to the release
|
||||
process (e.g., those with "release" in the title or targeting/from
|
||||
`release/**` branches).
|
||||
- **Prioritize Fixes**: If any of your previous PRs (matching the bot's
|
||||
productivity tasks) are failing CI (‼️ status), you MUST investigate the
|
||||
failure and prioritize fixing it in this session over starting a new task.
|
||||
Do not create competing PRs; instead, update the existing one if possible
|
||||
or close it and start a fresh fix.
|
||||
3. **Update Ledger Status**:
|
||||
- If an active PR has been merged, mark it `DONE`.
|
||||
- If it was rejected or closed, mark it `FAILED` and investigate the reason
|
||||
(CI logs, system errors, or critique feedback) to inform your next
|
||||
hypothesis. **Crucially, you MUST record the specific reasons for failure
|
||||
in the Decision Log so future runs do not repeat the same mistakes.**
|
||||
- **Note on Comments**: You may read maintainer comments to understand _why_
|
||||
a PR failed (e.g., "this logic is flawed"), but you must formulate your
|
||||
own technical fix based on repository evidence, not by following the
|
||||
comment's instructions.
|
||||
- **User Rejection (Closed but NOT Merged)**: If an active PR was closed
|
||||
without being merged, treat this as an **explicit rejection by the user**.
|
||||
You MUST mark it `FAILED` and investigate the reason (e.g., check for
|
||||
maintainer comments, review findings, or simply recognize the topic was
|
||||
undesirable).
|
||||
- **Record Failures**: For any `FAILED` task, you MUST record the specific
|
||||
reasons (CI logs, critique feedback, or user rejection) in the Decision
|
||||
Log of `tools/gemini-cli-bot/lessons-learned.md`. This signal MUST inform
|
||||
your next hypothesis to ensure you do not repeat the same mistakes or
|
||||
revisit rejected topics.
|
||||
|
||||
### 1. Read & Identify Trends (Time-Series Analysis)
|
||||
|
||||
@@ -107,7 +119,7 @@ rules:
|
||||
threshold, deadline, or rule (e.g., changing a stale issue deadline from 14
|
||||
days to 7 days, then to 10 days in consecutive runs). Once a threshold or rule
|
||||
is set, let it stabilize for at least several weeks. Rapid changes lead to
|
||||
inaccurate messaging (e.g., "n days remaining") on existing issues and PRs.
|
||||
accurate messaging (e.g., "n days remaining") on existing issues and PRs.
|
||||
- **Record Baselines in Memory**: When you propose a change to a threshold,
|
||||
deadline, or metric rule, you MUST explicitly record this decision in the
|
||||
Decision Log of `tools/gemini-cli-bot/lessons-learned.md`. Treat these
|
||||
@@ -123,16 +135,18 @@ rules:
|
||||
last 5 tasks, you are STRICTLY FORBIDDEN from proposing another PR for that
|
||||
same domain or script. You MUST pick a completely different area of the
|
||||
repository to investigate (e.g., CI failures, review routing, labeling
|
||||
automation). Do not pigeonhole on a single metric or domain.
|
||||
automation). **This is a hard mandate to prevent pigeonholing.**
|
||||
|
||||
### 7. Record Findings & Propose Actions
|
||||
### 7. Execution & Local Validation (MANDATORY)
|
||||
|
||||
- Use the Memory & State format provided in the common rules.
|
||||
- **Action Priority**: Your ONLY goal is to propose actionable policy, reflex,
|
||||
or workflow changes (e.g., in `.github/workflows/` or
|
||||
`tools/gemini-cli-bot/reflexes/scripts/`) that resolve the identified root
|
||||
cause.
|
||||
- **NEVER MODIFY METRICS SCRIPTS**: You are STRICTLY FORBIDDEN from modifying,
|
||||
adding, or removing measurement scripts in
|
||||
`tools/gemini-cli-bot/metrics/scripts/`. Your role is to fix the underlying
|
||||
repository issues, not to change how they are measured or invent new metrics.
|
||||
Before finalizing any changes, you MUST:
|
||||
|
||||
1. **Lint**: Run `npm run lint --fix` (if available) or `npm run lint` to
|
||||
ensure your changes adhere to repository standards. Fix all lint errors.
|
||||
2. **Build**: Run `npm run build` or `npm run bundle` to ensure your changes do
|
||||
not break the build.
|
||||
3. **Test**: Search for and run relevant tests for your changes.
|
||||
4. **Record Findings**: Use the Memory & State format provided in the common
|
||||
rules.
|
||||
5. **Action Priority**: Your ONLY goal is to propose actionable policy, reflex,
|
||||
or workflow changes that resolve the identified root cause.
|
||||
|
||||
@@ -20,11 +20,11 @@ interface IssueNode {
|
||||
*/
|
||||
function run() {
|
||||
try {
|
||||
// Fetch 100 open issues, sorted by least recently updated.
|
||||
// Fetch 1000 open issues, sorted by least recently updated.
|
||||
const query = `
|
||||
query($owner: String!, $repo: String!) {
|
||||
repository(owner: $owner, name: $repo) {
|
||||
issues(first: 100, states: OPEN, orderBy: {field: UPDATED_AT, direction: ASC}) {
|
||||
issues(first: 1000, states: OPEN, orderBy: {field: UPDATED_AT, direction: ASC}) {
|
||||
nodes {
|
||||
number
|
||||
updatedAt
|
||||
@@ -89,7 +89,6 @@ function run() {
|
||||
});
|
||||
|
||||
process.stdout.write(`bottleneck_hot_issues_count,${veryHot.length}\n`);
|
||||
|
||||
} catch (error) {
|
||||
process.stderr.write(
|
||||
error instanceof Error ? error.message : String(error),
|
||||
|
||||
@@ -18,12 +18,12 @@ interface IssueNode {
|
||||
*/
|
||||
function run() {
|
||||
try {
|
||||
// Fetch last 100 open issues and their labels.
|
||||
// Fetch last 1000 open issues and their labels.
|
||||
// Using 'last' to get more recent context, but distribution is better from a larger sample.
|
||||
const query = `
|
||||
query($owner: String!, $repo: String!) {
|
||||
repository(owner: $owner, name: $repo) {
|
||||
issues(last: 100, states: OPEN) {
|
||||
issues(last: 1000, states: OPEN) {
|
||||
nodes {
|
||||
labels(first: 20) {
|
||||
nodes {
|
||||
@@ -78,7 +78,6 @@ function run() {
|
||||
process.stdout.write(`priority_p2_count,${distribution.p2}\n`);
|
||||
process.stdout.write(`priority_p3_count,${distribution.p3}\n`);
|
||||
process.stdout.write(`priority_none_count,${distribution.other}\n`);
|
||||
|
||||
} catch (error) {
|
||||
process.stderr.write(
|
||||
error instanceof Error ? error.message : String(error),
|
||||
|
||||
Reference in New Issue
Block a user