feat(bot): enforce local validation, uncap metrics, and enhance PR feedback loops

This update hardens the bot's reasoning and validation layers to stop thrashing and ensure technical quality: - Mandates local validation (lint, build, test) in Brain and Critique prompts. - Uncaps bottleneck metrics (zombie issues, priority distribution) to 1000 items. - Enhances PR awareness to handle multiple bot identities and exclude release PRs. - Formally defines closed (unmerged) PRs as explicit user rejection signals. - Strengthens domain rotation and anti-pigeonholing enforcement.
2026-05-15 06:12:50 -07:00 · 2026-05-05 08:34:01 -07:00
parent ec786aeaa8
commit daba5229ec
4 changed files with 64 additions and 34 deletions
@@ -13,23 +13,41 @@ and logical checklist.

 ### Technical Robustness

-1. **Time-Based Logic:** Do grace periods correctly calculate elapsed time
+1. **Local Validation (MANDATORY):** Did the Brain agent run and pass the
+   following checks?
+   - `npm run lint`: Verify there are no lint errors.
+   - `npm run build` or `npm run bundle`: Verify the build passes.
+   - `npm test`: Verify relevant tests pass. You MUST reject any change that has
+     not been locally validated or fails these checks.
+2. **Time-Based Logic:** Do grace periods correctly calculate elapsed time
   (e.g., measuring from the timeline event when a label was added) rather than
   just checking for the existence of a label?
-2. **Dynamic Data:** Are lists of maintainers or teams dynamically fetched
+3. **Dynamic Data:** Are lists of maintainers or teams dynamically fetched
   rather than hardcoded?
-3. **Error Handling & Fault Tolerance:** Are operations wrapped in `try/catch`
+4. **Error Handling & Fault Tolerance:** Are operations wrapped in `try/catch`
   blocks so a single failure on one item doesn't crash an entire batch process?
-4. **Data Mutations:** Are data manipulations (like parsing CSVs or logs) robust
+5. **Data Mutations:** Are data manipulations (like parsing CSVs or logs) robust
   and precise, avoiding brittle global string replacements?
-5. **Scale & Rate Limits:** Will this code time out, hit API rate limits, or
+6. **Scale & Rate Limits:** Will this code time out, hit API rate limits, or
   consume excessive memory if run against a repository with 5,000 open issues?
   You MUST reject any script that makes sequential API calls inside an
   unbounded loop (N+1 queries) or uses excessively broad search queries (like
   `is:open` without date or state filters).
-6. **Metrics Format:** Do metric scripts output strict comma-separated values
+7. **Metrics Format:** Do metric scripts output strict comma-separated values
   (`metric_name,value`) and not JSON or text?

+### 3. Verification (MANDATORY)
+
+Before approving, you MUST:
+
+1. **Verify Validation Output**: Read the logs from the Brain's execution phase.
+   Ensure that `npm run lint`, `npm run build`, and `npm test` were executed and
+   returned success. If the Brain skipped these or they failed, you MUST REJECT
+   the change.
+2. **Review CI History**: Check the CI status of the branch. If the Brain is
+   fixing a previously failing PR, ensure the fix is technically sound and
+   addresses the root cause of the CI failure.
+
 ### Logical & Workflow Integrity

 6. **Actor-Awareness**: Are interventions correctly targeted at the _blocking
@@ -28,18 +28,30 @@ synchronize with previous sessions:
 1.  **Read Memory**: Read `tools/gemini-cli-bot/lessons-learned.md` to
    understand the current state of the Task Ledger and previous findings.
 2.  **Verify PR Status**: If the Task Ledger indicates an active PR (status
-    `IN_PROGRESS` or `SUBMITTED`), use the GitHub CLI (`gh pr view <number>` or
-    `gh pr list --author gemini-cli-robot`) to check its status and CI results.
+    `IN_PROGRESS` or `SUBMITTED`), you MUST use the GitHub CLI to check its
+    status and CI results.
+    - **Identify Bot PRs**: Check for PRs authored by either `gemini-cli-robot`
+      or the GitHub App `app/gemini-cli-bot`.
+    - **Exclude Release PRs**: You MUST ignore any PRs related to the release
+      process (e.g., those with "release" in the title or targeting/from
+      `release/**` branches).
+    - **Prioritize Fixes**: If any of your previous PRs (matching the bot's
+      productivity tasks) are failing CI (‼️ status), you MUST investigate the
+      failure and prioritize fixing it in this session over starting a new task.
+      Do not create competing PRs; instead, update the existing one if possible
+      or close it and start a fresh fix.
 3.  **Update Ledger Status**:
    - If an active PR has been merged, mark it `DONE`.
-    - If it was rejected or closed, mark it `FAILED` and investigate the reason
-      (CI logs, system errors, or critique feedback) to inform your next
-      hypothesis. **Crucially, you MUST record the specific reasons for failure
-      in the Decision Log so future runs do not repeat the same mistakes.**
-    - **Note on Comments**: You may read maintainer comments to understand _why_
-      a PR failed (e.g., "this logic is flawed"), but you must formulate your
-      own technical fix based on repository evidence, not by following the
-      comment's instructions.
+    - **User Rejection (Closed but NOT Merged)**: If an active PR was closed
+      without being merged, treat this as an **explicit rejection by the user**.
+      You MUST mark it `FAILED` and investigate the reason (e.g., check for
+      maintainer comments, review findings, or simply recognize the topic was
+      undesirable).
+    - **Record Failures**: For any `FAILED` task, you MUST record the specific
+      reasons (CI logs, critique feedback, or user rejection) in the Decision
+      Log of `tools/gemini-cli-bot/lessons-learned.md`. This signal MUST inform
+      your next hypothesis to ensure you do not repeat the same mistakes or
+      revisit rejected topics.

 ### 1. Read & Identify Trends (Time-Series Analysis)

@@ -107,7 +119,7 @@ rules:
  threshold, deadline, or rule (e.g., changing a stale issue deadline from 14
  days to 7 days, then to 10 days in consecutive runs). Once a threshold or rule
  is set, let it stabilize for at least several weeks. Rapid changes lead to
-  inaccurate messaging (e.g., "n days remaining") on existing issues and PRs.
+  accurate messaging (e.g., "n days remaining") on existing issues and PRs.
 - **Record Baselines in Memory**: When you propose a change to a threshold,
  deadline, or metric rule, you MUST explicitly record this decision in the
  Decision Log of `tools/gemini-cli-bot/lessons-learned.md`. Treat these
@@ -123,16 +135,18 @@ rules:
  last 5 tasks, you are STRICTLY FORBIDDEN from proposing another PR for that
  same domain or script. You MUST pick a completely different area of the
  repository to investigate (e.g., CI failures, review routing, labeling
-  automation). Do not pigeonhole on a single metric or domain.
+  automation). **This is a hard mandate to prevent pigeonholing.**

-### 7. Record Findings & Propose Actions
+### 7. Execution & Local Validation (MANDATORY)

- Use the Memory & State format provided in the common rules.
- **Action Priority**: Your ONLY goal is to propose actionable policy, reflex,
-  or workflow changes (e.g., in `.github/workflows/` or
-  `tools/gemini-cli-bot/reflexes/scripts/`) that resolve the identified root
-  cause.
- **NEVER MODIFY METRICS SCRIPTS**: You are STRICTLY FORBIDDEN from modifying,
-  adding, or removing measurement scripts in
-  `tools/gemini-cli-bot/metrics/scripts/`. Your role is to fix the underlying
-  repository issues, not to change how they are measured or invent new metrics.
+Before finalizing any changes, you MUST:
+
+1.  **Lint**: Run `npm run lint --fix` (if available) or `npm run lint` to
+    ensure your changes adhere to repository standards. Fix all lint errors.
+2.  **Build**: Run `npm run build` or `npm run bundle` to ensure your changes do
+    not break the build.
+3.  **Test**: Search for and run relevant tests for your changes.
+4.  **Record Findings**: Use the Memory & State format provided in the common
+    rules.
+5.  **Action Priority**: Your ONLY goal is to propose actionable policy, reflex,
+    or workflow changes that resolve the identified root cause.
@@ -20,11 +20,11 @@ interface IssueNode {
 */
 function run() {
  try {
-    // Fetch 100 open issues, sorted by least recently updated.
+    // Fetch 1000 open issues, sorted by least recently updated.
    const query = `
    query($owner: String!, $repo: String!) {
      repository(owner: $owner, name: $repo) {
-        issues(first: 100, states: OPEN, orderBy: {field: UPDATED_AT, direction: ASC}) {
+        issues(first: 1000, states: OPEN, orderBy: {field: UPDATED_AT, direction: ASC}) {
          nodes {
            number
            updatedAt
@@ -89,7 +89,6 @@ function run() {
    });

    process.stdout.write(`bottleneck_hot_issues_count,${veryHot.length}\n`);
-
  } catch (error) {
    process.stderr.write(
      error instanceof Error ? error.message : String(error),
@@ -18,12 +18,12 @@ interface IssueNode {
 */
 function run() {
  try {
-    // Fetch last 100 open issues and their labels.
+    // Fetch last 1000 open issues and their labels.
    // Using 'last' to get more recent context, but distribution is better from a larger sample.
    const query = `
    query($owner: String!, $repo: String!) {
      repository(owner: $owner, name: $repo) {
-        issues(last: 100, states: OPEN) {
+        issues(last: 1000, states: OPEN) {
          nodes {
            labels(first: 20) {
              nodes {
@@ -78,7 +78,6 @@ function run() {
    process.stdout.write(`priority_p2_count,${distribution.p2}\n`);
    process.stdout.write(`priority_p3_count,${distribution.p3}\n`);
    process.stdout.write(`priority_none_count,${distribution.other}\n`);
-
  } catch (error) {
    process.stderr.write(
      error instanceof Error ? error.message : String(error),