From 0c4ac593eba690dccb8bd1f7c03bdf91743012ce Mon Sep 17 00:00:00 2001
From: Christian Gunderman <gundermanc@google.com>
Date: Tue, 5 May 2026 09:01:14 -0700
Subject: [PATCH] fix(bot): break pigeonholing by syncing lessons-learned and
 optimizing CI

This update resolves the bot's persistent focus on already-completed tasks:
- Moves and syncs lessons-learned.md to tools/gemini-cli-bot/ to ensure persistent memory.
- Marks metrics fixes, prompt hardenings, and user rejection signals as DONE in the ledger.
- Implements the CI matrix optimization (Node 20.x for PRs) the bot was re-attempting.
- This forces the bot to rotate to a new domain in the next run by satisfying its current goals.
---
 .github/workflows/ci.yml                | 10 +----
 tools/gemini-cli-bot/lessons-learned.md | 56 +++++++++++++++++++++++++
 2 files changed, 58 insertions(+), 8 deletions(-)
 create mode 100644 tools/gemini-cli-bot/lessons-learned.md

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 2ef8bdb58d..c022e2916c 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -147,10 +147,7 @@ jobs:
       pull-requests: 'write'
     strategy:
       matrix:
-        node-version:
-          - '20.x'
-          - '22.x'
-          - '24.x'
+        node-version: ${{ fromJSON(github.event_name == 'pull_request' && '["20.x"]' || '["20.x", "22.x", "24.x"]') }}
         shard:
           - 'cli'
           - 'others'
@@ -242,10 +239,7 @@ jobs:
     continue-on-error: true
     strategy:
       matrix:
-        node-version:
-          - '20.x'
-          - '22.x'
-          - '24.x'
+        node-version: ${{ fromJSON(github.event_name == 'pull_request' && '["20.x"]' || '["20.x", "22.x", "24.x"]') }}
         shard:
           - 'cli'
           - 'others'
diff --git a/tools/gemini-cli-bot/lessons-learned.md b/tools/gemini-cli-bot/lessons-learned.md
new file mode 100644
index 0000000000..74e698d03c
--- /dev/null
+++ b/tools/gemini-cli-bot/lessons-learned.md
@@ -0,0 +1,56 @@
+# 🤖 Gemini Bot Brain: Memory & State
+
+## 📋 Task Ledger
+
+| ID    | Status    | Goal                                       | PR/Ref | Details                                                                        |
+| :---- | :-------- | :----------------------------------------- | :----- | :----------------------------------------------------------------------------- |
+| BT-33 | DONE      | Restore & Enforce Lifecycle (Actual)       | #26355 | Verified merged on main. Unified lifecycle manager deployed.                   |
+| BT-34 | DONE      | Fix linter issues in PR #26355             | #26355 | Verified merged on main.                                                       |
+| BT-35 | SUBMITTED | Implement Gemini PR Pre-review Reflex      | #26471 | Created reflex script and updated pulse workflow.                              |
+| BT-36 | FAILED    | Optimize Lifecycle Manager & Prune Backlog | #PR    | Branch `bot/task-BT-36` was found empty; changes did not land on main.         |
+| BT-37 | FAILED    | Scale-Safe Lifecycle & Aggressive Pruning  | #BT37  | REJECTED: PR closure logic lacks mandatory grace period; false claim in log.   |
+| BT-38 | FAILED    | Implement Robust State-Based Lifecycle     | #PR    | REJECTED: Introduces N+1 query vulnerability in paginated loop.                |
+| BT-39 | DONE      | Scale-Safe Lifecycle with Grace Period     | #26483 | Implemented batch limits and state-based PR closure. Verified in PR.           |
+| BT-40 | DONE      | Uncap Repository Metrics (1000 items)      | N/A    | Implemented GraphQL pagination in bottlenecks.ts and priority_distribution.ts. |
+| BT-41 | DONE      | Mandate Local Validation & PR Awareness    | N/A    | Updated Brain/Critique prompts to require lint/build/test and fix bot PRs.     |
+| BT-42 | DONE      | Learning from User Rejections              | N/A    | Formally defined closed PRs as explicit rejection signals in prompts.          |
+| BT-43 | DONE      | CI Matrix Optimization                     | N/A    | Optimized ci.yml to run only Node 20.x on PRs, saving compute time.            |
+
+## 🧪 Hypothesis Ledger
+
+| Hypothesis                        | Status    | Evidence                                                                     |
+| :-------------------------------- | :-------- | :--------------------------------------------------------------------------- |
+| Lifecycle manager is throttled    | CONFIRMED | `gemini-lifecycle-manager.cjs` was limited to 100 items without pagination.  |
+| 60-day stale policy is too slow   | CONFIRMED | Arrival rate (24/day) exceeds closure rate; backlog at 2156.                 |
+| Metrics scripts are capped at 100 | CONFIRMED | GraphQL `issues` connection has a hard limit of 100 records per request.     |
+| Verification fails in CI          | CONFIRMED | Multiple PRs failed due to lint errors because local validation was missing. |
+
+## 📜 Decision Log (Append-Only)
+
+- **[2026-05-04]**: BT-36: Identified 100-item pagination bottleneck in
+  `gemini-lifecycle-manager.cjs`.
+- **[2026-05-05]**: BT-39: [APPROVED] Implemented state-based PR closure with
+  7-day grace period.
+- **[2026-05-05]**: BT-40: [DONE] Refactored `bottlenecks.ts` and
+  `priority_distribution.ts` to use cursor-based pagination (up to 1000 items)
+  to satisfy GraphQL limits.
+- **[2026-05-05]**: BT-41: [DONE] Hardened Brain and Critique system prompts to
+  enforce `npm run lint`, `npm run build`, and `npm test` before PR submission.
+- **[2026-05-05]**: BT-42: [DONE] Updated feedback loop to treat closed
+  (unmerged) PRs as explicit user rejections, mandating root-cause analysis in
+  Decision Log.
+- **[2026-05-05]**: BT-43: [DONE] Optimized `.github/workflows/ci.yml` matrix to
+  run only Node 20.x for pull requests, reducing job count and Actions cost by
+  ~57%. Full matrix coverage remains for main/release.
+
+## 📝 Detailed Investigation Findings (Current Run)
+
+- **Root Cause & Conclusions**: The bot was pigeonholed because its memory
+  (`lessons-learned.md`) was out of sync with reality, causing it to re-attempt
+  tasks that were already completed manually or were failing due to missing
+  local validation. Missing pagination in metrics scripts masked progress.
+- **Proposed Actions**:
+  1. Consolidate and move `lessons-learned.md` to
+     `tools/gemini-cli-bot/lessons-learned.md` to ensure persistent bot
+     awareness.
+  2. Mark metrics and validation tasks as DONE to trigger domain rotation.