This PR implements a more robust issue deduplication workflow for the backlog, addressing critical feedback from the previous iteration.
## Changes
1. **State Tracking with `status/checked-for-duplicates` label**:
- Added a new label `status/checked-for-duplicates` that is applied to every issue processed by the bot, even if no duplicates are found.
- Updated the backlog search query to exclude issues with this label (`-label:status/checked-for-duplicates`).
- This prevents the bot from re-processing the same unique issues every day, solving the infinite loop risk.
2. **Optimized Batch and Turn Limits**:
- Reduced the processing batch size from 50 to 20 issues.
- Increased `maxSessionTurns` from 50 to 100.
- This ensures Gemini has enough turns to fetch details for potential duplicates and perform thorough analysis without hitting session limits.
3. **Safety Truncation for Issue Bodies**:
- Added a `jq` step to truncate issue bodies to the first 2000 characters before passing them to Gemini.
- This prevents potential environment variable overflow issues in GitHub Actions runners for issues with extremely large descriptions.
4. **Automatic Label Creation**:
- Updated the `github-script` step to automatically ensure that both `status/checked-for-duplicates` and `status/possible-duplicate` labels exist in the repository before attempting to apply them.
## Impact
- **Efficiency**: Clears the backlog systematically without redundant processing.
- **Reliability**: Reduces the risk of session timeouts and environment variable overflows.
- **Visibility**: Clearly indicates which issues have been reviewed for duplicates.
## Validation
- Ran `npm run lint` to ensure no regressions in repository standards.
- Manually verified the `jq` truncation logic and GraphQL search query syntax.
This update resolves the bot's persistent focus on already-completed tasks:
- Moves and syncs lessons-learned.md to tools/gemini-cli-bot/ to ensure persistent memory.
- Marks metrics fixes, prompt hardenings, and user rejection signals as DONE in the ledger.
- Implements the CI matrix optimization (Node 20.x for PRs) the bot was re-attempting.
- This forces the bot to rotate to a new domain in the next run by satisfying its current goals.
This update hardens the bot's reasoning and validation layers to stop thrashing and ensure technical quality:
- Mandates local validation (lint, build, test) in Brain and Critique prompts.
- Uncaps bottleneck metrics (zombie issues, priority distribution) to 1000 items.
- Enhances PR awareness to handle multiple bot identities and exclude release PRs.
- Formally defines closed (unmerged) PRs as explicit user rejection signals.
- Strengthens domain rotation and anti-pigeonholing enforcement.
Fixes the throughput metrics script and introduces new visibility into backlog bottlenecks and priority distribution.
### Changes
- **Throughput Fixes**: Resolved a `ReferenceError` where `isMaintainer` was not correctly scoped, fixed a malformed license header, and added a new metric for `issue_arrival_rate_per_day` to enable growth-vs-closure analysis.
- **Backlog Bottlenecks**: Introduced `bottlenecks.ts` to identify "Zombie" issues (no activity > 30 days) and "Hot" issues (high activity).
- **Priority Distribution**: Introduced `priority_distribution.ts` to track the count of open issues by priority level (P0-P3).
### Impact
These metrics will provide the necessary data to confirm if the repository is experiencing systemic backlog growth (Arrival Rate > Throughput) and help identify which segments of the backlog require urgent triage.