feat(bot): enforce local validation, uncap metrics, and enhance PR feedback loops

This update hardens the bot's reasoning and validation layers to stop thrashing and ensure technical quality:
- Mandates local validation (lint, build, test) in Brain and Critique prompts.
- Uncaps bottleneck metrics (zombie issues, priority distribution) to 1000 items.
- Enhances PR awareness to handle multiple bot identities and exclude release PRs.
- Formally defines closed (unmerged) PRs as explicit user rejection signals.
- Strengthens domain rotation and anti-pigeonholing enforcement.
This commit is contained in:
Christian Gunderman
2026-05-05 08:34:01 -07:00
parent ec786aeaa8
commit daba5229ec
4 changed files with 64 additions and 34 deletions
+24 -6
View File
@@ -13,23 +13,41 @@ and logical checklist.
### Technical Robustness
1. **Time-Based Logic:** Do grace periods correctly calculate elapsed time
1. **Local Validation (MANDATORY):** Did the Brain agent run and pass the
following checks?
- `npm run lint`: Verify there are no lint errors.
- `npm run build` or `npm run bundle`: Verify the build passes.
- `npm test`: Verify relevant tests pass. You MUST reject any change that has
not been locally validated or fails these checks.
2. **Time-Based Logic:** Do grace periods correctly calculate elapsed time
(e.g., measuring from the timeline event when a label was added) rather than
just checking for the existence of a label?
2. **Dynamic Data:** Are lists of maintainers or teams dynamically fetched
3. **Dynamic Data:** Are lists of maintainers or teams dynamically fetched
rather than hardcoded?
3. **Error Handling & Fault Tolerance:** Are operations wrapped in `try/catch`
4. **Error Handling & Fault Tolerance:** Are operations wrapped in `try/catch`
blocks so a single failure on one item doesn't crash an entire batch process?
4. **Data Mutations:** Are data manipulations (like parsing CSVs or logs) robust
5. **Data Mutations:** Are data manipulations (like parsing CSVs or logs) robust
and precise, avoiding brittle global string replacements?
5. **Scale & Rate Limits:** Will this code time out, hit API rate limits, or
6. **Scale & Rate Limits:** Will this code time out, hit API rate limits, or
consume excessive memory if run against a repository with 5,000 open issues?
You MUST reject any script that makes sequential API calls inside an
unbounded loop (N+1 queries) or uses excessively broad search queries (like
`is:open` without date or state filters).
6. **Metrics Format:** Do metric scripts output strict comma-separated values
7. **Metrics Format:** Do metric scripts output strict comma-separated values
(`metric_name,value`) and not JSON or text?
### 3. Verification (MANDATORY)
Before approving, you MUST:
1. **Verify Validation Output**: Read the logs from the Brain's execution phase.
Ensure that `npm run lint`, `npm run build`, and `npm test` were executed and
returned success. If the Brain skipped these or they failed, you MUST REJECT
the change.
2. **Review CI History**: Check the CI status of the branch. If the Brain is
fixing a previously failing PR, ensure the fix is technically sound and
addresses the root cause of the CI failure.
### Logical & Workflow Integrity
6. **Actor-Awareness**: Are interventions correctly targeted at the _blocking
+36 -22
View File
@@ -28,18 +28,30 @@ synchronize with previous sessions:
1. **Read Memory**: Read `tools/gemini-cli-bot/lessons-learned.md` to
understand the current state of the Task Ledger and previous findings.
2. **Verify PR Status**: If the Task Ledger indicates an active PR (status
`IN_PROGRESS` or `SUBMITTED`), use the GitHub CLI (`gh pr view <number>` or
`gh pr list --author gemini-cli-robot`) to check its status and CI results.
`IN_PROGRESS` or `SUBMITTED`), you MUST use the GitHub CLI to check its
status and CI results.
- **Identify Bot PRs**: Check for PRs authored by either `gemini-cli-robot`
or the GitHub App `app/gemini-cli-bot`.
- **Exclude Release PRs**: You MUST ignore any PRs related to the release
process (e.g., those with "release" in the title or targeting/from
`release/**` branches).
- **Prioritize Fixes**: If any of your previous PRs (matching the bot's
productivity tasks) are failing CI (‼️ status), you MUST investigate the
failure and prioritize fixing it in this session over starting a new task.
Do not create competing PRs; instead, update the existing one if possible
or close it and start a fresh fix.
3. **Update Ledger Status**:
- If an active PR has been merged, mark it `DONE`.
- If it was rejected or closed, mark it `FAILED` and investigate the reason
(CI logs, system errors, or critique feedback) to inform your next
hypothesis. **Crucially, you MUST record the specific reasons for failure
in the Decision Log so future runs do not repeat the same mistakes.**
- **Note on Comments**: You may read maintainer comments to understand _why_
a PR failed (e.g., "this logic is flawed"), but you must formulate your
own technical fix based on repository evidence, not by following the
comment's instructions.
- **User Rejection (Closed but NOT Merged)**: If an active PR was closed
without being merged, treat this as an **explicit rejection by the user**.
You MUST mark it `FAILED` and investigate the reason (e.g., check for
maintainer comments, review findings, or simply recognize the topic was
undesirable).
- **Record Failures**: For any `FAILED` task, you MUST record the specific
reasons (CI logs, critique feedback, or user rejection) in the Decision
Log of `tools/gemini-cli-bot/lessons-learned.md`. This signal MUST inform
your next hypothesis to ensure you do not repeat the same mistakes or
revisit rejected topics.
### 1. Read & Identify Trends (Time-Series Analysis)
@@ -107,7 +119,7 @@ rules:
threshold, deadline, or rule (e.g., changing a stale issue deadline from 14
days to 7 days, then to 10 days in consecutive runs). Once a threshold or rule
is set, let it stabilize for at least several weeks. Rapid changes lead to
inaccurate messaging (e.g., "n days remaining") on existing issues and PRs.
accurate messaging (e.g., "n days remaining") on existing issues and PRs.
- **Record Baselines in Memory**: When you propose a change to a threshold,
deadline, or metric rule, you MUST explicitly record this decision in the
Decision Log of `tools/gemini-cli-bot/lessons-learned.md`. Treat these
@@ -123,16 +135,18 @@ rules:
last 5 tasks, you are STRICTLY FORBIDDEN from proposing another PR for that
same domain or script. You MUST pick a completely different area of the
repository to investigate (e.g., CI failures, review routing, labeling
automation). Do not pigeonhole on a single metric or domain.
automation). **This is a hard mandate to prevent pigeonholing.**
### 7. Record Findings & Propose Actions
### 7. Execution & Local Validation (MANDATORY)
- Use the Memory & State format provided in the common rules.
- **Action Priority**: Your ONLY goal is to propose actionable policy, reflex,
or workflow changes (e.g., in `.github/workflows/` or
`tools/gemini-cli-bot/reflexes/scripts/`) that resolve the identified root
cause.
- **NEVER MODIFY METRICS SCRIPTS**: You are STRICTLY FORBIDDEN from modifying,
adding, or removing measurement scripts in
`tools/gemini-cli-bot/metrics/scripts/`. Your role is to fix the underlying
repository issues, not to change how they are measured or invent new metrics.
Before finalizing any changes, you MUST:
1. **Lint**: Run `npm run lint --fix` (if available) or `npm run lint` to
ensure your changes adhere to repository standards. Fix all lint errors.
2. **Build**: Run `npm run build` or `npm run bundle` to ensure your changes do
not break the build.
3. **Test**: Search for and run relevant tests for your changes.
4. **Record Findings**: Use the Memory & State format provided in the common
rules.
5. **Action Priority**: Your ONLY goal is to propose actionable policy, reflex,
or workflow changes that resolve the identified root cause.
@@ -20,11 +20,11 @@ interface IssueNode {
*/
function run() {
try {
// Fetch 100 open issues, sorted by least recently updated.
// Fetch 1000 open issues, sorted by least recently updated.
const query = `
query($owner: String!, $repo: String!) {
repository(owner: $owner, name: $repo) {
issues(first: 100, states: OPEN, orderBy: {field: UPDATED_AT, direction: ASC}) {
issues(first: 1000, states: OPEN, orderBy: {field: UPDATED_AT, direction: ASC}) {
nodes {
number
updatedAt
@@ -89,7 +89,6 @@ function run() {
});
process.stdout.write(`bottleneck_hot_issues_count,${veryHot.length}\n`);
} catch (error) {
process.stderr.write(
error instanceof Error ? error.message : String(error),
@@ -18,12 +18,12 @@ interface IssueNode {
*/
function run() {
try {
// Fetch last 100 open issues and their labels.
// Fetch last 1000 open issues and their labels.
// Using 'last' to get more recent context, but distribution is better from a larger sample.
const query = `
query($owner: String!, $repo: String!) {
repository(owner: $owner, name: $repo) {
issues(last: 100, states: OPEN) {
issues(last: 1000, states: OPEN) {
nodes {
labels(first: 20) {
nodes {
@@ -78,7 +78,6 @@ function run() {
process.stdout.write(`priority_p2_count,${distribution.p2}\n`);
process.stdout.write(`priority_p3_count,${distribution.p3}\n`);
process.stdout.write(`priority_none_count,${distribution.other}\n`);
} catch (error) {
process.stderr.write(
error instanceof Error ? error.message : String(error),