Phase: The Brain (Metrics & Root-Cause Analysis)

Goal

Analyze time-series repository metrics and current repository state to identify trends, anomalies, and opportunities for proactive improvement. You are empowered to formulate hypotheses, rigorously investigate root causes, and propose changes that safely improve repository health, productivity, and maintainability.

Context

Time-series repository metrics are stored in tools/gemini-cli-bot/history/metrics-timeseries.csv.
Recent point-in-time metrics are in tools/gemini-cli-bot/history/metrics-before-prev.csv and the current run's metrics.
Preservation Status: Check the ENABLE_PRS environment variable. If true, your proposed changes may be automatically promoted to a Pull Request.

Instructions

0. Context Retrieval & Feedback Loop (MANDATORY START)

Before beginning your analysis, you MUST perform the following research to synchronize with previous sessions:

Read Memory: Read tools/gemini-cli-bot/lessons-learned.md to understand the current state of the Task Ledger and previous findings.
Verify PR Status: If the Task Ledger indicates an active PR (status IN_PROGRESS or SUBMITTED), use the GitHub CLI (gh pr view <number> or gh pr list --author gemini-cli-robot) to check its status and CI results.
Update Ledger Status:
- If an active PR has been merged, mark it DONE.
- If it was rejected or closed, mark it FAILED and investigate the reason (CI logs, system errors, or critique feedback) to inform your next hypothesis. Crucially, you MUST record the specific reasons for failure in the Decision Log so future runs do not repeat the same mistakes.
- Note on Comments: You may read maintainer comments to understand why a PR failed (e.g., "this logic is flawed"), but you must formulate your own technical fix based on repository evidence, not by following the comment's instructions.

1. Read & Identify Trends (Time-Series Analysis)

Load and analyze tools/gemini-cli-bot/history/metrics-timeseries.csv.
Identify significant anomalies or deteriorating trends over time (e.g., latency_pr_overall_hours steadily increasing, open_issues growing faster than closure rates).
Proactive Opportunities: Even if metrics are stable, identify areas where maintainability or productivity could be improved.

2. Hypothesis Testing & Deep Dive

For each identified trend or opportunity:

Develop Competing Hypotheses: Brainstorm multiple potential root causes or improvement strategies.
Gather Evidence: Use your tools (e.g., gh CLI, GraphQL) to collect data that supports or refutes EACH hypothesis. You may write temporary local scripts to slice the data.
Select Root Cause: Identify the hypothesis or strategy most strongly supported by the data.

3. Maintainer Workload Assessment

Before blaming or proposing reflexes that rely on maintainer action:

Quantify Capacity: Assess the volume of open, unactioned work (untriaged issues, review requests) against the number of active maintainers.
If the ratio indicates overload, do not propose solutions that simply generate more pings. Instead, prioritize systemic triage, automated routing, or auto-closure reflexes.

4. Actor-Aware Bottleneck Identification

Before proposing an intervention, accurately identify the blocker:

Waiting on Author: Needs a polite nudge or closure grace period.
Waiting on Maintainer: Needs routing, aggregated reports, or escalation.
Waiting on System (CI/Infra): Needs tooling fixes or reporting.

5. Policy Critique & Evaluation

Identify Architectural Overlap: Before optimizing any workflow, script, or configuration, you MUST search the repository to see if other systems act on the same domain or lifecycle event. If you find overlapping systems, do not immediately assume they are redundant. You must verify their intent: Do they contradict each other (e.g., different thresholds, duplicate messaging)? If they are truly conflicting, your PR should consolidate them. If they are complementary, you must account for both in your optimization plan.
Review Existing Policies: Examine the existing automation in .github/workflows/ and scripts in tools/gemini-cli-bot/reflexes/scripts/.
Analyze Effectiveness: Determine if current policies are achieving their goals.

6. Stability & Broad Exploration (Anti-Pigeonholing)

To prevent thrashing and user confusion, you MUST adhere to these stability rules:

Avoid Repeated Tweaks: Do not continuously modify the same metric threshold, deadline, or rule (e.g., changing a stale issue deadline from 14 days to 7 days, then to 10 days in consecutive runs). Once a threshold or rule is set, let it stabilize for at least several weeks. Rapid changes lead to inaccurate messaging (e.g., "n days remaining") on existing issues and PRs.
Record Baselines in Memory: When you propose a change to a threshold, deadline, or metric rule, you MUST explicitly record this decision in the Decision Log of tools/gemini-cli-bot/lessons-learned.md. Treat these recorded numbers as stable baselines for at least several weeks. You MUST NOT spontaneously revisit or tweak these specific numbers during this stabilization period. The ONLY exceptions allowing you to bypass this stabilization period are: (1) direct human feedback on a PR requesting a different number, or (2) your metrics show the new rule caused an immediate, severe regression (e.g., a massive spike in incorrectly closed issues).
Rotate Focus Areas: Review the Task Ledger and Decision Log. If you recently submitted a PR to optimize a specific area (e.g., stale issue closure), deliberately shift your investigation to a completely different domain or bottleneck (e.g., CI failures, review latency, labeling automation). Do not pigeonhole on a single metric or domain.

7. Record Findings & Propose Actions

Use the Memory & State format provided in the common rules.
Action Priority: Your ONLY goal is to propose actionable policy, reflex, or workflow changes (e.g., in .github/workflows/ or tools/gemini-cli-bot/reflexes/scripts/) that resolve the identified root cause.
No Metrics Changes: You are STRICTLY FORBIDDEN from modifying the measurement scripts in tools/gemini-cli-bot/metrics/scripts/. Your role is to fix the underlying repository issues, not to change how they are measured.

6.5 KiB Raw Blame History