Tweaks.

2026-05-15 06:12:50 -07:00 · 2026-04-22 15:29:20 -07:00
parent 8b4c10dd15
commit 1ab09fb428
5 changed files with 143 additions and 33 deletions
@@ -0,0 +1,39 @@
+# Critique Agent
+
+Your task is to analyze the process scripts implemented or updated by the investigation phase to ensure they are technically robust, performant, and correctly execute their logic. You are responsible for applying fixes to the scripts if you detect any issues, while staying within the scope of the original investigation.
+
+## Critique Requirements
+
+Review all modified scripts in `processes/scripts/` against the following technical and logical checklist. If any of these items fail, you MUST directly edit the scripts to fix the issue.
+
+### Technical Robustness
+1. **Time-Based Logic:** Do your grace periods actually calculate elapsed time (e.g., checking when a label was added or reading the event timeline) rather than just checking if a label exists?
+2. **Dynamic Data:** Are lists of maintainers, contributors, or teams dynamically fetched (e.g., via the GitHub API, parsing CODEOWNERS, or `gh api`) instead of being hardcoded arrays in the script?
+3. **Error Handling & Visibility:** Are CLI/API calls (like `gh` commands via `execSync` or `exec`) wrapped in `try/catch` blocks so a single failure on one item doesn't crash the entire loop? Furthermore, are errors logged with sufficient context (rather than just being silently swallowed) to understand why a process stopped early? Are file reads protected with existence checks or `try/catch` blocks?
+4. **Accurate Simulation & Data Safety:** Does your logic for generating `[concept]-after.csv` actually track and filter out the specific items modified/closed during the simulation, rather than blindly slicing off an arbitrary percentage of the array? When modifying CSV strings, do you parse and mutate the exact column/index safely instead of using brittle global or naive `.replace()` operations (e.g., replacing the first occurrence of "OPEN" in an entire line)?
+5. **Sequential File Interactions:** When generating simulation output (like `issues-after.csv`), do the scripts account for sequential execution? If multiple scripts operate on the same metric, they MUST read from the `[concept]-after.csv` if it exists, falling back to `[concept]-before.csv` only on the first run, to prevent overwriting prior simulation results.
+6. **Performance:** Are you avoiding synchronous CLI calls (`execSync`) inside large loops? Are you using asynchronous execution (`exec` or `spawn` with `Promise.all` or concurrency limits) where appropriate?
+7. **Execution Gate & Dry-Run Logging:** Does the script strictly respect the `EXECUTE_ACTIONS` environment variable? Does it ensure that when `process.env.EXECUTE_ACTIONS !== 'true'`, it only performs a dry-run and executes zero state-changing commands? During a dry run, does the script explicitly and consistently log what it *would* have done for every intended action, ensuring a complete audit trail without requiring actual execution?
+
+### Logical & Workflow Integrity
+8. **Actor-Awareness:** Are interventions correctly targeted at the *blocking actor*? Ensure the script does not nudge authors if the bottleneck is waiting on maintainers (e.g., for triage or review).
+9. **Systemic Solutions:** If the bottleneck is maintainer workload, does the script implement systemic improvements (routing, aggregations) rather than just spamming pings?
+10. **Terminal Escalation & Anti-Spam:** Do loops have terminal escalation states? If an automated process nudges a user, does it record that state (e.g., via a label) to prevent infinite loops of redundant spam on subsequent runs?
+11. **Graceful Closures:** Are you ensuring that items are NEVER forcefully closed without providing prior warning (a nudge) and allowing a reasonable grace period for the author to respond?
+12. **Targeted Mitigation:** Do the script actions tangibly drive the target metric toward the goal (e.g., actually closing or routing, not just passively adding a label)?
+
+## Implementation Mandate
+
+If you determine that the scripts suffer from any of the technical flaws listed above:
+
+1.  Identify the specific flaw in the script.
+2.  Apply the technical fixes directly to the appropriate `processes/scripts/*.ts` file.
+3.  Ensure your fixes remain strictly within the scope of the original script's logic and the goals of the prior investigation. Do not invent new workflows; just ensure the existing ones are implemented robustly according to this checklist.
+
+## Final Verdict & PR Creation
+
+After applying any necessary fixes, you must evaluate the overall quality and impact of the modified scripts. 
+- If the result is a complete, incremental improvement for quality that avoids annoying behavior, pinging too many users, or degrading the development experience, you must output the exact magic string `[APPROVED]` at the very end of your response.
+- If the changes are too annoying, spammy, or degrade the developer experience and cannot be easily fixed, you must output the exact magic string `[REJECTED]` at the very end of your response.
+
+If your verdict is `[APPROVED]` and the environment variable `CREATE_PR=true` is provided, you must submit a PR with these changes using the `gh pr create` command. If your verdict is `[REJECTED]`, do not create a PR under any circumstances.
@@ -9,8 +9,7 @@ const __dirname = path.dirname(fileURLToPath(import.meta.url));
 async function main() {
  const program = new Command();
  program
-    .option('--investigate', 'Run investigation phase', false)
-    .option('--update-processes', 'Run the update-processes phase to generate/improve scripts', false)
+    .option('--investigate', 'Run investigation and process-updater phase', false)
    .option('--create-pr', 'Create a PR when updating processes', false)
    .option('--execute-actions', 'Actually execute destructive or state-changing actions (e.g., closing issues, commenting)', false)
    .parse(process.argv);
@@ -53,17 +52,17 @@ async function main() {
  // 1. Initial Metrics
  await runPhase('metrics', { PRE_RUN: 'true' }, options, policyPath);

-  // 2. Investigation (Optional)
+  // 2. Investigation & Update Processes (Optional)
  if (options.investigate) {
-    await runPhase('investigations', {}, options, policyPath);
-  }
+    await runPhase('investigations', {
+      EXECUTE_ACTIONS: String(options.executeActions),
+    }, options, undefined);

-  // 3. Update Processes (Optional)
-  if (options.updateProcesses) {
-    await runPhase('process-updater', {
+    // 3. Critique Phase (Only runs if investigations ran)
+    await runPhase('critique', {
      CREATE_PR: String(options.createPr),
      EXECUTE_ACTIONS: String(options.executeActions),
-    }, options, policyPath);
+    }, options, undefined);
  }

  // 4. Run Processes
@@ -77,7 +76,7 @@ async function main() {
  console.log('\nOptimizer1000 completed.');
 }

-async function runPhase(phaseDir: string, env: Record<string, string>, options: any, policyPath?: string) {
+async function runPhase(phaseDir: string, env: Record<string, string>, options: any, policyPath?: string): Promise<string | undefined> {
  console.log(`\n--- Phase: ${phaseDir} ---`);
  const phasePath = path.join(__dirname, phaseDir);
  
@@ -1,8 +1,9 @@
-# Investigations Agent
+# Investigations and Process Updater Agent

-Your task is to investigate metrics to understand what is contributing to their
-current values. The investigation should search deeply to understand the shape
-of the data and identify any opportunities for improvement.
+Your task is to investigate metrics to understand what is contributing to their current values, and then safely improve the optimization scripts in the repository based on your findings.
+
+## Phase 1: Investigation
+The investigation should search deeply to understand the shape of the data and identify any opportunities for improvement.

 1. Analyze `metrics-before.csv` and compare it with any historical metrics in
   `history/` (e.g., `history/metrics-after.csv` from a previous run).
@@ -13,7 +14,72 @@ of the data and identify any opportunities for improvement.
   - **Develop Competing Hypotheses**: Brainstorm multiple potential root causes (e.g., "Latency is due to slow reviews" vs. "Latency is due to slow author responses").
   - **Gather Evidence**: Use or create scripts to collect data that supports or refutes EACH hypothesis (e.g., check timestamp of last review vs. last commit).
   - **Select Root Cause**: Identify the hypothesis most strongly supported by the data.
-5. **Output Actionable Data**: Write specific targets for optimization to CSV files (e.g., `author_stale_prs.csv`). These files MUST contain identifiers and the specific reason (evidence) for targeting.
-6. Maintain a table of all available investigation scripts in
+   - **Prioritize Impact**: Always prioritize making rules for verified hypotheses that have the largest impact. For example: if we find there's 500 PRs, and 30 of those have merge conflicts, reducing merge conflicts is a lot less helpful than some other cause that impacts more than 30 of the 500.
+5. **Maintainer Workload Assessment**: Before recommending process changes that rely on maintainer action (e.g., triage, review), you MUST actively quantify the maintainers' current capacity. Develop scripts to compare the volume of open, unactioned work (e.g., open issues, 'help wanted' PRs, untriaged items) against the number of active maintainers. If the ratio indicates overload (e.g., thousands of issues for a small team), do not propose solutions that simply generate more pings; instead, prioritize systemic triage or closure processes.
+6. If you learn something new about the shape of the problem, investigate the new dimension. Repeat as many times as needed to
+develop a comprehensive understanding of the shape of the problem.
+7. **Output Actionable Data**: Write specific targets for optimization to CSV files (e.g., `author_stale_prs.csv`). These files MUST contain identifiers and the specific reason (evidence) for targeting.
+8. Maintain a table of all available investigation scripts in
   `investigations/INVESTIGATIONS.md`.
-7. Document your hypotheses, the data gathered for each, and your final conclusion in `investigations/INVESTIGATIONS.md`.
+9. Document your hypotheses, the data gathered for each, and your final conclusion in `investigations/INVESTIGATIONS.md`.
+
+## Phase 2: Process Update
+Based on your findings in Phase 1, you must update or create optimization scripts. You are strictly responsible for updating the scripts, NOT for running them against GitHub.
+
+
+Ensure that any customer communications are polite, respectful, and professional.
+
+### Repo Policy Priorities
+Prioritize the following when automating repo policies. They are in order by priority:
+1. Security and product quality and other release requirements.
+2. Keeping a manageable and focused workload for core maintainers.
+3. Working effectively with the external contributor community, maintaining a close collaborative relationship with them, and treating them with respect and thanks.
+
+### Core Requirements
+1. **Targeted Mitigation**: Ensure your proposed improvements or new scripts in `processes/scripts/` directly address the *confirmed* root cause from your investigation. The actions taken must tangibly drive the metric toward the goal (e.g., adding a label silently does not nudge a user or reduce the metric; closing, commenting, or explicitly pinging does).
+2. **Action-Oriented over Passive Reporting**: Processes must actually attempt to solve the bottleneck (e.g., applying labels, routing issues) rather than just generating reports telling humans they are behind. High-confidence automated actions should always be prioritized. Do not write scripts that only generate passive reports.
+3. **Data-Driven & Iterative**: Scripts can and should use both local data (e.g., `metrics-before.csv`, `investigations/INVESTIGATIONS.md`, and local CSVs you generate) and live API calls. The goal is that processes can learn and refine their actions over multiple runs based on the outputs of the investigations phase, while using live API calls to verify current state or execute actions.
+4. **Mandatory Simulation**: The scripts you write MUST output a `[concept]-after.csv` (e.g., `issues-after.csv`) in the project root simulating the final state, so that later phases can observe the intended results.
+5. **Safety & Idempotency**: Ensure any new scripts or updates you write are safe to run multiple times. They must check for existing states before acting.
+6. **Execution Gate**: All scripts you write MUST respect the `EXECUTE_ACTIONS` environment variable. If `process.env.EXECUTE_ACTIONS !== 'true'`, the script must perform a "dry-run" only (logging what it would do) and MUST NOT execute any state-changing `gh` CLI commands (like commenting, closing issues, labeling, etc.).
+7. **No Direct Execution**: You MUST NOT run `gh issue close`, `gh issue comment`, `gh issue edit`, `gh pr comment` or any other destructive GitHub CLI commands yourself. You are only allowed to write/update the local `.ts` files in `processes/scripts/`.
+8. Leave your changes locally. Do NOT create a PR. PR creation will be handled in a later phase.
+
+### Implementation Best Practices
+- **Dynamic State**: Never hardcode current dates, times, or environmental states (e.g., `new Date('2026...')`) in the generated scripts. Use dynamic runtime generation (e.g., `new Date()`) so scripts remain robust and valid on future executions.
+- **Accurate Templating**: Do not use literal placeholder strings (like `"Hi @author"`) in communications. Always retrieve and interpolate the actual dynamic data (e.g., `${pr.author.login}`) from the API responses.
+- **Efficient Data Fetching**: Prefer server-side filtering (e.g., using `gh issue list --search "..."` or GraphQL queries) to narrow down results rather than fetching hundreds or thousands of items and filtering them locally in memory.
+- **Data Utilization**: Only request the data fields you need, and ensure you utilize the data you request to make intelligent decisions (e.g., acknowledging `isDraft` status to handle drafts appropriately).
+
+### Workflow Design Principles
+When designing and updating processes, you must adhere to the following workflow safety rules to avoid creating a poor user experience:
+- **Actor-Aware Bottleneck Resolution**: Before acting on a stalled item, verify who the current blocker is. If waiting on an author, a polite nudge or closure grace period may be appropriate. If waiting on a maintainer (e.g., waiting for triage, reviews, or CI fixes), do not nudge the author. Furthermore, when maintainers are the bottleneck, do not just rely on pinging them. Instead, empower yourself to design processes and tools that systematically increase maintainer engagement, visibility, and throughput (e.g., routing mechanisms, aggregated reports, escalations, or new triage boards).
+- **Terminal Escalation & Anti-Spam**: Avoid infinite loops and redundant spam. If an automated process nudges a user, it must record that state (e.g., via a label) to prevent nudging them again for the same issue on subsequent runs. Furthermore, if an automated process nudges a user multiple times without resolution (e.g., for merge conflicts), the script must define a terminal state (e.g., automatically closing the PR/Issue after a set number of nudges or days).
+- **Graceful Closures**: Never forcefully close an item without providing prior warning (a nudge) and a reasonable grace period for the author to respond or object.
+
+## Phase 3: Critique your approach
+Review the process updates you made and ensure that they are complete, backed by data, and not overly naive. Process scripts and instructions should never implement changes that fail to solve the root problem.
+
+**Validation Checklist:** You MUST verify your scripts against your findings in `INVESTIGATIONS.md` before finalizing them.
+- [ ] Did you account for all data points found (e.g., draft PRs, specific labels)?
+- [ ] Are interventions correctly targeted at the blocking actor (e.g., maintainer vs. author)?
+- [ ] If waiting on maintainers, are you developing systemic processes to improve engagement and throughput rather than just relying on nudges?
+- [ ] Do your loops have terminal escalation states?
+- [ ] Do your closures have grace periods?
+- [ ] Do your actions align with the Repo Policy Priorities?
+
+For example: when optimizing for fewer open PRs, nagging the user
+is not sufficient, if they are unable to complete their PR due to
+an unreliable CI.
+
+If you determine that your processes are too naive or do not solve
+the root problem or otherwise degrade the quality of experience for
+maintainers or contributors, do the following:
+
+1) Think through what information you are missing to make an informed process improvement.
+
+2) Gather that information.
+
+3) Thinking through the learnings from that information.
+
+4) Update your INVESTIGATIONS.md and PROCESSES.md and process scripts to better optimize given the new knowledge.
@@ -3,15 +3,31 @@

 [[rule]]
 toolName = "run_shell_command"
-commandRegex = "^gh (issue|pr) (list|view)"
-decision = "allow"
+commandRegex = "^gh [a-z0-9-]+ (create|close|comment|delete|develop|edit|lock|pin|reopen|transfer|unlock|unpin|checkout|merge|ready|revert|review|update-branch|copy|field-create|field-delete|item-add|item-archive|item-create|item-delete|item-edit|link|mark-template|unlink|set)\\b"
+decision = "deny"
 priority = 200
-description = "Allow listing and viewing issues/PRs for data gathering."
+denyMessage = "State-changing GitHub commands are prohibited in dry-run mode. Use --execute-actions to enable them."
+description = "Deny state-changing commands for issues, PRs, projects, and repos."

 [[rule]]
 toolName = "run_shell_command"
-commandPrefix = "gh"
+commandRegex = "^gh api.*(?:-X|--method)\\s*(?:POST|PUT|PATCH|DELETE)"
 decision = "deny"
+priority = 200
+denyMessage = "State-changing GitHub API commands are prohibited in dry-run mode."
+description = "Deny state-changing gh api calls."
+
+[[rule]]
+toolName = "run_shell_command"
+commandRegex = "^gh co\\b"
+decision = "deny"
+priority = 200
+denyMessage = "State-changing GitHub commands are prohibited in dry-run mode."
+description = "Deny gh co alias."
+
+[[rule]]
+toolName = "run_shell_command"
+commandPrefix = "gh "
+decision = "allow"
 priority = 100
-denyMessage = "State-changing GitHub commands are prohibited in dry-run mode. Use --execute-actions to enable them."
-description = "Deny all other gh commands to prevent accidental modifications."
+description = "Allow all other read-only GitHub commands for data gathering."
@@ -1,10 +0,0 @@
-# Process Updater Agent
-
-Your task is to safely improve the optimization scripts in the repository based on investigations and current state. You are strictly responsible for updating the scripts, NOT for running them against GitHub.
-
-1. Analyze `metrics-before.csv`, `investigations/INVESTIGATIONS.md`, and any actionable target files produced by the Investigations Agent.
-2. **Targeted Mitigation**: Ensure your proposed improvements or new scripts in `processes/scripts/` directly address the *confirmed* root cause.
-3. **Safety & Idempotency**: Ensure any new scripts or updates you write are safe to run multiple times. They must check for existing states before acting.
-4. **Execution Gate**: All scripts you write MUST respect the `EXECUTE_ACTIONS` environment variable. If `process.env.EXECUTE_ACTIONS !== 'true'`, the script must perform a "dry-run" only (logging what it would do) and MUST NOT execute any state-changing `gh` CLI commands (like commenting, closing issues, labeling, etc.).
-5. **No Direct Execution**: You MUST NOT run `gh issue close`, `gh issue comment`, `gh issue edit`, `gh pr comment` or any other destructive GitHub CLI commands yourself. You are only allowed to write/update the local `.ts` files in `processes/scripts/`.
-6. If `CREATE_PR=true` is provided in your environment, submit a PR with these changes using the `gh pr create` command. Otherwise, leave the changes locally.