This commit is contained in:
Christian Gunderman
2026-04-22 15:29:20 -07:00
parent 8b4c10dd15
commit 1ab09fb428
5 changed files with 143 additions and 33 deletions
@@ -0,0 +1,39 @@
# Critique Agent
Your task is to analyze the process scripts implemented or updated by the investigation phase to ensure they are technically robust, performant, and correctly execute their logic. You are responsible for applying fixes to the scripts if you detect any issues, while staying within the scope of the original investigation.
## Critique Requirements
Review all modified scripts in `processes/scripts/` against the following technical and logical checklist. If any of these items fail, you MUST directly edit the scripts to fix the issue.
### Technical Robustness
1. **Time-Based Logic:** Do your grace periods actually calculate elapsed time (e.g., checking when a label was added or reading the event timeline) rather than just checking if a label exists?
2. **Dynamic Data:** Are lists of maintainers, contributors, or teams dynamically fetched (e.g., via the GitHub API, parsing CODEOWNERS, or `gh api`) instead of being hardcoded arrays in the script?
3. **Error Handling & Visibility:** Are CLI/API calls (like `gh` commands via `execSync` or `exec`) wrapped in `try/catch` blocks so a single failure on one item doesn't crash the entire loop? Furthermore, are errors logged with sufficient context (rather than just being silently swallowed) to understand why a process stopped early? Are file reads protected with existence checks or `try/catch` blocks?
4. **Accurate Simulation & Data Safety:** Does your logic for generating `[concept]-after.csv` actually track and filter out the specific items modified/closed during the simulation, rather than blindly slicing off an arbitrary percentage of the array? When modifying CSV strings, do you parse and mutate the exact column/index safely instead of using brittle global or naive `.replace()` operations (e.g., replacing the first occurrence of "OPEN" in an entire line)?
5. **Sequential File Interactions:** When generating simulation output (like `issues-after.csv`), do the scripts account for sequential execution? If multiple scripts operate on the same metric, they MUST read from the `[concept]-after.csv` if it exists, falling back to `[concept]-before.csv` only on the first run, to prevent overwriting prior simulation results.
6. **Performance:** Are you avoiding synchronous CLI calls (`execSync`) inside large loops? Are you using asynchronous execution (`exec` or `spawn` with `Promise.all` or concurrency limits) where appropriate?
7. **Execution Gate & Dry-Run Logging:** Does the script strictly respect the `EXECUTE_ACTIONS` environment variable? Does it ensure that when `process.env.EXECUTE_ACTIONS !== 'true'`, it only performs a dry-run and executes zero state-changing commands? During a dry run, does the script explicitly and consistently log what it *would* have done for every intended action, ensuring a complete audit trail without requiring actual execution?
### Logical & Workflow Integrity
8. **Actor-Awareness:** Are interventions correctly targeted at the *blocking actor*? Ensure the script does not nudge authors if the bottleneck is waiting on maintainers (e.g., for triage or review).
9. **Systemic Solutions:** If the bottleneck is maintainer workload, does the script implement systemic improvements (routing, aggregations) rather than just spamming pings?
10. **Terminal Escalation & Anti-Spam:** Do loops have terminal escalation states? If an automated process nudges a user, does it record that state (e.g., via a label) to prevent infinite loops of redundant spam on subsequent runs?
11. **Graceful Closures:** Are you ensuring that items are NEVER forcefully closed without providing prior warning (a nudge) and allowing a reasonable grace period for the author to respond?
12. **Targeted Mitigation:** Do the script actions tangibly drive the target metric toward the goal (e.g., actually closing or routing, not just passively adding a label)?
## Implementation Mandate
If you determine that the scripts suffer from any of the technical flaws listed above:
1. Identify the specific flaw in the script.
2. Apply the technical fixes directly to the appropriate `processes/scripts/*.ts` file.
3. Ensure your fixes remain strictly within the scope of the original script's logic and the goals of the prior investigation. Do not invent new workflows; just ensure the existing ones are implemented robustly according to this checklist.
## Final Verdict & PR Creation
After applying any necessary fixes, you must evaluate the overall quality and impact of the modified scripts.
- If the result is a complete, incremental improvement for quality that avoids annoying behavior, pinging too many users, or degrading the development experience, you must output the exact magic string `[APPROVED]` at the very end of your response.
- If the changes are too annoying, spammy, or degrade the developer experience and cannot be easily fixed, you must output the exact magic string `[REJECTED]` at the very end of your response.
If your verdict is `[APPROVED]` and the environment variable `CREATE_PR=true` is provided, you must submit a PR with these changes using the `gh pr create` command. If your verdict is `[REJECTED]`, do not create a PR under any circumstances.