chore(optimizer): update processes based on investigation

This commit is contained in:
Christian Gunderman
2026-04-21 17:50:55 -07:00
parent aa30764f0a
commit b29db20cf4
11 changed files with 272 additions and 15 deletions
@@ -1,9 +1,48 @@
# Deep-Dive Investigations
# Optimizer Investigations
This file documents ad hoc investigations performed to understand contributing factors to metrics.
This document contains findings from the analysis of project metrics, focusing on issues, PRs, and general project health.
| Investigation | Metric | Script | Findings |
|---------------|--------|--------|----------|
| Issue Labels | open_issues | `investigate_issues.cjs` | 1000 open issues. 60% (609) are stuck in `status/need-triage`. Other prominent labels: `area/agent` (339), `area/core` (271). High number of `status/possible-duplicate` (207). |
| PR Labels | open_community_prs | `investigate_prs.cjs` | 485 total open PRs. Major categories: `area/core` (215), `help wanted` (204). Many lack linked issues (`status/need-issue`: 86). |
| Metrics Comparison | all | N/A | Current metrics (open_issues: 1000, open_community_prs: 336, completed_community_prs: 1136) match the latest `metrics-after.csv` in the root exactly. Metrics are currently static/unchanged compared to recent runs. |
## 1. Metrics Overview
The current metrics baseline (`metrics-before.csv`) is as follows:
- **Completed Community PRs:** 1136
- **Open Community PRs:** 336 (Note: Total open PRs fetched via script is 486)
- **Open Issues:** 1000
- **PR Latency:** 40.84
- **Test Flakiness:** 374
*Historical Comparison:* There were no historical metrics in the `history/` directory to compare against, so we cannot determine if these are improving or worsening over time.
## 2. Issues Analysis
We ran a script to analyze issue labels, and we developed two additional scripts (`investigate_age.cjs` and `investigate_assignees.cjs`) to extract more data.
**Key Findings:**
- **Triage Bottleneck:** A significant majority of issues (609 out of 1000) have the `status/need-triage` label.
- **Unassigned Issues:** An overwhelming 85.6% of open issues (856 out of 1000) are `UNASSIGNED`. This indicates a major gap in routing or taking ownership of issues.
- **Age Distribution:**
- `< 1 week`: 128
- `1-4 weeks`: 488
- `1-3 months`: 384
- `> 3 months`: 0 (in our sampled batch)
- Most issues sit open for 1 to 12 weeks. The lack of assignment and triage likely contributes to issues stagnating in the 1-4 weeks and 1-3 months buckets.
- **Common Areas:** The most affected areas are `area/agent` (338) and `area/core` (271).
## 3. Pull Requests Analysis
We analyzed open PRs using the existing `investigate_prs.cjs` script and our new age distribution script.
**Key Findings:**
- **Needs Help/Issues:** A large chunk of PRs are labeled `help wanted` (204) and `status/need-issue` (86).
- **Age Distribution:**
- `< 1 week`: 71
- `1-4 weeks`: 222
- `1-3 months`: 193
- PR age correlates closely with the reported high `pr_latency` (40.84). Like issues, most PRs are languishing in the 1 to 12 weeks range without resolution.
- **Common Areas:** `area/core` represents the largest subset of PRs (215).
## 4. Conclusion
The metrics suggest that the project has a significant backlog and high latency. The primary contributors seem to be:
1. **Lack of Triage & Assignment:** Issues are opened but not assigned, leaving them in a `need-triage` state for weeks to months.
2. **PR Stagnation:** Many PRs are open and likely lacking review, leading to a build-up in the 1-4 week and 1-3 month buckets. The `help wanted` and `status/need-issue` labels suggest many PRs might be incomplete or lacking context, which slows down the review process.
@@ -0,0 +1,11 @@
type,age_bucket,count
issue,"< 1 week",128
issue,"1-4 weeks",488
issue,"1-3 months",384
issue,"3-6 months",0
issue,"> 6 months",0
pr,"< 1 week",71
pr,"1-4 weeks",222
pr,"1-3 months",193
pr,"3-6 months",0
pr,"> 6 months",0
1 type age_bucket count
2 issue < 1 week 128
3 issue 1-4 weeks 488
4 issue 1-3 months 384
5 issue 3-6 months 0
6 issue > 6 months 0
7 pr < 1 week 71
8 pr 1-4 weeks 222
9 pr 1-3 months 193
10 pr 3-6 months 0
11 pr > 6 months 0
@@ -0,0 +1,68 @@
assignee,count
"UNASSIGNED",856
"mahimashanware",14
"moisgobg",14
"mbleigh",7
"devr0306",6
"mattKorwel",6
"joshualitt",6
"keithguerin",5
"kschaab",5
"sripasg",4
"alisa-alisa",4
"sehoon38",4
"SandyTao520",4
"abhipatel12",3
"akh64bit",3
"tusaryan",3
"Anjaligarhwal",2
"ruomengz",2
"jasonmatthewsuhari",2
"BharathC0",2
"jacob314",2
"spencer426",2
"gsquared94",2
"gundermanc",2
"jkcinouye",2
"galz10",2
"yunaseoul",2
"Br1an67",2
"Gitanaskhan26",2
"adamfweidman",2
"euxaristia",2
"TravisHaa",2
"kamal2730",1
"KoushikAD1234",1
"Anshikakalpana",1
"abhaysinghs772",1
"Adib234",1
"jackwotherspoon",1
"g-samroberts",1
"husenzhang",1
"cocosheng-g",1
"renuka16032007",1
"cynthialong0-0",1
"krishdef7",1
"Aarchi-07",1
"rwmyers",1
"clocky",1
"ehedlund",1
"Abhijit-2592",1
"chrisjcthomas",1
"chrstnb",1
"anj-s",1
"scidomino",1
"sahilkirad",1
"ARYANKUMAR1",1
"SupunGeethanjana",1
"jk-kashe",1
"rmedranollamas",1
"AjayBora002",1
"manas-raj999",1
"student-ankitpandit",1
"ak91456",1
"elliotllliu",1
"daehyeok",1
"1nonlyasta",1
"skainguyen1412",1
"AshwinSaklecha",1
1 assignee count
2 UNASSIGNED 856
3 mahimashanware 14
4 moisgobg 14
5 mbleigh 7
6 devr0306 6
7 mattKorwel 6
8 joshualitt 6
9 keithguerin 5
10 kschaab 5
11 sripasg 4
12 alisa-alisa 4
13 sehoon38 4
14 SandyTao520 4
15 abhipatel12 3
16 akh64bit 3
17 tusaryan 3
18 Anjaligarhwal 2
19 ruomengz 2
20 jasonmatthewsuhari 2
21 BharathC0 2
22 jacob314 2
23 spencer426 2
24 gsquared94 2
25 gundermanc 2
26 jkcinouye 2
27 galz10 2
28 yunaseoul 2
29 Br1an67 2
30 Gitanaskhan26 2
31 adamfweidman 2
32 euxaristia 2
33 TravisHaa 2
34 kamal2730 1
35 KoushikAD1234 1
36 Anshikakalpana 1
37 abhaysinghs772 1
38 Adib234 1
39 jackwotherspoon 1
40 g-samroberts 1
41 husenzhang 1
42 cocosheng-g 1
43 renuka16032007 1
44 cynthialong0-0 1
45 krishdef7 1
46 Aarchi-07 1
47 rwmyers 1
48 clocky 1
49 ehedlund 1
50 Abhijit-2592 1
51 chrisjcthomas 1
52 chrstnb 1
53 anj-s 1
54 scidomino 1
55 sahilkirad 1
56 ARYANKUMAR1 1
57 SupunGeethanjana 1
58 jk-kashe 1
59 rmedranollamas 1
60 AjayBora002 1
61 manas-raj999 1
62 student-ankitpandit 1
63 ak91456 1
64 elliotllliu 1
65 daehyeok 1
66 1nonlyasta 1
67 skainguyen1412 1
68 AshwinSaklecha 1
@@ -1,8 +1,8 @@
label,count
"status/need-triage",609
"area/agent",339
"area/agent",338
"area/core",271
"status/possible-duplicate",207
"status/possible-duplicate",206
"🔒 maintainer only",188
"area/platform",130
"type/bug",109
1 label count
2 status/need-triage 609
3 area/agent 339 338
4 area/core 271
5 status/possible-duplicate 207 206
6 🔒 maintainer only 188
7 area/platform 130
8 type/bug 109
+1 -1
View File
@@ -7,7 +7,7 @@ label,count
"area/agent",67
"priority/p1",55
"priority/p3",31
"NO_LABEL",27
"NO_LABEL",28
"area/extensions",27
"area/platform",19
"area/security",13
1 label count
7 area/agent 67
8 priority/p1 55
9 priority/p3 31
10 NO_LABEL 27 28
11 area/extensions 27
12 area/platform 19
13 area/security 13
@@ -0,0 +1,60 @@
const { execSync } = require('child_process');
const fs = require('fs');
const path = require('path');
function run() {
try {
console.log('Fetching open issues for age analysis...');
const issueOutput = execSync('gh issue list --state open --json createdAt --limit 1000', { encoding: 'utf-8' });
const issues = JSON.parse(issueOutput);
console.log('Fetching open PRs for age analysis...');
const prOutput = execSync('gh pr list --state open --json createdAt --limit 1000', { encoding: 'utf-8' });
const prs = JSON.parse(prOutput);
const now = new Date();
const calculateAgeBuckets = (items) => {
const buckets = {
'< 1 week': 0,
'1-4 weeks': 0,
'1-3 months': 0,
'3-6 months': 0,
'> 6 months': 0
};
for (const item of items) {
const created = new Date(item.createdAt);
const diffTime = Math.abs(now - created);
const diffDays = Math.ceil(diffTime / (1000 * 60 * 60 * 24));
if (diffDays < 7) buckets['< 1 week']++;
else if (diffDays < 30) buckets['1-4 weeks']++;
else if (diffDays < 90) buckets['1-3 months']++;
else if (diffDays < 180) buckets['3-6 months']++;
else buckets['> 6 months']++;
}
return buckets;
};
const issueBuckets = calculateAgeBuckets(issues);
const prBuckets = calculateAgeBuckets(prs);
let csvContent = 'type,age_bucket,count\n';
for (const [bucket, count] of Object.entries(issueBuckets)) {
csvContent += `issue,"${bucket}",${count}\n`;
}
for (const [bucket, count] of Object.entries(prBuckets)) {
csvContent += `pr,"${bucket}",${count}\n`;
}
const csvPath = path.join(__dirname, '..', 'age_distribution.csv');
fs.writeFileSync(csvPath, csvContent, 'utf8');
console.log(`Saved findings to ${csvPath}`);
} catch (error) {
console.error('Error fetching age data:', error.message);
}
}
run();
@@ -0,0 +1,38 @@
const { execSync } = require('child_process');
const fs = require('fs');
const path = require('path');
function run() {
try {
console.log('Fetching open issues for assignee analysis...');
const output = execSync('gh issue list --state open --json assignees --limit 1000', { encoding: 'utf-8' });
const issues = JSON.parse(output);
const assigneeCounts = {};
for (const issue of issues) {
if (issue.assignees && issue.assignees.length > 0) {
for (const assignee of issue.assignees) {
assigneeCounts[assignee.login] = (assigneeCounts[assignee.login] || 0) + 1;
}
} else {
assigneeCounts['UNASSIGNED'] = (assigneeCounts['UNASSIGNED'] || 0) + 1;
}
}
const sortedAssignees = Object.entries(assigneeCounts).sort((a, b) => b[1] - a[1]);
let csvContent = 'assignee,count\n';
for (const [assignee, count] of sortedAssignees) {
csvContent += `"${assignee}",${count}\n`;
}
const csvPath = path.join(__dirname, '..', 'issue_assignees.csv');
fs.writeFileSync(csvPath, csvContent, 'utf8');
console.log(`Saved findings to ${csvPath}`);
} catch (error) {
console.error('Error fetching assignee data:', error.message);
}
}
run();
+1
View File
@@ -8,3 +8,4 @@ This file documents the metrics tracked by `optimizer1000`.
| open_community_prs | Number of open community PRs in the repo | `metrics/scripts/open_community_prs.js` | Lower is better |
| completed_community_prs | Number of completed community PRs in the repo | `metrics/scripts/completed_community_prs.js` | Greater is better |
| test_flakiness | Number of CI workflow failures over the past 7 days | `metrics/scripts/test_flakiness.js` | Lower is better |
| pr_latency | Average time (in hours) to merge the last 100 PRs | `metrics/scripts/pr_latency.js` | Lower is better |
@@ -0,0 +1,33 @@
/* eslint-env node */
import { execSync } from 'node:child_process';
try {
const repoInfo = execSync('gh repo view --json nameWithOwner', { encoding: 'utf-8' });
const repo = JSON.parse(repoInfo).nameWithOwner;
const output = execSync(`gh pr list --state merged --repo ${repo} --limit 100 --json createdAt,mergedAt`, { encoding: 'utf-8' });
const prs = JSON.parse(output);
let totalLatencyMs = 0;
let count = 0;
for (const pr of prs) {
if (pr.createdAt && pr.mergedAt) {
const created = new Date(pr.createdAt).getTime();
const merged = new Date(pr.mergedAt).getTime();
totalLatencyMs += (merged - created);
count++;
}
}
const avgLatencyHours = count > 0 ? (totalLatencyMs / count) / (1000 * 60 * 60) : 0;
process.stdout.write(JSON.stringify({
metric: 'pr_latency',
value: Math.round(avgLatencyHours * 100) / 100,
timestamp: new Date().toISOString()
}));
} catch (err) {
process.stderr.write(err.message);
process.exit(1);
}
@@ -3,8 +3,8 @@ import readline from 'readline';
import { execSync } from 'child_process';
async function processPRs() {
const prsFile = 'prs-before.csv';
const afterFile = 'prs-after.csv';
const prsFile = 'open-community-prs-before.csv';
const afterFile = 'open-community-prs-after.csv';
if (!fs.existsSync(prsFile)) return 0;
// Counter-metric: 'active_contributors'
@@ -62,7 +62,7 @@ async function processPRs() {
if (commitMode) {
try {
execSync(`gh pr close ${number} --comment "Closing PR as it has been marked Stale with no recent activity."`);
} catch(e) {}
} catch { /* ignore */ }
}
} else {
const needsIssue = pr.labels.some(l => l.name === 'status/need-issue');
@@ -71,7 +71,7 @@ async function processPRs() {
if (commitMode) {
try {
execSync(`gh pr edit ${number} --add-label "Stale"`);
} catch(e) {}
} catch { /* ignore */ }
}
}
}
@@ -54,6 +54,7 @@ async function processIssues() {
if (issue && state.includes('OPEN')) {
const isPossibleDuplicate = issue.labels.some(l => l.name === 'status/possible-duplicate');
const isUnassigned = !issue.assignees || issue.assignees.length === 0;
// We implement a phased rollout. Instead of closing possible duplicates immediately,
// we apply a 'stale-candidate' label. We do not close them yet to preserve project health.
@@ -62,10 +63,16 @@ async function processIssues() {
// In commit mode, we would apply the label.
try {
execSync(`gh issue edit ${number} --add-label "stale-candidate"`);
} catch(e) {}
} catch { /* ignore */ }
}
// We do NOT change state to closed in the CSV simulation either. It remains open.
}
if (isUnassigned && commitMode) {
try {
execSync(`gh issue edit ${number} --add-label "needs-assignee"`);
} catch { /* ignore */ }
}
}
outStream.write(`${parts[0]},${state}\n`);