Files
gemini-cli/tools/gemini-cli-bot/metrics/scripts/review_distribution.ts
T
gemini-cli-robot 7faa50cbae # Improve Metric Accuracy for Issues, PRs, and Review Distribution
## 1. What the change is
This PR refactors the `open_issues.ts` and `open_prs.ts` metric scripts to use the GitHub GraphQL API's `totalCount` field instead of relying on the CLI's `gh issue list` command with a hardcoded limit. It also updates `review_distribution.ts` to include `COLLABORATOR` in the maintainer association check.

## 2. Why it is recommended
The current implementation of `open_issues.ts` and `open_prs.ts` used `--limit 1000`, which caused metrics to be capped at 1000 even when the actual backlog was much larger (~2400 issues). This provided a misleading view of repository health and the true scale of the backlog. Using GraphQL `totalCount` ensures accurate counts regardless of list size.

Additionally, `review_distribution.ts` was inconsistently excluding `COLLABORATOR` associations, which could lead to an inaccurate representation of review work distribution if many maintainers are designated as Collaborators. This led to a `review_distribution_variance` of 0 in recent runs.

## 3. Which metric or aspect of productivity is expected to be improved
- **open_issues**: Will now reflect the true total count (expected to jump from 1000 to ~2400).
- **open_prs**: Will reflect the true total count of open pull requests.
- **review_distribution_variance**: Will more accurately reflect how review work is shared among all maintainers (including collaborators).

## 4. By how much the metric is expected to improve
The `open_issues` metric is expected to increase by approximately **140%** (from 1000 to ~2400) once accurate data is collected. The `review_distribution_variance` is expected to become non-zero, providing a real baseline for monitoring reviewer workload balance.
2026-04-28 17:18:16 +00:00

91 lines
2.4 KiB
TypeScript

/**
* @license
* Copyright 2026 Google LLC
* SPDX-License-Identifier: Apache-2.0
*
* @license
*/
import { GITHUB_OWNER, GITHUB_REPO, type MetricOutput } from '../types.js';
import { execSync } from 'node:child_process';
try {
const query = `
query($owner: String!, $repo: String!) {
repository(owner: $owner, name: $repo) {
pullRequests(last: 100) {
nodes {
reviews(first: 50) {
nodes {
author { login }
authorAssociation
}
}
}
}
}
}
`;
const output = execSync(
'gh api graphql -F owner=$OWNER -F repo=$REPO -f query=@-',
{
encoding: 'utf-8',
input: query,
env: { ...process.env, OWNER: GITHUB_OWNER, REPO: GITHUB_REPO },
},
);
const response = JSON.parse(output);
if (response.errors) {
throw new Error(response.errors.map((e: any) => e.message).join(', '));
}
const data = response.data.repository;
const reviewCounts: Record<string, number> = {};
for (const pr of data.pullRequests.nodes) {
if (!pr.reviews?.nodes) continue;
// We only count one review per author per PR to avoid counting multiple review comments as multiple reviews
const reviewersOnPR = new Set<string>();
for (const review of pr.reviews.nodes) {
if (
['MEMBER', 'OWNER', 'COLLABORATOR'].includes(review.authorAssociation) &&
review.author?.login
) {
const login = review.author.login.toLowerCase();
if (login.endsWith('[bot]') || login.includes('bot')) {
continue; // Ignore bots
}
reviewersOnPR.add(review.author.login);
}
}
for (const reviewer of reviewersOnPR) {
reviewCounts[reviewer] = (reviewCounts[reviewer] || 0) + 1;
}
}
const counts = Object.values(reviewCounts);
let variance = 0;
if (counts.length > 0) {
const mean = counts.reduce((a, b) => a + b, 0) / counts.length;
variance =
counts.reduce((a, b) => a + Math.pow(b - mean, 2), 0) / counts.length;
}
const timestamp = new Date().toISOString();
process.stdout.write(
JSON.stringify(<MetricOutput>{
metric: 'review_distribution_variance',
value: Math.round(variance * 100) / 100,
timestamp,
details: reviewCounts,
}) + '\n',
);
} catch (err) {
process.stderr.write(err instanceof Error ? err.message : String(err));
process.exit(1);
}