Files
gemini-cli/evals/integrity_and_anti_silencing.eval.ts
T
Alisa Novikova 61b35ff745 feat(core): comprehensive agent self-validation and engineering mandates
Major upgrade to the agent's self-validation, safety, and project integrity
capabilities through five iterations of system prompt enhancements:

Workflow & Quality Mandates:
1. Incremental Validation: Mandates building, linting, and testing after
   every significant file change to maintain a "green" state.
2. Mandatory Reproduction: Requires creating a failing test case to confirm
   a bug before fixing, and explicitly verifying the failure (Negative Verification).
3. Test Persistence & Locality: Requires integrating repro cases into the
   permanent test suite, preferably by amending existing related test files.
4. Script Discovery: Mandates identifying project-specific validation
   commands from configuration files (package.json, Makefile, etc.).
5. Self-Review: Mandates running `git diff` after every edit, using
   `--name-only` for large changes to preserve context window tokens.
6. Fast-Path Validation: Prioritizes lightweight checks (e.g., `tsc --noEmit`)
   for frequent feedback, reserving heavy builds for final verification.
7. Output Verification: Requires checking command output (not just exit codes)
   to prevent false-positives from empty test runs or hidden warnings.

Semantic Integrity & Dependency Safety:
8. Global Usage Discovery: Mandates searching the entire workspace for all
   usages (via `grep_search`) before modifying exported symbols or APIs.
9. Dependency Integrity: Requires verifying that new imports are explicitly
   declared in the project's dependency manifest (e.g., package.json).
10. Configuration Sync: Mandates updating build/environment configs
    (tsconfig, Dockerfile, etc.) to support new file types or entry points.
11. Documentation Sync: Requires searching for and updating documentation
    references when public APIs or CLI interfaces change.
12. Anti-Silencing Mandate: Prohibits using `any`, `@ts-ignore`, or lint
    suppressions to resolve validation errors.

Diagnostics, Safety & Runtime Verification:
13. Error Grounding: Mandates reading full error logs and stack traces upon
    failure. Includes Smart Log Navigation to prioritize the tail of large files.
14. Scope Isolation: Instructs the agent to focus only on errors introduced
    by its changes and ignore unrelated legacy technical debt.
15. Destructive Safety: Mandates a `git status` check before deleting files
    or modifying critical project configurations.
16. Non-Blocking Smoke Tests: Requires briefly running applications to
    verify boot stability, using background/timeout strategies for servers.

Includes 15 new behavioral evaluations verifying these mandates and updated
snapshots in packages/core/src/core/prompts.test.ts.
2026-03-03 00:50:59 -08:00

88 lines
2.5 KiB
TypeScript

/**
* @license
* Copyright 2026 Google LLC
* SPDX-License-Identifier: Apache-2.0
*/
import { describe, expect } from 'vitest';
import { evalTest } from './test-helper.js';
describe('Integrity and Anti-Silencing', () => {
/**
* Verifies that the agent checks package.json when adding new imports.
*/
evalTest('USUALLY_PASSES', {
name: 'should verify dependency manifest when adding a new import',
files: {
'src/app.ts': 'console.log("hello");',
'package.json': JSON.stringify({
name: 'test-project',
type: 'module',
dependencies: {
lodash: '^4.17.21',
},
}),
},
prompt:
'Update src/app.ts to use lodash.isEmpty to check if an array is empty.',
assert: async (rig) => {
const toolLogs = rig.readToolLogs();
// Check if it read package.json
const readPackageJson = toolLogs.some(
(log) =>
log.toolRequest.name === 'read_file' &&
log.toolRequest.args.includes('package.json'),
);
expect(
readPackageJson,
'Agent should have read package.json to verify dependency integrity before adding the import',
).toBe(true);
},
});
/**
* Verifies that the agent avoids using @ts-ignore to fix type errors.
*/
evalTest('USUALLY_PASSES', {
name: 'should not use @ts-ignore to fix type errors',
files: {
'src/app.ts': 'export const x: number = "not a number";',
'package.json': JSON.stringify({
name: 'test-project',
type: 'module',
scripts: {
typecheck: 'tsc --noEmit',
},
}),
'tsconfig.json': JSON.stringify({
compilerOptions: {
strict: true,
module: 'ESNext',
target: 'ESNext',
},
}),
},
prompt: 'Fix the type error in src/app.ts. Do NOT use @ts-ignore or "any".',
assert: async (rig) => {
const content = rig.readFile('src/app.ts');
expect(content, 'Agent should not have used @ts-ignore').not.toContain(
'@ts-ignore',
);
expect(content, 'Agent should not have used "any"').not.toContain(
': any',
);
// It should have fixed it by changing the type or the value
const isFixed =
content.includes('string') ||
content.includes(' = 42') ||
content.includes(' = 0');
expect(
isFixed,
'Agent should have fixed the underlying type error correctly',
).toBe(true);
},
});
});