Adds a strict retry threshold to the agent's validation loop:
'If validation fails 3 times on the exact same test or error, DO NOT attempt another minor code tweak. You must immediately step back, use search tools to gather wider context, and formulate a completely new strategy.'
This prevents the agent from getting stuck in repetitive, unsuccessful minor tweaks and encourages a more strategic approach when initial fixes fail.
Introduces three critical mandates to the agent's testing and validation workflow:
1. **Prioritize Existing Infrastructure:** Strictly prefer running the project's existing test suite over writing custom reproduction scripts to avoid environment/import difficulties.
2. **Timebox Test Setup:** Abandon custom reproduction scripts if they fail to set up within 3-5 turns due to environment or import errors, falling back to static analysis and built-in tests.
3. **Mandate Exhaustive Validation:** Explicitly requires running relevant existing project tests to prevent regressions, ensuring a passing custom reproduction script is treated as a necessary but not sufficient condition for completion.
These changes prevent 'Early Exhaustion' by reducing the complexity of standalone test setup in frameworks like Django.
This commit upgrades the agent with a robust self-validation workflow while
ensuring 100% verbatim coverage of the original system prompt text.
By moving to an additive model, we preserve all original reasoning anchors,
instructional lead-ins, and senior engineering heuristics while injecting
critical autonomous mandates.
Verbatim Restoration:
- All 'Context Efficiency' guidelines, lead-ins, and scenarios (Search/Understand/Navigate).
- All 'Engineering Standards' regarding style mimicry, abstractions, and debt isolation.
- Full 'Primary Workflows' sequence and formatting.
Self-Validation Workflow Injections:
- Research Phase: Parallel Discovery (manifests + logic) and High-Signal Grep.
- Bug Fixing: Negative Verification (confirming repro failure) and Coverage Expansion.
- Implementation: Transactional Edits (logical batching of module changes).
- Validation Loop: Tiered Validation (Fixers -> Fast-Path -> Related Tests) and Smart Log Navigation (Tail-First).
Technical Verification:
- Verified against 67 core prompt tests and 14 snapshots.
- New behavioral eval suite passed (evals/self_validation_workflow.eval.ts).
- Full 'npm run preflight' successful.
This commit upgrades the agent with a robust self-validation workflow while
ensuring 100% semantic and verbatim coverage of the original system prompt.
By moving to an additive model, we preserve the original reasoning anchors
(lead-ins, heuristics, and formatting) while injecting critical autonomous
engineering mandates.
Self-Validation Workflow Injections:
- Research Phase: Parallel Discovery (combining manifests/logic) and High-Signal Grep.
- Bug Fixing: Negative Verification (confirming repro failure) and Coverage Expansion.
- Implementation: Transactional Edits (logical batching of module changes).
- Validation Loop: Tiered Validation (Fixers -> Fast-Path -> Related Tests) and Smart Log Navigation.
Technical Verification:
- Verbatim restoration verified against 66 core tests and 14 snapshots.
- New behavioral eval suite passed (evals/self_validation_workflow.eval.ts).
- Full 'npm run preflight' validation successful.
Major upgrade to the agent's self-validation, safety, and project integrity
capabilities through five iterations of system prompt enhancements:
Workflow & Quality Mandates:
1. Incremental Validation: Mandates building, linting, and testing after
every significant file change to maintain a "green" state.
2. Mandatory Reproduction: Requires creating a failing test case to confirm
a bug before fixing, and explicitly verifying the failure (Negative Verification).
3. Test Persistence & Locality: Requires integrating repro cases into the
permanent test suite, preferably by amending existing related test files.
4. Script Discovery: Mandates identifying project-specific validation
commands from configuration files (package.json, Makefile, etc.).
5. Self-Review: Mandates running `git diff` after every edit, using
`--name-only` for large changes to preserve context window tokens.
6. Fast-Path Validation: Prioritizes lightweight checks (e.g., `tsc --noEmit`)
for frequent feedback, reserving heavy builds for final verification.
7. Output Verification: Requires checking command output (not just exit codes)
to prevent false-positives from empty test runs or hidden warnings.
Semantic Integrity & Dependency Safety:
8. Global Usage Discovery: Mandates searching the entire workspace for all
usages (via `grep_search`) before modifying exported symbols or APIs.
9. Dependency Integrity: Requires verifying that new imports are explicitly
declared in the project's dependency manifest (e.g., package.json).
10. Configuration Sync: Mandates updating build/environment configs
(tsconfig, Dockerfile, etc.) to support new file types or entry points.
11. Documentation Sync: Requires searching for and updating documentation
references when public APIs or CLI interfaces change.
12. Anti-Silencing Mandate: Prohibits using `any`, `@ts-ignore`, or lint
suppressions to resolve validation errors.
Diagnostics, Safety & Runtime Verification:
13. Error Grounding: Mandates reading full error logs and stack traces upon
failure. Includes Smart Log Navigation to prioritize the tail of large files.
14. Scope Isolation: Instructs the agent to focus only on errors introduced
by its changes and ignore unrelated legacy technical debt.
15. Destructive Safety: Mandates a `git status` check before deleting files
or modifying critical project configurations.
16. Non-Blocking Smoke Tests: Requires briefly running applications to
verify boot stability, using background/timeout strategies for servers.
Includes 15 new behavioral evaluations verifying these mandates and updated
snapshots in packages/core/src/core/prompts.test.ts.