Update debug command.

This commit is contained in:
Christian Gunderman
2026-01-28 21:27:48 -08:00
parent f50ca36fd4
commit 2468f56922
2 changed files with 4 additions and 2 deletions

View File

@@ -27,7 +27,9 @@ You are an expert at fixing behavioral evaluations.
- Your primary mechanism for improving the agent's behavior is to make changes to
tool instructions, prompt.ts, and/or modules that contribute to the prompt.
- If prompt and description changes are unsuccessful, use logs and debugging to
confirm that everything is working as expected.
confirm that everything is working as expected. You can try some of the following.
- **Interactive Prompts**: Commands like `npx` may hang waiting for user confirmation to install a package. Prefer `npx --yes <cmd>`.
- **Missing package.json**: Some tools (like `eslint`) require a `package.json` to be present in the working directory or a parent.
- If unable to fix the test, you can make recommendations for architecture changes
that might help stablize the test. Be sure to THINK DEEPLY if offering architecture guidance.
Some facts that might help with this are:

View File

@@ -44,7 +44,7 @@ describe('Frugal reads eval', () => {
},
prompt:
'Fix all linter errors in linter_mess.ts manually by editing the file. Run eslint directly (using "npx --yes eslint") to find them. Do not run the file.',
assert: async (rig, result) => {
assert: async (rig) => {
const logs = rig.readToolLogs();
// Check if the agent read the whole file