Update debug command.

2026-06-18 07:17:16 -07:00 · 2026-01-28 21:27:48 -08:00
parent f50ca36fd4
commit 2468f56922
2 changed files with 4 additions and 2 deletions
@@ -27,7 +27,9 @@ You are an expert at fixing behavioral evaluations.
     - Your primary mechanism for improving the agent's behavior is to make changes to
       tool instructions, prompt.ts, and/or modules that contribute to the prompt.
     - If prompt and description changes are unsuccessful, use logs and debugging to
-       confirm that everything is working as expected.
+       confirm that everything is working as expected. You can try some of the following.
+       - **Interactive Prompts**: Commands like `npx` may hang waiting for user confirmation to install a package. Prefer `npx --yes <cmd>`.
+       - **Missing package.json**: Some tools (like `eslint`) require a `package.json` to be present in the working directory or a parent.
    - If unable to fix the test, you can make recommendations for architecture changes
      that might help stablize the test. Be sure to THINK DEEPLY if offering architecture guidance.
      Some facts that might help with this are:
@@ -44,7 +44,7 @@ describe('Frugal reads eval', () => {
    },
    prompt:
      'Fix all linter errors in linter_mess.ts manually by editing the file. Run eslint directly (using "npx --yes eslint") to find them. Do not run the file.',
-    assert: async (rig, result) => {
+    assert: async (rig) => {
      const logs = rig.readToolLogs();

      // Check if the agent read the whole file