fix(core): Encourage non-interactive flags for scaffolding commands (#18804)

This commit is contained in:
N. Taylor Mullen
2026-02-15 12:26:59 -08:00
committed by GitHub
parent 884acda2dc
commit 6eec9f3350
3 changed files with 42 additions and 15 deletions

View File

@@ -44,4 +44,31 @@ describe('interactive_commands', () => {
).toMatch(/\b(run|--run)\b/);
},
});
/**
* Validates that the agent uses non-interactive flags when scaffolding a new project.
*/
evalTest('ALWAYS_PASSES', {
name: 'should use non-interactive flags when scaffolding a new app',
prompt: 'Create a new react application named my-app using vite.',
assert: async (rig, result) => {
const logs = rig.readToolLogs();
const scaffoldCall = logs.find(
(l) =>
l.toolRequest.name === 'run_shell_command' &&
/npm (init|create)|npx create-|yarn create|pnpm create/.test(
l.toolRequest.args,
),
);
expect(
scaffoldCall,
'Agent should have called a scaffolding command (e.g., npm create)',
).toBeDefined();
expect(
scaffoldCall?.toolRequest.args,
'Agent should have passed a non-interactive flag (-y, --yes, or a specific --template)',
).toMatch(/(?:^|\s)(--yes|-y|--template\s+\S+)(?:\s|$|\\|")/);
},
});
});

View File

@@ -588,7 +588,7 @@ Operate using a **Research -> Strategy -> Execution** lifecycle. For the Executi
- **Mobile:** Compose Multiplatform or Flutter.
- **Games:** HTML/CSS/JS (Three.js for 3D).
- **CLIs:** Python or Go.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
4. **Verify:** Review work against the original request. Fix bugs and deviations. Ensure styling and interactions produce a high-quality, functional, and beautiful prototype. **Build the application and ensure there are no compile errors.**
5. **Solicit Feedback:** Provide instructions on how to start the application and request user feedback on the prototype.
@@ -702,7 +702,7 @@ Operate using a **Research -> Strategy -> Execution** lifecycle. For the Executi
- **Mobile:** Compose Multiplatform or Flutter.
- **Games:** HTML/CSS/JS (Three.js for 3D).
- **CLIs:** Python or Go.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\`. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons). Never link to external services or assume local paths for assets that have not been created.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\`. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons). Never link to external services or assume local paths for assets that have not been created.
4. **Verify:** Review work against the original request. Fix bugs and deviations. **Build the application and ensure there are no compile errors.**
# Operational Guidelines
@@ -798,7 +798,7 @@ Operate using a **Research -> Strategy -> Execution** lifecycle. For the Executi
- **Mobile:** Compose Multiplatform or Flutter.
- **Games:** HTML/CSS/JS (Three.js for 3D).
- **CLIs:** Python or Go.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\`. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons). Never link to external services or assume local paths for assets that have not been created.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\`. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons). Never link to external services or assume local paths for assets that have not been created.
4. **Verify:** Review work against the original request. Fix bugs and deviations. **Build the application and ensure there are no compile errors.**
# Operational Guidelines
@@ -1392,7 +1392,7 @@ Operate using a **Research -> Strategy -> Execution** lifecycle. For the Executi
- **Mobile:** Compose Multiplatform or Flutter.
- **Games:** HTML/CSS/JS (Three.js for 3D).
- **CLIs:** Python or Go.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
4. **Verify:** Review work against the original request. Fix bugs and deviations. Ensure styling and interactions produce a high-quality, functional, and beautiful prototype. **Build the application and ensure there are no compile errors.**
5. **Solicit Feedback:** Provide instructions on how to start the application and request user feedback on the prototype.
@@ -1505,7 +1505,7 @@ Operate using a **Research -> Strategy -> Execution** lifecycle. For the Executi
- **Mobile:** Compose Multiplatform or Flutter.
- **Games:** HTML/CSS/JS (Three.js for 3D).
- **CLIs:** Python or Go.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
4. **Verify:** Review work against the original request. Fix bugs and deviations. Ensure styling and interactions produce a high-quality, functional, and beautiful prototype. **Build the application and ensure there are no compile errors.**
5. **Solicit Feedback:** Provide instructions on how to start the application and request user feedback on the prototype.
@@ -1622,7 +1622,7 @@ Operate using a **Research -> Strategy -> Execution** lifecycle. For the Executi
- **Mobile:** Compose Multiplatform or Flutter.
- **Games:** HTML/CSS/JS (Three.js for 3D).
- **CLIs:** Python or Go.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
4. **Verify:** Review work against the original request. Fix bugs and deviations. Ensure styling and interactions produce a high-quality, functional, and beautiful prototype. **Build the application and ensure there are no compile errors.**
5. **Solicit Feedback:** Provide instructions on how to start the application and request user feedback on the prototype.
@@ -1739,7 +1739,7 @@ Operate using a **Research -> Strategy -> Execution** lifecycle. For the Executi
- **Mobile:** Compose Multiplatform or Flutter.
- **Games:** HTML/CSS/JS (Three.js for 3D).
- **CLIs:** Python or Go.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
4. **Verify:** Review work against the original request. Fix bugs and deviations. Ensure styling and interactions produce a high-quality, functional, and beautiful prototype. **Build the application and ensure there are no compile errors.**
5. **Solicit Feedback:** Provide instructions on how to start the application and request user feedback on the prototype.
@@ -1964,7 +1964,7 @@ Operate using a **Research -> Strategy -> Execution** lifecycle. For the Executi
- **Mobile:** Compose Multiplatform or Flutter.
- **Games:** HTML/CSS/JS (Three.js for 3D).
- **CLIs:** Python or Go.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
4. **Verify:** Review work against the original request. Fix bugs and deviations. Ensure styling and interactions produce a high-quality, functional, and beautiful prototype. **Build the application and ensure there are no compile errors.**
5. **Solicit Feedback:** Provide instructions on how to start the application and request user feedback on the prototype.
@@ -2316,7 +2316,7 @@ Operate using a **Research -> Strategy -> Execution** lifecycle. For the Executi
- **Mobile:** Compose Multiplatform or Flutter.
- **Games:** HTML/CSS/JS (Three.js for 3D).
- **CLIs:** Python or Go.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
4. **Verify:** Review work against the original request. Fix bugs and deviations. Ensure styling and interactions produce a high-quality, functional, and beautiful prototype. **Build the application and ensure there are no compile errors.**
5. **Solicit Feedback:** Provide instructions on how to start the application and request user feedback on the prototype.
@@ -2429,7 +2429,7 @@ Operate using a **Research -> Strategy -> Execution** lifecycle. For the Executi
- **Mobile:** Compose Multiplatform or Flutter.
- **Games:** HTML/CSS/JS (Three.js for 3D).
- **CLIs:** Python or Go.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
4. **Verify:** Review work against the original request. Fix bugs and deviations. Ensure styling and interactions produce a high-quality, functional, and beautiful prototype. **Build the application and ensure there are no compile errors.**
5. **Solicit Feedback:** Provide instructions on how to start the application and request user feedback on the prototype.
@@ -2653,7 +2653,7 @@ Operate using a **Research -> Strategy -> Execution** lifecycle. For the Executi
- **Mobile:** Compose Multiplatform or Flutter.
- **Games:** HTML/CSS/JS (Three.js for 3D).
- **CLIs:** Python or Go.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
4. **Verify:** Review work against the original request. Fix bugs and deviations. Ensure styling and interactions produce a high-quality, functional, and beautiful prototype. **Build the application and ensure there are no compile errors.**
5. **Solicit Feedback:** Provide instructions on how to start the application and request user feedback on the prototype.
@@ -2766,7 +2766,7 @@ Operate using a **Research -> Strategy -> Execution** lifecycle. For the Executi
- **Mobile:** Compose Multiplatform or Flutter.
- **Games:** HTML/CSS/JS (Three.js for 3D).
- **CLIs:** Python or Go.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using \`run_shell_command\` for commands like 'npm init', 'npx create-react-app'. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
4. **Verify:** Review work against the original request. Fix bugs and deviations. Ensure styling and interactions produce a high-quality, functional, and beautiful prototype. **Build the application and ensure there are no compile errors.**
5. **Solicit Feedback:** Provide instructions on how to start the application and request user feedback on the prototype.

View File

@@ -553,7 +553,7 @@ function newApplicationSteps(options: PrimaryWorkflowsOptions): string {
if (options.approvedPlan) {
return `
1. **Understand:** Read the approved plan. Treat this file as your single source of truth.
2. **Implement:** Implement the application according to the plan. When starting, scaffold the application using ${formatToolName(SHELL_TOOL_NAME)}. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, CSS animations, icons) to ensure a complete, rich, and coherent experience. Never link to external services or assume local paths for assets that have not been created. If you discover new requirements or need to change the approach, confirm with the user and update the plan file.
2. **Implement:** Implement the application according to the plan. When starting, scaffold the application using ${formatToolName(SHELL_TOOL_NAME)}. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, CSS animations, icons) to ensure a complete, rich, and coherent experience. Never link to external services or assume local paths for assets that have not been created. If you discover new requirements or need to change the approach, confirm with the user and update the plan file.
3. **Verify:** Review work against the original request and the approved plan. Fix bugs, deviations, and ensure placeholders are visually adequate. **Ensure styling and interactions produce a high-quality, polished, and beautiful prototype.** Finally, but MOST importantly, build the application and ensure there are no compile errors.
4. **Finish:** Provide a brief summary of what was built.`.trim();
}
@@ -588,7 +588,7 @@ function newApplicationSteps(options: PrimaryWorkflowsOptions): string {
- **Mobile:** Compose Multiplatform or Flutter.
- **Games:** HTML/CSS/JS (Three.js for 3D).
- **CLIs:** Python or Go.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using ${formatToolName(SHELL_TOOL_NAME)} for commands like 'npm init', 'npx create-react-app'. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using ${formatToolName(SHELL_TOOL_NAME)} for commands like 'npm init', 'npx create-react-app'. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons) to ensure a complete, coherent experience. Never link to external services or assume local paths for assets that have not been created.
4. **Verify:** Review work against the original request. Fix bugs and deviations. Ensure styling and interactions produce a high-quality, functional, and beautiful prototype. **Build the application and ensure there are no compile errors.**
5. **Solicit Feedback:** Provide instructions on how to start the application and request user feedback on the prototype.`.trim();
}
@@ -603,7 +603,7 @@ function newApplicationSteps(options: PrimaryWorkflowsOptions): string {
- **Mobile:** Compose Multiplatform or Flutter.
- **Games:** HTML/CSS/JS (Three.js for 3D).
- **CLIs:** Python or Go.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using ${formatToolName(SHELL_TOOL_NAME)}. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons). Never link to external services or assume local paths for assets that have not been created.
3. **Implementation:** Autonomously implement each feature per the approved plan. When starting, scaffold the application using ${formatToolName(SHELL_TOOL_NAME)}. For interactive scaffolding tools (like create-react-app, create-vite, or npm create), you MUST use the corresponding non-interactive flag (e.g. '--yes', '-y', or specific template flags) to prevent the environment from hanging waiting for user input. For visual assets, utilize **platform-native primitives** (e.g., stylized shapes, gradients, icons). Never link to external services or assume local paths for assets that have not been created.
4. **Verify:** Review work against the original request. Fix bugs and deviations. **Build the application and ensure there are no compile errors.**`.trim();
}