Pipe stderr from npx chrome-devtools-mcp instead of inheriting it.
The server's banner warnings were leaking into the terminal and
corrupting the Ink-based UI in alternate buffer mode. Piped output
is forwarded to debugLogger so it remains visible with --debug.
Updated the browser_agent description from a primitive-focused listing
(navigating, filling, clicking) to a goal-oriented description that
emphasizes autonomy, multi-step reasoning, and dynamic feedback
interpretation. This encourages the parent agent to delegate entire
tasks in a single call rather than micromanaging individual browser
actions.
The system prompt always included the VISUAL IDENTIFICATION section
telling the model about analyze_screenshot, even when visualModel was
not configured. This caused the model to attempt calling the tool
despite it not being registered.
- Convert BROWSER_SYSTEM_PROMPT to buildBrowserSystemPrompt(visionEnabled)
- Pass vision state from factory to definition builder
- Remove analyze_screenshot reference from click_at tool description
- Add tests for conditional prompt inclusion/exclusion
- Fix misleading test comment about tool count
- Add submitKey parameter to type_text tool for pressing Enter/Tab/etc
after typing, eliminating a separate model round-trip per value entry
- Update system prompt and tool hints to guide model toward type_text
with submitKey instead of per-character press_key calls
- Refactor connection error handling into createConnectionError() with
session-mode-aware remediation messages for profile locks, timeouts,
and generic failures
- Update terminal failure prompts to pass through error remediation
verbatim instead of hardcoding instructions
- Add tests for profile-lock, timeout, and generic connection errors
Vision (screenshot analysis + coordinate-based interactions) is now
disabled by default. Set visualModel in browser_agent customConfig
to enable it, e.g. visualModel: 'gemini-2.5-computer-use-preview-10-2025'.
- Add custom type_text tool that types a full string by internally
calling press_key for each character, turning N model round-trips
into 1. Dramatically speeds up text input in complex web apps.
- Move tool-specific usage rules from system prompt to individual
tool descriptions via augmentToolDescription() for better
organization and token efficiency.
- Add terminal failure handling instructions to system prompt
(Chrome connection errors, browser crashes, repeated errors)
with specific remediation steps.
- Add complex web app guidance (spreadsheets, rich editors) to
system prompt, recommending type_text + keyboard navigation.
- Fix augmentToolDescription key ordering so more-specific keys
(fill_form, click_at) match before shorter keys (fill, click).
- Remove non-existent tool references (scroll, type_text as MCP tool)
and add click_at hint for vision tool.
Fix chrome-devtools-mcp CLI flags:
- --existing (invalid) → --autoConnect for existing session mode
- --profile-path (invalid) → --userDataDir for custom profile path
- Default session mode changed from 'isolated' to 'persistent'
Add 'persistent' session mode (new default) which uses a persistent
Chrome profile at ~/.cache/chrome-devtools-mcp/chrome-profile.
Add connection timeout and actionable error for 'existing' mode when
Chrome remote debugging is not enabled.
Implement the visual agent using the LocalAgentDefinition pattern:
- VisualAgentDefinition: Agent metadata for coordinate-based visual tasks
- delegateToVisualAgent.ts: Tool for semantic agent to delegate visual tasks
- Uses gemini-2.5-computer-use-preview-10-2025 model for Computer Use capability
The visual agent handles tasks requiring visual identification or precise
coordinate-based actions that cannot be done via the accessibility tree.
Add extensible browser agent configuration using the agents.overrides pattern:
- Extended AgentOverride interface with customConfig field for agent-specific settings
- Added BrowserAgentCustomConfig type for browser-specific configuration
- Added getAgentOverride() and getBrowserAgentConfig() methods to Config class
- Settings configured via agents.overrides.browser_agent.customConfig
- Updated settings schema with customConfig in AgentOverride definition
This establishes the foundational pattern for configuring the browser agent
through the standard agents.overrides infrastructure.