diff --git a/.gemini/skills/agent-tui/SKILL.md b/.gemini/skills/agent-tui/SKILL.md new file mode 100644 index 0000000000..e6715173eb --- /dev/null +++ b/.gemini/skills/agent-tui/SKILL.md @@ -0,0 +1,330 @@ +--- +name: agent-tui +description: > + Main Agents: Do NOT use this skill directly. If you need to test the TUI, invoke the `tui_tester` subagent. + Drive terminal UI (TUI) applications programmatically for testing, automation, and inspection. + Use when: automating CLI/TUI interactions, regression testing terminal apps, or verifying interactive behavior. + Also use when: user asks "what is agent-tui", "what does agent-tui do", "demo agent-tui", "show me agent-tui", "how does agent-tui work", or wants to see it in action. +--- + +## 🚨 CRITICAL: macOS Daemon Workaround & Gemini CLI Usage 🚨 + +When using `agent-tui` in this macOS environment, the default background daemonization process crashes, causing `Connection refused (os error 61)` errors. + +**You MUST start the daemon manually shielded from TTY hangups before running any `agent-tui` commands.** Using `nohup` is insufficient; you must use `tmux` to provide a fully isolated pseudo-terminal. + +To support parallel runs, **only restart the daemon if it is not currently running:** + +```bash +# Check if daemon is alive, start it in tmux if it is not +if ! agent-tui sessions >/dev/null 2>&1; then + tmux kill-session -t agent-tui 2>/dev/null || true + agent-tui daemon stop 2>/dev/null || true + rm -f /tmp/agent-tui* + tmux new-session -d -s agent-tui 'agent-tui daemon start --foreground > /tmp/agent-tui-daemon.log 2>&1' + sleep 1 +fi +``` + +### Session ID vs PID (Crucial for Reconnection) + +When `agent-tui run` returns JSON, it includes both a `session_id` and a `pid`. The `pid` is purely informational (the OS process ID of the child command). You **do not** use the `pid` to reconnect or issue commands. You must always use the `session_id` (e.g., `--session `). + +If the daemon crashes (`os error 61`), the pseudo-terminal is destroyed. Even if the child `pid` survives as an orphan, you cannot reconnect to it. You must restart the daemon using the workaround above and start a completely new session. + +### Testing the Gemini CLI + +When testing the Gemini CLI with `agent-tui`, there are several strict requirements to ensure deterministic and accurate behavior: + +1. **Build Before Running**: `agent-tui` runs the built JS files, not TypeScript. You **MUST** run `npm run build` or `npm run build:all` after making code changes and before launching the CLI with `agent-tui`. +2. **Bypass Trust Modals**: Always pass `GEMINI_CLI_TRUST_WORKSPACE=true` in the environment. If you don't, any new project-level agents or extensions will trigger a full-screen "Acknowledge and Enable" modal. This modal steals focus, swallows automation keystrokes, and causes `agent-tui wait` commands to time out. +3. **Isolated Environments**: If you need to test without real user credentials or existing agents interfering, isolate the global settings using `GEMINI_CLI_HOME=`. +4. **Testing State Deltas (e.g., Reloads)**: If you are testing features that report deltas (e.g., `/agents reload` outputting "1 new local subagent"), you **MUST**: + - Start the CLI *first* so it establishes its baseline registry. + - Use a separate shell command (outside of `agent-tui`) to write the new agent `.md`/`.toml` file. + - Use `agent-tui type` and `press` to trigger the `/agents reload` command inside the running session. + - (If you add the files before starting the CLI, they become part of the baseline and won't trigger the delta logic). + +```bash +# Example: Standard isolated run (sandboxed config + bypass trust modals) +env GEMINI_CLI_TRUST_WORKSPACE=true GEMINI_CLI_HOME=test-gemini-home agent-tui run -d "$(pwd)" node packages/cli/dist/index.js +``` + +# Terminal Automation Mastery + +## Prerequisites + +- **Supported OS**: macOS or Linux (Windows not supported yet). +- **Verify install**: + +```bash +agent-tui --version +``` + +If not installed, use one of: + +```bash +# Recommended: one-line install (macOS/Linux) +curl -fsSL https://raw.githubusercontent.com/pproenca/agent-tui/master/install.sh | sh +``` + +```bash +# Package manager +npm i -g agent-tui +pnpm add -g agent-tui +bun add -g agent-tui +``` + +```bash +# Build from source +cargo install --git https://github.com/pproenca/agent-tui.git --path cli/crates/agent-tui +``` + +If you used the install script, ensure `~/.local/bin` is on your PATH. + +## Philosophy: Why Terminal Automation Is Different + +Terminal UIs are **stateless from the observer's perspective**. Unlike web browsers with a persistent DOM, terminal automation works with a constantly-refreshed character grid. This fundamental difference shapes everything: + +| Web Automation | Terminal Automation | +|----------------|---------------------| +| DOM persists across interactions | Screen buffer is redrawn constantly | +| Selectors are stable | Text positions may shift | +| Query once, act many times | Must re-verify before EVERY action | +| Network events signal completion | Must detect visual stability | + +**The Core Insight**: agent-tui gives you vision without memory. Each screenshot is a fresh observation. Previous state means nothing after the UI changes. This isn't a limitationβ€”it's the nature of terminal interaction. + +## Mental Model: The Feedback Loop + +Think of terminal automation as a **closed-loop control system**: + +``` + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ β”‚ + β–Ό β”‚ +OBSERVE ──► DECIDE ──► ACT ──► WAIT ──► VERIFY β”€β”€β”€β”˜ + β”‚ β”‚ + β”‚ β”‚ + └─────── NEVER skip β—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +**Each phase is mandatory.** Skipping verification is the #1 cause of flaky automation. + +### The "Fresh Eyes" Principle + +Every time you need to interact with the UI: + +1. **Take a fresh screenshot** β€” your previous one is now stale +2. **Locate your target visually** β€” text positions may have changed +3. **Verify the state** β€” the UI may have changed unexpectedly +4. **Act only when stable** β€” animations and loading states cause failures + +This feels slower, but it's the only reliable approach. Optimistic reuse of stale state causes intermittent failures that are painful to debug. + +## Critical Rules (Non-Negotiable) + +> **RULE 1: Atomic Execution (No Pipelining)** +> You are FORBIDDEN from chaining commands with `&&` (e.g., `type "x" && press Enter && wait`). Modals or UI updates can intercept your keystrokes. You MUST execute one atomic action, wait, screenshot, and verify before taking the next action in a new turn. + +> **RULE 2: Re-snapshot after EVERY action** +> The UI state is invalidated by any change. Always take a fresh screenshot before acting again. + +> **RULE 3: Never act on unstable UI** +> If the UI is animating, loading, or transitioning, `wait --stable` first. Acting during transitions because race conditions. + +> **RULE 4: Verify before claiming success** +> Use `wait "expected text" --assert` to confirm outcomes. Don't assume an action workedβ€”prove it. + +> **RULE 5: Error Recovery** +> If a `wait` command times out, DO NOT blindly restart or kill the session. Execute `screenshot` to visually diagnose what unexpected UI element (modal, error dialog, lost focus) intercepted the flow. + +> **RULE 6: Clean up sessions** +> Always end with `agent-tui kill`. Orphaned sessions consume resources and can interfere with future runs. + +## Decision Framework + +### Which Screenshot Mode? + +Use `screenshot --format json` when parsing automation output, or plain `screenshot` for human readable text. + +### How to Wait? + +``` +What are you waiting for? +β”‚ +β”œβ”€β–Ί Specific text to appear +β”‚ └─► `wait "text" --assert` (fails if not found) +β”‚ +β”œβ”€β–Ί Specific text to disappear +β”‚ └─► `wait "text" --gone --assert` +β”‚ +β”œβ”€β–Ί UI to stop changing (animations, loading) +β”‚ └─► `wait --stable` +β”‚ +└─► Multiple conditions + └─► Chain waits sequentially +``` + +### How to Act? + +``` +What do you need to do? +β”‚ +β”œβ”€β–Ί Type text into the terminal +β”‚ └─► `type "text"` +β”‚ +β”œβ”€β–Ί Send keyboard shortcuts/navigation +β”‚ └─► `press Ctrl+C` or `press ArrowDown Enter` +``` + +## Core Workflow + +The canonical automation loop: + +```bash +# 1. START: Launch the TUI app +agent-tui run [-- args...] + +# 2. OBSERVE: Get current UI state +agent-tui screenshot --format json + +# 3. DECIDE: Based on text, determine next action +# (This happens in your head/code) + +# 4. ACT: Execute the action +agent-tui type "text" +agent-tui press Enter + +# 5. WAIT: Synchronize with UI changes +agent-tui wait "Expected" --assert # or wait --stable + +# 6. VERIFY: Confirm the outcome (often combined with step 5) +# If verification fails, handle the error + +# 7. REPEAT: Go back to step 2 until done + +# 8. CLEANUP: Always clean up +agent-tui kill +``` + +## Anti-Patterns (What NOT to Do) + +### ❌ Acting During Animation/Loading + +```bash +# WRONG: Acting immediately on dynamic UI +agent-tui run my-app +agent-tui screenshot --format json # UI might still be loading! +agent-tui type "value" # ❌ Might miss the input field + +# RIGHT: Wait for stability first +agent-tui run my-app +agent-tui wait --stable # Let UI settle +agent-tui screenshot --format json # Now it's reliable +agent-tui type "value" +``` + +### ❌ Assuming Success Without Verification + +```bash +# WRONG: Assuming the type worked +agent-tui type "value" +agent-tui press Enter +# ...proceed as if success... # ❌ What if it failed silently? + +# RIGHT: Verify the outcome +agent-tui type "value" +agent-tui press Enter +agent-tui wait "Success" --assert # βœ“ Proves the action worked +``` + +### ❌ Skipping Cleanup + +```bash +# WRONG: Forgetting to kill the session +agent-tui run my-app +# ...do stuff... +# script ends # ❌ Session left running! + +# RIGHT: Always clean up +agent-tui run my-app +# ...do stuff... +agent-tui kill # βœ“ Clean exit +``` + +## Before You Start: Clarify Requirements + +Before automating any TUI, gather this information: + +1. **Command**: What exactly to run? (`my-app --flag` or `npm start`?) +2. **Success criteria**: What text/state indicates success? +3. **Input sequence**: What keystrokes/data to enter, in what order? +4. **Safety**: Is it safe to submit forms, delete data, etc.? +5. **Auth**: Does it need login? Test credentials? +6. **Live preview**: Does the user want to watch? (`agent-tui live start --open`) + +If any of these are unclear, ask before running. + +## Demo Mode: Showing What agent-tui Can Do + +When a user asks what agent-tui is, wants a demo, or asks "show me how it works": + +1. **Don't explainβ€”demonstrate.** Actions speak louder than words. +2. **Use the live preview** so they can watch in real-time. +3. **Run `top`**β€”it's universal and shows dynamic real-time updates. + +**Quick demo trigger phrases:** +- "What is agent-tui?" / "What does agent-tui do?" +- "Demo agent-tui" / "Show me agent-tui" +- "How does agent-tui work?" / "See it in action" + +## Failure Recovery + +| Symptom | Diagnosis | Solution | +|---------|-----------|----------| +| "Text not found" | Stale view or text moved | Re-snapshot, locate text again | +| Wait times out | UI didn't reach expected state | Check screenshot, verify expectations | +| "Daemon not running" | Daemon crashed or not started | `agent-tui daemon start` | +| Unexpected layout | Wrong terminal size | `agent-tui resize --cols 120 --rows 40` | +| Session unresponsive | App crashed or hung | `agent-tui kill`, then re-run | +| Repeated failures | Something fundamentally wrong | Stop after 3-5 attempts, ask user | + +## Self-Discovery: Use --help + +You don't need to memorize every flag. The CLI is self-documenting: + +```bash +agent-tui --help # List all commands +agent-tui run --help # Options for 'run' +agent-tui screenshot --help # Options for 'screenshot' +agent-tui wait --help # Options for 'wait' +``` + +**When in doubt, ask the CLI.** This skill teaches *when* and *why* to use commands. For exact flags and syntax, `--help` is authoritative. + +## Quick Reference + +```bash +# Start app +agent-tui run [-- args] # Launch TUI under control + +# Observe +agent-tui screenshot # Plain text view +agent-tui screenshot --format json # Machine-readable output + +# Act +agent-tui press Enter # Press key(s) +agent-tui press Ctrl+C # Keyboard shortcuts +agent-tui type "text" # Type text + +# Wait/Verify +agent-tui wait "text" --assert # Wait for text, fail if not found +agent-tui wait "text" --gone --assert # Wait for text to disappear +agent-tui wait --stable # Wait for UI to stop changing + +# Manage +agent-tui sessions # List active sessions +agent-tui live start --open # Start live preview +agent-tui kill # End current session +``` diff --git a/.gemini/skills/tui-tester/SKILL.md b/.gemini/skills/tui-tester/SKILL.md new file mode 100644 index 0000000000..d9aed662d0 --- /dev/null +++ b/.gemini/skills/tui-tester/SKILL.md @@ -0,0 +1,65 @@ +--- +name: tui-tester +description: Expert guidance for testing Gemini CLI behavior and visual output using terminal automation. +--- + +# TUI Tester Skill + +This skill provides the operational manual for verifying Gemini CLI behavioral changes and visual output using terminal automation. + +## Core Responsibilities + +- **Verify Behavior**: Confirm that code changes result in the expected terminal interactions. +- **Visual Validation**: Ensure the TUI renders correctly across different terminal sizes and states. +- **Regression Testing**: Use automation to prevent breaking existing interactive workflows. + +## Critical Protocol + +When performing TUI testing, you must adhere to these strict rules: + +### 1. Initialization +**YOUR ABSOLUTE FIRST ACTION MUST BE:** +Activate the `agent-tui` skill. This provides the underlying tools needed for terminal automation. + +### 2. Environment Setup (macOS / Parallel Safe) +Ensure the global daemon is running and the live preview is open: +```bash +if ! agent-tui sessions >/dev/null 2>&1; then + tmux kill-session -t agent-tui 2>/dev/null || true + agent-tui daemon stop 2>/dev/null || true + rm -f /tmp/agent-tui* + tmux new-session -d -s agent-tui 'agent-tui daemon start --foreground > /tmp/agent-tui-daemon.log 2>&1' + sleep 1 +fi +agent-tui live start --open +``` + +### 3. Session Management +- **Session IDs**: Always use the `session_id` returned by `agent-tui run` for subsequent interactions. +- **Atomic Execution**: Execute exactly one command per turn. Do not pipeline actions. +- **The Loop**: Action -> Wait -> Screenshot -> Verify -> Next Action. + +### 4. Gemini CLI Specifics +- **Build First**: Always run `npm run build` or `npm run build:all` before testing local changes. +- **Bypass Trust**: Set `GEMINI_CLI_TRUST_WORKSPACE=true` to avoid focus-stealing modals. +- **Isolate Config**: Use `GEMINI_CLI_HOME` to prevent interference with your personal settings. + +## Workflow Example + +```bash +# Start the CLI +env GEMINI_CLI_TRUST_WORKSPACE=true agent-tui run node packages/cli/dist/index.js + +# Wait for the prompt +agent-tui wait "β”‚" --assert + +# Send a command +agent-tui type "/help" +agent-tui press Enter + +# Verify output +agent-tui wait "Available Commands" --assert +``` + +## Error Recovery +If a wait times out, take a fresh screenshot to diagnose the state. If you see `os error 61`, restart the daemon using the tmux method.