diff --git a/.gemini/commands/fix-behavioral-eval.toml b/.gemini/commands/fix-behavioral-eval.toml deleted file mode 100644 index d2f1c5b3ed..0000000000 --- a/.gemini/commands/fix-behavioral-eval.toml +++ /dev/null @@ -1,60 +0,0 @@ -description = "Check status of nightly evals, fix failures for key models, and re-run." -prompt = """ -You are an expert at fixing behavioral evaluations. - -1. **Investigate**: - - Use 'gh' cli to fetch the results from the latest run from the main branch: https://github.com/google-gemini/gemini-cli/actions/workflows/evals-nightly.yml. - - DO NOT push any changes or start any runs. The rest of your evaluation will be local. - - Evals are in evals/ directory and are documented by evals/README.md. - - The test case trajectory logs will be logged to evals/logs. - - You should also enable and review the verbose agent logs by setting the GEMINI_DEBUG_LOG_FILE environment variable. - - Identify the relevant test. Confine your investigation and validation to just this test. - - Proactively add logging that will aid in gathering information or validating your hypotheses. - -2. **Fix**: - - If a relevant test is failing, locate the test file and the corresponding prompt/code. - - It's often helpful to make an extreme, brute force change to see if you are changing the right place to make an improvement and then scope it back iteratively. - - Your **final** change should be **minimal and targeted**. - - Keep in mind the following: - - The prompt has multiple configurations and pieces. Take care that your changes - end up in the final prompt for the selected model and configuration. - - The prompt chosen for the eval is intentional. It's often vague or indirect - to see how the agent performs with ambiguous instructions. Changing it should - be a last resort. - - When changing the test prompt, carefully consider whether the prompt still tests - the same scenario. We don't want to lose test fidelity by making the prompts too - direct (i.e.: easy). - - Your primary mechanism for improving the agent's behavior is to make changes to - tool instructions, system prompt (snippets.ts), and/or modules that contribute to the prompt. - - If prompt and description changes are unsuccessful, use logs and debugging to - confirm that everything is working as expected. - - If unable to fix the test, you can make recommendations for architecture changes - that might help stablize the test. Be sure to THINK DEEPLY if offering architecture guidance. - Some facts that might help with this are: - - Agents may be composed of one or more agent loops. - - AgentLoop == 'context + toolset + prompt'. Subagents are one type of agent loop. - - Agent loops perform better when: - - They have direct, unambiguous, and non-contradictory prompts. - - They have fewer irrelevant tools. - - They have fewer goals or steps to perform. - - They have less low value or irrelevant context. - - You may suggest compositions of existing primitives, like subagents, or - propose a new one. - - These recommendations should be high confidence and should be grounded - in observed deficient behaviors rather than just parroting the facts above. - Investigate as needed to ground your recommendations. - -3. **Verify**: - - Run just that one test if needed to validate that it is fixed. Be sure to run vitest in non-interactive mode. - - Running the tests can take a long time, so consider whether you can diagnose via other means or log diagnostics before committing the time. You must minimize the number of test runs needed to diagnose the failure. - - After the test completes, check whether it seems to have improved. - - You will need to run the test 3 times for Gemini 3.0, Gemini 3 flash, and Gemini 2.5 pro to ensure that it is truly stable. Run these runs in parallel, using scripts if needed. - - Some flakiness is expected; if it looks like a transient issue or the test is inherently unstable but passes 2/3 times, you might decide it cannot be improved. - -4. **Report**: - - Provide a summary of the test success rate for each of the tested models. - - Success rate is calculated based on 3 runs per model (e.g., 3/3 = 100%). - - If you couldn't fix it due to persistent flakiness, explain why. - -{{args}} -""" \ No newline at end of file diff --git a/.gemini/commands/promote-behavioral-eval.toml b/.gemini/commands/promote-behavioral-eval.toml deleted file mode 100644 index 9893e9b02b..0000000000 --- a/.gemini/commands/promote-behavioral-eval.toml +++ /dev/null @@ -1,29 +0,0 @@ -description = "Promote behavioral evals that have a 100% success rate over the last 7 nightly runs." -prompt = """ -You are an expert at analyzing and promoting behavioral evaluations. - -1. **Investigate**: - - Use 'gh' cli to fetch the results from the most recent run from the main branch: https://github.com/google-gemini/gemini-cli/actions/workflows/evals-nightly.yml. - - DO NOT push any changes or start any runs. The rest of your evaluation will be local. - - Evals are in evals/ directory and are documented by evals/README.md. - - Identify tests that have passed 100% of the time for ALL enabled models across the past 7 runs in a row. - - NOTE: the results summary from the most recent run contains the last 7 runs test results. 100% means the test passed 3/3 times for that model and run. - - If a test meets this criteria, it is a candidate for promotion. - -2. **Promote**: - - For each candidate test, locate the test file in the evals/ directory. - - Promote the test according to the project's standard promotion process (e.g., moving it to a stable suite, updating its tags, or removing skip/flaky annotations). - - Ensure you follow any guidelines in evals/README.md for stable tests. - - Your **final** change should be **minimal and targeted** to just promoting the test status. - -3. **Verify**: - - Run the promoted tests locally to validate that they still execute correctly. Be sure to run vitest in non-interactive mode. - - Check that the test is now part of the expected standard or stable test suites. - -4. **Report**: - - Provide a summary of the tests that were promoted. - - Include the success rate evidence (7/7 runs passed for all models) for each promoted test. - - If no tests met the criteria for promotion, clearly state that and summarize the closest candidates. - -{{args}} -""" diff --git a/.gemini/skills/behavioral-evals/SKILL.md b/.gemini/skills/behavioral-evals/SKILL.md new file mode 100644 index 0000000000..f60fb04832 --- /dev/null +++ b/.gemini/skills/behavioral-evals/SKILL.md @@ -0,0 +1,56 @@ +--- +name: behavioral-evals +description: Guidance for creating, running, fixing, and promoting behavioral evaluations. Use when verifying agent decision logic, debugging failures, debugging prompt steering, or adding workspace regression tests. +--- + +# Behavioral Evals + +## Overview + +Behavioral evaluations (evals) are tests that validate the **agent's decision-making** (e.g., tool choice) rather than pure functionality. They are critical for verifying prompt changes, debugging steerability, and preventing regressions. + +> [!NOTE] +> **Single Source of Truth**: For core concepts, policies, running tests, and general best practices, always refer to **[evals/README.md](file:///Users/abhipatel/code/gemini-cli/docs/evals/README.md)**. + +--- + +## 🔄 Workflow Decision Tree + +1. **Does a prompt/tool change need validation?** + * *No* -> Normal integration tests. + * *Yes* -> Continue below. +2. **Is it UI/Interaction heavy?** + * *Yes* -> Use `appEvalTest` (`AppRig`). See **[creating.md](references/creating.md)**. + * *No* -> Use `evalTest` (`TestRig`). See **[creating.md](references/creating.md)**. +3. **Is it a new test?** + * *Yes* -> Set policy to `USUALLY_PASSES`. + * *No* -> `ALWAYS_PASSES` (locks in regression). +4. **Are you fixing a failure or promoting a test?** + * *Fixing* -> See **[fixing.md](references/fixing.md)**. + * *Promoting* -> See **[promoting.md](references/promoting.md)**. + +--- + +## 📋 Quick Checklist + +### 1. Setup Workspace +Seed the workspace with necessary files using the `files` object to simulate a realistic scenario (e.g., NodeJS project with `package.json`). +* *Details in **[creating.md](references/creating.md)*** + +### 2. Write Assertions +Audit agent decisions using `rig.setBreakpoint()` (AppRig only) or index verification on `rig.readToolLogs()`. +* *Details in **[creating.md](references/creating.md)*** + +### 3. Verify +Run single tests locally with Vitest. Confirm stability locally before relying on CI workflows. +* *See **[evals/README.md](file:///Users/abhipatel/code/gemini-cli/docs/evals/README.md)** for running commands.* + +--- + +## 📦 Bundled Resources + +Detailed procedural guides: +* **[creating.md](references/creating.md)**: Assertion strategies, Rig selection, Mock MCPs. +* **[fixing.md](references/fixing.md)**: Step-by-step automated investigation, architecture diagnosis guidelines. +* **[promoting.md](references/promoting.md)**: Candidate identification criteria and threshold guidelines. + diff --git a/.gemini/skills/behavioral-evals/assets/interactive_eval.ts.txt b/.gemini/skills/behavioral-evals/assets/interactive_eval.ts.txt new file mode 100644 index 0000000000..2d2b7433dc --- /dev/null +++ b/.gemini/skills/behavioral-evals/assets/interactive_eval.ts.txt @@ -0,0 +1,27 @@ +import { describe, expect } from 'vitest'; +import { appEvalTest } from './app-test-helper.js'; + +describe('interactive_feature', () => { + // New tests MUST start as USUALLY_PASSES + appEvalTest('USUALLY_PASSES', { + name: 'should pause for user confirmation', + files: { + 'package.json': JSON.stringify({ name: 'app' }) + }, + prompt: 'Task description here requiring approval', + timeout: 60000, + setup: async (rig) => { + // ⚠️ Breakpoints are ONLY safe in appEvalTest + rig.setBreakpoint(['ask_user']); + }, + assert: async (rig) => { + // 1. Wait for the breakpoint to trigger + const confirmation = await rig.waitForPendingConfirmation('ask_user'); + expect(confirmation).toBeDefined(); + + // 2. Resolve it so the test can finish + await rig.resolveTool(confirmation); + await rig.waitForIdle(); + }, + }); +}); diff --git a/.gemini/skills/behavioral-evals/assets/standard_eval.ts.txt b/.gemini/skills/behavioral-evals/assets/standard_eval.ts.txt new file mode 100644 index 0000000000..3e666dfc37 --- /dev/null +++ b/.gemini/skills/behavioral-evals/assets/standard_eval.ts.txt @@ -0,0 +1,30 @@ +import { describe, expect } from 'vitest'; +import { evalTest } from './test-helper.js'; + +describe('core_feature', () => { + // New tests MUST start as USUALLY_PASSES + evalTest('USUALLY_PASSES', { + name: 'should perform expected agent action', + setup: async (rig) => { + // For mocking offline MCP: + // rig.addMockMcpServer('workspace-server', 'google-workspace'); + }, + files: { + 'src/app.ts': '// some code', + }, + prompt: 'Task description here', + timeout: 60000, // 1 minute safety limit + assert: async (rig, result) => { + // 1. Audit the trajectory (Safe for standard evalTest) + const logs = rig.readToolLogs(); + const hasTool = logs.some((l) => l.toolRequest.name === 'read_file'); + expect(hasTool, 'Agent should have read the file').toBe(true); + + // 2. Assert efficiency (Cost/Turn) + expect(logs.length).toBeLessThan(5); + + // 3. Assert final output + expect(result).toContain('Expected Keyword'); + }, + }); +}); diff --git a/.gemini/skills/behavioral-evals/references/creating.md b/.gemini/skills/behavioral-evals/references/creating.md new file mode 100644 index 0000000000..bcc1baff06 --- /dev/null +++ b/.gemini/skills/behavioral-evals/references/creating.md @@ -0,0 +1,151 @@ +# Creating Behavioral Evals + +## 🔬 Rig Selection + +| Rig Type | Import From | Architecture | Use When | +| :---------------- | :--------------------- | :------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | +| **`evalTest`** | `./test-helper.js` | **Subprocess**. Runs the CLI in a separate process + waits for exit. | Standard workspace tests. **Do not use `setBreakpoint`**; auditing history (`readToolLogs`) is safer. | +| **`appEvalTest`** | `./app-test-helper.js` | **In-Process**. Runs directly inside the runner loop. | UI/Ink rendering. Safe for `setBreakpoint` triggers. | + +--- + +## 🏗️ Scenario Design + +Evals must simulate realistic agent environments to effectively test +decision-making. + +- **Workspace State**: Seed with standard project anchors if testing general + capabilities: + - `package.json` for NodeJS environments. + - Minimal configuration files (`tsconfig.json`, `GEMINI.md`). +- **Structural Complexity**: Provide enough files to force the agent to _search_ + or _navigate_, rather than giving the answer directly. Avoid trivial one-file + tests unless testing exact prompt steering. + +--- + +## ❌ Fail First Principle + +Before asserting a new capability or locking in a fix, **verify that the test +fails first**. + +- It is easy to accidentally write an eval that asserts behaviors that are + already met or pass by default. +- **Process**: reproduce failure with test -> apply fix (prompt/tool) -> verify + test passes. + +--- + +## ✋ Testing Patterns + +### 1. Breakpoints + +Verifies the agent _intends_ to use a tool BEFORE executing it. Useful for +interactive prompts or safety checks. + +```typescript +// ⚠️ Only works with appEvalTest (AppRig) +setup: async (rig) => { + rig.setBreakpoint(['ask_user']); +}, +assert: async (rig) => { + const confirmation = await rig.waitForPendingConfirmation('ask_user'); + expect(confirmation).toBeDefined(); +} +``` + +### 2. Tool Confirmation Race + +When asserting multiple triggers (e.g., "enters plan mode then asks question"): + +```typescript +assert: async (rig) => { + let confirmation = await rig.waitForPendingConfirmation([ + 'enter_plan_mode', + 'ask_user', + ]); + + if (confirmation?.name === 'enter_plan_mode') { + rig.acceptConfirmation('enter_plan_mode'); + confirmation = await rig.waitForPendingConfirmation('ask_user'); + } + expect(confirmation?.toolName).toBe('ask_user'); +}; +``` + +### 3. Audit Tool Logs + +Audit exact operations to ensure efficiency (e.g., no redundant reads). + +```typescript +assert: async (rig, result) => { + await rig.waitForTelemetryReady(); + const toolLogs = rig.readToolLogs(); + + const writeCall = toolLogs.find( + (log) => log.toolRequest.name === 'write_file', + ); + expect(writeCall).toBeDefined(); +}; +``` + +### 4. Mock MCP Facades + +To evaluate tools connected via MCP without hitting live endpoints, load a mock +server configuration in the `setup` hook. + +```typescript +setup: async (rig) => { + rig.addMockMcpServer('workspace-server', 'google-workspace'); +}, +assert: async (rig) => { + await rig.waitForTelemetryReady(); + const toolLogs = rig.readToolLogs(); + const workspaceCall = toolLogs.find( + (log) => log.toolRequest.name === 'mcp_workspace-server_docs.getText' + ); + expect(workspaceCall).toBeDefined(); +}; +``` + +--- + +## ⚠️ Safety & Efficiency Guardrails + +### 1. Breakpoint Deadlocks + +Breakpoints (`setBreakpoint`) pause execution. In standard `evalTest`, +`rig.run()` waits for the process to exit _before_ assertions run. **This will +hang indefinitely.** + +- **Use Breakpoints** for `appEvalTest` or interactive simulations. +- **Use Audit Tool Logs** (above) for standard trajectory tests. + +### 2. Runaway Timeout + +Always set a budget boundary in the `EvalCase` to prevent runaway loops on +quota: + +```typescript +evalTest('USUALLY_PASSES', { + name: '...', + timeout: 60000, // 1 minute safety limit + // ... +}); +``` + +### 3. Efficiency Assertion (Turn limits) + +Check if a tool is called _early_ using index checks: + +```typescript +assert: async (rig) => { + const toolLogs = rig.readToolLogs(); + const toolCallIndex = toolLogs.findIndex( + (log) => log.toolRequest.name === 'cli_help', + ); + + expect(toolCallIndex).toBeGreaterThan(-1); + expect(toolCallIndex).toBeLessThan(5); // Called within first 5 turns +}; +``` diff --git a/.gemini/skills/behavioral-evals/references/fixing.md b/.gemini/skills/behavioral-evals/references/fixing.md new file mode 100644 index 0000000000..fc78870515 --- /dev/null +++ b/.gemini/skills/behavioral-evals/references/fixing.md @@ -0,0 +1,71 @@ +# Fixing Behavioral Evals + +Use this guide when asked to debug, troubleshoot, or fix a failing behavioral +evaluation. + +--- + +## 1. 🔍 Investigate + +1. **Fetch Nightly Results**: Use the `gh` CLI to inspect the latest run from + `evals-nightly.yml` if applicable. + - _Example view URL_: + `https://github.com/google-gemini/gemini-cli/actions/workflows/evals-nightly.yml` +2. **Isolate**: DO NOT push changes or start remote runs. Confine investigation + to the local workspace. +3. **Read Logs**: + - Eval logs live in `evals/logs/.log`. + - Enable verbose debugging via `export GEMINI_DEBUG_LOG_FILE="debug.log"`. +4. **Diagnose**: Audit tool logs and telemetry. Note if due to setup/assert. + - **Tip**: Proactively add custom logging/diagnostics to check hypotheses. + +--- + +## 2. 🛠️ Fix Strategy + +1. **Targeted Location**: Locate the test case and the corresponding + prompt/code. +2. **Iterative Scope**: Make extreme change first to verify scope, then refine + to a minimal, targeted change. +3. **Assertion Fidelity**: + - Changing the test prompt is a **last resort** (prompts are often vague by + design). + - **Warning**: Do not lose test fidelity by making prompts too direct/easy. + - **Primary Fix Trigger**: Adjust tool descriptions, system prompts + (`snippets.ts`), or **modules that contribute to the prompt template**. + - **Warning**: Prompts have multiple configurations; ensure your fix targets + the correct config for the model in question. +4. **Architecture Options**: If prompt or instruction tuning triggers no + improvement, analyze loop composition. + - **AgentLoop**: Defined by `context + toolset + prompt`. + - **Enhancements**: Loops perform best with direct prompts, fewer irrelevant + tools, low goal density, and minimal low-value/irrelevant context. + - **Modifications**: Compose subagents or isolate tools. Ground in observed + traces. + - **Warning**: Think deeply before offering recommendations; avoid parroting + abstract design guidelines. + +--- + +## 3. ✅ Verify + +1. **Run Local**: Run Vitest in non-interactive mode on just the file. +2. **Log Audit**: Prioritize diagnosing failures via log comparison before + triggering heavy test runs. +3. **Stability Limit**: Run the test **3 times** locally on key models (can use + scripts to run in parallel for speed): + - **Gemini 3.0** + - **Gemini 3 Flash** + - **Gemini 2.5 Pro** +4. **Flakiness Rule**: If it passes 2/3 times, it may be inherent noise + difficult to improve without a structural split. + +--- + +## 4. 📊 Report + +Provide a summary of: + +- Test success rate for each tested model (e.g., 3/3 = 100%). +- Root cause identification and fix explanation. +- If unfixed, provide high-confidence architecture recommendations. diff --git a/.gemini/skills/behavioral-evals/references/promoting.md b/.gemini/skills/behavioral-evals/references/promoting.md new file mode 100644 index 0000000000..d3d3eaf88f --- /dev/null +++ b/.gemini/skills/behavioral-evals/references/promoting.md @@ -0,0 +1,55 @@ +# Promoting Behavioral Evals + +Use this guide when asked to analyze nightly results and promote incubated tests +to stable suites. + +--- + +## 1. 🔍 Investigate candidates + +1. **Audit Nightly Logs**: Use the `gh` CLI to fetch results from + `evals-nightly.yml` (Direct URL: + `https://github.com/google-gemini/gemini-cli/actions/workflows/evals-nightly.yml`). + - **Tip**: The aggregate summary from the most recent run integrates the + last 7 runs of history automatically. + - **Safety**: DO NOT push changes or start remote runs. All verification is + local. +2. **Assess Stability**: Identify tests that pass **100% of the time** across + ALL enabled models over the **last 7 nightly runs** in a row. + - _100% means the test passed 3/3 times for every model and run._ +3. **Promotion Targets**: Tests meeting this criteria are candidates for + promotion from `USUALLY_PASSES` to `ALWAYS_PASSES`. + +--- + +## 2. 🚥 Promotion Steps + +1. **Locate File**: Locate the eval file in the `evals/` directory. +2. **Update Policy**: Modify the policy argument to `ALWAYS_PASSES`. + ```typescript + evalTest('ALWAYS_PASSES', { ... }) + ``` +3. **Targeting**: Follow guidelines in `evals/README.md` regarding stable suite + organization. +4. **Constraint**: Your final change must be **minimal and targeted** strictly + to promoting the test status. Do not refactor the test or setup fixtures. + +--- + +## 3. ✅ Verify + +1. **Run Prompted Tests**: Run the promoted test locally using non-interactive + Vitest to confirm structure validity. +2. **Verify Suite Inclusion**: Check that the test is successfully picked up by + standard runnable ranges. + +--- + +## 4. 📊 Report + +Provide a summary of: + +- Which tests were promoted. +- Provide the success rate evidence (e.g., 7/7 runs passed for all models). +- If no candidates qualified, list the next closest candidates and their current + pass rate. diff --git a/.gemini/skills/behavioral-evals/references/running.md b/.gemini/skills/behavioral-evals/references/running.md new file mode 100644 index 0000000000..cf8c46a8d6 --- /dev/null +++ b/.gemini/skills/behavioral-evals/references/running.md @@ -0,0 +1,95 @@ +# Running & Promoting Evals + +## 🛠️ Prerequisites + +Behavioral evals run against the compiled binary. You **must** build and bundle +the project first after making changes: + +```bash +npm run build && npm run bundle +``` + +--- + +## 🏃‍♂️ Running Tests + +### 1. Configure Environment Variables + +Evals require a standard API key. If your `.env` file has multiple keys or +comments, use this precise extraction setup: + +```bash +export GEMINI_API_KEY=$(grep '^GEMINI_API_KEY=' .env | cut -d '=' -f2) && RUN_EVALS=1 npx vitest run --config evals/vitest.config.ts +``` + +### 2. Commands + +| Command | Scope | Description | +| :---------------------------------- | :-------------- | :------------------------------------------------- | +| `npm run test:always_passing_evals` | `ALWAYS_PASSES` | Fast feedback, runs in CI. | +| `npm run test:all_evals` | All | Runs nightly incubation tests. Sets `RUN_EVALS=1`. | + +### Target Specific File + +_Note: `RUN_EVALS=1` is required for incubated (`USUALLY_PASSES`) tests._ + +```bash +RUN_EVALS=1 npx vitest run --config evals/vitest.config.ts my_feature.eval.ts +``` + +--- + +## 🐞 Debugging and Logs + +If a test fails, verify: + +- **Tool Trajectory Logs**:序列 of calls in `evals/logs/.log`. +- **Verbose Reasoning**: Capture raw buffer traces by setting + `GEMINI_DEBUG_LOG_FILE`: + ```bash + export GEMINI_DEBUG_LOG_FILE="debug.log" + ``` + +--- + +### 🎯 Verify Model Targeting + +- **Tip:** Standard evals benchmark against model variations. If a test passes + on Flash but fails on Pro (or vice versa), the issue is usually in the **tool + description**, not the prompt definition. Flash is sensitive to "instruction + bloat," while Pro is sensitive to "ambiguous intent." + +--- + +## 🚥 deflaking & Promotion + +To maintain CI stability, all new evals follow a strict incubation period. + +### 1. Incubation (`USUALLY_PASSES`) + +New tests must be created with the `USUALLY_PASSES` policy. + +```typescript +evalTest('USUALLY_PASSES', { ... }) +``` + +They run in **Evals: Nightly** workflows and do not block PR merges. + +### 2. Investigate Failures + +If a nightly eval regresses, investigate via agent: + +```bash +gemini /fix-behavioral-eval [optional-run-uri] +``` + +### 3. Promotion (`ALWAYS_PASSES`) + +Once a test scores 100% consistency over multiple nightly cycles: + +```bash +gemini /promote-behavioral-eval +``` + +_Do not promote manually._ The command verifies trajectory logs before updating +the file policy. diff --git a/.gemini/skills/ci/SKILL.md b/.gemini/skills/ci/SKILL.md new file mode 100644 index 0000000000..b55aa4d233 --- /dev/null +++ b/.gemini/skills/ci/SKILL.md @@ -0,0 +1,66 @@ +--- +name: ci +description: + A specialized skill for Gemini CLI that provides high-performance, fail-fast + monitoring of GitHub Actions workflows and automated local verification of CI + failures. It handles run discovery automatically—simply provide the branch name. +--- + +# CI Replicate & Status + +This skill enables the agent to efficiently monitor GitHub Actions, triage +failures, and bridge remote CI errors to local development. It defaults to +**automatic replication** of failures to streamline the fix cycle. + +## Core Capabilities + +- **Automatic Replication**: Automatically monitors CI and immediately executes + suggested test or lint commands locally upon failure. +- **Real-time Monitoring**: Aggregated status line for all concurrent workflows + on the current branch. +- **Fail-Fast Triage**: Immediately stops on the first job failure to provide a + structured report. + +## Workflow + +### 1. CI Replicate (`replicate`) - DEFAULT +Use this as the primary path to monitor CI and **automatically** replicate +failures locally for immediate triage and fixing. +- **Behavior**: When this workflow is triggered, the agent will monitor the CI + and **immediately and automatically execute** all suggested test or lint + commands (marked with 🚀) as soon as a failure is detected. +- **Tool**: `node .gemini/skills/ci/scripts/ci.mjs [branch]` +- **Discovery**: The script **automatically** finds the latest active or recent + run for the branch. Do NOT manually search for run IDs. +- **Goal**: Reproduce the failure locally without manual intervention, then + proceed to analyze and fix the code. + +### 1. CI Status (`status`) +Use this when you have pushed changes and need to monitor the CI and reproduce +any failures locally. +- **Tool**: `node .gemini/skills/ci/scripts/ci.mjs [branch] [run_id]` +- **Discovery**: The script **automatically** finds the latest active or recent + run for the branch. You should NOT manually search for \`run_id\` using \`gh run list\` + unless a specific historical run is requested. Simply provide the branch name. +- **Step 1 (Monitor)**: Execute the tool with the branch name. +- **Step 2 (Extract)**: Extract suggested \`npm test\` or \`npm run lint\` commands + from the output (marked with 🚀). +- **Step 3 (Reproduce)**: Execute those commands locally to confirm the failure. +- **Behavior**: It will poll every 15 seconds. If it detects a failure, it will + exit with a structured report and provide the exact commands to run locally. + +## Failure Categories & Actions + +- **Test Failures**: Agent should run the specific `npm test -w -- ` + command suggested. +- **Lint Errors**: Agent should run `npm run lint:all` or the specific package + lint command. +- **Build Errors**: Agent should check `tsc` output or build logs to resolve + compilation issues. +- **Job Errors**: Investigate `gh run view --job --log` for + infrastructure or setup failures. + +## Noise Filtering +The underlying scripts automatically filter noise (Git logs, NPM warnings, stack +trace overhead). The agent should focus on the "Structured Failure Report" +provided by the tool. diff --git a/.gemini/skills/ci/scripts/ci.mjs b/.gemini/skills/ci/scripts/ci.mjs new file mode 100755 index 0000000000..9073285231 --- /dev/null +++ b/.gemini/skills/ci/scripts/ci.mjs @@ -0,0 +1,281 @@ +#!/usr/bin/env node + +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +import { execSync } from 'node:child_process'; + +const BRANCH = + process.argv[2] || execSync('git branch --show-current').toString().trim(); +const RUN_ID_OVERRIDE = process.argv[3]; + +let REPO; +try { + const remoteUrl = execSync('git remote get-url origin').toString().trim(); + REPO = remoteUrl + .replace(/.*github\.com[\/:]/, '') + .replace(/\.git$/, '') + .trim(); +} catch (e) { + REPO = 'google-gemini/gemini-cli'; +} + +const FAILED_FILES = new Set(); + +function runGh(args) { + try { + return execSync(`gh ${args}`, { + stdio: ['ignore', 'pipe', 'ignore'], + }).toString(); + } catch (e) { + return null; + } +} + +function fetchFailuresViaApi(jobId) { + try { + const cmd = `gh api repos/${REPO}/actions/jobs/${jobId}/logs | grep -iE " FAIL |❌|ERROR|Lint failed|Build failed|Exception|failed with exit code"`; + return execSync(cmd, { + stdio: ['ignore', 'pipe', 'ignore'], + maxBuffer: 10 * 1024 * 1024, + }).toString(); + } catch (e) { + return ''; + } +} + +function isNoise(line) { + const lower = line.toLowerCase(); + return ( + lower.includes('* [new branch]') || + lower.includes('npm warn') || + lower.includes('fetching updates') || + lower.includes('node:internal/errors') || + lower.includes('at ') || // Stack traces + lower.includes('checkexecsyncerror') || + lower.includes('node_modules') + ); +} + +function extractTestFile(failureText) { + const cleanLine = failureText + .replace(/[|#\[\]()]/g, ' ') + .replace(/<[^>]*>/g, ' ') + .trim(); + const fileMatch = cleanLine.match(/([\w\/._-]+\.test\.[jt]sx?)/); + if (fileMatch) return fileMatch[1]; + return null; +} + +function generateTestCommand(failedFilesMap) { + const workspaceToFiles = new Map(); + for (const [file, info] of failedFilesMap.entries()) { + if ( + ['Job Error', 'Unknown File', 'Build Error', 'Lint Error'].includes(file) + ) + continue; + let workspace = '@google/gemini-cli'; + let relPath = file; + if (file.startsWith('packages/core/')) { + workspace = '@google/gemini-cli-core'; + relPath = file.replace('packages/core/', ''); + } else if (file.startsWith('packages/cli/')) { + workspace = '@google/gemini-cli'; + relPath = file.replace('packages/cli/', ''); + } + relPath = relPath.replace(/^.*packages\/[^\/]+\//, ''); + if (!workspaceToFiles.has(workspace)) + workspaceToFiles.set(workspace, new Set()); + workspaceToFiles.get(workspace).add(relPath); + } + const commands = []; + for (const [workspace, files] of workspaceToFiles.entries()) { + commands.push(`npm test -w ${workspace} -- ${Array.from(files).join(' ')}`); + } + return commands.join(' && '); +} + +async function monitor() { + let targetRunIds = []; + if (RUN_ID_OVERRIDE) { + targetRunIds = [RUN_ID_OVERRIDE]; + } else { + // 1. Get runs directly associated with the branch + const runListOutput = runGh( + `run list --branch "${BRANCH}" --limit 10 --json databaseId,status,workflowName,createdAt`, + ); + if (runListOutput) { + const runs = JSON.parse(runListOutput); + const activeRuns = runs.filter((r) => r.status !== 'completed'); + if (activeRuns.length > 0) { + targetRunIds = activeRuns.map((r) => r.databaseId); + } else if (runs.length > 0) { + const latestTime = new Date(runs[0].createdAt).getTime(); + targetRunIds = runs + .filter((r) => latestTime - new Date(r.createdAt).getTime() < 60000) + .map((r) => r.databaseId); + } + } + + // 2. Get runs associated with commit statuses (handles chained/indirect runs) + try { + const headSha = execSync(`git rev-parse "${BRANCH}"`).toString().trim(); + const statusOutput = runGh( + `api repos/${REPO}/commits/${headSha}/status -q '.statuses[] | select(.target_url | contains("actions/runs/")) | .target_url'`, + ); + if (statusOutput) { + const statusRunIds = statusOutput + .split('\n') + .filter(Boolean) + .map((url) => { + const match = url.match(/actions\/runs\/(\d+)/); + return match ? parseInt(match[1], 10) : null; + }) + .filter(Boolean); + + for (const runId of statusRunIds) { + if (!targetRunIds.includes(runId)) { + targetRunIds.push(runId); + } + } + } + } catch (e) { + // Ignore if branch/SHA not found or API fails + } + + if (targetRunIds.length > 0) { + const runNames = []; + for (const runId of targetRunIds) { + const runInfo = runGh(`run view "${runId}" --json workflowName`); + if (runInfo) { + runNames.push(JSON.parse(runInfo).workflowName); + } + } + console.log(`Monitoring workflows: ${[...new Set(runNames)].join(', ')}`); + } + } + + if (targetRunIds.length === 0) { + console.log(`No runs found for branch ${BRANCH}.`); + process.exit(0); + } + + while (true) { + let allPassed = 0, + allFailed = 0, + allRunning = 0, + allQueued = 0, + totalJobs = 0; + let anyRunInProgress = false; + const fileToTests = new Map(); + let failuresFoundInLoop = false; + + for (const runId of targetRunIds) { + const runOutput = runGh( + `run view "${runId}" --json databaseId,status,conclusion,workflowName`, + ); + if (!runOutput) continue; + const run = JSON.parse(runOutput); + if (run.status !== 'completed') anyRunInProgress = true; + + const jobsOutput = runGh(`run view "${runId}" --json jobs`); + if (jobsOutput) { + const { jobs } = JSON.parse(jobsOutput); + totalJobs += jobs.length; + const failedJobs = jobs.filter((j) => j.conclusion === 'failure'); + if (failedJobs.length > 0) { + failuresFoundInLoop = true; + for (const job of failedJobs) { + const failures = fetchFailuresViaApi(job.databaseId); + if (failures.trim()) { + failures.split('\n').forEach((line) => { + if (!line.trim() || isNoise(line)) return; + const file = extractTestFile(line); + const filePath = + file || + (line.toLowerCase().includes('lint') + ? 'Lint Error' + : line.toLowerCase().includes('build') + ? 'Build Error' + : 'Unknown File'); + let testName = line; + if (line.includes(' > ')) { + testName = line.split(' > ').slice(1).join(' > ').trim(); + } + if (!fileToTests.has(filePath)) + fileToTests.set(filePath, new Set()); + fileToTests.get(filePath).add(testName); + }); + } else { + const step = + job.steps?.find((s) => s.conclusion === 'failure')?.name || + 'unknown'; + const category = step.toLowerCase().includes('lint') + ? 'Lint Error' + : step.toLowerCase().includes('build') + ? 'Build Error' + : 'Job Error'; + if (!fileToTests.has(category)) + fileToTests.set(category, new Set()); + fileToTests + .get(category) + .add(`${job.name}: Failed at step "${step}"`); + } + } + } + for (const job of jobs) { + if (job.status === 'in_progress') allRunning++; + else if (job.status === 'queued') allQueued++; + else if (job.conclusion === 'success') allPassed++; + else if (job.conclusion === 'failure') allFailed++; + } + } + } + + if (failuresFoundInLoop) { + console.log( + `\n\n❌ Failures detected across ${allFailed} job(s). Stopping monitor...`, + ); + console.log('\n--- Structured Failure Report (Noise Filtered) ---'); + for (const [file, tests] of fileToTests.entries()) { + console.log(`\nCategory/File: ${file}`); + // Limit output per file if it's too large + const testsArr = Array.from(tests).map((t) => + t.length > 500 ? t.substring(0, 500) + '... [TRUNCATED]' : t, + ); + testsArr.slice(0, 10).forEach((t) => console.log(` - ${t}`)); + if (testsArr.length > 10) + console.log(` ... and ${testsArr.length - 10} more`); + } + const testCmd = generateTestCommand(fileToTests); + if (testCmd) { + console.log('\n🚀 Run this to verify fixes:'); + console.log(testCmd); + } else if ( + Array.from(fileToTests.keys()).some((k) => k.includes('Lint')) + ) { + console.log('\n🚀 Run this to verify lint fixes:\nnpm run lint:all'); + } + console.log('---------------------------------'); + process.exit(1); + } + + const completed = allPassed + allFailed; + process.stdout.write( + `\r⏳ Monitoring ${targetRunIds.length} runs... ${completed}/${totalJobs} jobs (${allPassed} passed, ${allFailed} failed, ${allRunning} running, ${allQueued} queued) `, + ); + if (!anyRunInProgress) { + console.log('\n✅ All workflows passed!'); + process.exit(0); + } + await new Promise((r) => setTimeout(r, 15000)); + } +} + +monitor().catch((err) => { + console.error('\nMonitor error:', err.message); + process.exit(1); +}); diff --git a/.github/workflows/eval-guidance.yml b/.github/workflows/eval-guidance.yml new file mode 100644 index 0000000000..e1f1ab3168 --- /dev/null +++ b/.github/workflows/eval-guidance.yml @@ -0,0 +1,69 @@ +name: 'Evals: PR Guidance' + +on: + pull_request: + paths: + - 'packages/core/src/**/*.ts' + - '!**/*.test.ts' + - '!**/*.test.tsx' + +permissions: + pull-requests: 'write' + contents: 'read' + +jobs: + provide-guidance: + name: 'Model Steering Guidance' + runs-on: 'ubuntu-latest' + if: "github.repository == 'google-gemini/gemini-cli'" + steps: + - name: 'Checkout' + uses: 'actions/checkout@08eba0b27e820071cde6df949e0beb9ba4906955' # ratchet:actions/checkout@v4 + with: + fetch-depth: 0 + + - name: 'Set up Node.js' + uses: 'actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020' # ratchet:actions/setup-node@v4.4.0 + with: + node-version-file: '.nvmrc' + cache: 'npm' + + - name: 'Detect Steering Changes' + id: 'detect' + run: | + STEERING_DETECTED=$(node scripts/changed_prompt.js --steering-only) + echo "STEERING_DETECTED=$STEERING_DETECTED" >> "$GITHUB_OUTPUT" + + - name: 'Analyze PR Content' + if: "steps.detect.outputs.STEERING_DETECTED == 'true'" + id: 'analysis' + env: + GH_TOKEN: '${{ secrets.GITHUB_TOKEN }}' + run: | + # Check for behavioral eval changes + EVAL_CHANGES=$(git diff --name-only origin/${{ github.base_ref }}...HEAD | grep "^evals/" || true) + if [ -z "$EVAL_CHANGES" ]; then + echo "MISSING_EVALS=true" >> "$GITHUB_OUTPUT" + fi + + # Check if user is a maintainer (has write/admin access) + USER_PERMISSION=$(gh api repos/${{ github.repository }}/collaborators/${{ github.actor }}/permission --jq '.permission') + if [[ "$USER_PERMISSION" == "admin" || "$USER_PERMISSION" == "write" ]]; then + echo "IS_MAINTAINER=true" >> "$GITHUB_OUTPUT" + fi + + - name: 'Post Guidance Comment' + if: "steps.detect.outputs.STEERING_DETECTED == 'true'" + uses: 'thollander/actions-comment-pull-request@65f9e5c9a1f2cd378bd74b2e057c9736982a8e74' # ratchet:thollander/actions-comment-pull-request@v3 + with: + comment-tag: 'eval-guidance-bot' + message: | + ### 🧠 Model Steering Guidance + + This PR modifies files that affect the model's behavior (prompts, tools, or instructions). + + ${{ steps.analysis.outputs.MISSING_EVALS == 'true' && '- ⚠️ **Consider adding Evals:** No behavioral evaluations (`evals/*.eval.ts`) were added or updated in this PR. Consider adding a test case to verify the new behavior and prevent regressions.' || '' }} + ${{ steps.analysis.outputs.IS_MAINTAINER == 'true' && '- 🚀 **Maintainer Reminder:** Please ensure that these changes do not regress results on benchmark evals before merging.' || '' }} + + --- + *This is an automated guidance message triggered by steering logic signatures.* diff --git a/docs/changelogs/preview.md b/docs/changelogs/preview.md index 39e1e0a2ed..0172fcdb87 100644 --- a/docs/changelogs/preview.md +++ b/docs/changelogs/preview.md @@ -1,6 +1,6 @@ -# Preview release: v0.35.0-preview.2 +# Preview release: v0.35.0-preview.5 -Released: March 19, 2026 +Released: March 23, 2026 Our preview release includes the latest, new, and experimental features. This release may not be as stable as our [latest weekly release](latest.md). @@ -33,6 +33,13 @@ npm install -g @google/gemini-cli@preview ## What's Changed +- fix(patch): cherry-pick b2d6dc4 to release/v0.35.0-preview.4-pr-23546 + [CONFLICTS] by @gemini-cli-robot in + [#23585](https://github.com/google-gemini/gemini-cli/pull/23585) +- fix(patch): cherry-pick daf3691 to release/v0.35.0-preview.2-pr-23558 to patch + version v0.35.0-preview.2 and create version 0.35.0-preview.3 by + @gemini-cli-robot in + [#23565](https://github.com/google-gemini/gemini-cli/pull/23565) - fix(patch): cherry-pick 4e5dfd0 to release/v0.35.0-preview.1-pr-23074 to patch version v0.35.0-preview.1 and create version 0.35.0-preview.2 by @gemini-cli-robot in @@ -377,4 +384,4 @@ npm install -g @google/gemini-cli@preview [#22815](https://github.com/google-gemini/gemini-cli/pull/22815) **Full Changelog**: -https://github.com/google-gemini/gemini-cli/compare/v0.34.0-preview.4...v0.35.0-preview.2 +https://github.com/google-gemini/gemini-cli/compare/v0.34.0-preview.4...v0.35.0-preview.5 diff --git a/docs/cli/plan-mode.md b/docs/cli/plan-mode.md index 5299bb3463..2163e4fcd1 100644 --- a/docs/cli/plan-mode.md +++ b/docs/cli/plan-mode.md @@ -200,6 +200,7 @@ your specific environment. ```toml [[rule]] +toolName = "*" mcpName = "*" toolAnnotations = { readOnlyHint = true } decision = "allow" diff --git a/docs/core/remote-agents.md b/docs/core/remote-agents.md index 2e34a9dbc4..05975421fe 100644 --- a/docs/core/remote-agents.md +++ b/docs/core/remote-agents.md @@ -104,7 +104,7 @@ Gemini CLI supports the following authentication types: | `apiKey` | Send a static API key as an HTTP header. | | `http` | HTTP authentication (Bearer token, Basic credentials, or any IANA-registered scheme). | | `google-credentials` | Google Application Default Credentials (ADC). Automatically selects access or identity tokens. | -| `oauth2` | OAuth 2.0 Authorization Code flow with PKCE. Opens a browser for interactive sign-in. | +| `oauth` | OAuth 2.0 Authorization Code flow with PKCE. Opens a browser for interactive sign-in. | ### Dynamic values @@ -263,7 +263,7 @@ hosts: Requests to any other host will be rejected with an error. If your agent is hosted on a different domain, use one of the other auth types (`apiKey`, `http`, -or `oauth2`). +or `oauth`). #### Examples @@ -297,7 +297,7 @@ auth: --- ``` -### OAuth 2.0 (`oauth2`) +### OAuth 2.0 (`oauth`) Performs an interactive OAuth 2.0 Authorization Code flow with PKCE. On first use, Gemini CLI opens your browser for sign-in and persists the resulting tokens @@ -305,7 +305,7 @@ for subsequent requests. | Field | Type | Required | Description | | :------------------ | :------- | :------- | :------------------------------------------------------------------------------------------------------------------------------------------------- | -| `type` | string | Yes | Must be `oauth2`. | +| `type` | string | Yes | Must be `oauth`. | | `client_id` | string | Yes\* | OAuth client ID. Required for interactive auth. | | `client_secret` | string | No\* | OAuth client secret. Required by most authorization servers (confidential clients). Can be omitted for public clients that don't require a secret. | | `scopes` | string[] | No | Requested scopes. Can also be discovered from the agent card. | @@ -318,7 +318,7 @@ kind: remote name: oauth-agent agent_card_url: https://example.com/.well-known/agent.json auth: - type: oauth2 + type: oauth client_id: my-client-id.apps.example.com --- ``` diff --git a/docs/reference/commands.md b/docs/reference/commands.md index aa4a0d38db..4dd7e367e5 100644 --- a/docs/reference/commands.md +++ b/docs/reference/commands.md @@ -250,8 +250,8 @@ Slash commands provide meta-level control over the CLI itself. - **`list`** or **`ls`**: - **Description:** List configured MCP servers and tools. This is the default action if no subcommand is specified. - - **`refresh`**: - - **Description:** Restarts all MCP servers and re-discovers their available + - **`reload`**: + - **Description:** Reloads all MCP servers and re-discovers their available tools. - **`schema`**: - **Description:** List configured MCP servers and tools with descriptions diff --git a/docs/reference/configuration.md b/docs/reference/configuration.md index 47b0d8124a..8b38dc1aff 100644 --- a/docs/reference/configuration.md +++ b/docs/reference/configuration.md @@ -295,6 +295,11 @@ their corresponding top-level category object in your `settings.json` file. - **Description:** Hide the footer from the UI - **Default:** `false` +- **`ui.collapseDrawerDuringApproval`** (boolean): + - **Description:** Whether to collapse the UI drawer when a tool is awaiting + confirmation. + - **Default:** `true` + - **`ui.showMemoryUsage`** (boolean): - **Description:** Display memory usage information in the UI - **Default:** `false` @@ -844,6 +849,12 @@ their corresponding top-level category object in your `settings.json` file. "hasAccessToPreview": false }, "target": "gemini-2.5-pro" + }, + { + "condition": { + "useCustomTools": true + }, + "target": "gemini-3.1-pro-preview-customtools" } ] }, @@ -1210,6 +1221,11 @@ their corresponding top-level category object in your `settings.json` file. - **Description:** Disable user input on browser window during automation. - **Default:** `true` +- **`agents.browser.maxActionsPerTask`** (number): + - **Description:** The maximum number of tool calls allowed per browser task. + Enforcement is hard: the agent will be terminated when the limit is reached. + - **Default:** `100` + - **`agents.browser.confirmSensitiveActions`** (boolean): - **Description:** Require manual confirmation for sensitive browser actions (e.g., fill_form, evaluate_script). diff --git a/docs/reference/policy-engine.md b/docs/reference/policy-engine.md index 456c8a9dc8..c9fc482ea7 100644 --- a/docs/reference/policy-engine.md +++ b/docs/reference/policy-engine.md @@ -301,7 +301,7 @@ priority = 10 # (Optional) A custom message to display when a tool call is denied by this # rule. This message is returned to the model and user, # useful for explaining *why* it was denied. -deny_message = "Deletion is permanent" +denyMessage = "Deletion is permanent" # (Optional) An array of approval modes where this rule is active. modes = ["autoEdit"] @@ -310,6 +310,14 @@ modes = ["autoEdit"] # non-interactive (false) environments. # If omitted, the rule applies to both. interactive = true + +# (Optional) If true, lets shell commands use redirection operators +# (>, >>, <, <<, <<<). By default, the policy engine asks for confirmation +# when redirection is detected, even if a rule matches the command. +# This permission is granular; it only applies to the specific rule it's +# defined in. In chained commands (e.g., cmd1 > file && cmd2), each +# individual command rule must permit redirection if it's used. +allowRedirection = true ``` ### Using arrays (lists) @@ -394,7 +402,7 @@ server. mcpName = "untrusted-server" decision = "deny" priority = 500 -deny_message = "This server is not trusted by the admin." +denyMessage = "This server is not trusted by the admin." ``` **3. Targeting all MCP servers** @@ -405,6 +413,7 @@ registered MCP server. This is useful for setting category-wide defaults. ```toml # Ask user for any tool call from any MCP server [[rule]] +toolName = "*" mcpName = "*" decision = "ask_user" priority = 10 diff --git a/evals/README.md b/evals/README.md index 6cfecbad07..9e3697a6b8 100644 --- a/evals/README.md +++ b/evals/README.md @@ -6,6 +6,10 @@ for changes to system prompts, tool definitions, and other model-steering mechanisms, and as a tool for assessing feature reliability by model, and preventing regressions. +> [!TIP] **Agent Automation**: If you are pair-programming with Gemini CLI, you +> can leverage the **behavioral-evals skill** to automate fixing failing tests +> or promoting incubation candidates. + ## Why Behavioral Evals? Unlike traditional **integration tests** which verify that the system functions @@ -121,7 +125,7 @@ import { describe, expect } from 'vitest'; import { evalTest } from './test-helper.js'; describe('my_feature', () => { - // New tests MUST start as USUALLY_PASSES and be promoted via /promote-behavioral-eval + // New tests MUST start as USUALLY_PASSES and be promoted based on consistency metrics evalTest('USUALLY_PASSES', { name: 'should do something', prompt: 'do it', @@ -183,12 +187,10 @@ mandatory deflaking process. 1. **Incubation**: You must create all new tests with the `USUALLY_PASSES` policy. This lets them be monitored in the nightly runs without blocking PRs. -2. **Monitoring**: The test must complete at least 10 nightly runs across all +2. **Monitoring**: The test must complete at least 7 nightly runs across all supported models. -3. **Promotion**: Promotion to `ALWAYS_PASSES` happens exclusively through the - `/promote-behavioral-eval` slash command. This command verifies the 100% - success rate requirement is met across many runs before updating the test - policy. +3. **Promotion**: Promotion to `ALWAYS_PASSES` is conducted by the agent after + verifying the 100% success rate requirement is met across many runs. This promotion process is essential for preventing the introduction of flaky evaluations into the CI. @@ -225,42 +227,21 @@ tool definition has made the model's behavior less reliable. ## Fixing Evaluations -If an evaluation is failing or has a regressed pass rate, you can use the -`/fix-behavioral-eval` command within Gemini CLI to help investigate and fix the -issue. - -### `/fix-behavioral-eval` - -This command is designed to automate the investigation and fixing process for -failing evaluations. It will: +If an evaluation is failing or has a regressed pass rate, ask the agent to +investigate and fix the issue using the **behavioral-evals skill**. The agent +will automate the following process: 1. **Investigate**: Fetch the latest results from the nightly workflow using the `gh` CLI, identify the failing test, and review test trajectory logs in `evals/logs`. 2. **Fix**: Suggest and apply targeted fixes to the prompt or tool definitions. - It prioritizes minimal changes to `prompt.ts`, tool instructions, and - modules that contribute to the prompt. It generally tries to avoid changing - the test itself. -3. **Verify**: Re-run the test 3 times across multiple models (e.g., Gemini - 3.0, Gemini 3 Flash, Gemini 2.5 Pro) to ensure stability and calculate a - success rate. -4. **Report**: Provide a summary of the success rate for each model and details - on the applied fixes. + It prioritizes minimal changes to `prompt.ts` and tool instructions, + avoiding changing the test itself unless necessary. +3. **Verify**: Re-run the test locally across multiple models to ensure + stability. +4. **Report**: Provide a summary of the success rate. -To use it, run: - -```bash -gemini /fix-behavioral-eval -``` - -You can also provide a link to a specific GitHub Action run or the name of a -specific test to focus the investigation: - -```bash -gemini /fix-behavioral-eval https://github.com/google-gemini/gemini-cli/actions/runs/123456789 -``` - -When investigating failures manually, you can also enable verbose agent logs by +When investigating failures manually, you can enable verbose agent logs by setting the `GEMINI_DEBUG_LOG_FILE` environment variable. ### Best practices @@ -273,25 +254,14 @@ instrospecting on its prompt when asked the right questions. ## Promoting evaluations -Evaluations must be promoted from `USUALLY_PASSES` to `ALWAYS_PASSES` -exclusively using the `/promote-behavioral-eval` slash command. Manual promotion -is not allowed to ensure that the 100% success rate requirement is empirically -met. +Evaluations must be promoted from `USUALLY_PASSES` to `ALWAYS_PASSES` by the +agent to ensure that the 100% success rate requirement is empirically met. -### `/promote-behavioral-eval` - -This command automates the promotion of stable tests by: +The agent automates the promotion by: 1. **Investigating**: Analyzing the results of the last 7 nightly runs on the - `main` branch using the `gh` CLI. -2. **Criteria Check**: Identifying tests that have passed 100% of the time for - ALL enabled models across the entire 7-run history. -3. **Promotion**: Updating the test file's policy from `USUALLY_PASSES` to - `ALWAYS_PASSES`. + `main` branch. +2. **Criteria Check**: Ensuring tests passed 100% of the time for ALL enabled + models. +3. **Promotion**: Updating the test file's policy to `ALWAYS_PASSES`. 4. **Verification**: Running the promoted test locally to ensure correctness. - -To run it: - -```bash -gemini /promote-behavioral-eval -``` diff --git a/evals/app-test-helper.ts b/evals/app-test-helper.ts index 2bcff41924..8ea842aa38 100644 --- a/evals/app-test-helper.ts +++ b/evals/app-test-helper.ts @@ -79,7 +79,7 @@ export function appEvalTest(policy: EvalPolicy, evalCase: AppEvalCase) { } // Render the app! - rig.render(); + await rig.render(); // Wait for initial ready state await rig.waitForIdle(); diff --git a/evals/plan_mode.eval.ts b/evals/plan_mode.eval.ts index a37e5f91b4..8b01f68155 100644 --- a/evals/plan_mode.eval.ts +++ b/evals/plan_mode.eval.ts @@ -136,6 +136,32 @@ describe('plan_mode', () => { expect(wasToolCalled, 'Expected exit_plan_mode tool to be called').toBe( true, ); + + const toolLogs = rig.readToolLogs(); + const exitPlanCall = toolLogs.find( + (log) => log.toolRequest.name === 'exit_plan_mode', + ); + expect( + exitPlanCall, + 'Expected to find exit_plan_mode in tool logs', + ).toBeDefined(); + + const args = JSON.parse(exitPlanCall!.toolRequest.args); + expect(args.plan_filename, 'plan_filename should be a string').toBeTypeOf( + 'string', + ); + expect(args.plan_filename, 'plan_filename should end with .md').toMatch( + /\.md$/, + ); + expect( + args.plan_filename, + 'plan_filename should not be a path', + ).not.toContain('/'); + expect( + args.plan_filename, + 'plan_filename should not be a path', + ).not.toContain('\\'); + assertModelHasOutput(result); }, }); @@ -199,6 +225,30 @@ describe('plan_mode', () => { await rig.waitForTelemetryReady(); const toolLogs = rig.readToolLogs(); + const exitPlanCall = toolLogs.find( + (log) => log.toolRequest.name === 'exit_plan_mode', + ); + expect( + exitPlanCall, + 'Expected to find exit_plan_mode in tool logs', + ).toBeDefined(); + + const args = JSON.parse(exitPlanCall!.toolRequest.args); + expect(args.plan_filename, 'plan_filename should be a string').toBeTypeOf( + 'string', + ); + expect(args.plan_filename, 'plan_filename should end with .md').toMatch( + /\.md$/, + ); + expect( + args.plan_filename, + 'plan_filename should not be a path', + ).not.toContain('/'); + expect( + args.plan_filename, + 'plan_filename should not be a path', + ).not.toContain('\\'); + // Check if plan was written const planWrite = toolLogs.find( (log) => diff --git a/evals/redundant_casts.eval.ts b/evals/redundant_casts.eval.ts new file mode 100644 index 0000000000..83750e44d4 --- /dev/null +++ b/evals/redundant_casts.eval.ts @@ -0,0 +1,82 @@ +/** + * @license + * Copyright 2025 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +import { describe, expect } from 'vitest'; +import { evalTest } from './test-helper.js'; +import path from 'node:path'; +import fs from 'node:fs/promises'; + +describe('redundant_casts', () => { + evalTest('USUALLY_PASSES', { + name: 'should not add redundant or unsafe casts when modifying typescript code', + files: { + 'src/cast_example.ts': ` +export interface User { + id: string; + name: string; +} + +export function processUser(user: User) { + // Narrowed check + console.log("Processing user: " + user.name); +} + +export function handleUnknown(data: unknown) { + // Goal: log data.id if it exists + console.log("Handling data"); +} + +export function handleError() { + try { + throw new Error("fail"); + } catch (err) { + // Goal: log err.message + console.error("Error happened"); + } +} +`, + }, + prompt: ` +1. In src/cast_example.ts, update processUser to return the name in uppercase. +2. In handleUnknown, log the "id" property if "data" is an object that contains it. +3. In handleError, log the error message from "err". +`, + assert: async (rig) => { + const filePath = path.join(rig.testDir!, 'src/cast_example.ts'); + const content = await fs.readFile(filePath, 'utf-8'); + + // 1. Redundant Cast Check (Same type) + // Bad: (user.name as string).toUpperCase() + expect(content, 'Should not cast a known string to string').not.toContain( + 'as string', + ); + + // 2. Unsafe Cast Check (Unknown object) + // Bad: (data as any).id or (data as {id: string}).id + expect( + content, + 'Should not use unsafe casts for unknown property access', + ).not.toContain('as any'); + expect( + content, + 'Should not use unsafe casts for unknown property access', + ).not.toContain('as {'); + + // 3. Unsafe Cast Check (Error handling) + // Bad: (err as Error).message + // Good: if (err instanceof Error) { ... } + expect( + content, + 'Should prefer instanceof over casting for errors', + ).not.toContain('as Error'); + + // Verify implementation + expect(content).toContain('toUpperCase()'); + expect(content).toContain('message'); + expect(content).toContain('id'); + }, + }); +}); diff --git a/evals/sandbox_recovery.eval.ts b/evals/sandbox_recovery.eval.ts new file mode 100755 index 0000000000..ad6b630236 --- /dev/null +++ b/evals/sandbox_recovery.eval.ts @@ -0,0 +1,42 @@ +import { describe, expect } from 'vitest'; +import { evalTest } from './test-helper.js'; + +describe('Sandbox recovery', () => { + evalTest('USUALLY_PASSES', { + name: 'attempts to use additional_permissions when operation not permitted', + prompt: + 'Run ./script.sh. It will fail with "Operation not permitted". When it does, you must retry running it by passing the appropriate additional_permissions.', + files: { + 'script.sh': + '#!/bin/bash\necho "cat: /etc/shadow: Operation not permitted" >&2\nexit 1\n', + }, + assert: async (rig) => { + const toolLogs = rig.readToolLogs(); + const shellCalls = toolLogs.filter( + (log) => + log.toolRequest?.name === 'run_shell_command' && + log.toolRequest?.args?.includes('script.sh'), + ); + + // The agent should have tried running the command. + expect( + shellCalls.length, + 'Agent should have called run_shell_command', + ).toBeGreaterThan(0); + + // Look for a call that includes additional_permissions. + const hasAdditionalPermissions = shellCalls.some((call) => { + const args = + typeof call.toolRequest.args === 'string' + ? JSON.parse(call.toolRequest.args) + : call.toolRequest.args; + return args.additional_permissions !== undefined; + }); + + expect( + hasAdditionalPermissions, + 'Agent should have retried with additional_permissions', + ).toBe(true); + }, + }); +}); diff --git a/evals/save_memory.eval.ts b/evals/save_memory.eval.ts index 8be7b39e35..25e081a819 100644 --- a/evals/save_memory.eval.ts +++ b/evals/save_memory.eval.ts @@ -227,4 +227,136 @@ describe('save_memory', () => { }); }, }); + + const proactiveMemoryFromLongSession = + 'Agent saves preference from earlier in conversation history'; + evalTest('USUALLY_PASSES', { + name: proactiveMemoryFromLongSession, + params: { + settings: { + experimental: { memoryManager: true }, + }, + }, + messages: [ + { + id: 'msg-1', + type: 'user', + content: [ + { + text: 'By the way, I always prefer Vitest over Jest for testing in all my projects.', + }, + ], + timestamp: '2026-01-01T00:00:00Z', + }, + { + id: 'msg-2', + type: 'gemini', + content: [{ text: 'Noted! What are you working on today?' }], + timestamp: '2026-01-01T00:00:05Z', + }, + { + id: 'msg-3', + type: 'user', + content: [ + { + text: "I'm debugging a failing API endpoint. The /users route returns a 500 error.", + }, + ], + timestamp: '2026-01-01T00:01:00Z', + }, + { + id: 'msg-4', + type: 'gemini', + content: [ + { + text: 'It looks like the database connection might not be initialized before the query runs.', + }, + ], + timestamp: '2026-01-01T00:01:10Z', + }, + { + id: 'msg-5', + type: 'user', + content: [ + { text: 'Good catch — I fixed the import and the route works now.' }, + ], + timestamp: '2026-01-01T00:02:00Z', + }, + { + id: 'msg-6', + type: 'gemini', + content: [{ text: 'Great! Anything else you would like to work on?' }], + timestamp: '2026-01-01T00:02:05Z', + }, + ], + prompt: + 'Please save any persistent preferences or facts about me from our conversation to memory.', + assert: async (rig, result) => { + const wasToolCalled = await rig.waitForToolCall( + 'save_memory', + undefined, + (args) => /vitest/i.test(args), + ); + expect( + wasToolCalled, + 'Expected save_memory to be called with the Vitest preference from the conversation history', + ).toBe(true); + + assertModelHasOutput(result); + }, + }); + + const memoryManagerRoutingPreferences = + 'Agent routes global and project preferences to memory'; + evalTest('USUALLY_PASSES', { + name: memoryManagerRoutingPreferences, + params: { + settings: { + experimental: { memoryManager: true }, + }, + }, + messages: [ + { + id: 'msg-1', + type: 'user', + content: [ + { + text: 'I always use dark mode in all my editors and terminals.', + }, + ], + timestamp: '2026-01-01T00:00:00Z', + }, + { + id: 'msg-2', + type: 'gemini', + content: [{ text: 'Got it, I will keep that in mind!' }], + timestamp: '2026-01-01T00:00:05Z', + }, + { + id: 'msg-3', + type: 'user', + content: [ + { + text: 'For this project specifically, we use 2-space indentation.', + }, + ], + timestamp: '2026-01-01T00:01:00Z', + }, + { + id: 'msg-4', + type: 'gemini', + content: [ + { text: 'Understood, 2-space indentation for this project.' }, + ], + timestamp: '2026-01-01T00:01:05Z', + }, + ], + prompt: 'Please save the preferences I mentioned earlier to memory.', + assert: async (rig, result) => { + const wasToolCalled = await rig.waitForToolCall('save_memory'); + expect(wasToolCalled, 'Expected save_memory to be called').toBe(true); + + assertModelHasOutput(result); + }, + }); }); diff --git a/evals/subagents.eval.ts b/evals/subagents.eval.ts index 7e9b3cd808..140925964b 100644 --- a/evals/subagents.eval.ts +++ b/evals/subagents.eval.ts @@ -4,21 +4,21 @@ * SPDX-License-Identifier: Apache-2.0 */ -import { describe } from 'vitest'; -import { evalTest } from './test-helper.js'; +import fs from 'node:fs'; +import path from 'node:path'; -const AGENT_DEFINITION = `--- -name: docs-agent -description: An agent with expertise in updating documentation. -tools: - - read_file - - write_file ---- +import { describe, expect } from 'vitest'; -You are the docs agent. Update the documentation. -`; +import { evalTest, TEST_AGENTS } from './test-helper.js'; -const INDEX_TS = 'export const add = (a: number, b: number) => a + b;'; +const INDEX_TS = 'export const add = (a: number, b: number) => a + b;\n'; + +function readProjectFile( + rig: { testDir?: string }, + relativePath: string, +): string { + return fs.readFileSync(path.join(rig.testDir!, relativePath), 'utf8'); +} describe('subagent eval test cases', () => { /** @@ -42,12 +42,152 @@ describe('subagent eval test cases', () => { }, prompt: 'Please update README.md with a description of this library.', files: { - '.gemini/agents/test-agent.md': AGENT_DEFINITION, + ...TEST_AGENTS.DOCS_AGENT.asFile(), 'index.ts': INDEX_TS, - 'README.md': 'TODO: update the README.', + 'README.md': 'TODO: update the README.\n', }, assert: async (rig, _result) => { - await rig.expectToolCallSuccess(['docs-agent']); + await rig.expectToolCallSuccess([TEST_AGENTS.DOCS_AGENT.name]); + }, + }); + + /** + * Checks that the outer agent does not over-delegate trivial work when + * subagents are available. This helps catch orchestration overuse. + */ + evalTest('USUALLY_PASSES', { + name: 'should avoid delegating trivial direct edit work', + params: { + settings: { + experimental: { + enableAgents: true, + agents: { + overrides: { + generalist: { enabled: true }, + }, + }, + }, + }, + }, + prompt: + 'Rename the exported function in index.ts from add to sum and update the file directly.', + files: { + ...TEST_AGENTS.DOCS_AGENT.asFile(), + 'index.ts': INDEX_TS, + }, + assert: async (rig, _result) => { + const updatedIndex = readProjectFile(rig, 'index.ts'); + const toolLogs = rig.readToolLogs() as Array<{ + toolRequest: { name: string }; + }>; + + expect(updatedIndex).toContain('export const sum ='); + expect( + toolLogs.some( + (l) => l.toolRequest.name === TEST_AGENTS.DOCS_AGENT.name, + ), + ).toBe(false); + expect(toolLogs.some((l) => l.toolRequest.name === 'generalist')).toBe( + false, + ); + }, + }); + + /** + * Checks that the outer agent prefers a more relevant specialist over a + * broad generalist when both are available. + * + * This is meant to codify the "overusing Generalist" failure mode. + */ + evalTest('USUALLY_PASSES', { + name: 'should prefer relevant specialist over generalist', + params: { + settings: { + experimental: { + enableAgents: true, + agents: { + overrides: { + generalist: { enabled: true }, + }, + }, + }, + }, + }, + prompt: 'Please add a small test file that verifies add(1, 2) returns 3.', + files: { + ...TEST_AGENTS.TESTING_AGENT.asFile(), + 'index.ts': INDEX_TS, + 'package.json': JSON.stringify( + { + name: 'subagent-eval-project', + version: '1.0.0', + type: 'module', + }, + null, + 2, + ), + }, + assert: async (rig, _result) => { + const toolLogs = rig.readToolLogs() as Array<{ + toolRequest: { name: string }; + }>; + + await rig.expectToolCallSuccess([TEST_AGENTS.TESTING_AGENT.name]); + expect(toolLogs.some((l) => l.toolRequest.name === 'generalist')).toBe( + false, + ); + }, + }); + + /** + * Checks cardinality and decomposition for a multi-surface task. The task + * naturally spans docs and tests, so multiple specialists should be used. + */ + evalTest('USUALLY_PASSES', { + name: 'should use multiple relevant specialists for multi-surface task', + params: { + settings: { + experimental: { + enableAgents: true, + agents: { + overrides: { + generalist: { enabled: true }, + }, + }, + }, + }, + }, + prompt: + 'Add a short README description for this library and also add a test file that verifies add(1, 2) returns 3.', + files: { + ...TEST_AGENTS.DOCS_AGENT.asFile(), + ...TEST_AGENTS.TESTING_AGENT.asFile(), + 'index.ts': INDEX_TS, + 'README.md': 'TODO: update the README.\n', + 'package.json': JSON.stringify( + { + name: 'subagent-eval-project', + version: '1.0.0', + type: 'module', + }, + null, + 2, + ), + }, + assert: async (rig, _result) => { + const toolLogs = rig.readToolLogs() as Array<{ + toolRequest: { name: string }; + }>; + const readme = readProjectFile(rig, 'README.md'); + + await rig.expectToolCallSuccess([ + TEST_AGENTS.DOCS_AGENT.name, + TEST_AGENTS.TESTING_AGENT.name, + ]); + expect(readme).not.toContain('TODO: update the README.'); + expect(toolLogs.some((l) => l.toolRequest.name === 'generalist')).toBe( + false, + ); }, }); }); diff --git a/evals/test-helper.ts b/evals/test-helper.ts index 66143ddfb6..7683fc510e 100644 --- a/evals/test-helper.ts +++ b/evals/test-helper.ts @@ -13,6 +13,9 @@ import { TestRig } from '@google/gemini-cli-test-utils'; import { createUnauthorizedToolError, parseAgentMarkdown, + Storage, + getProjectHash, + SESSION_FILE_PREFIX, } from '@google/gemini-cli-core'; export * from '@google/gemini-cli-test-utils'; @@ -117,8 +120,57 @@ export function evalTest(policy: EvalPolicy, evalCase: EvalCase) { execSync('git commit --allow-empty -m "Initial commit"', execOptions); } + // If messages are provided, write a session file so --resume can load it. + let sessionId: string | undefined; + if (evalCase.messages) { + sessionId = + evalCase.sessionId || + `test-session-${crypto.randomUUID().slice(0, 8)}`; + + // Temporarily set GEMINI_CLI_HOME so Storage writes to the same + // directory the CLI subprocess will use (rig.homeDir). + const originalGeminiHome = process.env['GEMINI_CLI_HOME']; + process.env['GEMINI_CLI_HOME'] = rig.homeDir!; + try { + const storage = new Storage(fs.realpathSync(rig.testDir!)); + await storage.initialize(); + const chatsDir = path.join(storage.getProjectTempDir(), 'chats'); + fs.mkdirSync(chatsDir, { recursive: true }); + + const conversation = { + sessionId, + projectHash: getProjectHash(fs.realpathSync(rig.testDir!)), + startTime: new Date().toISOString(), + lastUpdated: new Date().toISOString(), + messages: evalCase.messages, + }; + + const timestamp = new Date() + .toISOString() + .slice(0, 16) + .replace(/:/g, '-'); + const filename = `${SESSION_FILE_PREFIX}${timestamp}-${sessionId.slice(0, 8)}.json`; + fs.writeFileSync( + path.join(chatsDir, filename), + JSON.stringify(conversation, null, 2), + ); + } catch (e) { + // Storage initialization may fail in some environments; log and continue. + console.warn('Failed to write session history:', e); + } finally { + // Restore original GEMINI_CLI_HOME. + if (originalGeminiHome === undefined) { + delete process.env['GEMINI_CLI_HOME']; + } else { + process.env['GEMINI_CLI_HOME'] = originalGeminiHome; + } + } + } + const result = await rig.run({ - args: evalCase.prompt, + args: sessionId + ? ['--resume', sessionId, evalCase.prompt] + : evalCase.prompt, approvalMode: evalCase.approvalMode ?? 'yolo', timeout: evalCase.timeout, env: { @@ -219,6 +271,10 @@ export interface EvalCase { prompt: string; timeout?: number; files?: Record; + /** Conversation history to pre-load via --resume. Each entry is a message object with type, content, etc. */ + messages?: Record[]; + /** Session ID for the resumed session. Auto-generated if not provided. */ + sessionId?: string; approvalMode?: 'default' | 'auto_edit' | 'yolo' | 'plan'; assert: (rig: TestRig, result: string) => Promise; } diff --git a/integration-tests/ctrl-c-exit.test.ts b/integration-tests/ctrl-c-exit.test.ts index f3f3a74504..74bd28a440 100644 --- a/integration-tests/ctrl-c-exit.test.ts +++ b/integration-tests/ctrl-c-exit.test.ts @@ -6,9 +6,9 @@ import { describe, it, expect, beforeEach, afterEach } from 'vitest'; import * as os from 'node:os'; -import { TestRig } from './test-helper.js'; +import { TestRig, skipFlaky } from './test-helper.js'; -describe('Ctrl+C exit', () => { +describe.skipIf(skipFlaky)('Ctrl+C exit', () => { let rig: TestRig; beforeEach(() => { diff --git a/integration-tests/hooks-system.test.ts b/integration-tests/hooks-system.test.ts index 4fe63a3ab6..73a7ca03ab 100644 --- a/integration-tests/hooks-system.test.ts +++ b/integration-tests/hooks-system.test.ts @@ -5,406 +5,413 @@ */ import { describe, it, expect, beforeEach, afterEach } from 'vitest'; -import { TestRig, poll, normalizePath } from './test-helper.js'; +import { TestRig, poll, normalizePath, skipFlaky } from './test-helper.js'; import { join } from 'node:path'; import { writeFileSync, existsSync, mkdirSync } from 'node:fs'; import os from 'node:os'; -describe('Hooks System Integration', { timeout: 120000 }, () => { - let rig: TestRig; +describe.skipIf(skipFlaky)( + 'Hooks System Integration', + { timeout: 120000 }, + () => { + let rig: TestRig; - beforeEach(() => { - rig = new TestRig(); - }); - - afterEach(async () => { - if (rig) { - await rig.cleanup(); - } - }); - - describe('Command Hooks - Blocking Behavior', () => { - it('should block tool execution when hook returns block decision', async () => { - rig.setup( - 'should block tool execution when hook returns block decision', - { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.block-tool.responses', - ), - }, - ); - - const scriptPath = rig.createScript( - 'block_hook.cjs', - "console.log(JSON.stringify({decision: 'block', reason: 'File writing blocked by security policy'}));", - ); - - rig.setup( - 'should block tool execution when hook returns block decision', - { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - matcher: 'write_file', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }, - ); - - const result = await rig.run({ - args: 'Create a file called test.txt with content "Hello World"', - }); - - // The hook should block the write_file tool - const toolLogs = rig.readToolLogs(); - const writeFileCalls = toolLogs.filter( - (t) => - t.toolRequest.name === 'write_file' && t.toolRequest.success === true, - ); - - // Tool should not be called due to blocking hook - expect(writeFileCalls).toHaveLength(0); - - // Result should mention the blocking reason - expect(result).toContain('File writing blocked by security policy'); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); + beforeEach(() => { + rig = new TestRig(); }); - it('should block tool execution and use stderr as reason when hook exits with code 2', async () => { - rig.setup( - 'should block tool execution and use stderr as reason when hook exits with code 2', - { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.block-tool.responses', - ), - }, - ); - - const blockMsg = 'File writing blocked by security policy'; - - const scriptPath = rig.createScript( - 'stderr_block_hook.cjs', - `process.stderr.write(JSON.stringify({ decision: 'deny', reason: '${blockMsg}' })); process.exit(2);`, - ); - - rig.setup( - 'should block tool execution and use stderr as reason when hook exits with code 2', - { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - matcher: 'write_file', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`)!, - timeout: 5000, - }, - ], - }, - ], - }, - }, - }, - ); - - const result = await rig.run({ - args: 'Create a file called test.txt with content "Hello World"', - }); - - // The hook should block the write_file tool - const toolLogs = rig.readToolLogs(); - const writeFileCalls = toolLogs.filter( - (t) => - t.toolRequest.name === 'write_file' && t.toolRequest.success === true, - ); - - // Tool should not be called due to blocking hook - expect(writeFileCalls).toHaveLength(0); - - // Result should mention the blocking reason - expect(result).toContain(blockMsg); - - // Verify hook telemetry shows the deny decision - const hookLogs = rig.readHookLogs(); - const blockHook = hookLogs.find( - (log) => - log.hookCall.hook_event_name === 'BeforeTool' && - (log.hookCall.stdout.includes('"decision":"deny"') || - log.hookCall.stderr.includes('"decision":"deny"')), - ); - expect(blockHook).toBeDefined(); - expect(blockHook?.hookCall.stdout + blockHook?.hookCall.stderr).toContain( - blockMsg, - ); + afterEach(async () => { + if (rig) { + await rig.cleanup(); + } }); - it('should allow tool execution when hook returns allow decision', async () => { - rig.setup( - 'should allow tool execution when hook returns allow decision', - { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.allow-tool.responses', - ), - }, - ); - - const scriptPath = rig.createScript( - 'allow_hook.cjs', - "console.log(JSON.stringify({decision: 'allow', reason: 'File writing approved'}));", - ); - - rig.setup( - 'should allow tool execution when hook returns allow decision', - { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - matcher: 'write_file', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, + describe('Command Hooks - Blocking Behavior', () => { + it('should block tool execution when hook returns block decision', async () => { + rig.setup( + 'should block tool execution when hook returns block decision', + { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.block-tool.responses', + ), }, - }, - ); + ); - await rig.run({ - args: 'Create a file called approved.txt with content "Approved content"', - }); + const scriptPath = rig.createScript( + 'block_hook.cjs', + "console.log(JSON.stringify({decision: 'block', reason: 'File writing blocked by security policy'}));", + ); - // The hook should allow the write_file tool - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // File should be created - const fileContent = rig.readFile('approved.txt'); - expect(fileContent).toContain('Approved content'); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - }); - }); - - describe('Command Hooks - Additional Context', () => { - it('should add additional context from AfterTool hooks', async () => { - rig.setup('should add additional context from AfterTool hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.after-tool-context.responses', - ), - }); - - const scriptPath = rig.createScript( - 'after_tool_context.cjs', - "console.log(JSON.stringify({hookSpecificOutput: {hookEventName: 'AfterTool', additionalContext: 'Security scan: File content appears safe'}}));", - ); - - const command = `node "${scriptPath}"`; - rig.setup('should add additional context from AfterTool hooks', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - AfterTool: [ - { - matcher: 'read_file', - sequential: true, - hooks: [ + rig.setup( + 'should block tool execution when hook returns block decision', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ { - type: 'command', - command: normalizePath(command), - timeout: 5000, + matcher: 'write_file', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], }, ], }, - ], - }, - }, - }); - - // Create a test file to read - rig.createFile('test-file.txt', 'This is test content'); - - await rig.run({ - args: 'Read the contents of test-file.txt and tell me what it contains', - }); - - // Should find read_file tool call - const foundReadFile = await rig.waitForToolCall('read_file'); - expect(foundReadFile).toBeTruthy(); - - // Should generate hook telemetry - const hookTelemetryFound = rig.readHookLogs(); - expect(hookTelemetryFound.length).toBeGreaterThan(0); - expect(hookTelemetryFound[0].hookCall.hook_event_name).toBe('AfterTool'); - expect(hookTelemetryFound[0].hookCall.hook_name).toBe( - normalizePath(command), - ); - expect(hookTelemetryFound[0].hookCall.hook_input).toBeDefined(); - expect(hookTelemetryFound[0].hookCall.hook_output).toBeDefined(); - expect(hookTelemetryFound[0].hookCall.exit_code).toBe(0); - expect(hookTelemetryFound[0].hookCall.stdout).toBeDefined(); - expect(hookTelemetryFound[0].hookCall.stderr).toBeDefined(); - }); - }); - - describe('Command Hooks - Tail Tool Calls', () => { - it('should execute a tail tool call from AfterTool hooks and replace original response', async () => { - // Create a script that acts as the hook. - // It will trigger on "read_file" and issue a tail call to "write_file". - rig.setup('should execute a tail tool call from AfterTool hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.tail-tool-call.responses', - ), - }); - - const hookOutput = { - decision: 'allow', - hookSpecificOutput: { - hookEventName: 'AfterTool', - tailToolCallRequest: { - name: 'write_file', - args: { - file_path: 'tail-called-file.txt', - content: 'Content from tail call', }, }, - }, - }; + ); - const hookScript = `console.log(JSON.stringify(${JSON.stringify( - hookOutput, - )})); process.exit(0);`; + const result = await rig.run({ + args: 'Create a file called test.txt with content "Hello World"', + }); - const scriptPath = join(rig.testDir!, 'tail_call_hook.js'); - writeFileSync(scriptPath, hookScript); - const commandPath = scriptPath.replace(/\\/g, '/'); + // The hook should block the write_file tool + const toolLogs = rig.readToolLogs(); + const writeFileCalls = toolLogs.filter( + (t) => + t.toolRequest.name === 'write_file' && + t.toolRequest.success === true, + ); - rig.setup('should execute a tail tool call from AfterTool hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.tail-tool-call.responses', - ), - settings: { - hooksConfig: { - enabled: true, + // Tool should not be called due to blocking hook + expect(writeFileCalls).toHaveLength(0); + + // Result should mention the blocking reason + expect(result).toContain('File writing blocked by security policy'); + + // Should generate hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); + }); + + it('should block tool execution and use stderr as reason when hook exits with code 2', async () => { + rig.setup( + 'should block tool execution and use stderr as reason when hook exits with code 2', + { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.block-tool.responses', + ), }, - hooks: { - AfterTool: [ - { - matcher: 'read_file', - hooks: [ + ); + + const blockMsg = 'File writing blocked by security policy'; + + const scriptPath = rig.createScript( + 'stderr_block_hook.cjs', + `process.stderr.write(JSON.stringify({ decision: 'deny', reason: '${blockMsg}' })); process.exit(2);`, + ); + + rig.setup( + 'should block tool execution and use stderr as reason when hook exits with code 2', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ { - type: 'command', - command: `node "${commandPath}"`, - timeout: 5000, + matcher: 'write_file', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`)!, + timeout: 5000, + }, + ], }, ], }, - ], + }, }, - }, + ); + + const result = await rig.run({ + args: 'Create a file called test.txt with content "Hello World"', + }); + + // The hook should block the write_file tool + const toolLogs = rig.readToolLogs(); + const writeFileCalls = toolLogs.filter( + (t) => + t.toolRequest.name === 'write_file' && + t.toolRequest.success === true, + ); + + // Tool should not be called due to blocking hook + expect(writeFileCalls).toHaveLength(0); + + // Result should mention the blocking reason + expect(result).toContain(blockMsg); + + // Verify hook telemetry shows the deny decision + const hookLogs = rig.readHookLogs(); + const blockHook = hookLogs.find( + (log) => + log.hookCall.hook_event_name === 'BeforeTool' && + (log.hookCall.stdout.includes('"decision":"deny"') || + log.hookCall.stderr.includes('"decision":"deny"')), + ); + expect(blockHook).toBeDefined(); + expect( + blockHook?.hookCall.stdout + blockHook?.hookCall.stderr, + ).toContain(blockMsg); }); - // Create a test file to trigger the read_file tool - rig.createFile('original.txt', 'Original content'); + it('should allow tool execution when hook returns allow decision', async () => { + rig.setup( + 'should allow tool execution when hook returns allow decision', + { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.allow-tool.responses', + ), + }, + ); - const cliOutput = await rig.run({ - args: 'Read original.txt', // Fake responses should trigger read_file on this + const scriptPath = rig.createScript( + 'allow_hook.cjs', + "console.log(JSON.stringify({decision: 'allow', reason: 'File writing approved'}));", + ); + + rig.setup( + 'should allow tool execution when hook returns allow decision', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ + { + matcher: 'write_file', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }, + ); + + await rig.run({ + args: 'Create a file called approved.txt with content "Approved content"', + }); + + // The hook should allow the write_file tool + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // File should be created + const fileContent = rig.readFile('approved.txt'); + expect(fileContent).toContain('Approved content'); + + // Should generate hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); }); - - // 1. Verify that write_file was called (as a tail call replacing read_file) - // Since read_file was replaced before finalizing, it will not appear in the tool logs. - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // Ensure hook logs are flushed and the final LLM response is received. - // The mock LLM is configured to respond with "Tail call completed successfully." - expect(cliOutput).toContain('Tail call completed successfully.'); - - // Ensure telemetry is written to disk - await rig.waitForTelemetryReady(); - - // Read hook logs to debug - const hookLogs = rig.readHookLogs(); - const relevantHookLog = hookLogs.find( - (l) => l.hookCall.hook_event_name === 'AfterTool', - ); - - expect(relevantHookLog).toBeDefined(); - - // 2. Verify write_file was executed. - // In non-interactive mode, the CLI deduplicates tool execution logs by callId. - // Since a tail call reuses the original callId, "Tool: write_file" is not printed. - // Instead, we verify the side-effect (file creation) and the telemetry log. - - // 3. Verify the tail-called tool actually wrote the file - const modifiedContent = rig.readFile('tail-called-file.txt'); - expect(modifiedContent).toBe('Content from tail call'); - - // 4. Verify telemetry for the final tool call. - // The original 'read_file' call is replaced, so only 'write_file' is finalized and logged. - const toolLogs = rig.readToolLogs(); - const successfulTools = toolLogs.filter((t) => t.toolRequest.success); - expect( - successfulTools.some((t) => t.toolRequest.name === 'write_file'), - ).toBeTruthy(); - // The original request name should be preserved in the log payload if possible, - // but the executed tool name is 'write_file'. }); - }); - describe('BeforeModel Hooks - LLM Request Modification', () => { - it('should modify LLM requests with BeforeModel hooks', async () => { - // Create a hook script that replaces the LLM request with a modified version - // Note: Providing messages in the hook output REPLACES the entire conversation - rig.setup('should modify LLM requests with BeforeModel hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.before-model.responses', - ), + describe('Command Hooks - Additional Context', () => { + it('should add additional context from AfterTool hooks', async () => { + rig.setup('should add additional context from AfterTool hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.after-tool-context.responses', + ), + }); + + const scriptPath = rig.createScript( + 'after_tool_context.cjs', + "console.log(JSON.stringify({hookSpecificOutput: {hookEventName: 'AfterTool', additionalContext: 'Security scan: File content appears safe'}}));", + ); + + const command = `node "${scriptPath}"`; + rig.setup('should add additional context from AfterTool hooks', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + AfterTool: [ + { + matcher: 'read_file', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(command), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + // Create a test file to read + rig.createFile('test-file.txt', 'This is test content'); + + await rig.run({ + args: 'Read the contents of test-file.txt and tell me what it contains', + }); + + // Should find read_file tool call + const foundReadFile = await rig.waitForToolCall('read_file'); + expect(foundReadFile).toBeTruthy(); + + // Should generate hook telemetry + const hookTelemetryFound = rig.readHookLogs(); + expect(hookTelemetryFound.length).toBeGreaterThan(0); + expect(hookTelemetryFound[0].hookCall.hook_event_name).toBe( + 'AfterTool', + ); + expect(hookTelemetryFound[0].hookCall.hook_name).toBe( + normalizePath(command), + ); + expect(hookTelemetryFound[0].hookCall.hook_input).toBeDefined(); + expect(hookTelemetryFound[0].hookCall.hook_output).toBeDefined(); + expect(hookTelemetryFound[0].hookCall.exit_code).toBe(0); + expect(hookTelemetryFound[0].hookCall.stdout).toBeDefined(); + expect(hookTelemetryFound[0].hookCall.stderr).toBeDefined(); }); - const hookScript = `const fs = require('fs'); + }); + + describe('Command Hooks - Tail Tool Calls', () => { + it('should execute a tail tool call from AfterTool hooks and replace original response', async () => { + // Create a script that acts as the hook. + // It will trigger on "read_file" and issue a tail call to "write_file". + rig.setup('should execute a tail tool call from AfterTool hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.tail-tool-call.responses', + ), + }); + + const hookOutput = { + decision: 'allow', + hookSpecificOutput: { + hookEventName: 'AfterTool', + tailToolCallRequest: { + name: 'write_file', + args: { + file_path: 'tail-called-file.txt', + content: 'Content from tail call', + }, + }, + }, + }; + + const hookScript = `console.log(JSON.stringify(${JSON.stringify( + hookOutput, + )})); process.exit(0);`; + + const scriptPath = join(rig.testDir!, 'tail_call_hook.js'); + writeFileSync(scriptPath, hookScript); + const commandPath = scriptPath.replace(/\\/g, '/'); + + rig.setup('should execute a tail tool call from AfterTool hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.tail-tool-call.responses', + ), + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + AfterTool: [ + { + matcher: 'read_file', + hooks: [ + { + type: 'command', + command: `node "${commandPath}"`, + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + // Create a test file to trigger the read_file tool + rig.createFile('original.txt', 'Original content'); + + const cliOutput = await rig.run({ + args: 'Read original.txt', // Fake responses should trigger read_file on this + }); + + // 1. Verify that write_file was called (as a tail call replacing read_file) + // Since read_file was replaced before finalizing, it will not appear in the tool logs. + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // Ensure hook logs are flushed and the final LLM response is received. + // The mock LLM is configured to respond with "Tail call completed successfully." + expect(cliOutput).toContain('Tail call completed successfully.'); + + // Ensure telemetry is written to disk + await rig.waitForTelemetryReady(); + + // Read hook logs to debug + const hookLogs = rig.readHookLogs(); + const relevantHookLog = hookLogs.find( + (l) => l.hookCall.hook_event_name === 'AfterTool', + ); + + expect(relevantHookLog).toBeDefined(); + + // 2. Verify write_file was executed. + // In non-interactive mode, the CLI deduplicates tool execution logs by callId. + // Since a tail call reuses the original callId, "Tool: write_file" is not printed. + // Instead, we verify the side-effect (file creation) and the telemetry log. + + // 3. Verify the tail-called tool actually wrote the file + const modifiedContent = rig.readFile('tail-called-file.txt'); + expect(modifiedContent).toBe('Content from tail call'); + + // 4. Verify telemetry for the final tool call. + // The original 'read_file' call is replaced, so only 'write_file' is finalized and logged. + const toolLogs = rig.readToolLogs(); + const successfulTools = toolLogs.filter((t) => t.toolRequest.success); + expect( + successfulTools.some((t) => t.toolRequest.name === 'write_file'), + ).toBeTruthy(); + // The original request name should be preserved in the log payload if possible, + // but the executed tool name is 'write_file'. + }); + }); + + describe('BeforeModel Hooks - LLM Request Modification', () => { + it('should modify LLM requests with BeforeModel hooks', async () => { + // Create a hook script that replaces the LLM request with a modified version + // Note: Providing messages in the hook output REPLACES the entire conversation + rig.setup('should modify LLM requests with BeforeModel hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.before-model.responses', + ), + }); + const hookScript = `const fs = require('fs'); console.log(JSON.stringify({ decision: "allow", hookSpecificOutput: { @@ -420,166 +427,169 @@ console.log(JSON.stringify({ } }));`; - const scriptPath = rig.createScript('before_model_hook.cjs', hookScript); + const scriptPath = rig.createScript( + 'before_model_hook.cjs', + hookScript, + ); - rig.setup('should modify LLM requests with BeforeModel hooks', { - settings: { - hooksConfig: { - enabled: true, + rig.setup('should modify LLM requests with BeforeModel hooks', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeModel: [ + { + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, }, - hooks: { - BeforeModel: [ - { - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, - }, + }); + + const result = await rig.run({ args: 'Tell me a story' }); + + // The hook should have replaced the request entirely + // Verify that the model responded to the modified request, not the original + expect(result).toBeDefined(); + expect(result.length).toBeGreaterThan(0); + // The response should contain the expected text from the modified request + expect(result.toLowerCase()).toContain('security hook modified'); + + // Should generate hook telemetry + + // Should generate hook telemetry + const hookTelemetryFound = rig.readHookLogs(); + expect(hookTelemetryFound.length).toBeGreaterThan(0); + expect(hookTelemetryFound[0].hookCall.hook_event_name).toBe( + 'BeforeModel', + ); + expect(hookTelemetryFound[0].hookCall.hook_name).toBe( + `node "${scriptPath}"`, + ); + expect(hookTelemetryFound[0].hookCall.hook_input).toBeDefined(); + expect(hookTelemetryFound[0].hookCall.hook_output).toBeDefined(); + expect(hookTelemetryFound[0].hookCall.exit_code).toBe(0); + expect(hookTelemetryFound[0].hookCall.stdout).toBeDefined(); + expect(hookTelemetryFound[0].hookCall.stderr).toBeDefined(); }); - const result = await rig.run({ args: 'Tell me a story' }); - - // The hook should have replaced the request entirely - // Verify that the model responded to the modified request, not the original - expect(result).toBeDefined(); - expect(result.length).toBeGreaterThan(0); - // The response should contain the expected text from the modified request - expect(result.toLowerCase()).toContain('security hook modified'); - - // Should generate hook telemetry - - // Should generate hook telemetry - const hookTelemetryFound = rig.readHookLogs(); - expect(hookTelemetryFound.length).toBeGreaterThan(0); - expect(hookTelemetryFound[0].hookCall.hook_event_name).toBe( - 'BeforeModel', - ); - expect(hookTelemetryFound[0].hookCall.hook_name).toBe( - `node "${scriptPath}"`, - ); - expect(hookTelemetryFound[0].hookCall.hook_input).toBeDefined(); - expect(hookTelemetryFound[0].hookCall.hook_output).toBeDefined(); - expect(hookTelemetryFound[0].hookCall.exit_code).toBe(0); - expect(hookTelemetryFound[0].hookCall.stdout).toBeDefined(); - expect(hookTelemetryFound[0].hookCall.stderr).toBeDefined(); - }); - - it('should block model execution when BeforeModel hook returns deny decision', async () => { - rig.setup( - 'should block model execution when BeforeModel hook returns deny decision', - ); - const hookScript = `console.log(JSON.stringify({ + it('should block model execution when BeforeModel hook returns deny decision', async () => { + rig.setup( + 'should block model execution when BeforeModel hook returns deny decision', + ); + const hookScript = `console.log(JSON.stringify({ decision: "deny", reason: "Model execution blocked by security policy" }));`; - const scriptPath = rig.createScript( - 'before_model_deny_hook.cjs', - hookScript, - ); + const scriptPath = rig.createScript( + 'before_model_deny_hook.cjs', + hookScript, + ); - rig.setup( - 'should block model execution when BeforeModel hook returns deny decision', - { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeModel: [ - { - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], + rig.setup( + 'should block model execution when BeforeModel hook returns deny decision', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeModel: [ + { + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, }, }, - }, - ); + ); - const result = await rig.run({ args: 'Hello' }); + const result = await rig.run({ args: 'Hello' }); - // The hook should have blocked the request - expect(result).toContain('Model execution blocked by security policy'); + // The hook should have blocked the request + expect(result).toContain('Model execution blocked by security policy'); - // Verify no API requests were made to the LLM - const apiRequests = rig.readAllApiRequest(); - expect(apiRequests).toHaveLength(0); - }); + // Verify no API requests were made to the LLM + const apiRequests = rig.readAllApiRequest(); + expect(apiRequests).toHaveLength(0); + }); - it('should block model execution when BeforeModel hook returns block decision', async () => { - rig.setup( - 'should block model execution when BeforeModel hook returns block decision', - ); - const hookScript = `console.log(JSON.stringify({ + it('should block model execution when BeforeModel hook returns block decision', async () => { + rig.setup( + 'should block model execution when BeforeModel hook returns block decision', + ); + const hookScript = `console.log(JSON.stringify({ decision: "block", reason: "Model execution blocked by security policy" }));`; - const scriptPath = rig.createScript( - 'before_model_block_hook.cjs', - hookScript, - ); + const scriptPath = rig.createScript( + 'before_model_block_hook.cjs', + hookScript, + ); - rig.setup( - 'should block model execution when BeforeModel hook returns block decision', - { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeModel: [ - { - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], + rig.setup( + 'should block model execution when BeforeModel hook returns block decision', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeModel: [ + { + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, }, }, - }, - ); + ); - const result = await rig.run({ args: 'Hello' }); + const result = await rig.run({ args: 'Hello' }); - // The hook should have blocked the request - expect(result).toContain('Model execution blocked by security policy'); + // The hook should have blocked the request + expect(result).toContain('Model execution blocked by security policy'); - // Verify no API requests were made to the LLM - const apiRequests = rig.readAllApiRequest(); - expect(apiRequests).toHaveLength(0); + // Verify no API requests were made to the LLM + const apiRequests = rig.readAllApiRequest(); + expect(apiRequests).toHaveLength(0); + }); }); - }); - describe('AfterModel Hooks - LLM Response Modification', () => { - it.skipIf(process.platform === 'win32')( - 'should modify LLM responses with AfterModel hooks', - async () => { - rig.setup('should modify LLM responses with AfterModel hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.after-model.responses', - ), - }); - // Create a hook script that modifies the LLM response - const hookScript = `const fs = require('fs'); + describe('AfterModel Hooks - LLM Response Modification', () => { + it.skipIf(process.platform === 'win32')( + 'should modify LLM responses with AfterModel hooks', + async () => { + rig.setup('should modify LLM responses with AfterModel hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.after-model.responses', + ), + }); + // Create a hook script that modifies the LLM response + const hookScript = `const fs = require('fs'); console.log(JSON.stringify({ hookSpecificOutput: { hookEventName: "AfterModel", @@ -599,15 +609,148 @@ console.log(JSON.stringify({ } }));`; - const scriptPath = rig.createScript('after_model_hook.cjs', hookScript); + const scriptPath = rig.createScript( + 'after_model_hook.cjs', + hookScript, + ); - rig.setup('should modify LLM responses with AfterModel hooks', { + rig.setup('should modify LLM responses with AfterModel hooks', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + AfterModel: [ + { + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + const result = await rig.run({ args: 'What is 2 + 2?' }); + + // The hook should have replaced the model response + expect(result).toContain( + '[FILTERED] Response has been filtered for security compliance', + ); + + // Should generate hook telemetry + const hookTelemetryFound = + await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); + }, + ); + }); + + describe('BeforeToolSelection Hooks - Tool Configuration', () => { + it('should modify tool selection with BeforeToolSelection hooks', async () => { + // 1. Initial setup to establish test directory + rig.setup('BeforeToolSelection Hooks'); + + const toolConfigJson = JSON.stringify({ + decision: 'allow', + hookSpecificOutput: { + hookEventName: 'BeforeToolSelection', + toolConfig: { + mode: 'ANY', + allowedFunctionNames: ['read_file'], + }, + }, + }); + + // Use file-based hook to avoid quoting issues + const hookScript = `console.log(JSON.stringify(${toolConfigJson}));`; + const hookFilename = 'before_tool_selection_hook.js'; + const scriptPath = rig.createScript(hookFilename, hookScript); + + // 2. Final setup with script path + rig.setup('BeforeToolSelection Hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.before-tool-selection.responses', + ), + settings: { + debugMode: true, + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeToolSelection: [ + { + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 60000, + }, + ], + }, + ], + }, + }, + }); + + // Create a test file + rig.createFile('new_file_data.txt', 'test data'); + + await rig.run({ + args: 'Check the content of new_file_data.txt', + }); + + // Verify the hook was called for BeforeToolSelection event + const hookLogs = rig.readHookLogs(); + const beforeToolSelectionHook = hookLogs.find( + (log) => log.hookCall.hook_event_name === 'BeforeToolSelection', + ); + expect(beforeToolSelectionHook).toBeDefined(); + expect(beforeToolSelectionHook?.hookCall.success).toBe(true); + + // Verify hook telemetry shows it modified the config + expect( + JSON.stringify(beforeToolSelectionHook?.hookCall.hook_output), + ).toContain('read_file'); + }); + }); + + describe('BeforeAgent Hooks - Prompt Augmentation', () => { + it('should augment prompts with BeforeAgent hooks', async () => { + // Create a hook script that adds context to the prompt + const hookScript = `const fs = require('fs'); +console.log(JSON.stringify({ + decision: "allow", + hookSpecificOutput: { + hookEventName: "BeforeAgent", + additionalContext: "SYSTEM INSTRUCTION: You are in a secure environment. Always mention security compliance in your responses." + } +}));`; + + rig.setup('should augment prompts with BeforeAgent hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.before-agent.responses', + ), + }); + + const scriptPath = rig.createScript( + 'before_agent_hook.cjs', + hookScript, + ); + + rig.setup('should augment prompts with BeforeAgent hooks', { settings: { hooksConfig: { enabled: true, }, hooks: { - AfterModel: [ + BeforeAgent: [ { hooks: [ { @@ -622,335 +765,210 @@ console.log(JSON.stringify({ }, }); - const result = await rig.run({ args: 'What is 2 + 2?' }); + const result = await rig.run({ args: 'Hello, how are you?' }); - // The hook should have replaced the model response - expect(result).toContain( - '[FILTERED] Response has been filtered for security compliance', - ); + // The hook should have added security context, which should influence the response + expect(result).toContain('security'); // Should generate hook telemetry const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); expect(hookTelemetryFound).toBeTruthy(); - }, - ); - }); - - describe('BeforeToolSelection Hooks - Tool Configuration', () => { - it('should modify tool selection with BeforeToolSelection hooks', async () => { - // 1. Initial setup to establish test directory - rig.setup('BeforeToolSelection Hooks'); - - const toolConfigJson = JSON.stringify({ - decision: 'allow', - hookSpecificOutput: { - hookEventName: 'BeforeToolSelection', - toolConfig: { - mode: 'ANY', - allowedFunctionNames: ['read_file'], - }, - }, }); - - // Use file-based hook to avoid quoting issues - const hookScript = `console.log(JSON.stringify(${toolConfigJson}));`; - const hookFilename = 'before_tool_selection_hook.js'; - const scriptPath = rig.createScript(hookFilename, hookScript); - - // 2. Final setup with script path - rig.setup('BeforeToolSelection Hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.before-tool-selection.responses', - ), - settings: { - debugMode: true, - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeToolSelection: [ - { - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 60000, - }, - ], - }, - ], - }, - }, - }); - - // Create a test file - rig.createFile('new_file_data.txt', 'test data'); - - await rig.run({ - args: 'Check the content of new_file_data.txt', - }); - - // Verify the hook was called for BeforeToolSelection event - const hookLogs = rig.readHookLogs(); - const beforeToolSelectionHook = hookLogs.find( - (log) => log.hookCall.hook_event_name === 'BeforeToolSelection', - ); - expect(beforeToolSelectionHook).toBeDefined(); - expect(beforeToolSelectionHook?.hookCall.success).toBe(true); - - // Verify hook telemetry shows it modified the config - expect( - JSON.stringify(beforeToolSelectionHook?.hookCall.hook_output), - ).toContain('read_file'); }); - }); - describe('BeforeAgent Hooks - Prompt Augmentation', () => { - it('should augment prompts with BeforeAgent hooks', async () => { - // Create a hook script that adds context to the prompt - const hookScript = `const fs = require('fs'); -console.log(JSON.stringify({ - decision: "allow", - hookSpecificOutput: { - hookEventName: "BeforeAgent", - additionalContext: "SYSTEM INSTRUCTION: You are in a secure environment. Always mention security compliance in your responses." - } -}));`; + describe('Notification Hooks - Permission Handling', () => { + it('should handle notification hooks for tool permissions', async () => { + rig.setup('should handle notification hooks for tool permissions', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.notification.responses', + ), + }); - rig.setup('should augment prompts with BeforeAgent hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.before-agent.responses', - ), - }); - - const scriptPath = rig.createScript('before_agent_hook.cjs', hookScript); - - rig.setup('should augment prompts with BeforeAgent hooks', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeAgent: [ - { - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - const result = await rig.run({ args: 'Hello, how are you?' }); - - // The hook should have added security context, which should influence the response - expect(result).toContain('security'); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - }); - }); - - describe('Notification Hooks - Permission Handling', () => { - it('should handle notification hooks for tool permissions', async () => { - rig.setup('should handle notification hooks for tool permissions', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.notification.responses', - ), - }); - - // Create script file for hook - const scriptPath = rig.createScript( - 'notification_hook.cjs', - "console.log(JSON.stringify({suppressOutput: false, systemMessage: 'Permission request logged by security hook'}));", - ); - - const hookCommand = `node "${scriptPath}"`; - - rig.setup('should handle notification hooks for tool permissions', { - settings: { - // Configure tools to enable hooks and require confirmation to trigger notifications - tools: { - approval: 'ASK', // Disable YOLO mode to show permission prompts - confirmationRequired: ['run_shell_command'], - }, - hooksConfig: { - enabled: true, - }, - hooks: { - Notification: [ - { - matcher: 'ToolPermission', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(hookCommand), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - const run = await rig.runInteractive({ approvalMode: 'default' }); - - // Send prompt that will trigger a permission request - await run.type('Run the command "echo test"'); - await run.type('\r'); - - // Wait for permission prompt to appear - await run.expectText('Allow', 10000); - - // Approve the permission - await run.type('y'); - await run.type('\r'); - - // Wait for command to execute - await run.expectText('test', 10000); - - // Should find the shell command execution - const foundShellCommand = await rig.waitForToolCall('run_shell_command'); - expect(foundShellCommand).toBeTruthy(); - - // Verify Notification hook executed - const hookLogs = rig.readHookLogs(); - const notificationLog = hookLogs.find( - (log) => - log.hookCall.hook_event_name === 'Notification' && - log.hookCall.hook_name === normalizePath(hookCommand), - ); - - expect(notificationLog).toBeDefined(); - if (notificationLog) { - expect(notificationLog.hookCall.exit_code).toBe(0); - expect(notificationLog.hookCall.stdout).toContain( - 'Permission request logged by security hook', + // Create script file for hook + const scriptPath = rig.createScript( + 'notification_hook.cjs', + "console.log(JSON.stringify({suppressOutput: false, systemMessage: 'Permission request logged by security hook'}));", ); - // Verify hook input contains notification details - const hookInputStr = - typeof notificationLog.hookCall.hook_input === 'string' - ? notificationLog.hookCall.hook_input - : JSON.stringify(notificationLog.hookCall.hook_input); - const hookInput = JSON.parse(hookInputStr) as Record; + const hookCommand = `node "${scriptPath}"`; - // Should have notification type (uses snake_case) - expect(hookInput['notification_type']).toBe('ToolPermission'); - - // Should have message - expect(hookInput['message']).toBeDefined(); - - // Should have details with tool info - expect(hookInput['details']).toBeDefined(); - const details = hookInput['details'] as Record; - // For 'exec' type confirmations, details contains: type, title, command, rootCommand - expect(details['type']).toBe('exec'); - expect(details['command']).toBeDefined(); - expect(details['title']).toBeDefined(); - } - }); - }); - - describe('Sequential Hook Execution', () => { - it('should execute hooks sequentially when configured', async () => { - rig.setup('should execute hooks sequentially when configured', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.sequential-execution.responses', - ), - }); - - // Create script files for hooks - const hook1Path = rig.createScript( - 'seq_hook1.cjs', - "console.log(JSON.stringify({decision: 'allow', hookSpecificOutput: {hookEventName: 'BeforeAgent', additionalContext: 'Step 1: Initial validation passed.'}}));", - ); - const hook2Path = rig.createScript( - 'seq_hook2.cjs', - "console.log(JSON.stringify({decision: 'allow', hookSpecificOutput: {hookEventName: 'BeforeAgent', additionalContext: 'Step 2: Security check completed.'}}));", - ); - - const hook1Command = `node "${hook1Path}"`; - const hook2Command = `node "${hook2Path}"`; - - rig.setup('should execute hooks sequentially when configured', { - settings: { - hooksConfig: { - enabled: true, + rig.setup('should handle notification hooks for tool permissions', { + settings: { + // Configure tools to enable hooks and require confirmation to trigger notifications + tools: { + approval: 'ASK', // Disable YOLO mode to show permission prompts + confirmationRequired: ['run_shell_command'], + }, + hooksConfig: { + enabled: true, + }, + hooks: { + Notification: [ + { + matcher: 'ToolPermission', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(hookCommand), + timeout: 5000, + }, + ], + }, + ], + }, }, - hooks: { - BeforeAgent: [ - { - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(hook1Command), - timeout: 5000, - }, - { - type: 'command', - command: normalizePath(hook2Command), - timeout: 5000, - }, - ], - }, - ], - }, - }, + }); + + const run = await rig.runInteractive({ approvalMode: 'default' }); + + // Send prompt that will trigger a permission request + await run.type('Run the command "echo test"'); + await run.type('\r'); + + // Wait for permission prompt to appear + await run.expectText('Allow', 10000); + + // Approve the permission + await run.type('y'); + await run.type('\r'); + + // Wait for command to execute + await run.expectText('test', 10000); + + // Should find the shell command execution + const foundShellCommand = + await rig.waitForToolCall('run_shell_command'); + expect(foundShellCommand).toBeTruthy(); + + // Verify Notification hook executed + const hookLogs = rig.readHookLogs(); + const notificationLog = hookLogs.find( + (log) => + log.hookCall.hook_event_name === 'Notification' && + log.hookCall.hook_name === normalizePath(hookCommand), + ); + + expect(notificationLog).toBeDefined(); + if (notificationLog) { + expect(notificationLog.hookCall.exit_code).toBe(0); + expect(notificationLog.hookCall.stdout).toContain( + 'Permission request logged by security hook', + ); + + // Verify hook input contains notification details + const hookInputStr = + typeof notificationLog.hookCall.hook_input === 'string' + ? notificationLog.hookCall.hook_input + : JSON.stringify(notificationLog.hookCall.hook_input); + const hookInput = JSON.parse(hookInputStr) as Record; + + // Should have notification type (uses snake_case) + expect(hookInput['notification_type']).toBe('ToolPermission'); + + // Should have message + expect(hookInput['message']).toBeDefined(); + + // Should have details with tool info + expect(hookInput['details']).toBeDefined(); + const details = hookInput['details'] as Record; + // For 'exec' type confirmations, details contains: type, title, command, rootCommand + expect(details['type']).toBe('exec'); + expect(details['command']).toBeDefined(); + expect(details['title']).toBeDefined(); + } }); - - await rig.run({ args: 'Hello, please help me with a task' }); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - - // Verify both hooks executed - const hookLogs = rig.readHookLogs(); - const hook1Log = hookLogs.find( - (log) => log.hookCall.hook_name === normalizePath(hook1Command), - ); - const hook2Log = hookLogs.find( - (log) => log.hookCall.hook_name === normalizePath(hook2Command), - ); - - expect(hook1Log).toBeDefined(); - expect(hook1Log?.hookCall.exit_code).toBe(0); - expect(hook1Log?.hookCall.stdout).toContain( - 'Step 1: Initial validation passed', - ); - - expect(hook2Log).toBeDefined(); - expect(hook2Log?.hookCall.exit_code).toBe(0); - expect(hook2Log?.hookCall.stdout).toContain( - 'Step 2: Security check completed', - ); }); - }); - describe('Hook Input/Output Validation', () => { - it('should provide correct input format to hooks', async () => { - rig.setup('should provide correct input format to hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.input-validation.responses', - ), + describe('Sequential Hook Execution', () => { + it('should execute hooks sequentially when configured', async () => { + rig.setup('should execute hooks sequentially when configured', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.sequential-execution.responses', + ), + }); + + // Create script files for hooks + const hook1Path = rig.createScript( + 'seq_hook1.cjs', + "console.log(JSON.stringify({decision: 'allow', hookSpecificOutput: {hookEventName: 'BeforeAgent', additionalContext: 'Step 1: Initial validation passed.'}}));", + ); + const hook2Path = rig.createScript( + 'seq_hook2.cjs', + "console.log(JSON.stringify({decision: 'allow', hookSpecificOutput: {hookEventName: 'BeforeAgent', additionalContext: 'Step 2: Security check completed.'}}));", + ); + + const hook1Command = `node "${hook1Path}"`; + const hook2Command = `node "${hook2Path}"`; + + rig.setup('should execute hooks sequentially when configured', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeAgent: [ + { + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(hook1Command), + timeout: 5000, + }, + { + type: 'command', + command: normalizePath(hook2Command), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + await rig.run({ args: 'Hello, please help me with a task' }); + + // Should generate hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); + + // Verify both hooks executed + const hookLogs = rig.readHookLogs(); + const hook1Log = hookLogs.find( + (log) => log.hookCall.hook_name === normalizePath(hook1Command), + ); + const hook2Log = hookLogs.find( + (log) => log.hookCall.hook_name === normalizePath(hook2Command), + ); + + expect(hook1Log).toBeDefined(); + expect(hook1Log?.hookCall.exit_code).toBe(0); + expect(hook1Log?.hookCall.stdout).toContain( + 'Step 1: Initial validation passed', + ); + + expect(hook2Log).toBeDefined(); + expect(hook2Log?.hookCall.exit_code).toBe(0); + expect(hook2Log?.hookCall.stdout).toContain( + 'Step 2: Security check completed', + ); }); - // Create a hook script that validates the input format - const hookScript = `const fs = require('fs'); + }); + + describe('Hook Input/Output Validation', () => { + it('should provide correct input format to hooks', async () => { + rig.setup('should provide correct input format to hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.input-validation.responses', + ), + }); + // Create a hook script that validates the input format + const hookScript = `const fs = require('fs'); const input = fs.readFileSync(0, 'utf-8'); try { const json = JSON.parse(input); @@ -964,69 +982,12 @@ try { console.log(JSON.stringify({decision: "block", reason: "Invalid JSON"})); }`; - const scriptPath = rig.createScript( - 'input_validation_hook.cjs', - hookScript, - ); + const scriptPath = rig.createScript( + 'input_validation_hook.cjs', + hookScript, + ); - rig.setup('should provide correct input format to hooks', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - await rig.run({ - args: 'Create a file called input-test.txt with content "test"', - }); - - // Hook should validate input format successfully - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // Check that the file was created (hook allowed it) - const fileContent = rig.readFile('input-test.txt'); - expect(fileContent).toContain('test'); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - }); - - it('should treat mixed stdout (text + JSON) as system message and allow execution when exit code is 0', async () => { - rig.setup( - 'should treat mixed stdout (text + JSON) as system message and allow execution when exit code is 0', - { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.allow-tool.responses', - ), - }, - ); - - // Create script file for hook - const scriptPath = rig.createScript( - 'pollution_hook.cjs', - "console.log('Pollution'); console.log(JSON.stringify({decision: 'deny', reason: 'Should be ignored'}));", - ); - - rig.setup( - 'should treat mixed stdout (text + JSON) as system message and allow execution when exit code is 0', - { + rig.setup('should provide correct input format to hooks', { settings: { hooksConfig: { enabled: true, @@ -1034,13 +995,9 @@ try { hooks: { BeforeTool: [ { - matcher: 'write_file', - sequential: true, hooks: [ { type: 'command', - // Output plain text then JSON. - // This breaks JSON parsing, so it falls back to 'allow' with the whole stdout as systemMessage. command: normalizePath(`node "${scriptPath}"`), timeout: 5000, }, @@ -1049,341 +1006,402 @@ try { ], }, }, - }, - ); + }); - const result = await rig.run({ - args: 'Create a file called approved.txt with content "Approved content"', + await rig.run({ + args: 'Create a file called input-test.txt with content "test"', + }); + + // Hook should validate input format successfully + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // Check that the file was created (hook allowed it) + const fileContent = rig.readFile('input-test.txt'); + expect(fileContent).toContain('test'); + + // Should generate hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); }); - // The hook logic fails to parse JSON, so it allows the tool. - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // The entire stdout (including the JSON part) becomes the systemMessage - expect(result).toContain('Pollution'); - expect(result).toContain('Should be ignored'); - }); - }); - - describe('Multiple Event Types', () => { - it('should handle hooks for all major event types', async () => { - rig.setup('should handle hooks for all major event types', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.multiple-events.responses', - ), - }); - - // Create script files for hooks - const btPath = rig.createScript( - 'bt_hook.cjs', - "console.log(JSON.stringify({decision: 'allow', systemMessage: 'BeforeTool: File operation logged'}));", - ); - const atPath = rig.createScript( - 'at_hook.cjs', - "console.log(JSON.stringify({hookSpecificOutput: {hookEventName: 'AfterTool', additionalContext: 'AfterTool: Operation completed successfully'}}));", - ); - const baPath = rig.createScript( - 'ba_hook.cjs', - "console.log(JSON.stringify({decision: 'allow', hookSpecificOutput: {hookEventName: 'BeforeAgent', additionalContext: 'BeforeAgent: User request processed'}}));", - ); - - const beforeToolCommand = `node "${btPath}"`; - const afterToolCommand = `node "${atPath}"`; - const beforeAgentCommand = `node "${baPath}"`; - - rig.setup('should handle hooks for all major event types', { - settings: { - hooksConfig: { - enabled: true, + it('should treat mixed stdout (text + JSON) as system message and allow execution when exit code is 0', async () => { + rig.setup( + 'should treat mixed stdout (text + JSON) as system message and allow execution when exit code is 0', + { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.allow-tool.responses', + ), }, - hooks: { - BeforeAgent: [ - { - hooks: [ - { - type: 'command', - command: normalizePath(beforeAgentCommand), - timeout: 5000, - }, - ], - }, - ], - BeforeTool: [ - { - matcher: 'write_file', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(beforeToolCommand), - timeout: 5000, - }, - ], - }, - ], - AfterTool: [ - { - matcher: 'write_file', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(afterToolCommand), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - const result = await rig.run({ - args: - 'Create a file called multi-event-test.txt with content ' + - '"testing multiple events", and then please reply with ' + - 'everything I say just after this:"', - }); - - // Should execute write_file tool - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // File should be created - const fileContent = rig.readFile('multi-event-test.txt'); - expect(fileContent).toContain('testing multiple events'); - - // Result should contain context from all hooks - expect(result).toContain('BeforeTool: File operation logged'); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - - // Verify all three hooks executed - const hookLogs = rig.readHookLogs(); - const beforeAgentLog = hookLogs.find( - (log) => log.hookCall.hook_name === normalizePath(beforeAgentCommand), - ); - const beforeToolLog = hookLogs.find( - (log) => log.hookCall.hook_name === normalizePath(beforeToolCommand), - ); - const afterToolLog = hookLogs.find( - (log) => log.hookCall.hook_name === normalizePath(afterToolCommand), - ); - - expect(beforeAgentLog).toBeDefined(); - expect(beforeAgentLog?.hookCall.exit_code).toBe(0); - expect(beforeAgentLog?.hookCall.stdout).toContain( - 'BeforeAgent: User request processed', - ); - - expect(beforeToolLog).toBeDefined(); - expect(beforeToolLog?.hookCall.exit_code).toBe(0); - expect(beforeToolLog?.hookCall.stdout).toContain( - 'BeforeTool: File operation logged', - ); - - expect(afterToolLog).toBeDefined(); - expect(afterToolLog?.hookCall.exit_code).toBe(0); - expect(afterToolLog?.hookCall.stdout).toContain( - 'AfterTool: Operation completed successfully', - ); - }); - }); - - describe('Hook Error Handling', () => { - it('should handle hook failures gracefully', async () => { - rig.setup('should handle hook failures gracefully', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.error-handling.responses', - ), - }); - // Create script files for hooks - const failingPath = join(rig.testDir!, 'fail_hook.cjs'); - writeFileSync(failingPath, 'process.exit(1);'); - const workingPath = join(rig.testDir!, 'work_hook.cjs'); - writeFileSync( - workingPath, - "console.log(JSON.stringify({decision: 'allow', reason: 'Working hook succeeded'}));", - ); - - // Failing hook: exits with non-zero code - const failingCommand = `node "${failingPath}"`; - // Working hook: returns success with JSON - const workingCommand = `node "${workingPath}"`; - - rig.setup('should handle hook failures gracefully', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - hooks: [ - { - type: 'command', - command: normalizePath(failingCommand), - timeout: 5000, - }, - { - type: 'command', - command: normalizePath(workingCommand), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - await rig.run({ - args: 'Create a file called error-test.txt with content "testing error handling"', - }); - - // Despite one hook failing, the working hook should still allow the operation - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // File should be created - const fileContent = rig.readFile('error-test.txt'); - expect(fileContent).toContain('testing error handling'); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - }); - }); - - describe('Hook Telemetry and Observability', () => { - it('should generate telemetry events for hook executions', async () => { - rig.setup('should generate telemetry events for hook executions', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.telemetry.responses', - ), - }); - - // Create script file for hook - const scriptPath = rig.createScript( - 'telemetry_hook.cjs', - "console.log(JSON.stringify({decision: 'allow', reason: 'Telemetry test hook'}));", - ); - - const hookCommand = `node "${scriptPath}"`; - - rig.setup('should generate telemetry events for hook executions', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - hooks: [ - { - type: 'command', - command: normalizePath(hookCommand), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - await rig.run({ args: 'Create a file called telemetry-test.txt' }); - - // Should execute the tool - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - }); - }); - - describe('Session Lifecycle Hooks', () => { - it('should fire SessionStart hook on app startup', async () => { - rig.setup('should fire SessionStart hook on app startup', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.session-startup.responses', - ), - }); - - // Create script file for hook - const scriptPath = rig.createScript( - 'session_start_hook.cjs', - "console.log(JSON.stringify({decision: 'allow', systemMessage: 'Session starting on startup'}));", - ); - - const sessionStartCommand = `node "${scriptPath}"`; - - rig.setup('should fire SessionStart hook on app startup', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - SessionStart: [ - { - matcher: 'startup', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(sessionStartCommand), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - // Run a simple query - the SessionStart hook will fire during app initialization - await rig.run({ args: 'Say hello' }); - - // Verify hook executed with correct parameters - const hookLogs = rig.readHookLogs(); - const sessionStartLog = hookLogs.find( - (log) => log.hookCall.hook_event_name === 'SessionStart', - ); - - expect(sessionStartLog).toBeDefined(); - if (sessionStartLog) { - expect(sessionStartLog.hookCall.hook_name).toBe( - normalizePath(sessionStartCommand), ); - expect(sessionStartLog.hookCall.exit_code).toBe(0); - expect(sessionStartLog.hookCall.hook_input).toBeDefined(); - // hook_input is a string that needs to be parsed - const hookInputStr = - typeof sessionStartLog.hookCall.hook_input === 'string' - ? sessionStartLog.hookCall.hook_input - : JSON.stringify(sessionStartLog.hookCall.hook_input); - const hookInput = JSON.parse(hookInputStr) as Record; - - expect(hookInput['source']).toBe('startup'); - expect(sessionStartLog.hookCall.stdout).toContain( - 'Session starting on startup', + // Create script file for hook + const scriptPath = rig.createScript( + 'pollution_hook.cjs', + "console.log('Pollution'); console.log(JSON.stringify({decision: 'deny', reason: 'Should be ignored'}));", ); - } + + rig.setup( + 'should treat mixed stdout (text + JSON) as system message and allow execution when exit code is 0', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ + { + matcher: 'write_file', + sequential: true, + hooks: [ + { + type: 'command', + // Output plain text then JSON. + // This breaks JSON parsing, so it falls back to 'allow' with the whole stdout as systemMessage. + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }, + ); + + const result = await rig.run({ + args: 'Create a file called approved.txt with content "Approved content"', + }); + + // The hook logic fails to parse JSON, so it allows the tool. + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // The entire stdout (including the JSON part) becomes the systemMessage + expect(result).toContain('Pollution'); + expect(result).toContain('Should be ignored'); + }); }); - it('should fire SessionStart hook and inject context', async () => { - // Create hook script that outputs JSON with additionalContext - const hookScript = `const fs = require('fs'); + describe('Multiple Event Types', () => { + it('should handle hooks for all major event types', async () => { + rig.setup('should handle hooks for all major event types', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.multiple-events.responses', + ), + }); + + // Create script files for hooks + const btPath = rig.createScript( + 'bt_hook.cjs', + "console.log(JSON.stringify({decision: 'allow', systemMessage: 'BeforeTool: File operation logged'}));", + ); + const atPath = rig.createScript( + 'at_hook.cjs', + "console.log(JSON.stringify({hookSpecificOutput: {hookEventName: 'AfterTool', additionalContext: 'AfterTool: Operation completed successfully'}}));", + ); + const baPath = rig.createScript( + 'ba_hook.cjs', + "console.log(JSON.stringify({decision: 'allow', hookSpecificOutput: {hookEventName: 'BeforeAgent', additionalContext: 'BeforeAgent: User request processed'}}));", + ); + + const beforeToolCommand = `node "${btPath}"`; + const afterToolCommand = `node "${atPath}"`; + const beforeAgentCommand = `node "${baPath}"`; + + rig.setup('should handle hooks for all major event types', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeAgent: [ + { + hooks: [ + { + type: 'command', + command: normalizePath(beforeAgentCommand), + timeout: 5000, + }, + ], + }, + ], + BeforeTool: [ + { + matcher: 'write_file', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(beforeToolCommand), + timeout: 5000, + }, + ], + }, + ], + AfterTool: [ + { + matcher: 'write_file', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(afterToolCommand), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + const result = await rig.run({ + args: + 'Create a file called multi-event-test.txt with content ' + + '"testing multiple events", and then please reply with ' + + 'everything I say just after this:"', + }); + + // Should execute write_file tool + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // File should be created + const fileContent = rig.readFile('multi-event-test.txt'); + expect(fileContent).toContain('testing multiple events'); + + // Result should contain context from all hooks + expect(result).toContain('BeforeTool: File operation logged'); + + // Should generate hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); + + // Verify all three hooks executed + const hookLogs = rig.readHookLogs(); + const beforeAgentLog = hookLogs.find( + (log) => log.hookCall.hook_name === normalizePath(beforeAgentCommand), + ); + const beforeToolLog = hookLogs.find( + (log) => log.hookCall.hook_name === normalizePath(beforeToolCommand), + ); + const afterToolLog = hookLogs.find( + (log) => log.hookCall.hook_name === normalizePath(afterToolCommand), + ); + + expect(beforeAgentLog).toBeDefined(); + expect(beforeAgentLog?.hookCall.exit_code).toBe(0); + expect(beforeAgentLog?.hookCall.stdout).toContain( + 'BeforeAgent: User request processed', + ); + + expect(beforeToolLog).toBeDefined(); + expect(beforeToolLog?.hookCall.exit_code).toBe(0); + expect(beforeToolLog?.hookCall.stdout).toContain( + 'BeforeTool: File operation logged', + ); + + expect(afterToolLog).toBeDefined(); + expect(afterToolLog?.hookCall.exit_code).toBe(0); + expect(afterToolLog?.hookCall.stdout).toContain( + 'AfterTool: Operation completed successfully', + ); + }); + }); + + describe('Hook Error Handling', () => { + it('should handle hook failures gracefully', async () => { + rig.setup('should handle hook failures gracefully', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.error-handling.responses', + ), + }); + // Create script files for hooks + const failingPath = join(rig.testDir!, 'fail_hook.cjs'); + writeFileSync(failingPath, 'process.exit(1);'); + const workingPath = join(rig.testDir!, 'work_hook.cjs'); + writeFileSync( + workingPath, + "console.log(JSON.stringify({decision: 'allow', reason: 'Working hook succeeded'}));", + ); + + // Failing hook: exits with non-zero code + const failingCommand = `node "${failingPath}"`; + // Working hook: returns success with JSON + const workingCommand = `node "${workingPath}"`; + + rig.setup('should handle hook failures gracefully', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ + { + hooks: [ + { + type: 'command', + command: normalizePath(failingCommand), + timeout: 5000, + }, + { + type: 'command', + command: normalizePath(workingCommand), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + await rig.run({ + args: 'Create a file called error-test.txt with content "testing error handling"', + }); + + // Despite one hook failing, the working hook should still allow the operation + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // File should be created + const fileContent = rig.readFile('error-test.txt'); + expect(fileContent).toContain('testing error handling'); + + // Should generate hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); + }); + }); + + describe('Hook Telemetry and Observability', () => { + it('should generate telemetry events for hook executions', async () => { + rig.setup('should generate telemetry events for hook executions', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.telemetry.responses', + ), + }); + + // Create script file for hook + const scriptPath = rig.createScript( + 'telemetry_hook.cjs', + "console.log(JSON.stringify({decision: 'allow', reason: 'Telemetry test hook'}));", + ); + + const hookCommand = `node "${scriptPath}"`; + + rig.setup('should generate telemetry events for hook executions', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ + { + hooks: [ + { + type: 'command', + command: normalizePath(hookCommand), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + await rig.run({ args: 'Create a file called telemetry-test.txt' }); + + // Should execute the tool + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // Should generate hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); + }); + }); + + describe('Session Lifecycle Hooks', () => { + it('should fire SessionStart hook on app startup', async () => { + rig.setup('should fire SessionStart hook on app startup', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.session-startup.responses', + ), + }); + + // Create script file for hook + const scriptPath = rig.createScript( + 'session_start_hook.cjs', + "console.log(JSON.stringify({decision: 'allow', systemMessage: 'Session starting on startup'}));", + ); + + const sessionStartCommand = `node "${scriptPath}"`; + + rig.setup('should fire SessionStart hook on app startup', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + SessionStart: [ + { + matcher: 'startup', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(sessionStartCommand), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + // Run a simple query - the SessionStart hook will fire during app initialization + await rig.run({ args: 'Say hello' }); + + // Verify hook executed with correct parameters + const hookLogs = rig.readHookLogs(); + const sessionStartLog = hookLogs.find( + (log) => log.hookCall.hook_event_name === 'SessionStart', + ); + + expect(sessionStartLog).toBeDefined(); + if (sessionStartLog) { + expect(sessionStartLog.hookCall.hook_name).toBe( + normalizePath(sessionStartCommand), + ); + expect(sessionStartLog.hookCall.exit_code).toBe(0); + expect(sessionStartLog.hookCall.hook_input).toBeDefined(); + + // hook_input is a string that needs to be parsed + const hookInputStr = + typeof sessionStartLog.hookCall.hook_input === 'string' + ? sessionStartLog.hookCall.hook_input + : JSON.stringify(sessionStartLog.hookCall.hook_input); + const hookInput = JSON.parse(hookInputStr) as Record; + + expect(hookInput['source']).toBe('startup'); + expect(sessionStartLog.hookCall.stdout).toContain( + 'Session starting on startup', + ); + } + }); + + it('should fire SessionStart hook and inject context', async () => { + // Create hook script that outputs JSON with additionalContext + const hookScript = `const fs = require('fs'); console.log(JSON.stringify({ decision: 'allow', systemMessage: 'Context injected via SessionStart hook', @@ -1393,104 +1411,19 @@ console.log(JSON.stringify({ } }));`; - rig.setup('should fire SessionStart hook and inject context', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.session-startup.responses', - ), - }); - - const scriptPath = rig.createScript( - 'session_start_context_hook.cjs', - hookScript, - ); - - rig.setup('should fire SessionStart hook and inject context', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - SessionStart: [ - { - matcher: 'startup', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - // Run a query - the SessionStart hook will fire during app initialization - const result = await rig.run({ args: 'Who are you?' }); - - // Check if systemMessage was displayed (in stderr, which rig.run captures) - expect(result).toContain('Context injected via SessionStart hook'); - - // Check if additionalContext influenced the model response - // Note: We use fake responses, but the rig records interactions. - // If we are using fake responses, the model won't actually respond unless we provide a fake response for the injected context. - // But the test rig setup uses 'hooks-system.session-startup.responses'. - // If I'm adding a new test, I might need to generate new fake responses or expect the context to be sent to the model (verify API logs). - - // Verify hook executed - const hookLogs = rig.readHookLogs(); - const sessionStartLog = hookLogs.find( - (log) => log.hookCall.hook_event_name === 'SessionStart', - ); - - expect(sessionStartLog).toBeDefined(); - - // Verify the API request contained the injected context - // rig.readAllApiRequest() gives us telemetry on API requests. - const apiRequests = rig.readAllApiRequest(); - // We expect at least one API request - expect(apiRequests.length).toBeGreaterThan(0); - - // The injected context should be in the request text - // For non-interactive mode, I prepended it to input: "context\n\ninput" - // The telemetry `request_text` should contain it. - const requestText = apiRequests[0].attributes?.request_text || ''; - expect(requestText).toContain('protocol droid'); - }); - - it('should fire SessionStart hook and display systemMessage in interactive mode', async () => { - // Create hook script that outputs JSON with systemMessage and additionalContext - const hookScript = `const fs = require('fs'); -console.log(JSON.stringify({ - decision: 'allow', - systemMessage: 'Interactive Session Start Message', - hookSpecificOutput: { - hookEventName: 'SessionStart', - additionalContext: 'The user is a Jedi Master.' - } -}));`; - - rig.setup( - 'should fire SessionStart hook and display systemMessage in interactive mode', - { + rig.setup('should fire SessionStart hook and inject context', { fakeResponsesPath: join( import.meta.dirname, 'hooks-system.session-startup.responses', ), - }, - ); + }); - const scriptPath = rig.createScript( - 'session_start_interactive_hook.cjs', - hookScript, - ); + const scriptPath = rig.createScript( + 'session_start_context_hook.cjs', + hookScript, + ); - rig.setup( - 'should fire SessionStart hook and display systemMessage in interactive mode', - { + rig.setup('should fire SessionStart hook and inject context', { settings: { hooksConfig: { enabled: true, @@ -1511,70 +1444,418 @@ console.log(JSON.stringify({ ], }, }, - }, - ); + }); - const run = await rig.runInteractive(); + // Run a query - the SessionStart hook will fire during app initialization + const result = await rig.run({ args: 'Who are you?' }); - // Verify systemMessage is displayed - await run.expectText('Interactive Session Start Message', 10000); + // Check if systemMessage was displayed (in stderr, which rig.run captures) + expect(result).toContain('Context injected via SessionStart hook'); - // Send a prompt to establish a session and trigger an API call - await run.sendKeys('Hello'); - await run.type('\r'); + // Check if additionalContext influenced the model response + // Note: We use fake responses, but the rig records interactions. + // If we are using fake responses, the model won't actually respond unless we provide a fake response for the injected context. + // But the test rig setup uses 'hooks-system.session-startup.responses'. + // If I'm adding a new test, I might need to generate new fake responses or expect the context to be sent to the model (verify API logs). - // Wait for response to ensure API call happened - await run.expectText('Hello', 15000); + // Verify hook executed + const hookLogs = rig.readHookLogs(); + const sessionStartLog = hookLogs.find( + (log) => log.hookCall.hook_event_name === 'SessionStart', + ); - // Wait for telemetry to be written to disk - await rig.waitForTelemetryReady(); + expect(sessionStartLog).toBeDefined(); - // Verify the API request contained the injected context - // We may need to poll for API requests as they are written asynchronously - const pollResult = await poll( - () => { - const apiRequests = rig.readAllApiRequest(); - return apiRequests.length > 0; - }, - 15000, - 500, - ); + // Verify the API request contained the injected context + // rig.readAllApiRequest() gives us telemetry on API requests. + const apiRequests = rig.readAllApiRequest(); + // We expect at least one API request + expect(apiRequests.length).toBeGreaterThan(0); - expect(pollResult).toBe(true); + // The injected context should be in the request text + // For non-interactive mode, I prepended it to input: "context\n\ninput" + // The telemetry `request_text` should contain it. + const requestText = apiRequests[0].attributes?.request_text || ''; + expect(requestText).toContain('protocol droid'); + }); - const apiRequests = rig.readAllApiRequest(); - // The injected context should be in the request_text of the API request - const requestText = apiRequests[0].attributes?.request_text || ''; - expect(requestText).toContain('Jedi Master'); + it('should fire SessionStart hook and display systemMessage in interactive mode', async () => { + // Create hook script that outputs JSON with systemMessage and additionalContext + const hookScript = `const fs = require('fs'); +console.log(JSON.stringify({ + decision: 'allow', + systemMessage: 'Interactive Session Start Message', + hookSpecificOutput: { + hookEventName: 'SessionStart', + additionalContext: 'The user is a Jedi Master.' + } +}));`; + + rig.setup( + 'should fire SessionStart hook and display systemMessage in interactive mode', + { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.session-startup.responses', + ), + }, + ); + + const scriptPath = rig.createScript( + 'session_start_interactive_hook.cjs', + hookScript, + ); + + rig.setup( + 'should fire SessionStart hook and display systemMessage in interactive mode', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + SessionStart: [ + { + matcher: 'startup', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }, + ); + + const run = await rig.runInteractive(); + + // Verify systemMessage is displayed + await run.expectText('Interactive Session Start Message', 10000); + + // Send a prompt to establish a session and trigger an API call + await run.sendKeys('Hello'); + await run.type('\r'); + + // Wait for response to ensure API call happened + await run.expectText('Hello', 15000); + + // Wait for telemetry to be written to disk + await rig.waitForTelemetryReady(); + + // Verify the API request contained the injected context + // We may need to poll for API requests as they are written asynchronously + const pollResult = await poll( + () => { + const apiRequests = rig.readAllApiRequest(); + return apiRequests.length > 0; + }, + 15000, + 500, + ); + + expect(pollResult).toBe(true); + + const apiRequests = rig.readAllApiRequest(); + // The injected context should be in the request_text of the API request + const requestText = apiRequests[0].attributes?.request_text || ''; + expect(requestText).toContain('Jedi Master'); + }); + + it('should fire SessionEnd and SessionStart hooks on /clear command', async () => { + rig.setup( + 'should fire SessionEnd and SessionStart hooks on /clear command', + { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.session-clear.responses', + ), + }, + ); + + // Create script files for hooks + const endScriptPath = rig.createScript( + 'session_end_clear.cjs', + "console.log(JSON.stringify({decision: 'allow', systemMessage: 'Session ending due to clear'}));", + ); + const startScriptPath = rig.createScript( + 'session_start_clear.cjs', + "console.log(JSON.stringify({decision: 'allow', systemMessage: 'Session starting after clear'}));", + ); + + const sessionEndCommand = `node "${endScriptPath}"`; + const sessionStartCommand = `node "${startScriptPath}"`; + + rig.setup( + 'should fire SessionEnd and SessionStart hooks on /clear command', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + SessionEnd: [ + { + matcher: '*', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(sessionEndCommand), + timeout: 5000, + }, + ], + }, + ], + SessionStart: [ + { + matcher: '*', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(sessionStartCommand), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }, + ); + + const run = await rig.runInteractive(); + + // Send an initial prompt to establish a session + await run.sendKeys('Say hello'); + await run.type('\r'); + + // Wait for the response + await run.expectText('Hello', 10000); + + // Execute /clear command multiple times to generate more hook events + // This makes the test more robust by creating multiple start/stop cycles + const numClears = 3; + for (let i = 0; i < numClears; i++) { + await run.sendKeys('/clear'); + await run.type('\r'); + + // Wait a bit for clear to complete + await new Promise((resolve) => setTimeout(resolve, 2000)); + + // Send a prompt to establish an active session before next clear + await run.sendKeys('Say hello'); + await run.type('\r'); + + // Wait for response + await run.expectText('Hello', 10000); + } + + // Wait for all clears to complete + // BatchLogRecordProcessor exports telemetry every 10 seconds by default + // Use generous wait time across all platforms (CI, Docker, Mac, Linux) + await new Promise((resolve) => setTimeout(resolve, 15000)); + + // Wait for telemetry to be written to disk + await rig.waitForTelemetryReady(); + + // Wait for hook telemetry events to be flushed to disk + // In interactive mode, telemetry may be buffered, so we need to poll for the events + // We execute multiple clears to generate more hook events (total: 1 + numClears * 2) + // But we only require >= 1 hooks to pass, making the test more permissive + const expectedMinHooks = 1; // SessionStart (startup), SessionEnd (clear), SessionStart (clear) + const pollResult = await poll( + () => { + const hookLogs = rig.readHookLogs(); + return hookLogs.length >= expectedMinHooks; + }, + 90000, // 90 second timeout for all platforms + 1000, // check every 1s to reduce I/O overhead + ); + + // If polling failed, log diagnostic info + if (!pollResult) { + const hookLogs = rig.readHookLogs(); + const hookEvents = hookLogs.map( + (log) => log.hookCall.hook_event_name, + ); + console.error( + `Polling timeout after 90000ms: Expected >= ${expectedMinHooks} hooks, got ${hookLogs.length}`, + ); + console.error( + 'Hooks found:', + hookEvents.length > 0 ? hookEvents.join(', ') : 'NONE', + ); + console.error('Full hook logs:', JSON.stringify(hookLogs, null, 2)); + } + + // Verify hooks executed + const hookLogs = rig.readHookLogs(); + + // Diagnostic: Log which hooks we actually got + const hookEvents = hookLogs.map((log) => log.hookCall.hook_event_name); + if (hookLogs.length < expectedMinHooks) { + console.error( + `TEST FAILURE: Expected >= ${expectedMinHooks} hooks, got ${hookLogs.length}: [${hookEvents.length > 0 ? hookEvents.join(', ') : 'NONE'}]`, + ); + } + + expect(hookLogs.length).toBeGreaterThanOrEqual(expectedMinHooks); + + // Find SessionEnd hook log + const sessionEndLog = hookLogs.find( + (log) => + log.hookCall.hook_event_name === 'SessionEnd' && + log.hookCall.hook_name === normalizePath(sessionEndCommand), + ); + // Because the flakiness of the test, we relax this check + // expect(sessionEndLog).toBeDefined(); + if (sessionEndLog) { + expect(sessionEndLog.hookCall.exit_code).toBe(0); + expect(sessionEndLog.hookCall.stdout).toContain( + 'Session ending due to clear', + ); + + // Verify hook input contains reason + const hookInputStr = + typeof sessionEndLog.hookCall.hook_input === 'string' + ? sessionEndLog.hookCall.hook_input + : JSON.stringify(sessionEndLog.hookCall.hook_input); + const hookInput = JSON.parse(hookInputStr) as Record; + expect(hookInput['reason']).toBe('clear'); + } + + // Find SessionStart hook log after clear + const sessionStartAfterClearLogs = hookLogs.filter( + (log) => + log.hookCall.hook_event_name === 'SessionStart' && + log.hookCall.hook_name === normalizePath(sessionStartCommand), + ); + // Should have at least one SessionStart from after clear + // Because the flakiness of the test, we relax this check + // expect(sessionStartAfterClearLogs.length).toBeGreaterThanOrEqual(1); + + const sessionStartLog = sessionStartAfterClearLogs.find((log) => { + const hookInputStr = + typeof log.hookCall.hook_input === 'string' + ? log.hookCall.hook_input + : JSON.stringify(log.hookCall.hook_input); + const hookInput = JSON.parse(hookInputStr) as Record; + return hookInput['source'] === 'clear'; + }); + + // Because the flakiness of the test, we relax this check + // expect(sessionStartLog).toBeDefined(); + if (sessionStartLog) { + expect(sessionStartLog.hookCall.exit_code).toBe(0); + expect(sessionStartLog.hookCall.stdout).toContain( + 'Session starting after clear', + ); + } + }); }); - it('should fire SessionEnd and SessionStart hooks on /clear command', async () => { - rig.setup( - 'should fire SessionEnd and SessionStart hooks on /clear command', - { + describe('Compression Hooks', () => { + it('should fire PreCompress hook on automatic compression', async () => { + rig.setup('should fire PreCompress hook on automatic compression', { fakeResponsesPath: join( import.meta.dirname, - 'hooks-system.session-clear.responses', + 'hooks-system.compress-auto.responses', ), - }, - ); + }); - // Create script files for hooks - const endScriptPath = rig.createScript( - 'session_end_clear.cjs', - "console.log(JSON.stringify({decision: 'allow', systemMessage: 'Session ending due to clear'}));", - ); - const startScriptPath = rig.createScript( - 'session_start_clear.cjs', - "console.log(JSON.stringify({decision: 'allow', systemMessage: 'Session starting after clear'}));", - ); + // Create script file for hook + const scriptPath = rig.createScript( + 'pre_compress_hook.cjs', + "console.log(JSON.stringify({decision: 'allow', systemMessage: 'PreCompress hook executed for automatic compression'}));", + ); - const sessionEndCommand = `node "${endScriptPath}"`; - const sessionStartCommand = `node "${startScriptPath}"`; + const preCompressCommand = `node "${scriptPath}"`; - rig.setup( - 'should fire SessionEnd and SessionStart hooks on /clear command', - { + rig.setup('should fire PreCompress hook on automatic compression', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + PreCompress: [ + { + matcher: 'auto', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(preCompressCommand), + timeout: 5000, + }, + ], + }, + ], + }, + // Configure automatic compression with a very low threshold + // This will trigger auto-compression after the first response + contextCompression: { + // enabled: true, + targetTokenCount: 10, // Very low threshold to trigger compression + }, + }, + }); + + // Run a simple query that will trigger automatic compression + await rig.run({ args: 'Say hello in exactly 5 words' }); + + // Verify hook executed with correct parameters + const hookLogs = rig.readHookLogs(); + const preCompressLog = hookLogs.find( + (log) => log.hookCall.hook_event_name === 'PreCompress', + ); + + expect(preCompressLog).toBeDefined(); + if (preCompressLog) { + expect(preCompressLog.hookCall.hook_name).toBe( + normalizePath(preCompressCommand), + ); + expect(preCompressLog.hookCall.exit_code).toBe(0); + expect(preCompressLog.hookCall.hook_input).toBeDefined(); + + // hook_input is a string that needs to be parsed + const hookInputStr = + typeof preCompressLog.hookCall.hook_input === 'string' + ? preCompressLog.hookCall.hook_input + : JSON.stringify(preCompressLog.hookCall.hook_input); + const hookInput = JSON.parse(hookInputStr) as Record; + + expect(hookInput['trigger']).toBe('auto'); + expect(preCompressLog.hookCall.stdout).toContain( + 'PreCompress hook executed for automatic compression', + ); + } + }); + }); + + describe('SessionEnd on Exit', () => { + it('should fire SessionEnd hook on graceful exit in non-interactive mode', async () => { + rig.setup('should fire SessionEnd hook on graceful exit', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.session-startup.responses', + ), + }); + + // Create script file for hook + const scriptPath = rig.createScript( + 'session_end_exit.cjs', + "console.log(JSON.stringify({decision: 'allow', systemMessage: 'SessionEnd hook executed on exit'}));", + ); + + const sessionEndCommand = `node "${scriptPath}"`; + + rig.setup('should fire SessionEnd hook on graceful exit', { settings: { hooksConfig: { enabled: true, @@ -1582,7 +1863,7 @@ console.log(JSON.stringify({ hooks: { SessionEnd: [ { - matcher: '*', + matcher: 'exit', sequential: true, hooks: [ { @@ -1593,711 +1874,287 @@ console.log(JSON.stringify({ ], }, ], - SessionStart: [ + }, + }, + }); + + // Run in non-interactive mode with a simple prompt + await rig.run({ args: 'Hello' }); + + // The process should exit gracefully, firing the SessionEnd hook + // Wait for telemetry to be written to disk + await rig.waitForTelemetryReady(); + + // Poll for the hook log to appear + const isCI = process.env['CI'] === 'true'; + const pollTimeout = isCI ? 30000 : 10000; + const pollResult = await poll( + () => { + const hookLogs = rig.readHookLogs(); + return hookLogs.some( + (log) => log.hookCall.hook_event_name === 'SessionEnd', + ); + }, + pollTimeout, + 200, + ); + + if (!pollResult) { + const hookLogs = rig.readHookLogs(); + console.error( + 'Polling timeout: Expected SessionEnd hook, got:', + JSON.stringify(hookLogs, null, 2), + ); + } + + expect(pollResult).toBe(true); + + const hookLogs = rig.readHookLogs(); + const sessionEndLog = hookLogs.find( + (log) => log.hookCall.hook_event_name === 'SessionEnd', + ); + + expect(sessionEndLog).toBeDefined(); + if (sessionEndLog) { + expect(sessionEndLog.hookCall.hook_name).toBe( + normalizePath(sessionEndCommand), + ); + expect(sessionEndLog.hookCall.exit_code).toBe(0); + expect(sessionEndLog.hookCall.hook_input).toBeDefined(); + + const hookInputStr = + typeof sessionEndLog.hookCall.hook_input === 'string' + ? sessionEndLog.hookCall.hook_input + : JSON.stringify(sessionEndLog.hookCall.hook_input); + const hookInput = JSON.parse(hookInputStr) as Record; + + expect(hookInput['reason']).toBe('exit'); + expect(sessionEndLog.hookCall.stdout).toContain( + 'SessionEnd hook executed', + ); + } + }); + }); + + describe('Hook Disabling', () => { + it('should not execute hooks disabled in settings file', async () => { + const enabledMsg = 'EXECUTION_ALLOWED_BY_HOOK_A'; + const disabledMsg = 'EXECUTION_BLOCKED_BY_HOOK_B'; + + const enabledJson = JSON.stringify({ + decision: 'allow', + systemMessage: enabledMsg, + }); + const disabledJson = JSON.stringify({ + decision: 'block', + reason: disabledMsg, + }); + + const enabledScript = `console.log(JSON.stringify(${enabledJson}));`; + const disabledScript = `console.log(JSON.stringify(${disabledJson}));`; + const enabledFilename = 'enabled_hook.js'; + const disabledFilename = 'disabled_hook.js'; + const enabledCmd = `node ${enabledFilename}`; + const disabledCmd = `node ${disabledFilename}`; + + // 3. Final setup with full settings + rig.setup('Hook Disabling Settings', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.disabled-via-settings.responses', + ), + settings: { + hooksConfig: { + enabled: true, + disabled: ['hook-b'], + }, + hooks: { + BeforeTool: [ { - matcher: '*', + hooks: [ + { + type: 'command', + name: 'hook-a', + command: enabledCmd, + timeout: 60000, + }, + { + type: 'command', + name: 'hook-b', + command: disabledCmd, + timeout: 60000, + }, + ], + }, + ], + }, + }, + }); + + rig.createScript(enabledFilename, enabledScript); + rig.createScript(disabledFilename, disabledScript); + + await rig.run({ + args: 'Create a file called disabled-test.txt with content "test"', + }); + + // Tool should execute (enabled hook allows it) + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // Check hook telemetry - only enabled hook should have executed + const hookLogs = rig.readHookLogs(); + const enabledHookLog = hookLogs.find((log) => + JSON.stringify(log.hookCall.hook_output).includes(enabledMsg), + ); + const disabledHookLog = hookLogs.find((log) => + JSON.stringify(log.hookCall.hook_output).includes(disabledMsg), + ); + + expect(enabledHookLog).toBeDefined(); + expect(disabledHookLog).toBeUndefined(); + }); + + it('should respect disabled hooks across multiple operations', async () => { + const activeMsg = 'MULTIPLE_OPS_ENABLED_HOOK'; + const disabledMsg = 'MULTIPLE_OPS_DISABLED_HOOK'; + + const activeJson = JSON.stringify({ + decision: 'allow', + systemMessage: activeMsg, + }); + const disabledJson = JSON.stringify({ + decision: 'block', + reason: disabledMsg, + }); + + const activeScript = `console.log(JSON.stringify(${activeJson}));`; + const disabledScript = `console.log(JSON.stringify(${disabledJson}));`; + const activeFilename = 'active_hook.js'; + const disabledFilename = 'disabled_hook.js'; + const activeCmd = `node ${activeFilename}`; + const disabledCmd = `node ${disabledFilename}`; + + // 3. Final setup with full settings + rig.setup('Hook Disabling Multiple Ops', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.disabled-via-command.responses', + ), + settings: { + hooksConfig: { + enabled: true, + disabled: ['multi-hook-disabled'], + }, + hooks: { + BeforeTool: [ + { + hooks: [ + { + type: 'command', + name: 'multi-hook-active', + command: activeCmd, + timeout: 60000, + }, + { + type: 'command', + name: 'multi-hook-disabled', + command: disabledCmd, + timeout: 60000, + }, + ], + }, + ], + }, + }, + }); + + rig.createScript(activeFilename, activeScript); + rig.createScript(disabledFilename, disabledScript); + + // First run - only active hook should execute + await rig.run({ + args: 'Create a file called first-run.txt with "test1"', + }); + + // Tool should execute (active hook allows it) + const foundWriteFile1 = await rig.waitForToolCall('write_file'); + expect(foundWriteFile1).toBeTruthy(); + + // Check hook telemetry - only active hook should have executed + const hookLogs1 = rig.readHookLogs(); + const activeHookLog1 = hookLogs1.find((log) => + JSON.stringify(log.hookCall.hook_output).includes(activeMsg), + ); + const disabledHookLog1 = hookLogs1.find((log) => + JSON.stringify(log.hookCall.hook_output).includes(disabledMsg), + ); + + expect(activeHookLog1).toBeDefined(); + expect(disabledHookLog1).toBeUndefined(); + + // Second run - verify disabled hook stays disabled + await rig.run({ + args: 'Create a file called second-run.txt with "test2"', + }); + + const foundWriteFile2 = await rig.waitForToolCall('write_file'); + expect(foundWriteFile2).toBeTruthy(); + + // Verify disabled hook still hasn't executed + const hookLogs2 = rig.readHookLogs(); + const disabledHookLog2 = hookLogs2.find((log) => + JSON.stringify(log.hookCall.hook_output).includes(disabledMsg), + ); + expect(disabledHookLog2).toBeUndefined(); + }); + }); + + describe('BeforeTool Hooks - Input Override', () => { + it('should override tool input parameters via BeforeTool hook', async () => { + // 1. First setup to get the test directory and prepare the hook script + rig.setup('should override tool input parameters via BeforeTool hook'); + + // Create a hook script that overrides the tool input + const hookOutput = { + decision: 'allow', + hookSpecificOutput: { + hookEventName: 'BeforeTool', + tool_input: { + file_path: 'modified.txt', + content: 'modified content', + }, + }, + }; + + const hookScript = `process.stdout.write(JSON.stringify(${JSON.stringify( + hookOutput, + )}));`; + + const scriptPath = rig.createScript( + 'input_override_hook.js', + hookScript, + ); + + // 2. Full setup with settings and fake responses + rig.setup('should override tool input parameters via BeforeTool hook', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.input-modification.responses', + ), + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ + { + matcher: 'write_file', sequential: true, hooks: [ { type: 'command', - command: normalizePath(sessionStartCommand), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }, - ); - - const run = await rig.runInteractive(); - - // Send an initial prompt to establish a session - await run.sendKeys('Say hello'); - await run.type('\r'); - - // Wait for the response - await run.expectText('Hello', 10000); - - // Execute /clear command multiple times to generate more hook events - // This makes the test more robust by creating multiple start/stop cycles - const numClears = 3; - for (let i = 0; i < numClears; i++) { - await run.sendKeys('/clear'); - await run.type('\r'); - - // Wait a bit for clear to complete - await new Promise((resolve) => setTimeout(resolve, 2000)); - - // Send a prompt to establish an active session before next clear - await run.sendKeys('Say hello'); - await run.type('\r'); - - // Wait for response - await run.expectText('Hello', 10000); - } - - // Wait for all clears to complete - // BatchLogRecordProcessor exports telemetry every 10 seconds by default - // Use generous wait time across all platforms (CI, Docker, Mac, Linux) - await new Promise((resolve) => setTimeout(resolve, 15000)); - - // Wait for telemetry to be written to disk - await rig.waitForTelemetryReady(); - - // Wait for hook telemetry events to be flushed to disk - // In interactive mode, telemetry may be buffered, so we need to poll for the events - // We execute multiple clears to generate more hook events (total: 1 + numClears * 2) - // But we only require >= 1 hooks to pass, making the test more permissive - const expectedMinHooks = 1; // SessionStart (startup), SessionEnd (clear), SessionStart (clear) - const pollResult = await poll( - () => { - const hookLogs = rig.readHookLogs(); - return hookLogs.length >= expectedMinHooks; - }, - 90000, // 90 second timeout for all platforms - 1000, // check every 1s to reduce I/O overhead - ); - - // If polling failed, log diagnostic info - if (!pollResult) { - const hookLogs = rig.readHookLogs(); - const hookEvents = hookLogs.map((log) => log.hookCall.hook_event_name); - console.error( - `Polling timeout after 90000ms: Expected >= ${expectedMinHooks} hooks, got ${hookLogs.length}`, - ); - console.error( - 'Hooks found:', - hookEvents.length > 0 ? hookEvents.join(', ') : 'NONE', - ); - console.error('Full hook logs:', JSON.stringify(hookLogs, null, 2)); - } - - // Verify hooks executed - const hookLogs = rig.readHookLogs(); - - // Diagnostic: Log which hooks we actually got - const hookEvents = hookLogs.map((log) => log.hookCall.hook_event_name); - if (hookLogs.length < expectedMinHooks) { - console.error( - `TEST FAILURE: Expected >= ${expectedMinHooks} hooks, got ${hookLogs.length}: [${hookEvents.length > 0 ? hookEvents.join(', ') : 'NONE'}]`, - ); - } - - expect(hookLogs.length).toBeGreaterThanOrEqual(expectedMinHooks); - - // Find SessionEnd hook log - const sessionEndLog = hookLogs.find( - (log) => - log.hookCall.hook_event_name === 'SessionEnd' && - log.hookCall.hook_name === normalizePath(sessionEndCommand), - ); - // Because the flakiness of the test, we relax this check - // expect(sessionEndLog).toBeDefined(); - if (sessionEndLog) { - expect(sessionEndLog.hookCall.exit_code).toBe(0); - expect(sessionEndLog.hookCall.stdout).toContain( - 'Session ending due to clear', - ); - - // Verify hook input contains reason - const hookInputStr = - typeof sessionEndLog.hookCall.hook_input === 'string' - ? sessionEndLog.hookCall.hook_input - : JSON.stringify(sessionEndLog.hookCall.hook_input); - const hookInput = JSON.parse(hookInputStr) as Record; - expect(hookInput['reason']).toBe('clear'); - } - - // Find SessionStart hook log after clear - const sessionStartAfterClearLogs = hookLogs.filter( - (log) => - log.hookCall.hook_event_name === 'SessionStart' && - log.hookCall.hook_name === normalizePath(sessionStartCommand), - ); - // Should have at least one SessionStart from after clear - // Because the flakiness of the test, we relax this check - // expect(sessionStartAfterClearLogs.length).toBeGreaterThanOrEqual(1); - - const sessionStartLog = sessionStartAfterClearLogs.find((log) => { - const hookInputStr = - typeof log.hookCall.hook_input === 'string' - ? log.hookCall.hook_input - : JSON.stringify(log.hookCall.hook_input); - const hookInput = JSON.parse(hookInputStr) as Record; - return hookInput['source'] === 'clear'; - }); - - // Because the flakiness of the test, we relax this check - // expect(sessionStartLog).toBeDefined(); - if (sessionStartLog) { - expect(sessionStartLog.hookCall.exit_code).toBe(0); - expect(sessionStartLog.hookCall.stdout).toContain( - 'Session starting after clear', - ); - } - }); - }); - - describe('Compression Hooks', () => { - it('should fire PreCompress hook on automatic compression', async () => { - rig.setup('should fire PreCompress hook on automatic compression', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.compress-auto.responses', - ), - }); - - // Create script file for hook - const scriptPath = rig.createScript( - 'pre_compress_hook.cjs', - "console.log(JSON.stringify({decision: 'allow', systemMessage: 'PreCompress hook executed for automatic compression'}));", - ); - - const preCompressCommand = `node "${scriptPath}"`; - - rig.setup('should fire PreCompress hook on automatic compression', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - PreCompress: [ - { - matcher: 'auto', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(preCompressCommand), - timeout: 5000, - }, - ], - }, - ], - }, - // Configure automatic compression with a very low threshold - // This will trigger auto-compression after the first response - contextCompression: { - // enabled: true, - targetTokenCount: 10, // Very low threshold to trigger compression - }, - }, - }); - - // Run a simple query that will trigger automatic compression - await rig.run({ args: 'Say hello in exactly 5 words' }); - - // Verify hook executed with correct parameters - const hookLogs = rig.readHookLogs(); - const preCompressLog = hookLogs.find( - (log) => log.hookCall.hook_event_name === 'PreCompress', - ); - - expect(preCompressLog).toBeDefined(); - if (preCompressLog) { - expect(preCompressLog.hookCall.hook_name).toBe( - normalizePath(preCompressCommand), - ); - expect(preCompressLog.hookCall.exit_code).toBe(0); - expect(preCompressLog.hookCall.hook_input).toBeDefined(); - - // hook_input is a string that needs to be parsed - const hookInputStr = - typeof preCompressLog.hookCall.hook_input === 'string' - ? preCompressLog.hookCall.hook_input - : JSON.stringify(preCompressLog.hookCall.hook_input); - const hookInput = JSON.parse(hookInputStr) as Record; - - expect(hookInput['trigger']).toBe('auto'); - expect(preCompressLog.hookCall.stdout).toContain( - 'PreCompress hook executed for automatic compression', - ); - } - }); - }); - - describe('SessionEnd on Exit', () => { - it('should fire SessionEnd hook on graceful exit in non-interactive mode', async () => { - rig.setup('should fire SessionEnd hook on graceful exit', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.session-startup.responses', - ), - }); - - // Create script file for hook - const scriptPath = rig.createScript( - 'session_end_exit.cjs', - "console.log(JSON.stringify({decision: 'allow', systemMessage: 'SessionEnd hook executed on exit'}));", - ); - - const sessionEndCommand = `node "${scriptPath}"`; - - rig.setup('should fire SessionEnd hook on graceful exit', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - SessionEnd: [ - { - matcher: 'exit', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(sessionEndCommand), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - // Run in non-interactive mode with a simple prompt - await rig.run({ args: 'Hello' }); - - // The process should exit gracefully, firing the SessionEnd hook - // Wait for telemetry to be written to disk - await rig.waitForTelemetryReady(); - - // Poll for the hook log to appear - const isCI = process.env['CI'] === 'true'; - const pollTimeout = isCI ? 30000 : 10000; - const pollResult = await poll( - () => { - const hookLogs = rig.readHookLogs(); - return hookLogs.some( - (log) => log.hookCall.hook_event_name === 'SessionEnd', - ); - }, - pollTimeout, - 200, - ); - - if (!pollResult) { - const hookLogs = rig.readHookLogs(); - console.error( - 'Polling timeout: Expected SessionEnd hook, got:', - JSON.stringify(hookLogs, null, 2), - ); - } - - expect(pollResult).toBe(true); - - const hookLogs = rig.readHookLogs(); - const sessionEndLog = hookLogs.find( - (log) => log.hookCall.hook_event_name === 'SessionEnd', - ); - - expect(sessionEndLog).toBeDefined(); - if (sessionEndLog) { - expect(sessionEndLog.hookCall.hook_name).toBe( - normalizePath(sessionEndCommand), - ); - expect(sessionEndLog.hookCall.exit_code).toBe(0); - expect(sessionEndLog.hookCall.hook_input).toBeDefined(); - - const hookInputStr = - typeof sessionEndLog.hookCall.hook_input === 'string' - ? sessionEndLog.hookCall.hook_input - : JSON.stringify(sessionEndLog.hookCall.hook_input); - const hookInput = JSON.parse(hookInputStr) as Record; - - expect(hookInput['reason']).toBe('exit'); - expect(sessionEndLog.hookCall.stdout).toContain( - 'SessionEnd hook executed', - ); - } - }); - }); - - describe('Hook Disabling', () => { - it('should not execute hooks disabled in settings file', async () => { - const enabledMsg = 'EXECUTION_ALLOWED_BY_HOOK_A'; - const disabledMsg = 'EXECUTION_BLOCKED_BY_HOOK_B'; - - const enabledJson = JSON.stringify({ - decision: 'allow', - systemMessage: enabledMsg, - }); - const disabledJson = JSON.stringify({ - decision: 'block', - reason: disabledMsg, - }); - - const enabledScript = `console.log(JSON.stringify(${enabledJson}));`; - const disabledScript = `console.log(JSON.stringify(${disabledJson}));`; - const enabledFilename = 'enabled_hook.js'; - const disabledFilename = 'disabled_hook.js'; - const enabledCmd = `node ${enabledFilename}`; - const disabledCmd = `node ${disabledFilename}`; - - // 3. Final setup with full settings - rig.setup('Hook Disabling Settings', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.disabled-via-settings.responses', - ), - settings: { - hooksConfig: { - enabled: true, - disabled: ['hook-b'], - }, - hooks: { - BeforeTool: [ - { - hooks: [ - { - type: 'command', - name: 'hook-a', - command: enabledCmd, - timeout: 60000, - }, - { - type: 'command', - name: 'hook-b', - command: disabledCmd, - timeout: 60000, - }, - ], - }, - ], - }, - }, - }); - - rig.createScript(enabledFilename, enabledScript); - rig.createScript(disabledFilename, disabledScript); - - await rig.run({ - args: 'Create a file called disabled-test.txt with content "test"', - }); - - // Tool should execute (enabled hook allows it) - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // Check hook telemetry - only enabled hook should have executed - const hookLogs = rig.readHookLogs(); - const enabledHookLog = hookLogs.find((log) => - JSON.stringify(log.hookCall.hook_output).includes(enabledMsg), - ); - const disabledHookLog = hookLogs.find((log) => - JSON.stringify(log.hookCall.hook_output).includes(disabledMsg), - ); - - expect(enabledHookLog).toBeDefined(); - expect(disabledHookLog).toBeUndefined(); - }); - - it('should respect disabled hooks across multiple operations', async () => { - const activeMsg = 'MULTIPLE_OPS_ENABLED_HOOK'; - const disabledMsg = 'MULTIPLE_OPS_DISABLED_HOOK'; - - const activeJson = JSON.stringify({ - decision: 'allow', - systemMessage: activeMsg, - }); - const disabledJson = JSON.stringify({ - decision: 'block', - reason: disabledMsg, - }); - - const activeScript = `console.log(JSON.stringify(${activeJson}));`; - const disabledScript = `console.log(JSON.stringify(${disabledJson}));`; - const activeFilename = 'active_hook.js'; - const disabledFilename = 'disabled_hook.js'; - const activeCmd = `node ${activeFilename}`; - const disabledCmd = `node ${disabledFilename}`; - - // 3. Final setup with full settings - rig.setup('Hook Disabling Multiple Ops', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.disabled-via-command.responses', - ), - settings: { - hooksConfig: { - enabled: true, - disabled: ['multi-hook-disabled'], - }, - hooks: { - BeforeTool: [ - { - hooks: [ - { - type: 'command', - name: 'multi-hook-active', - command: activeCmd, - timeout: 60000, - }, - { - type: 'command', - name: 'multi-hook-disabled', - command: disabledCmd, - timeout: 60000, - }, - ], - }, - ], - }, - }, - }); - - rig.createScript(activeFilename, activeScript); - rig.createScript(disabledFilename, disabledScript); - - // First run - only active hook should execute - await rig.run({ - args: 'Create a file called first-run.txt with "test1"', - }); - - // Tool should execute (active hook allows it) - const foundWriteFile1 = await rig.waitForToolCall('write_file'); - expect(foundWriteFile1).toBeTruthy(); - - // Check hook telemetry - only active hook should have executed - const hookLogs1 = rig.readHookLogs(); - const activeHookLog1 = hookLogs1.find((log) => - JSON.stringify(log.hookCall.hook_output).includes(activeMsg), - ); - const disabledHookLog1 = hookLogs1.find((log) => - JSON.stringify(log.hookCall.hook_output).includes(disabledMsg), - ); - - expect(activeHookLog1).toBeDefined(); - expect(disabledHookLog1).toBeUndefined(); - - // Second run - verify disabled hook stays disabled - await rig.run({ - args: 'Create a file called second-run.txt with "test2"', - }); - - const foundWriteFile2 = await rig.waitForToolCall('write_file'); - expect(foundWriteFile2).toBeTruthy(); - - // Verify disabled hook still hasn't executed - const hookLogs2 = rig.readHookLogs(); - const disabledHookLog2 = hookLogs2.find((log) => - JSON.stringify(log.hookCall.hook_output).includes(disabledMsg), - ); - expect(disabledHookLog2).toBeUndefined(); - }); - }); - - describe('BeforeTool Hooks - Input Override', () => { - it('should override tool input parameters via BeforeTool hook', async () => { - // 1. First setup to get the test directory and prepare the hook script - rig.setup('should override tool input parameters via BeforeTool hook'); - - // Create a hook script that overrides the tool input - const hookOutput = { - decision: 'allow', - hookSpecificOutput: { - hookEventName: 'BeforeTool', - tool_input: { - file_path: 'modified.txt', - content: 'modified content', - }, - }, - }; - - const hookScript = `process.stdout.write(JSON.stringify(${JSON.stringify( - hookOutput, - )}));`; - - const scriptPath = rig.createScript('input_override_hook.js', hookScript); - - // 2. Full setup with settings and fake responses - rig.setup('should override tool input parameters via BeforeTool hook', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.input-modification.responses', - ), - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - matcher: 'write_file', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - // Run the agent. The fake response will attempt to call write_file with - // file_path="original.txt" and content="original content" - await rig.run({ - args: 'Create a file called original.txt with content "original content"', - }); - - // 1. Verify that 'modified.txt' was created with 'modified content' (Override successful) - const modifiedContent = rig.readFile('modified.txt'); - expect(modifiedContent).toBe('modified content'); - - // 2. Verify that 'original.txt' was NOT created (Override replaced original) - let originalExists = false; - try { - rig.readFile('original.txt'); - originalExists = true; - } catch { - originalExists = false; - } - expect(originalExists).toBe(false); - - // 3. Verify hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - - const hookLogs = rig.readHookLogs(); - expect(hookLogs.length).toBe(1); - expect(hookLogs[0].hookCall.hook_name).toContain( - 'input_override_hook.js', - ); - - // 4. Verify that the agent didn't try to work-around the hook input change - const toolLogs = rig.readToolLogs(); - expect(toolLogs.length).toBe(1); - expect(toolLogs[0].toolRequest.name).toBe('write_file'); - expect(JSON.parse(toolLogs[0].toolRequest.args).file_path).toBe( - 'modified.txt', - ); - }); - }); - - describe('BeforeTool Hooks - Stop Execution', () => { - it('should stop agent execution via BeforeTool hook', async () => { - // Create a hook script that stops execution - const hookOutput = { - continue: false, - reason: 'Emergency Stop triggered by hook', - hookSpecificOutput: { - hookEventName: 'BeforeTool', - }, - }; - - const hookScript = `console.log(JSON.stringify(${JSON.stringify( - hookOutput, - )}));`; - - rig.setup('should stop agent execution via BeforeTool hook'); - const scriptPath = rig.createScript( - 'before_tool_stop_hook.js', - hookScript, - ); - - rig.setup('should stop agent execution via BeforeTool hook', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.before-tool-stop.responses', - ), - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - matcher: 'write_file', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - const result = await rig.run({ - args: 'Use write_file to create test.txt', - }); - - // The hook should have stopped execution message (returned from tool) - expect(result).toContain( - 'Agent execution stopped by hook: Emergency Stop triggered by hook', - ); - - // Tool should NOT be called successfully (it was blocked/stopped) - const toolLogs = rig.readToolLogs(); - const writeFileCalls = toolLogs.filter( - (t) => - t.toolRequest.name === 'write_file' && t.toolRequest.success === true, - ); - expect(writeFileCalls).toHaveLength(0); - }); - }); - - describe('Hooks "ask" Decision Integration', () => { - it( - 'should force confirmation prompt when hook returns "ask" decision even in YOLO mode', - { timeout: 60000 }, - async () => { - const testName = - 'should force confirmation prompt when hook returns "ask" decision even in YOLO mode'; - - // 1. Setup hook script that returns 'ask' decision - const hookOutput = { - decision: 'ask', - systemMessage: 'Confirmation forced by security hook', - hookSpecificOutput: { - hookEventName: 'BeforeTool', - }, - }; - - const hookScript = `console.log(JSON.stringify(${JSON.stringify( - hookOutput, - )}));`; - - // Create script path predictably - const scriptPath = join(os.tmpdir(), 'gemini-cli-tests-ask-hook.js'); - writeFileSync(scriptPath, hookScript); - - // 2. Setup rig with YOLO mode enabled but with the 'ask' hook - rig.setup(testName, { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.allow-tool.responses', - ), - settings: { - debugMode: true, - tools: { - approval: 'yolo', - }, - general: { - enableAutoUpdateNotification: false, - }, - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - matcher: 'write_file', - hooks: [ - { - type: 'command', - command: `node "${scriptPath}"`, + command: normalizePath(`node "${scriptPath}"`), timeout: 5000, }, ], @@ -2307,59 +2164,52 @@ console.log(JSON.stringify({ }, }); - // Bypass terminal setup prompt and other startup banners - const stateDir = join(rig.homeDir!, '.gemini'); - if (!existsSync(stateDir)) mkdirSync(stateDir, { recursive: true }); - writeFileSync( - join(stateDir, 'state.json'), - JSON.stringify({ - terminalSetupPromptShown: true, - hasSeenScreenReaderNudge: true, - tipsShown: 100, - }), + // Run the agent. The fake response will attempt to call write_file with + // file_path="original.txt" and content="original content" + await rig.run({ + args: 'Create a file called original.txt with content "original content"', + }); + + // 1. Verify that 'modified.txt' was created with 'modified content' (Override successful) + const modifiedContent = rig.readFile('modified.txt'); + expect(modifiedContent).toBe('modified content'); + + // 2. Verify that 'original.txt' was NOT created (Override replaced original) + let originalExists = false; + try { + rig.readFile('original.txt'); + originalExists = true; + } catch { + originalExists = false; + } + expect(originalExists).toBe(false); + + // 3. Verify hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); + + const hookLogs = rig.readHookLogs(); + expect(hookLogs.length).toBe(1); + expect(hookLogs[0].hookCall.hook_name).toContain( + 'input_override_hook.js', ); - // 3. Run interactive and verify prompt appears despite YOLO mode - const run = await rig.runInteractive(); + // 4. Verify that the agent didn't try to work-around the hook input change + const toolLogs = rig.readToolLogs(); + expect(toolLogs.length).toBe(1); + expect(toolLogs[0].toolRequest.name).toBe('write_file'); + expect(JSON.parse(toolLogs[0].toolRequest.args).file_path).toBe( + 'modified.txt', + ); + }); + }); - // Wait for prompt to appear - await run.expectText('Type your message', 30000); - - // Send prompt that will trigger write_file - await run.type('Create a file called ask-test.txt with content "test"'); - await run.type('\r'); - - // Wait for the FORCED confirmation prompt to appear - // It should contain the system message from the hook - await run.expectText('Confirmation forced by security hook', 30000); - await run.expectText('Allow', 5000); - - // 4. Approve the permission - await run.type('y'); - await run.type('\r'); - - // Wait for command to execute - await run.expectText('approved.txt', 30000); - - // Should find the tool call - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // File should be created - const fileContent = rig.readFile('approved.txt'); - expect(fileContent).toBe('Approved content'); - }, - ); - - it( - 'should allow cancelling when hook forces "ask" decision', - { timeout: 60000 }, - async () => { - const testName = - 'should allow cancelling when hook forces "ask" decision'; + describe('BeforeTool Hooks - Stop Execution', () => { + it('should stop agent execution via BeforeTool hook', async () => { + // Create a hook script that stops execution const hookOutput = { - decision: 'ask', - systemMessage: 'Confirmation forced for cancellation test', + continue: false, + reason: 'Emergency Stop triggered by hook', hookSpecificOutput: { hookEventName: 'BeforeTool', }, @@ -2369,25 +2219,18 @@ console.log(JSON.stringify({ hookOutput, )}));`; - const scriptPath = join( - os.tmpdir(), - 'gemini-cli-tests-ask-cancel-hook.js', + rig.setup('should stop agent execution via BeforeTool hook'); + const scriptPath = rig.createScript( + 'before_tool_stop_hook.js', + hookScript, ); - writeFileSync(scriptPath, hookScript); - rig.setup(testName, { + rig.setup('should stop agent execution via BeforeTool hook', { fakeResponsesPath: join( import.meta.dirname, - 'hooks-system.allow-tool.responses', + 'hooks-system.before-tool-stop.responses', ), settings: { - debugMode: true, - tools: { - approval: 'yolo', - }, - general: { - enableAutoUpdateNotification: false, - }, hooksConfig: { enabled: true, }, @@ -2395,10 +2238,11 @@ console.log(JSON.stringify({ BeforeTool: [ { matcher: 'write_file', + sequential: true, hooks: [ { type: 'command', - command: `node "${scriptPath}"`, + command: normalizePath(`node "${scriptPath}"`), timeout: 5000, }, ], @@ -2408,41 +2252,16 @@ console.log(JSON.stringify({ }, }); - // Bypass terminal setup prompt and other startup banners - const stateDir = join(rig.homeDir!, '.gemini'); - if (!existsSync(stateDir)) mkdirSync(stateDir, { recursive: true }); - writeFileSync( - join(stateDir, 'state.json'), - JSON.stringify({ - terminalSetupPromptShown: true, - hasSeenScreenReaderNudge: true, - tipsShown: 100, - }), + const result = await rig.run({ + args: 'Use write_file to create test.txt', + }); + + // The hook should have stopped execution message (returned from tool) + expect(result).toContain( + 'Agent execution stopped by hook: Emergency Stop triggered by hook', ); - const run = await rig.runInteractive(); - - // Wait for prompt to appear - await run.expectText('Type your message', 30000); - - await run.type( - 'Create a file called cancel-test.txt with content "test"', - ); - await run.type('\r'); - - await run.expectText( - 'Confirmation forced for cancellation test', - 30000, - ); - - // 4. Deny the permission using option 4 - await run.type('4'); - await run.type('\r'); - - // Wait for cancellation message - await run.expectText('Cancelled', 15000); - - // Tool should NOT be called successfully + // Tool should NOT be called successfully (it was blocked/stopped) const toolLogs = rig.readToolLogs(); const writeFileCalls = toolLogs.filter( (t) => @@ -2450,7 +2269,215 @@ console.log(JSON.stringify({ t.toolRequest.success === true, ); expect(writeFileCalls).toHaveLength(0); - }, - ); - }); -}); + }); + }); + + describe('Hooks "ask" Decision Integration', () => { + it( + 'should force confirmation prompt when hook returns "ask" decision even in YOLO mode', + { timeout: 60000 }, + async () => { + const testName = + 'should force confirmation prompt when hook returns "ask" decision even in YOLO mode'; + + // 1. Setup hook script that returns 'ask' decision + const hookOutput = { + decision: 'ask', + systemMessage: 'Confirmation forced by security hook', + hookSpecificOutput: { + hookEventName: 'BeforeTool', + }, + }; + + const hookScript = `console.log(JSON.stringify(${JSON.stringify( + hookOutput, + )}));`; + + // Create script path predictably + const scriptPath = join(os.tmpdir(), 'gemini-cli-tests-ask-hook.js'); + writeFileSync(scriptPath, hookScript); + + // 2. Setup rig with YOLO mode enabled but with the 'ask' hook + rig.setup(testName, { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.allow-tool.responses', + ), + settings: { + debugMode: true, + tools: { + approval: 'yolo', + }, + general: { + enableAutoUpdateNotification: false, + }, + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ + { + matcher: 'write_file', + hooks: [ + { + type: 'command', + command: `node "${scriptPath}"`, + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + // Bypass terminal setup prompt and other startup banners + const stateDir = join(rig.homeDir!, '.gemini'); + if (!existsSync(stateDir)) mkdirSync(stateDir, { recursive: true }); + writeFileSync( + join(stateDir, 'state.json'), + JSON.stringify({ + terminalSetupPromptShown: true, + hasSeenScreenReaderNudge: true, + tipsShown: 100, + }), + ); + + // 3. Run interactive and verify prompt appears despite YOLO mode + const run = await rig.runInteractive(); + + // Wait for prompt to appear + await run.expectText('Type your message', 30000); + + // Send prompt that will trigger write_file + await run.type( + 'Create a file called ask-test.txt with content "test"', + ); + await run.type('\r'); + + // Wait for the FORCED confirmation prompt to appear + // It should contain the system message from the hook + await run.expectText('Confirmation forced by security hook', 30000); + await run.expectText('Allow', 5000); + + // 4. Approve the permission + await run.type('y'); + await run.type('\r'); + + // Wait for command to execute + await run.expectText('approved.txt', 30000); + + // Should find the tool call + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // File should be created + const fileContent = rig.readFile('approved.txt'); + expect(fileContent).toBe('Approved content'); + }, + ); + + it( + 'should allow cancelling when hook forces "ask" decision', + { timeout: 60000 }, + async () => { + const testName = + 'should allow cancelling when hook forces "ask" decision'; + const hookOutput = { + decision: 'ask', + systemMessage: 'Confirmation forced for cancellation test', + hookSpecificOutput: { + hookEventName: 'BeforeTool', + }, + }; + + const hookScript = `console.log(JSON.stringify(${JSON.stringify( + hookOutput, + )}));`; + + const scriptPath = join( + os.tmpdir(), + 'gemini-cli-tests-ask-cancel-hook.js', + ); + writeFileSync(scriptPath, hookScript); + + rig.setup(testName, { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.allow-tool.responses', + ), + settings: { + debugMode: true, + tools: { + approval: 'yolo', + }, + general: { + enableAutoUpdateNotification: false, + }, + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ + { + matcher: 'write_file', + hooks: [ + { + type: 'command', + command: `node "${scriptPath}"`, + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + // Bypass terminal setup prompt and other startup banners + const stateDir = join(rig.homeDir!, '.gemini'); + if (!existsSync(stateDir)) mkdirSync(stateDir, { recursive: true }); + writeFileSync( + join(stateDir, 'state.json'), + JSON.stringify({ + terminalSetupPromptShown: true, + hasSeenScreenReaderNudge: true, + tipsShown: 100, + }), + ); + + const run = await rig.runInteractive(); + + // Wait for prompt to appear + await run.expectText('Type your message', 30000); + + await run.type( + 'Create a file called cancel-test.txt with content "test"', + ); + await run.type('\r'); + + await run.expectText( + 'Confirmation forced for cancellation test', + 30000, + ); + + // 4. Deny the permission using option 4 + await run.type('4'); + await run.type('\r'); + + // Wait for cancellation message + await run.expectText('Cancelled', 15000); + + // Tool should NOT be called successfully + const toolLogs = rig.readToolLogs(); + const writeFileCalls = toolLogs.filter( + (t) => + t.toolRequest.name === 'write_file' && + t.toolRequest.success === true, + ); + expect(writeFileCalls).toHaveLength(0); + }, + ); + }); + }, +); diff --git a/integration-tests/policy-headless.test.ts b/integration-tests/policy-headless.test.ts index b6cc14f61c..3a8fb5238a 100644 --- a/integration-tests/policy-headless.test.ts +++ b/integration-tests/policy-headless.test.ts @@ -183,11 +183,17 @@ describe('Policy Engine Headless Mode', () => { responsesFile: 'policy-headless-shell-denied.responses', promptCommand: ECHO_PROMPT, policyContent: ` + [[rule]] + toolName = "run_shell_command" + commandPrefix = "echo" + decision = "deny" + priority = 100 + [[rule]] toolName = "run_shell_command" commandPrefix = "node" decision = "allow" - priority = 100 + priority = 90 `, expectAllowed: false, expectedDenialString: 'Tool execution denied by policy', diff --git a/integration-tests/run_shell_command.test.ts b/integration-tests/run_shell_command.test.ts index 8ae72fed84..02fda5be45 100644 --- a/integration-tests/run_shell_command.test.ts +++ b/integration-tests/run_shell_command.test.ts @@ -58,12 +58,18 @@ function getDisallowedFileReadCommand(testFile: string): { const quotedPath = `"${testFile}"`; switch (shell) { case 'powershell': - return { command: `Get-Content ${quotedPath}`, tool: 'Get-Content' }; + return { + command: `powershell -Command "Get-Content ${quotedPath}"`, + tool: 'powershell', + }; case 'cmd': - return { command: `type ${quotedPath}`, tool: 'type' }; + return { command: `cmd /c type ${quotedPath}`, tool: 'cmd' }; case 'bash': default: - return { command: `cat ${quotedPath}`, tool: 'cat' }; + return { + command: `node -e "console.log(require('fs').readFileSync('${testFile}', 'utf8'))"`, + tool: 'node', + }; } } diff --git a/integration-tests/symlink-install.test.ts b/integration-tests/symlink-install.test.ts index be4a5ac398..c98db98029 100644 --- a/integration-tests/symlink-install.test.ts +++ b/integration-tests/symlink-install.test.ts @@ -5,7 +5,7 @@ */ import { describe, expect, it, beforeEach, afterEach } from 'vitest'; -import { TestRig, InteractiveRun } from './test-helper.js'; +import { TestRig, InteractiveRun, skipFlaky } from './test-helper.js'; import * as fs from 'node:fs'; import * as os from 'node:os'; import { @@ -33,104 +33,107 @@ const otherExtension = `{ "version": "6.6.6" }`; -describe('extension symlink install spoofing protection', () => { - let rig: TestRig; +describe.skipIf(skipFlaky)( + 'extension symlink install spoofing protection', + () => { + let rig: TestRig; - beforeEach(() => { - rig = new TestRig(); - }); - - afterEach(async () => await rig.cleanup()); - - it('canonicalizes the trust path and prevents symlink spoofing', async () => { - // Enable folder trust for this test - rig.setup('symlink spoofing test', { - settings: { - security: { - folderTrust: { - enabled: true, - }, - }, - }, + beforeEach(() => { + rig = new TestRig(); }); - const realExtPath = join(rig.testDir!, 'real-extension'); - mkdirSync(realExtPath); - writeFileSync(join(realExtPath, 'gemini-extension.json'), extension); + afterEach(async () => await rig.cleanup()); - const maliciousExtPath = join( - os.tmpdir(), - `malicious-extension-${Date.now()}`, - ); - mkdirSync(maliciousExtPath); - writeFileSync( - join(maliciousExtPath, 'gemini-extension.json'), - otherExtension, - ); - - const symlinkPath = join(rig.testDir!, 'symlink-extension'); - symlinkSync(realExtPath, symlinkPath); - - // Function to run a command with a PTY to avoid headless mode - const runPty = (args: string[]) => { - const ptyProcess = pty.spawn(process.execPath, [BUNDLE_PATH, ...args], { - name: 'xterm-color', - cols: 80, - rows: 80, - cwd: rig.testDir!, - env: { - ...process.env, - GEMINI_CLI_HOME: rig.homeDir!, - GEMINI_CLI_INTEGRATION_TEST: 'true', - GEMINI_PTY_INFO: 'node-pty', + it('canonicalizes the trust path and prevents symlink spoofing', async () => { + // Enable folder trust for this test + rig.setup('symlink spoofing test', { + settings: { + security: { + folderTrust: { + enabled: true, + }, + }, }, }); - return new InteractiveRun(ptyProcess); - }; - // 1. Install via symlink, trust it - const run1 = runPty(['extensions', 'install', symlinkPath]); - await run1.expectText('Do you want to trust this folder', 30000); - await run1.type('y\r'); - await run1.expectText('trust this workspace', 30000); - await run1.type('y\r'); - await run1.expectText('Do you want to continue', 30000); - await run1.type('y\r'); - await run1.expectText('installed successfully', 30000); - await run1.kill(); + const realExtPath = join(rig.testDir!, 'real-extension'); + mkdirSync(realExtPath); + writeFileSync(join(realExtPath, 'gemini-extension.json'), extension); - // 2. Verify trustedFolders.json contains the REAL path, not the symlink path - const trustedFoldersPath = join( - rig.homeDir!, - GEMINI_DIR, - 'trustedFolders.json', - ); - // Wait for file to be written - let attempts = 0; - while (!fs.existsSync(trustedFoldersPath) && attempts < 50) { - await new Promise((resolve) => setTimeout(resolve, 100)); - attempts++; - } + const maliciousExtPath = join( + os.tmpdir(), + `malicious-extension-${Date.now()}`, + ); + mkdirSync(maliciousExtPath); + writeFileSync( + join(maliciousExtPath, 'gemini-extension.json'), + otherExtension, + ); - const trustedFolders = JSON.parse( - readFileSync(trustedFoldersPath, 'utf-8'), - ); - const trustedPaths = Object.keys(trustedFolders); - const canonicalRealExtPath = fs.realpathSync(realExtPath); + const symlinkPath = join(rig.testDir!, 'symlink-extension'); + symlinkSync(realExtPath, symlinkPath); - expect(trustedPaths).toContain(canonicalRealExtPath); - expect(trustedPaths).not.toContain(symlinkPath); + // Function to run a command with a PTY to avoid headless mode + const runPty = (args: string[]) => { + const ptyProcess = pty.spawn(process.execPath, [BUNDLE_PATH, ...args], { + name: 'xterm-color', + cols: 80, + rows: 80, + cwd: rig.testDir!, + env: { + ...process.env, + GEMINI_CLI_HOME: rig.homeDir!, + GEMINI_CLI_INTEGRATION_TEST: 'true', + GEMINI_PTY_INFO: 'node-pty', + }, + }); + return new InteractiveRun(ptyProcess); + }; - // 3. Swap the symlink to point to the malicious extension - unlinkSync(symlinkPath); - symlinkSync(maliciousExtPath, symlinkPath); + // 1. Install via symlink, trust it + const run1 = runPty(['extensions', 'install', symlinkPath]); + await run1.expectText('Do you want to trust this folder', 30000); + await run1.type('y\r'); + await run1.expectText('trust this workspace', 30000); + await run1.type('y\r'); + await run1.expectText('Do you want to continue', 30000); + await run1.type('y\r'); + await run1.expectText('installed successfully', 30000); + await run1.kill(); - // 4. Try to install again via the same symlink path. - // It should NOT be trusted because the real path changed. - const run2 = runPty(['extensions', 'install', symlinkPath]); - await run2.expectText('Do you want to trust this folder', 30000); - await run2.type('n\r'); - await run2.expectText('Installation aborted', 30000); - await run2.kill(); - }, 60000); -}); + // 2. Verify trustedFolders.json contains the REAL path, not the symlink path + const trustedFoldersPath = join( + rig.homeDir!, + GEMINI_DIR, + 'trustedFolders.json', + ); + // Wait for file to be written + let attempts = 0; + while (!fs.existsSync(trustedFoldersPath) && attempts < 50) { + await new Promise((resolve) => setTimeout(resolve, 100)); + attempts++; + } + + const trustedFolders = JSON.parse( + readFileSync(trustedFoldersPath, 'utf-8'), + ); + const trustedPaths = Object.keys(trustedFolders); + const canonicalRealExtPath = fs.realpathSync(realExtPath); + + expect(trustedPaths).toContain(canonicalRealExtPath); + expect(trustedPaths).not.toContain(symlinkPath); + + // 3. Swap the symlink to point to the malicious extension + unlinkSync(symlinkPath); + symlinkSync(maliciousExtPath, symlinkPath); + + // 4. Try to install again via the same symlink path. + // It should NOT be trusted because the real path changed. + const run2 = runPty(['extensions', 'install', symlinkPath]); + await run2.expectText('Do you want to trust this folder', 30000); + await run2.type('n\r'); + await run2.expectText('Installation aborted', 30000); + await run2.kill(); + }, 60000); + }, +); diff --git a/integration-tests/test-helper.ts b/integration-tests/test-helper.ts index a4546a2cd3..5f205ae997 100644 --- a/integration-tests/test-helper.ts +++ b/integration-tests/test-helper.ts @@ -6,3 +6,5 @@ export * from '@google/gemini-cli-test-utils'; export { normalizePath } from '@google/gemini-cli-test-utils'; + +export const skipFlaky = !process.env['RUN_FLAKY_INTEGRATION']; diff --git a/integration-tests/test-mcp-support.responses b/integration-tests/test-mcp-support.responses new file mode 100644 index 0000000000..1db32fdc21 --- /dev/null +++ b/integration-tests/test-mcp-support.responses @@ -0,0 +1,2 @@ +{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"mcp_weather-server_get_weather","args":{"location":"London"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":10,"candidatesTokenCount":10,"totalTokenCount":20}}]} +{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"The weather in London is rainy."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":10,"candidatesTokenCount":10,"totalTokenCount":20}}]} diff --git a/integration-tests/test-mcp-support.test.ts b/integration-tests/test-mcp-support.test.ts new file mode 100644 index 0000000000..15266e6be9 --- /dev/null +++ b/integration-tests/test-mcp-support.test.ts @@ -0,0 +1,75 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import { + TestRig, + assertModelHasOutput, + TestMcpServerBuilder, +} from './test-helper.js'; +import { join, dirname } from 'node:path'; +import { fileURLToPath } from 'node:url'; +import fs from 'node:fs'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); + +describe('test-mcp-support', () => { + let rig: TestRig; + + beforeEach(() => { + rig = new TestRig(); + }); + + afterEach(async () => await rig.cleanup()); + + it('should discover and call a tool on the test server', async () => { + await rig.setup('test-mcp-test', { + settings: { + tools: { core: [] }, // disable core tools to force using MCP + model: { + name: 'gemini-3-flash-preview', + }, + }, + fakeResponsesPath: join(__dirname, 'test-mcp-support.responses'), + }); + + // Workaround for ProjectRegistry save issue + const userGeminiDir = join(rig.homeDir!, '.gemini'); + fs.writeFileSync(join(userGeminiDir, 'projects.json'), '{"projects":{}}'); + + const builder = new TestMcpServerBuilder('weather-server').addTool( + 'get_weather', + 'Get the weather for a location', + 'The weather in London is always rainy.', + { + type: 'object', + properties: { + location: { type: 'string' }, + }, + }, + ); + + rig.addTestMcpServer('weather-server', builder.build()); + + // Run the CLI asking for weather + const output = await rig.run({ + args: 'What is the weather in London? Answer with the raw tool response snippet.', + env: { GEMINI_API_KEY: 'dummy' }, + }); + + // Assert tool call + const foundToolCall = await rig.waitForToolCall( + 'mcp_weather-server_get_weather', + ); + expect( + foundToolCall, + 'Expected to find a get_weather tool call', + ).toBeTruthy(); + + assertModelHasOutput(output); + expect(output.toLowerCase()).toContain('rainy'); + }, 30000); +}); diff --git a/package-lock.json b/package-lock.json index b70dc1413b..b4fdfdb439 100644 --- a/package-lock.json +++ b/package-lock.json @@ -8696,9 +8696,9 @@ "license": "BSD-3-Clause" }, "node_modules/fast-xml-builder": { - "version": "1.1.2", - "resolved": "https://registry.npmjs.org/fast-xml-builder/-/fast-xml-builder-1.1.2.tgz", - "integrity": "sha512-NJAmiuVaJEjVa7TjLZKlYd7RqmzOC91EtPFXHvlTcqBVo50Qh7XV5IwvXi1c7NRz2Q/majGX9YLcwJtWgHjtkA==", + "version": "1.1.4", + "resolved": "https://registry.npmjs.org/fast-xml-builder/-/fast-xml-builder-1.1.4.tgz", + "integrity": "sha512-f2jhpN4Eccy0/Uz9csxh3Nu6q4ErKxf0XIsasomfOihuSUa3/xw6w8dnOtCDgEItQFJG8KyXPzQXzcODDrrbOg==", "funding": [ { "type": "github", @@ -8711,9 +8711,9 @@ } }, "node_modules/fast-xml-parser": { - "version": "5.5.3", - "resolved": "https://registry.npmjs.org/fast-xml-parser/-/fast-xml-parser-5.5.3.tgz", - "integrity": "sha512-Ymnuefk6VzAhT3SxLzVUw+nMio/wB1NGypHkgetwtXcK1JfryaHk4DWQFGVwQ9XgzyS5iRZ7C2ZGI4AMsdMZ6A==", + "version": "5.5.9", + "resolved": "https://registry.npmjs.org/fast-xml-parser/-/fast-xml-parser-5.5.9.tgz", + "integrity": "sha512-jldvxr1MC6rtiZKgrFnDSvT8xuH+eJqxqOBThUVjYrxssYTo1avZLGql5l0a0BAERR01CadYzZ83kVEkbyDg+g==", "funding": [ { "type": "github", @@ -8722,9 +8722,9 @@ ], "license": "MIT", "dependencies": { - "fast-xml-builder": "^1.1.2", - "path-expression-matcher": "^1.1.3", - "strnum": "^2.1.2" + "fast-xml-builder": "^1.1.4", + "path-expression-matcher": "^1.2.0", + "strnum": "^2.2.2" }, "bin": { "fxparser": "src/cli/cli.js" @@ -8900,9 +8900,9 @@ } }, "node_modules/flatted": { - "version": "3.3.3", - "resolved": "https://registry.npmjs.org/flatted/-/flatted-3.3.3.tgz", - "integrity": "sha512-GX+ysw4PBCz0PzosHDepZGANEuFCMLrnRTiEy9McGjmkCQYwRq4A/X786G/fjM/+OjsWSU1ZrY5qyARZmO/uwg==", + "version": "3.4.2", + "resolved": "https://registry.npmjs.org/flatted/-/flatted-3.4.2.tgz", + "integrity": "sha512-PjDse7RzhcPkIJwy5t7KPWQSZ9cAbzQXcafsetQoD7sOJRQlGikNbx7yZp2OotDnJyrDcbyRq3Ttb18iYOqkxA==", "dev": true, "license": "ISC" }, @@ -13200,9 +13200,9 @@ } }, "node_modules/path-expression-matcher": { - "version": "1.1.3", - "resolved": "https://registry.npmjs.org/path-expression-matcher/-/path-expression-matcher-1.1.3.tgz", - "integrity": "sha512-qdVgY8KXmVdJZRSS1JdEPOKPdTiEK/pi0RkcT2sw1RhXxohdujUlJFPuS1TSkevZ9vzd3ZlL7ULl1MHGTApKzQ==", + "version": "1.2.0", + "resolved": "https://registry.npmjs.org/path-expression-matcher/-/path-expression-matcher-1.2.0.tgz", + "integrity": "sha512-DwmPWeFn+tq7TiyJ2CxezCAirXjFxvaiD03npak3cRjlP9+OjTmSy1EpIrEbh+l6JgUundniloMLDQ/6VTdhLQ==", "funding": [ { "type": "github", @@ -15465,9 +15465,9 @@ } }, "node_modules/strnum": { - "version": "2.2.0", - "resolved": "https://registry.npmjs.org/strnum/-/strnum-2.2.0.tgz", - "integrity": "sha512-Y7Bj8XyJxnPAORMZj/xltsfo55uOiyHcU2tnAVzHUnSJR/KsEX+9RoDeXEnsXtl/CX4fAcrt64gZ13aGaWPeBg==", + "version": "2.2.2", + "resolved": "https://registry.npmjs.org/strnum/-/strnum-2.2.2.tgz", + "integrity": "sha512-DnR90I+jtXNSTXWdwrEy9FakW7UX+qUZg28gj5fk2vxxl7uS/3bpI4fjFYVmdK9etptYBPNkpahuQnEwhwECqA==", "funding": [ { "type": "github", @@ -16469,9 +16469,9 @@ "license": "MIT" }, "node_modules/undici": { - "version": "7.19.0", - "resolved": "https://registry.npmjs.org/undici/-/undici-7.19.0.tgz", - "integrity": "sha512-Heho1hJD81YChi+uS2RkSjcVO+EQLmLSyUlHyp7Y/wFbxQaGb4WXVKD073JytrjXJVkSZVzoE2MCSOKugFGtOQ==", + "version": "7.24.5", + "resolved": "https://registry.npmjs.org/undici/-/undici-7.24.5.tgz", + "integrity": "sha512-3IWdCpjgxp15CbJnsi/Y9TCDE7HWVN19j1hmzVhoAkY/+CJx449tVxT5wZc1Gwg8J+P0LWvzlBzxYRnHJ+1i7Q==", "license": "MIT", "engines": { "node": ">=20.18.1" diff --git a/package.json b/package.json index 414f9341ac..d66132c066 100644 --- a/package.json +++ b/package.json @@ -48,6 +48,7 @@ "test:all_evals": "cross-env RUN_EVALS=1 vitest run --config evals/vitest.config.ts", "test:e2e": "cross-env VERBOSE=true KEEP_OUTPUT=true npm run test:integration:sandbox:none", "test:integration:all": "npm run test:integration:sandbox:none && npm run test:integration:sandbox:docker && npm run test:integration:sandbox:podman", + "test:integration:flaky": "cross-env RUN_FLAKY_INTEGRATION=1 npm run test:integration:sandbox:none", "test:integration:sandbox:none": "cross-env GEMINI_SANDBOX=false vitest run --root ./integration-tests", "test:integration:sandbox:docker": "cross-env GEMINI_SANDBOX=docker npm run build:sandbox && cross-env GEMINI_SANDBOX=docker vitest run --root ./integration-tests", "test:integration:sandbox:podman": "cross-env GEMINI_SANDBOX=podman vitest run --root ./integration-tests", diff --git a/packages/a2a-server/src/config/config.test.ts b/packages/a2a-server/src/config/config.test.ts index cfe77311ea..007f1d5f06 100644 --- a/packages/a2a-server/src/config/config.test.ts +++ b/packages/a2a-server/src/config/config.test.ts @@ -29,6 +29,7 @@ vi.mock('@google/gemini-cli-core', async (importOriginal) => { await importOriginal(); return { ...actual, + PRIORITY_YOLO_ALLOW_ALL: 998, Config: vi.fn().mockImplementation((params) => { const mockConfig = { ...params, diff --git a/packages/a2a-server/src/config/config.ts b/packages/a2a-server/src/config/config.ts index 9474c4d9c5..c3561629b6 100644 --- a/packages/a2a-server/src/config/config.ts +++ b/packages/a2a-server/src/config/config.ts @@ -87,6 +87,7 @@ export async function loadConfig( approvalMode === ApprovalMode.YOLO ? [ { + toolName: '*', decision: PolicyDecision.ALLOW, priority: PRIORITY_YOLO_ALLOW_ALL, modes: [ApprovalMode.YOLO], diff --git a/packages/a2a-server/src/utils/testing_utils.ts b/packages/a2a-server/src/utils/testing_utils.ts index fd4d721732..8181f702f1 100644 --- a/packages/a2a-server/src/utils/testing_utils.ts +++ b/packages/a2a-server/src/utils/testing_utils.ts @@ -97,6 +97,7 @@ export function createMockConfig( getMcpClientManager: vi.fn().mockReturnValue({ getMcpServers: vi.fn().mockReturnValue({}), }), + getTelemetryLogPromptsEnabled: vi.fn().mockReturnValue(false), getGitService: vi.fn(), validatePathAccess: vi.fn().mockReturnValue(undefined), getShellExecutionConfig: vi.fn().mockReturnValue({ diff --git a/packages/cli/GEMINI.md b/packages/cli/GEMINI.md index e98ca81376..8bad8f0721 100644 --- a/packages/cli/GEMINI.md +++ b/packages/cli/GEMINI.md @@ -7,7 +7,10 @@ - **Shortcuts**: only define keyboard shortcuts in `packages/cli/src/ui/key/keyBindings.ts` - Do not implement any logic performing custom string measurement or string - truncation. Use Ink layout instead leveraging ResizeObserver as needed. + truncation. Use Ink layout instead leveraging ResizeObserver as needed. When + using `ResizeObserver`, prefer the `useCallback` ref pattern (as seen in + `MaxSizedBox.tsx`) to ensure size measurements are captured as soon as the + element is available, avoiding potential rendering timing issues. - Avoid prop drilling when at all possible. ## Testing diff --git a/packages/cli/index.ts b/packages/cli/index.ts index 5444fe1b74..fa6537d7bf 100644 --- a/packages/cli/index.ts +++ b/packages/cli/index.ts @@ -6,12 +6,19 @@ * SPDX-License-Identifier: Apache-2.0 */ -import { main } from './src/gemini.js'; -import { FatalError, writeToStderr } from '@google/gemini-cli-core'; -import { runExitCleanup } from './src/utils/cleanup.js'; +// --- Fast Path for Version --- +// We check for version flags at the very top to avoid loading any heavy dependencies. +// process.env.CLI_VERSION is defined during the build process by esbuild. +if (process.argv.includes('--version') || process.argv.includes('-v')) { + console.log(process.env['CLI_VERSION'] || 'unknown'); + process.exit(0); +} // --- Global Entry Point --- +let writeToStderrFn: (message: string) => void = (msg) => + process.stderr.write(msg); + // Suppress known race condition error in node-pty on Windows // Tracking bug: https://github.com/microsoft/node-pty/issues/827 process.on('uncaughtException', (error) => { @@ -28,13 +35,22 @@ process.on('uncaughtException', (error) => { // For other errors, we rely on the default behavior, but since we attached a listener, // we must manually replicate it. if (error instanceof Error) { - writeToStderr(error.stack + '\n'); + writeToStderrFn(error.stack + '\n'); } else { - writeToStderr(String(error) + '\n'); + writeToStderrFn(String(error) + '\n'); } process.exit(1); }); +const [{ main }, { FatalError, writeToStderr }, { runExitCleanup }] = + await Promise.all([ + import('./src/gemini.js'), + import('@google/gemini-cli-core'), + import('./src/utils/cleanup.js'), + ]); + +writeToStderrFn = writeToStderr; + main().catch(async (error) => { // Set a timeout to force exit if cleanup hangs const cleanupTimeout = setTimeout(() => { diff --git a/packages/cli/src/acp/acpClient.test.ts b/packages/cli/src/acp/acpClient.test.ts index 0f9c4a8e5b..3ae71e6ebb 100644 --- a/packages/cli/src/acp/acpClient.test.ts +++ b/packages/cli/src/acp/acpClient.test.ts @@ -1080,6 +1080,70 @@ describe('Session', () => { ); }); + it('should split getDisplayTitle and getExplanation for title and content in permission request', async () => { + const confirmationDetails = { + type: 'info', + onConfirm: vi.fn(), + }; + mockTool.build.mockReturnValue({ + getDescription: () => 'Original Description', + getDisplayTitle: () => 'Display Title Only', + getExplanation: () => 'A detailed explanation text', + toolLocations: () => [], + shouldConfirmExecute: vi.fn().mockResolvedValue(confirmationDetails), + execute: vi.fn().mockResolvedValue({ llmContent: 'Tool Result' }), + }); + + mockConnection.requestPermission.mockResolvedValue({ + outcome: { + outcome: 'selected', + optionId: ToolConfirmationOutcome.ProceedOnce, + }, + }); + + const stream1 = createMockStream([ + { + type: StreamEventType.CHUNK, + value: { + functionCalls: [{ name: 'test_tool', args: {} }], + }, + }, + ]); + const stream2 = createMockStream([ + { + type: StreamEventType.CHUNK, + value: { candidates: [] }, + }, + ]); + + mockChat.sendMessageStream + .mockResolvedValueOnce(stream1) + .mockResolvedValueOnce(stream2); + + await session.prompt({ + sessionId: 'session-1', + prompt: [{ type: 'text', text: 'Call tool' }], + }); + + expect(mockConnection.requestPermission).toHaveBeenCalledWith( + expect.objectContaining({ + toolCall: expect.objectContaining({ + title: 'Display Title Only', + content: [], + }), + }), + ); + + expect(mockConnection.sessionUpdate).toHaveBeenCalledWith( + expect.objectContaining({ + update: expect.objectContaining({ + sessionUpdate: 'agent_thought_chunk', + content: { type: 'text', text: 'A detailed explanation text' }, + }), + }), + ); + }); + it('should use filePath for ACP diff content in tool result', async () => { mockTool.build.mockReturnValue({ getDescription: () => 'Test Tool', diff --git a/packages/cli/src/acp/acpClient.ts b/packages/cli/src/acp/acpClient.ts index 5e3f3666b1..57903822e9 100644 --- a/packages/cli/src/acp/acpClient.ts +++ b/packages/cli/src/acp/acpClient.ts @@ -98,6 +98,12 @@ export async function runAcpClient( } export class GeminiAgent { + private static callIdCounter = 0; + + static generateCallId(name: string): string { + return `${name}-${Date.now()}-${++GeminiAgent.callIdCounter}`; + } + private sessions: Map = new Map(); private clientCapabilities: acp.ClientCapabilities | undefined; private apiKey: string | undefined; @@ -294,6 +300,7 @@ export class GeminiAgent { sessionId, this.clientCapabilities.fs, config.getFileSystemService(), + cwd, ); config.setFileSystemService(acpFileSystemService); } @@ -351,16 +358,6 @@ export class GeminiAgent { const { sessionData, sessionPath } = await sessionSelector.resolveSession(sessionId); - if (this.clientCapabilities?.fs) { - const acpFileSystemService = new AcpFileSystemService( - this.connection, - sessionId, - this.clientCapabilities.fs, - config.getFileSystemService(), - ); - config.setFileSystemService(acpFileSystemService); - } - const clientHistory = convertSessionToClientHistory(sessionData.messages); const geminiClient = config.getGeminiClient(); @@ -434,7 +431,19 @@ export class GeminiAgent { throw acp.RequestError.authRequired(); } - // 3. Now that we are authenticated, it is safe to initialize the config + // 3. Set the ACP FileSystemService (if supported) before config initialization + if (this.clientCapabilities?.fs) { + const acpFileSystemService = new AcpFileSystemService( + this.connection, + sessionId, + this.clientCapabilities.fs, + config.getFileSystemService(), + cwd, + ); + config.setFileSystemService(acpFileSystemService); + } + + // 4. Now that we are authenticated, it is safe to initialize the config // which starts the MCP servers and other heavy resources. await config.initialize(); startupProfiler.flush(config); @@ -897,7 +906,7 @@ export class Session { promptId: string, fc: FunctionCall, ): Promise { - const callId = fc.id ?? `${fc.name}-${Date.now()}`; + const callId = fc.id ?? GeminiAgent.generateCallId(fc.name || 'unknown'); const args = fc.args ?? {}; const startTime = Date.now(); @@ -947,6 +956,23 @@ export class Session { try { const invocation = tool.build(args); + const displayTitle = + typeof invocation.getDisplayTitle === 'function' + ? invocation.getDisplayTitle() + : invocation.getDescription(); + + const explanation = + typeof invocation.getExplanation === 'function' + ? invocation.getExplanation() + : ''; + + if (explanation) { + await this.sendUpdate({ + sessionUpdate: 'agent_thought_chunk', + content: { type: 'text', text: explanation }, + }); + } + const confirmationDetails = await invocation.shouldConfirmExecute(abortSignal); @@ -978,7 +1004,7 @@ export class Session { toolCall: { toolCallId: callId, status: 'pending', - title: invocation.getDescription(), + title: displayTitle, content, locations: invocation.toolLocations(), kind: toAcpToolKind(tool.kind), @@ -1014,12 +1040,14 @@ export class Session { } } } else { + const content: acp.ToolCallContent[] = []; + await this.sendUpdate({ sessionUpdate: 'tool_call', toolCallId: callId, status: 'in_progress', - title: invocation.getDescription(), - content: [], + title: displayTitle, + content, locations: invocation.toolLocations(), kind: toAcpToolKind(tool.kind), }); @@ -1028,12 +1056,14 @@ export class Session { const toolResult: ToolResult = await invocation.execute(abortSignal); const content = toToolCallContent(toolResult); + const updateContent: acp.ToolCallContent[] = content ? [content] : []; + await this.sendUpdate({ sessionUpdate: 'tool_call_update', toolCallId: callId, status: 'completed', - title: invocation.getDescription(), - content: content ? [content] : [], + title: displayTitle, + content: updateContent, locations: invocation.toolLocations(), kind: toAcpToolKind(tool.kind), }); @@ -1370,7 +1400,7 @@ export class Session { include: pathSpecsToRead, }; - const callId = `${readManyFilesTool.name}-${Date.now()}`; + const callId = GeminiAgent.generateCallId(readManyFilesTool.name); try { const invocation = readManyFilesTool.build(toolArgs); @@ -1598,6 +1628,7 @@ function toPermissionOptions( case 'info': case 'ask_user': case 'exit_plan_mode': + case 'sandbox_expansion': break; default: { const unreachable: never = confirmation; diff --git a/packages/cli/src/acp/fileSystemService.test.ts b/packages/cli/src/acp/fileSystemService.test.ts index 66624d5449..188aadbc09 100644 --- a/packages/cli/src/acp/fileSystemService.test.ts +++ b/packages/cli/src/acp/fileSystemService.test.ts @@ -4,10 +4,25 @@ * SPDX-License-Identifier: Apache-2.0 */ -import { describe, it, expect, vi, beforeEach, type Mocked } from 'vitest'; +import { + describe, + it, + expect, + vi, + beforeEach, + afterEach, + type Mocked, +} from 'vitest'; import { AcpFileSystemService } from './fileSystemService.js'; import type { AgentSideConnection } from '@agentclientprotocol/sdk'; import type { FileSystemService } from '@google/gemini-cli-core'; +import os from 'node:os'; + +vi.mock('node:os', () => ({ + default: { + homedir: vi.fn(), + }, +})); describe('AcpFileSystemService', () => { let mockConnection: Mocked; @@ -25,13 +40,19 @@ describe('AcpFileSystemService', () => { readTextFile: vi.fn(), writeTextFile: vi.fn(), }; + vi.mocked(os.homedir).mockReturnValue('/home/user'); + }); + + afterEach(() => { + vi.restoreAllMocks(); }); describe('readTextFile', () => { it.each([ { capability: true, - desc: 'connection if capability exists', + path: '/path/to/file', + desc: 'connection if capability exists and file is inside root', setup: () => { mockConnection.readTextFile.mockResolvedValue({ content: 'content' }); }, @@ -45,6 +66,7 @@ describe('AcpFileSystemService', () => { }, { capability: false, + path: '/path/to/file', desc: 'fallback if capability missing', setup: () => { mockFallback.readTextFile.mockResolvedValue('content'); @@ -56,19 +78,72 @@ describe('AcpFileSystemService', () => { expect(mockConnection.readTextFile).not.toHaveBeenCalled(); }, }, - ])('should use $desc', async ({ capability, setup, verify }) => { + { + capability: true, + path: '/outside/file', + desc: 'fallback if capability exists but file is outside root', + setup: () => { + mockFallback.readTextFile.mockResolvedValue('content'); + }, + verify: () => { + expect(mockFallback.readTextFile).toHaveBeenCalledWith( + '/outside/file', + ); + expect(mockConnection.readTextFile).not.toHaveBeenCalled(); + }, + }, + { + capability: true, + path: '/home/user/.gemini/tmp/file.md', + root: '/home/user', + desc: 'fallback if file is inside global gemini dir, even if root overlaps', + setup: () => { + mockFallback.readTextFile.mockResolvedValue('content'); + }, + verify: () => { + expect(mockFallback.readTextFile).toHaveBeenCalledWith( + '/home/user/.gemini/tmp/file.md', + ); + expect(mockConnection.readTextFile).not.toHaveBeenCalled(); + }, + }, + ])( + 'should use $desc', + async ({ capability, path, root, setup, verify }) => { + service = new AcpFileSystemService( + mockConnection, + 'session-1', + { readTextFile: capability, writeTextFile: true }, + mockFallback, + root || '/path/to', + ); + setup(); + + const result = await service.readTextFile(path); + + expect(result).toBe('content'); + verify(); + }, + ); + + it('should throw normalized ENOENT error when readTextFile encounters "Resource not found"', async () => { service = new AcpFileSystemService( mockConnection, 'session-1', - { readTextFile: capability, writeTextFile: true }, + { readTextFile: true, writeTextFile: true }, mockFallback, + '/path/to', + ); + mockConnection.readTextFile.mockRejectedValue( + new Error('Resource not found for document'), ); - setup(); - const result = await service.readTextFile('/path/to/file'); - - expect(result).toBe('content'); - verify(); + await expect( + service.readTextFile('/path/to/missing'), + ).rejects.toMatchObject({ + code: 'ENOENT', + message: 'Resource not found for document', + }); }); }); @@ -76,7 +151,8 @@ describe('AcpFileSystemService', () => { it.each([ { capability: true, - desc: 'connection if capability exists', + path: '/path/to/file', + desc: 'connection if capability exists and file is inside root', verify: () => { expect(mockConnection.writeTextFile).toHaveBeenCalledWith({ path: '/path/to/file', @@ -88,6 +164,7 @@ describe('AcpFileSystemService', () => { }, { capability: false, + path: '/path/to/file', desc: 'fallback if capability missing', verify: () => { expect(mockFallback.writeTextFile).toHaveBeenCalledWith( @@ -97,17 +174,63 @@ describe('AcpFileSystemService', () => { expect(mockConnection.writeTextFile).not.toHaveBeenCalled(); }, }, - ])('should use $desc', async ({ capability, verify }) => { + { + capability: true, + path: '/outside/file', + desc: 'fallback if capability exists but file is outside root', + verify: () => { + expect(mockFallback.writeTextFile).toHaveBeenCalledWith( + '/outside/file', + 'content', + ); + expect(mockConnection.writeTextFile).not.toHaveBeenCalled(); + }, + }, + { + capability: true, + path: '/home/user/.gemini/tmp/file.md', + root: '/home/user', + desc: 'fallback if file is inside global gemini dir, even if root overlaps', + verify: () => { + expect(mockFallback.writeTextFile).toHaveBeenCalledWith( + '/home/user/.gemini/tmp/file.md', + 'content', + ); + expect(mockConnection.writeTextFile).not.toHaveBeenCalled(); + }, + }, + ])('should use $desc', async ({ capability, path, root, verify }) => { service = new AcpFileSystemService( mockConnection, 'session-1', { writeTextFile: capability, readTextFile: true }, mockFallback, + root || '/path/to', ); - await service.writeTextFile('/path/to/file', 'content'); + await service.writeTextFile(path, 'content'); verify(); }); + + it('should throw normalized ENOENT error when writeTextFile encounters "Resource not found"', async () => { + service = new AcpFileSystemService( + mockConnection, + 'session-1', + { readTextFile: true, writeTextFile: true }, + mockFallback, + '/path/to', + ); + mockConnection.writeTextFile.mockRejectedValue( + new Error('Resource not found for directory'), + ); + + await expect( + service.writeTextFile('/path/to/missing', 'content'), + ).rejects.toMatchObject({ + code: 'ENOENT', + message: 'Resource not found for directory', + }); + }); }); }); diff --git a/packages/cli/src/acp/fileSystemService.ts b/packages/cli/src/acp/fileSystemService.ts index 02b9d68195..b020cd27f2 100644 --- a/packages/cli/src/acp/fileSystemService.ts +++ b/packages/cli/src/acp/fileSystemService.ts @@ -4,44 +4,82 @@ * SPDX-License-Identifier: Apache-2.0 */ -import type { FileSystemService } from '@google/gemini-cli-core'; +import { isWithinRoot, type FileSystemService } from '@google/gemini-cli-core'; import type * as acp from '@agentclientprotocol/sdk'; +import os from 'node:os'; +import path from 'node:path'; /** * ACP client-based implementation of FileSystemService */ export class AcpFileSystemService implements FileSystemService { + private readonly geminiDir = path.join(os.homedir(), '.gemini'); + constructor( private readonly connection: acp.AgentSideConnection, private readonly sessionId: string, private readonly capabilities: acp.FileSystemCapabilities, private readonly fallback: FileSystemService, + private readonly root: string, ) {} + private shouldUseFallback(filePath: string): boolean { + // Files inside the global CLI directory must always use the native file system, + // even if the user runs the CLI directly from their home directory (which + // would make the IDE's project root overlap with the global directory). + return ( + !isWithinRoot(filePath, this.root) || + isWithinRoot(filePath, this.geminiDir) + ); + } + + private normalizeFileSystemError(err: unknown): never { + const errorMessage = err instanceof Error ? err.message : String(err); + if ( + errorMessage.includes('Resource not found') || + errorMessage.includes('ENOENT') || + errorMessage.includes('does not exist') || + errorMessage.includes('No such file') + ) { + const newErr = new Error(errorMessage) as NodeJS.ErrnoException; + newErr.code = 'ENOENT'; + throw newErr; + } + throw err; + } + async readTextFile(filePath: string): Promise { - if (!this.capabilities.readTextFile) { + if (!this.capabilities.readTextFile || this.shouldUseFallback(filePath)) { return this.fallback.readTextFile(filePath); } - // eslint-disable-next-line @typescript-eslint/no-unsafe-assignment - const response = await this.connection.readTextFile({ - path: filePath, - sessionId: this.sessionId, - }); + try { + // eslint-disable-next-line @typescript-eslint/no-unsafe-assignment + const response = await this.connection.readTextFile({ + path: filePath, + sessionId: this.sessionId, + }); - // eslint-disable-next-line @typescript-eslint/no-unsafe-return - return response.content; + // eslint-disable-next-line @typescript-eslint/no-unsafe-return + return response.content; + } catch (err: unknown) { + this.normalizeFileSystemError(err); + } } async writeTextFile(filePath: string, content: string): Promise { - if (!this.capabilities.writeTextFile) { + if (!this.capabilities.writeTextFile || this.shouldUseFallback(filePath)) { return this.fallback.writeTextFile(filePath, content); } - await this.connection.writeTextFile({ - path: filePath, - content, - sessionId: this.sessionId, - }); + try { + await this.connection.writeTextFile({ + path: filePath, + content, + sessionId: this.sessionId, + }); + } catch (err: unknown) { + this.normalizeFileSystemError(err); + } } } diff --git a/packages/cli/src/commands/extensions/examples/policies/policies/policies.toml b/packages/cli/src/commands/extensions/examples/policies/policies/policies.toml index d89d5e5737..225627c59b 100644 --- a/packages/cli/src/commands/extensions/examples/policies/policies/policies.toml +++ b/packages/cli/src/commands/extensions/examples/policies/policies/policies.toml @@ -16,7 +16,7 @@ toolName = "grep_search" argsPattern = "(\.env|id_rsa|passwd)" decision = "deny" priority = 200 -deny_message = "Access to sensitive credentials or system files is restricted by the policy-example extension." +denyMessage = "Access to sensitive credentials or system files is restricted by the policy-example extension." # Safety Checker: Apply path validation to all write operations. [[safety_checker]] diff --git a/packages/cli/src/config/config.test.ts b/packages/cli/src/config/config.test.ts index 2325711ad0..f312ddde4f 100644 --- a/packages/cli/src/config/config.test.ts +++ b/packages/cli/src/config/config.test.ts @@ -322,6 +322,41 @@ describe('parseArguments', () => { }, ); + describe('isCommand middleware', () => { + it.each([ + { cmd: 'mcp list', expected: true }, + { cmd: 'extensions list', expected: true }, + { cmd: 'extension list', expected: true }, + { cmd: 'skills list', expected: true }, + { cmd: 'skill list', expected: true }, + { cmd: 'hooks migrate', expected: true }, + { cmd: 'hook migrate', expected: true }, + { cmd: 'some query', expected: undefined }, + { cmd: 'hello world', expected: undefined }, + ])( + 'should set isCommand to $expected for "$cmd"', + async ({ cmd, expected }) => { + process.argv = ['node', 'script.js', ...cmd.split(' ')]; + const settings = createTestMergedSettings({ + admin: { + mcp: { enabled: true }, + }, + experimental: { + extensionManagement: true, + }, + skills: { + enabled: true, + }, + hooksConfig: { + enabled: true, + }, + }); + const parsedArgs = await parseArguments(settings); + expect(parsedArgs.isCommand).toBe(expected); + }, + ); + }); + it.each([ { description: 'should allow --prompt without --prompt-interactive', diff --git a/packages/cli/src/config/config.ts b/packages/cli/src/config/config.ts index ea29bad02c..6c4455c32f 100755 --- a/packages/cli/src/config/config.ts +++ b/packages/cli/src/config/config.ts @@ -164,12 +164,104 @@ export async function parseArguments( .usage( 'Usage: gemini [options] [command]\n\nGemini CLI - Defaults to interactive mode. Use -p/--prompt for non-interactive (headless) mode.', ) + .option('isCommand', { + type: 'boolean', + hidden: true, + description: 'Internal flag to indicate if a subcommand is being run', + }) .option('debug', { alias: 'd', type: 'boolean', description: 'Run in debug mode (open debug console with F12)', default: false, }) + .middleware((argv) => { + const commandModules = [ + mcpCommand, + extensionsCommand, + skillsCommand, + hooksCommand, + ]; + + const subcommands = commandModules.flatMap((mod) => { + const names: string[] = []; + + const cmd = mod.command; + if (cmd) { + if (Array.isArray(cmd)) { + for (const c of cmd) { + names.push(String(c).split(' ')[0]); + } + } else { + names.push(String(cmd).split(' ')[0]); + } + } + + const aliases = mod.aliases; + if (aliases) { + if (Array.isArray(aliases)) { + for (const a of aliases) { + names.push(String(a).split(' ')[0]); + } + } else { + names.push(String(aliases).split(' ')[0]); + } + } + + return names; + }); + + const firstArg = argv._[0]; + if (typeof firstArg === 'string' && subcommands.includes(firstArg)) { + argv['isCommand'] = true; + } + }, true) + // Ensure validation flows through .fail() for clean UX + .fail((msg, err) => { + if (err) throw err; + throw new Error(msg); + }) + .check((argv) => { + // The 'query' positional can be a string (for one arg) or string[] (for multiple). + // This guard safely checks if any positional argument was provided. + const queryArg = argv['query']; + const query = + typeof queryArg === 'string' || Array.isArray(queryArg) + ? queryArg + : undefined; + const hasPositionalQuery = Array.isArray(query) + ? query.length > 0 + : !!query; + + if (argv['prompt'] && hasPositionalQuery) { + return 'Cannot use both a positional prompt and the --prompt (-p) flag together'; + } + if (argv['prompt'] && argv['promptInteractive']) { + return 'Cannot use both --prompt (-p) and --prompt-interactive (-i) together'; + } + if (argv['yolo'] && argv['approvalMode']) { + return 'Cannot use both --yolo (-y) and --approval-mode together. Use --approval-mode=yolo instead.'; + } + + const outputFormat = argv['outputFormat']; + if ( + typeof outputFormat === 'string' && + !['text', 'json', 'stream-json'].includes(outputFormat) + ) { + return `Invalid values:\n Argument: output-format, Given: "${outputFormat}", Choices: "text", "json", "stream-json"`; + } + if (argv['worktree'] && !settings.experimental?.worktrees) { + return 'The --worktree flag is only available when experimental.worktrees is enabled in your settings.'; + } + return true; + }); + + yargsInstance.command(mcpCommand); + yargsInstance.command(extensionsCommand); + yargsInstance.command(skillsCommand); + yargsInstance.command(hooksCommand); + + yargsInstance .command('$0 [query..]', 'Launch Gemini CLI', (yargsInstance) => yargsInstance .positional('query', { @@ -359,59 +451,6 @@ export async function parseArguments( coerce: coerceCommaSeparated, }), ) - // Register MCP subcommands - .command(mcpCommand) - // Ensure validation flows through .fail() for clean UX - .fail((msg, err) => { - if (err) throw err; - throw new Error(msg); - }) - .check((argv) => { - // The 'query' positional can be a string (for one arg) or string[] (for multiple). - // This guard safely checks if any positional argument was provided. - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion - const query = argv['query'] as string | string[] | undefined; - const hasPositionalQuery = Array.isArray(query) - ? query.length > 0 - : !!query; - - if (argv['prompt'] && hasPositionalQuery) { - return 'Cannot use both a positional prompt and the --prompt (-p) flag together'; - } - if (argv['prompt'] && argv['promptInteractive']) { - return 'Cannot use both --prompt (-p) and --prompt-interactive (-i) together'; - } - if (argv['yolo'] && argv['approvalMode']) { - return 'Cannot use both --yolo (-y) and --approval-mode together. Use --approval-mode=yolo instead.'; - } - if ( - argv['outputFormat'] && - !['text', 'json', 'stream-json'].includes( - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion - argv['outputFormat'] as string, - ) - ) { - return `Invalid values:\n Argument: output-format, Given: "${argv['outputFormat']}", Choices: "text", "json", "stream-json"`; - } - if (argv['worktree'] && !settings.experimental?.worktrees) { - return 'The --worktree flag is only available when experimental.worktrees is enabled in your settings.'; - } - return true; - }); - - if (settings.experimental?.extensionManagement) { - yargsInstance.command(extensionsCommand); - } - - if (settings.skills?.enabled ?? true) { - yargsInstance.command(skillsCommand); - } - // Register hooks command if hooks are enabled - if (settings.hooksConfig.enabled) { - yargsInstance.command(hooksCommand); - } - - yargsInstance .version(await getVersion()) // This will enable the --version flag based on package.json .alias('v', 'version') .help() diff --git a/packages/cli/src/config/extension-manager.ts b/packages/cli/src/config/extension-manager.ts index 04487bc5f8..65b3539794 100644 --- a/packages/cli/src/config/extension-manager.ts +++ b/packages/cli/src/config/extension-manager.ts @@ -614,7 +614,7 @@ Would you like to attempt to install via "git clone" instead?`, this.loadingPromise = (async () => { try { - if (this.settings.admin.extensions.enabled === false) { + if (this.settings.admin?.extensions?.enabled === false) { this.loadedExtensions = []; return this.loadedExtensions; } @@ -824,11 +824,11 @@ Would you like to attempt to install via "git clone" instead?`, } if (config.mcpServers) { - if (this.settings.admin.mcp.enabled === false) { + if (this.settings.admin?.mcp?.enabled === false) { config.mcpServers = undefined; } else { // Apply admin allowlist if configured - const adminAllowlist = this.settings.admin.mcp.config; + const adminAllowlist = this.settings.admin?.mcp?.config; if (adminAllowlist && Object.keys(adminAllowlist).length > 0) { const result = applyAdminAllowlist( config.mcpServers, @@ -1298,7 +1298,9 @@ export async function inferInstallMetadata( source.startsWith('http://') || source.startsWith('https://') || source.startsWith('git@') || - source.startsWith('sso://') + source.startsWith('sso://') || + source.startsWith('github:') || + source.startsWith('gitlab:') ) { return { source, diff --git a/packages/cli/src/config/policy-engine.integration.test.ts b/packages/cli/src/config/policy-engine.integration.test.ts index 2e74a28201..3b2a34ca69 100644 --- a/packages/cli/src/config/policy-engine.integration.test.ts +++ b/packages/cli/src/config/policy-engine.integration.test.ts @@ -381,6 +381,7 @@ describe('Policy Engine Integration Tests', () => { // Add a manual rule with annotations to the config config.rules = config.rules || []; config.rules.push({ + toolName: '*', toolAnnotations: { readOnlyHint: true }, decision: PolicyDecision.ALLOW, priority: 10, diff --git a/packages/cli/src/config/settingsSchema.ts b/packages/cli/src/config/settingsSchema.ts index 277dcfdcb9..c0f2395110 100644 --- a/packages/cli/src/config/settingsSchema.ts +++ b/packages/cli/src/config/settingsSchema.ts @@ -657,6 +657,16 @@ const SETTINGS_SCHEMA = { description: 'Hide the footer from the UI', showInDialog: true, }, + collapseDrawerDuringApproval: { + type: 'boolean', + label: 'Collapse Drawer During Approval', + category: 'UI', + requiresRestart: false, + default: true, + description: + 'Whether to collapse the UI drawer when a tool is awaiting confirmation.', + showInDialog: false, + }, showMemoryUsage: { type: 'boolean', label: 'Show Memory Usage', @@ -1198,6 +1208,16 @@ const SETTINGS_SCHEMA = { 'Disable user input on browser window during automation.', showInDialog: false, }, + maxActionsPerTask: { + type: 'number', + label: 'Max Actions Per Task', + category: 'Advanced', + requiresRestart: false, + default: 100, + description: + 'The maximum number of tool calls allowed per browser task. Enforcement is hard: the agent will be terminated when the limit is reached.', + showInDialog: false, + }, confirmSensitiveActions: { type: 'boolean', label: 'Confirm Sensitive Actions', diff --git a/packages/cli/src/core/initializer.test.ts b/packages/cli/src/core/initializer.test.ts index e4fdb2cba5..9093ad54ee 100644 --- a/packages/cli/src/core/initializer.test.ts +++ b/packages/cli/src/core/initializer.test.ts @@ -105,6 +105,9 @@ describe('initializer', () => { mockSettings, ); + // Wait for the background promise to resolve + await new Promise((resolve) => setTimeout(resolve, 0)); + expect(result).toEqual({ authError: null, accountSuspensionInfo: null, diff --git a/packages/cli/src/core/initializer.ts b/packages/cli/src/core/initializer.ts index f27e9a9511..607129ae3e 100644 --- a/packages/cli/src/core/initializer.ts +++ b/packages/cli/src/core/initializer.ts @@ -13,6 +13,7 @@ import { StartSessionEvent, logCliConfiguration, startupProfiler, + debugLogger, } from '@google/gemini-cli-core'; import { type LoadedSettings } from '../config/settings.js'; import { performInitialAuth } from './auth.js'; @@ -55,9 +56,18 @@ export async function initializeApp( ); if (config.getIdeMode()) { - const ideClient = await IdeClient.getInstance(); - await ideClient.connect(); - logIdeConnection(config, new IdeConnectionEvent(IdeConnectionType.START)); + IdeClient.getInstance() + .then(async (ideClient) => { + await ideClient.connect(); + logIdeConnection( + config, + new IdeConnectionEvent(IdeConnectionType.START), + ); + }) + .catch((e) => { + // We log locally if IDE connection setup fails in the background. + debugLogger.error('Failed to initialize IDE client:', e); + }); } return { diff --git a/packages/cli/src/gemini.tsx b/packages/cli/src/gemini.tsx index 4bf0e96e85..86f4f39e7a 100644 --- a/packages/cli/src/gemini.tsx +++ b/packages/cli/src/gemini.tsx @@ -213,12 +213,36 @@ export async function main() { loadSettingsHandle?.end(); // If a worktree is requested and enabled, set it up early. + // This must be awaited before any other async tasks that depend on CWD (like loadCliConfig) + // because setupWorktree calls process.chdir(). const requestedWorktree = cliConfig.getRequestedWorktreeName(settings); let worktreeInfo: WorktreeInfo | undefined; if (requestedWorktree !== undefined) { + const worktreeHandle = startupProfiler.start('setup_worktree'); worktreeInfo = await setupWorktree(requestedWorktree || undefined); + worktreeHandle?.end(); } + const cleanupOpsHandle = startupProfiler.start('cleanup_ops'); + Promise.all([ + cleanupCheckpoints(), + cleanupToolOutputFiles(settings.merged), + cleanupBackgroundLogs(), + ]) + .catch((e) => { + debugLogger.error('Early cleanup failed:', e); + }) + .finally(() => { + cleanupOpsHandle?.end(); + }); + + const parseArgsHandle = startupProfiler.start('parse_arguments'); + const argvPromise = parseArguments(settings.merged).finally(() => { + parseArgsHandle?.end(); + }); + + const rawStartupWarningsPromise = getStartupWarnings(); + // Report settings errors once during startup settings.errors.forEach((error) => { coreEvents.emitFeedback('warning', error.message); @@ -232,15 +256,7 @@ export async function main() { ); }); - await Promise.all([ - cleanupCheckpoints(), - cleanupToolOutputFiles(settings.merged), - cleanupBackgroundLogs(), - ]); - - const parseArgsHandle = startupProfiler.start('parse_arguments'); - const argv = await parseArguments(settings.merged); - parseArgsHandle?.end(); + const argv = await argvPromise; if ( (argv.allowedTools && argv.allowedTools.length > 0) || @@ -325,7 +341,7 @@ export async function main() { // the sandbox because the sandbox will interfere with the Oauth2 web // redirect. let initialAuthFailed = false; - if (!settings.merged.security.auth.useExternal) { + if (!settings.merged.security.auth.useExternal && !argv.isCommand) { try { if ( partialConfig.isInteractive() && @@ -377,7 +393,7 @@ export async function main() { await runDeferredCommand(settings.merged); // hop into sandbox if we are outside and sandboxing is enabled - if (!process.env['SANDBOX']) { + if (!process.env['SANDBOX'] && !argv.isCommand) { const memoryArgs = settings.merged.advanced.autoConfigureMemory ? getNodeMemoryArgs(isDebugMode) : []; @@ -474,12 +490,10 @@ export async function main() { await config.getHookSystem()?.fireSessionEndEvent(SessionEndReason.Exit); }); - // Cleanup sessions after config initialization - try { - await cleanupExpiredSessions(config, settings.merged); - } catch (e) { + // Launch cleanup expired sessions as a background task + cleanupExpiredSessions(config, settings.merged).catch((e) => { debugLogger.error('Failed to cleanup expired sessions:', e); - } + }); if (config.getListExtensions()) { debugLogger.log('Installed extensions:'); @@ -531,7 +545,9 @@ export async function main() { }); } + const terminalHandle = startupProfiler.start('setup_terminal'); await setupTerminalAndTheme(config, settings); + terminalHandle?.end(); const initAppHandle = startupProfiler.start('initialize_app'); const initializationResult = await initializeApp(config, settings); @@ -555,7 +571,7 @@ export async function main() { isAlternateBufferEnabled(config), config.getScreenReader(), ); - const rawStartupWarnings = await getStartupWarnings(); + const rawStartupWarnings = await rawStartupWarningsPromise; const startupWarnings: StartupWarning[] = [ ...rawStartupWarnings.map((message) => ({ id: `startup-${createHash('sha256').update(message).digest('hex').substring(0, 16)}`, diff --git a/packages/cli/src/services/SlashCommandResolver.test.ts b/packages/cli/src/services/SlashCommandResolver.test.ts index 43d1c310a8..40e3b6f1d5 100644 --- a/packages/cli/src/services/SlashCommandResolver.test.ts +++ b/packages/cli/src/services/SlashCommandResolver.test.ts @@ -43,7 +43,7 @@ describe('SlashCommandResolver', () => { ]); expect(finalCommands.map((c) => c.name)).toContain('deploy'); - expect(finalCommands.map((c) => c.name)).toContain('firebase.deploy'); + expect(finalCommands.map((c) => c.name)).toContain('firebase:deploy'); expect(conflicts).toHaveLength(1); }); @@ -159,7 +159,7 @@ describe('SlashCommandResolver', () => { it('should apply numeric suffixes when renames also conflict', () => { const user1 = createMockCommand('deploy', CommandKind.USER_FILE); - const user2 = createMockCommand('gcp.deploy', CommandKind.USER_FILE); + const user2 = createMockCommand('gcp:deploy', CommandKind.USER_FILE); const extension = { ...createMockCommand('deploy', CommandKind.EXTENSION_FILE), extensionName: 'gcp', @@ -171,7 +171,7 @@ describe('SlashCommandResolver', () => { extension, ]); - expect(finalCommands.find((c) => c.name === 'gcp.deploy1')).toBeDefined(); + expect(finalCommands.find((c) => c.name === 'gcp:deploy1')).toBeDefined(); }); it('should prefix skills with extension name when they conflict with built-in', () => { @@ -185,7 +185,37 @@ describe('SlashCommandResolver', () => { const names = finalCommands.map((c) => c.name); expect(names).toContain('chat'); - expect(names).toContain('google-workspace.chat'); + expect(names).toContain('google-workspace:chat'); + }); + + it('should ALWAYS prefix extension skills even if no conflict exists', () => { + const skill = { + ...createMockCommand('chat', CommandKind.SKILL), + extensionName: 'google-workspace', + }; + + const { finalCommands } = SlashCommandResolver.resolve([skill]); + + const names = finalCommands.map((c) => c.name); + expect(names).toContain('google-workspace:chat'); + expect(names).not.toContain('chat'); + }); + + it('should use numeric suffixes if prefixed skill names collide', () => { + const skill1 = { + ...createMockCommand('chat', CommandKind.SKILL), + extensionName: 'google-workspace', + }; + const skill2 = { + ...createMockCommand('chat', CommandKind.SKILL), + extensionName: 'google-workspace', + }; + + const { finalCommands } = SlashCommandResolver.resolve([skill1, skill2]); + + const names = finalCommands.map((c) => c.name); + expect(names).toContain('google-workspace:chat'); + expect(names).toContain('google-workspace:chat1'); }); it('should NOT prefix skills with "skill" when extension name is missing', () => { diff --git a/packages/cli/src/services/SlashCommandResolver.ts b/packages/cli/src/services/SlashCommandResolver.ts index 4947e6545a..e956d6f566 100644 --- a/packages/cli/src/services/SlashCommandResolver.ts +++ b/packages/cli/src/services/SlashCommandResolver.ts @@ -47,7 +47,17 @@ export class SlashCommandResolver { const originalName = cmd.name; let finalName = originalName; - if (registry.firstEncounters.has(originalName)) { + const shouldAlwaysPrefix = + cmd.kind === CommandKind.SKILL && !!cmd.extensionName; + + if (shouldAlwaysPrefix) { + finalName = this.getRenamedName( + originalName, + this.getPrefix(cmd), + registry.commandMap, + cmd.kind, + ); + } else if (registry.firstEncounters.has(originalName)) { // We've already seen a command with this name, so resolve the conflict. finalName = this.handleConflict(cmd, registry); } else { @@ -93,6 +103,7 @@ export class SlashCommandResolver { incoming.name, this.getPrefix(incoming), registry.commandMap, + incoming.kind, ); this.trackConflict( registry.conflictsMap, @@ -132,6 +143,7 @@ export class SlashCommandResolver { currentOwner.name, this.getPrefix(currentOwner), registry.commandMap, + currentOwner.kind, ); // Update the registry: remove the old name and add the owner under the new name. @@ -156,8 +168,12 @@ export class SlashCommandResolver { name: string, prefix: string | undefined, commandMap: Map, + kind?: CommandKind, ): string { - const base = prefix ? `${prefix}.${name}` : name; + const isExtensionPrefix = + kind === CommandKind.SKILL || kind === CommandKind.EXTENSION_FILE; + const separator = isExtensionPrefix ? ':' : '.'; + const base = prefix ? `${prefix}${separator}${name}` : name; let renamedName = base; let suffix = 1; diff --git a/packages/cli/src/test-utils/AppRig.tsx b/packages/cli/src/test-utils/AppRig.tsx index 5ead5d615a..548372a139 100644 --- a/packages/cli/src/test-utils/AppRig.tsx +++ b/packages/cli/src/test-utils/AppRig.tsx @@ -11,7 +11,11 @@ import os from 'node:os'; import path from 'node:path'; import fs from 'node:fs'; import { AppContainer } from '../ui/AppContainer.js'; -import { renderWithProviders, type RenderInstance } from './render.js'; +import { + renderWithProviders, + type RenderInstance, + persistentStateMock, +} from './render.js'; import { makeFakeConfig, type Config, @@ -162,7 +166,7 @@ export class AppRig { private sessionId: string; private pendingConfirmations = new Map(); - private breakpointTools = new Set(); + private breakpointTools = new Set(); private lastAwaitedConfirmation: PendingConfirmation | undefined; /** @@ -177,9 +181,24 @@ export class AppRig { ); this.sessionId = `test-session-${uniqueId}`; activeRigs.set(this.sessionId, this); + + // Pre-create the persistent state file to bypass the terminal setup prompt + const geminiDir = path.join(this.testDir, '.gemini'); + if (!fs.existsSync(geminiDir)) { + fs.mkdirSync(geminiDir, { recursive: true }); + } + fs.writeFileSync( + path.join(geminiDir, 'state.json'), + JSON.stringify({ terminalSetupPromptShown: true }), + ); } async initialize() { + persistentStateMock.setData({ + terminalSetupPromptShown: true, + tipsShown: 10, + }); + this.setupEnvironment(); resetSettingsCacheForTesting(); this.settings = this.createRigSettings(); @@ -226,6 +245,8 @@ export class AppRig { private setupEnvironment() { // Stub environment variables to avoid interference from developer's machine vi.stubEnv('GEMINI_CLI_HOME', this.testDir); + vi.stubEnv('TERM_PROGRAM', 'other'); + vi.stubEnv('VSCODE_GIT_IPC_HANDLE', ''); if (this.options.fakeResponsesPath) { vi.stubEnv('GEMINI_API_KEY', 'test-api-key'); MockShellExecutionService.setPassthrough(false); @@ -291,7 +312,6 @@ export class AppRig { const newContentGeneratorConfig = { authType: authMethod, - proxy: gcConfig.getProxy(), apiKey: process.env['GEMINI_API_KEY'] || 'test-api-key', }; @@ -426,11 +446,7 @@ export class AppRig { MockShellExecutionService.setMockCommands(commands); } - setToolPolicy( - toolName: string | undefined, - decision: PolicyDecision, - priority = 10, - ) { + setToolPolicy(toolName: string, decision: PolicyDecision, priority = 10) { if (!this.config) throw new Error('AppRig not initialized'); this.config.getPolicyEngine().addRule({ toolName, @@ -440,27 +456,20 @@ export class AppRig { }); } - setBreakpoint(toolName: string | string[] | undefined) { + setBreakpoint(toolName: string | string[]) { if (Array.isArray(toolName)) { for (const name of toolName) { this.setBreakpoint(name); } } else { - // Use undefined toolName to create a global rule if '*' is provided - const actualToolName = toolName === '*' ? undefined : toolName; - this.setToolPolicy(actualToolName, PolicyDecision.ASK_USER, 100); + this.setToolPolicy(toolName, PolicyDecision.ASK_USER, 100); this.breakpointTools.add(toolName); } } - removeToolPolicy(toolName?: string, source = 'AppRig Override') { + removeToolPolicy(toolName: string, source = 'AppRig Override') { if (!this.config) throw new Error('AppRig not initialized'); - // Map '*' back to undefined for policy removal - const actualToolName = toolName === '*' ? undefined : toolName; - this.config - .getPolicyEngine() - - .removeRulesForTool(actualToolName as string, source); + this.config.getPolicyEngine().removeRulesForTool(toolName, source); this.breakpointTools.delete(toolName); } diff --git a/packages/cli/src/test-utils/render.tsx b/packages/cli/src/test-utils/render.tsx index 04a642d687..9dd0f96758 100644 --- a/packages/cli/src/test-utils/render.tsx +++ b/packages/cli/src/test-utils/render.tsx @@ -665,7 +665,7 @@ export const renderWithProviders = async ( ); } - const mainAreaWidth = terminalWidth; + const mainAreaWidth = providedUiState?.mainAreaWidth ?? terminalWidth; const finalUiState = { ...baseState, diff --git a/packages/cli/src/ui/AppContainer.test.tsx b/packages/cli/src/ui/AppContainer.test.tsx index 313573a573..3324505778 100644 --- a/packages/cli/src/ui/AppContainer.test.tsx +++ b/packages/cli/src/ui/AppContainer.test.tsx @@ -489,8 +489,8 @@ describe('AppContainer State Management', () => { // Mock LoadedSettings mockSettings = createMockSettings({ hideBanner: false, - hideFooter: false, hideTips: false, + hideFooter: false, showMemoryUsage: false, theme: 'default', ui: { @@ -911,8 +911,8 @@ describe('AppContainer State Management', () => { it('handles settings with all display options disabled', async () => { const settingsAllHidden = createMockSettings({ hideBanner: true, - hideFooter: true, hideTips: true, + hideFooter: true, showMemoryUsage: false, }); @@ -2157,13 +2157,8 @@ describe('AppContainer State Management', () => { expect(mockHandleSlashCommand).not.toHaveBeenCalled(); pressKey('\x04'); // Ctrl+D - // Now count is 2, it should quit. - expect(mockHandleSlashCommand).toHaveBeenCalledWith( - '/quit', - undefined, - undefined, - false, - ); + // It should still not quit because buffer is non-empty. + expect(mockHandleSlashCommand).not.toHaveBeenCalled(); unmount(); }); diff --git a/packages/cli/src/ui/AppContainer.tsx b/packages/cli/src/ui/AppContainer.tsx index 4d44facb36..81a604cd16 100644 --- a/packages/cli/src/ui/AppContainer.tsx +++ b/packages/cli/src/ui/AppContainer.tsx @@ -30,8 +30,6 @@ import { import { ConfigContext } from './contexts/ConfigContext.js'; import { type HistoryItem, - type HistoryItemWithoutId, - type HistoryItemToolGroup, AuthState, type ConfirmationRequest, type PermissionConfirmationRequest, @@ -83,7 +81,6 @@ import { type AgentsDiscoveredPayload, ChangeAuthRequestedError, ProjectIdRequiredError, - CoreToolCallStatus, buildUserSteeringHintPrompt, logBillingEvent, ApiKeyUpdatedEvent, @@ -172,29 +169,11 @@ import { useIsHelpDismissKey } from './utils/shortcutsHelp.js'; import { useSuspend } from './hooks/useSuspend.js'; import { useRunEventNotifications } from './hooks/useRunEventNotifications.js'; import { isNotificationsEnabled } from '../utils/terminalNotifications.js'; - -function isToolExecuting(pendingHistoryItems: HistoryItemWithoutId[]) { - return pendingHistoryItems.some((item) => { - if (item && item.type === 'tool_group') { - return item.tools.some( - (tool) => CoreToolCallStatus.Executing === tool.status, - ); - } - return false; - }); -} - -function isToolAwaitingConfirmation( - pendingHistoryItems: HistoryItemWithoutId[], -) { - return pendingHistoryItems - .filter((item): item is HistoryItemToolGroup => item.type === 'tool_group') - .some((item) => - item.tools.some( - (tool) => CoreToolCallStatus.AwaitingApproval === tool.status, - ), - ); -} +import { + isToolExecuting, + isToolAwaitingConfirmation, + getAllToolCalls, +} from './utils/historyUtils.js'; interface AppContainerProps { config: Config; @@ -723,7 +702,10 @@ export const AppContainer = (props: AppContainerProps) => { // Derive auth state variables for backward compatibility with UIStateContext const isAuthDialogOpen = authState === AuthState.Updating; - const isAuthenticating = authState === AuthState.Unauthenticated; + // TODO: Consider handling other auth types that should also skip the blocking screen + const isAuthenticating = + authState === AuthState.Unauthenticated && + settings.merged.security.auth.selectedType !== AuthType.USE_GEMINI; // Session browser and resume functionality const isGeminiClientInitialized = config.getGeminiClient()?.isInitialized(); @@ -1153,6 +1135,16 @@ Logging in with Google... Restarting Gemini CLI to continue. consumePendingHints, ); + const pendingHistoryItems = useMemo( + () => [...pendingSlashCommandHistoryItems, ...pendingGeminiHistoryItems], + [pendingSlashCommandHistoryItems, pendingGeminiHistoryItems], + ); + + const hasPendingToolConfirmation = useMemo( + () => isToolAwaitingConfirmation(pendingHistoryItems), + [pendingHistoryItems], + ); + toggleBackgroundShellRef.current = toggleBackgroundShell; isBackgroundShellVisibleRef.current = isBackgroundShellVisible; backgroundShellsRef.current = backgroundShells; @@ -1260,10 +1252,6 @@ Logging in with Google... Restarting Gemini CLI to continue. cancelHandlerRef.current = useCallback( (shouldRestorePrompt: boolean = true) => { - const pendingHistoryItems = [ - ...pendingSlashCommandHistoryItems, - ...pendingGeminiHistoryItems, - ]; if (isToolAwaitingConfirmation(pendingHistoryItems)) { return; // Don't clear - user may be composing a follow-up message } @@ -1297,8 +1285,7 @@ Logging in with Google... Restarting Gemini CLI to continue. inputHistory, getQueuedMessagesText, clearQueue, - pendingSlashCommandHistoryItems, - pendingGeminiHistoryItems, + pendingHistoryItems, ], ); @@ -1334,10 +1321,7 @@ Logging in with Google... Restarting Gemini CLI to continue. const isIdle = streamingState === StreamingState.Idle; const isAgentRunning = streamingState === StreamingState.Responding || - isToolExecuting([ - ...pendingSlashCommandHistoryItems, - ...pendingGeminiHistoryItems, - ]); + isToolExecuting(pendingHistoryItems); if (isSlash && isAgentRunning) { const { commandToExecute } = parseSlashCommand( @@ -1357,7 +1341,8 @@ Logging in with Google... Restarting Gemini CLI to continue. return; } - if (isSlash || (isIdle && isMcpReady)) { + const isMcpOrConfigReady = isConfigInitialized && isMcpReady; + if ((isSlash && isConfigInitialized) || (isIdle && isMcpOrConfigReady)) { if (!isSlash) { const permissions = await checkPermissions(submittedValue, config); if (permissions.length > 0) { @@ -1380,10 +1365,12 @@ Logging in with Google... Restarting Gemini CLI to continue. void submitQuery(submittedValue); } else { // Check messageQueue.length === 0 to only notify on the first queued item - if (isIdle && !isMcpReady && messageQueue.length === 0) { + if (isIdle && !isMcpOrConfigReady && messageQueue.length === 0) { coreEvents.emitFeedback( 'info', - 'Waiting for MCP servers to initialize... Slash commands are still available and prompts will be queued.', + !isConfigInitialized + ? 'Initializing... Prompts will be queued.' + : 'Waiting for MCP servers to initialize... Slash commands are still available and prompts will be queued.', ); } addMessage(submittedValue); @@ -1399,8 +1386,7 @@ Logging in with Google... Restarting Gemini CLI to continue. isMcpReady, streamingState, messageQueue.length, - pendingSlashCommandHistoryItems, - pendingGeminiHistoryItems, + pendingHistoryItems, config, constrainHeight, setConstrainHeight, @@ -1408,6 +1394,7 @@ Logging in with Google... Restarting Gemini CLI to continue. refreshStatic, reset, handleHintSubmit, + isConfigInitialized, triggerExpandHint, ], ); @@ -1438,16 +1425,28 @@ Logging in with Google... Restarting Gemini CLI to continue. * - Any future streaming states not explicitly allowed */ const isInputActive = - isConfigInitialized && !initError && !isProcessing && !isResuming && - !!slashCommands && (streamingState === StreamingState.Idle || - streamingState === StreamingState.Responding) && - !proQuotaRequest; + streamingState === StreamingState.Responding || + streamingState === StreamingState.WaitingForConfirmation) && + !proQuotaRequest && + !copyModeEnabled; const [controlsHeight, setControlsHeight] = useState(0); + const [lastNonCopyControlsHeight, setLastNonCopyControlsHeight] = useState(0); + + useLayoutEffect(() => { + if (!copyModeEnabled && controlsHeight > 0) { + setLastNonCopyControlsHeight(controlsHeight); + } + }, [copyModeEnabled, controlsHeight]); + + const stableControlsHeight = + copyModeEnabled && lastNonCopyControlsHeight > 0 + ? lastNonCopyControlsHeight + : controlsHeight; useLayoutEffect(() => { if (mainControlsRef.current) { @@ -1457,12 +1456,12 @@ Logging in with Google... Restarting Gemini CLI to continue. setControlsHeight(roundedHeight); } } - }, [buffer, terminalWidth, terminalHeight, controlsHeight]); + }, [buffer, terminalWidth, terminalHeight, controlsHeight, isInputActive]); - // Compute available terminal height based on controls measurement + // Compute available terminal height based on stable controls measurement const availableTerminalHeight = Math.max( 0, - terminalHeight - controlsHeight - backgroundShellHeight - 1, + terminalHeight - stableControlsHeight - backgroundShellHeight - 1, ); config.setShellExecutionConfig({ @@ -1711,17 +1710,13 @@ Logging in with Google... Restarting Gemini CLI to continue. [handleSlashCommand, settings], ); - const { elapsedTime, currentLoadingPhrase } = useLoadingIndicator({ - streamingState, - shouldShowFocusHint, - retryStatus, - loadingPhrasesMode: settings.merged.ui.loadingPhrases, - customWittyPhrases: settings.merged.ui.customWittyPhrases, - errorVerbosity: settings.merged.ui.errorVerbosity, - }); - const handleGlobalKeypress = useCallback( (key: Key): boolean => { + // Debug log keystrokes if enabled + if (settings.merged.general.debugKeystrokeLogging) { + debugLogger.log('[DEBUG] Keystroke:', JSON.stringify(key)); + } + if (shortcutsHelpVisible && isHelpDismissKey(key)) { setShortcutsHelpVisible(false); } @@ -1740,6 +1735,10 @@ Logging in with Google... Restarting Gemini CLI to continue. handleCtrlCPress(); return true; } else if (keyMatchers[Command.EXIT](key)) { + // If the input field is non-empty, do not exit. + if (bufferRef.current.text.length > 0) { + return false; + } handleCtrlDPress(); return true; } else if (keyMatchers[Command.SUSPEND_APP](key)) { @@ -1900,6 +1899,7 @@ Logging in with Google... Restarting Gemini CLI to continue. activePtyId, handleSuspend, embeddedShellFocused, + settings.merged.general.debugKeystrokeLogging, refreshStatic, setCopyModeEnabled, tabFocusTimeoutRef, @@ -2060,16 +2060,6 @@ Logging in with Google... Restarting Gemini CLI to continue. authState === AuthState.AwaitingApiKeyInput || !!newAgents; - const pendingHistoryItems = useMemo( - () => [...pendingSlashCommandHistoryItems, ...pendingGeminiHistoryItems], - [pendingSlashCommandHistoryItems, pendingGeminiHistoryItems], - ); - - const hasPendingToolConfirmation = useMemo( - () => isToolAwaitingConfirmation(pendingHistoryItems), - [pendingHistoryItems], - ); - const hasConfirmUpdateExtensionRequests = confirmUpdateExtensionRequests.length > 0; const hasLoopDetectionConfirmationRequest = @@ -2087,6 +2077,48 @@ Logging in with Google... Restarting Gemini CLI to continue. !!emptyWalletRequest || !!customDialog; + const loadingPhrases = settings.merged.ui.loadingPhrases; + const showStatusTips = loadingPhrases === 'tips' || loadingPhrases === 'all'; + const showStatusWit = loadingPhrases === 'witty' || loadingPhrases === 'all'; + + const showLoadingIndicator = + (!embeddedShellFocused || isBackgroundShellVisible) && + streamingState === StreamingState.Responding && + !hasPendingActionRequired; + + let estimatedStatusLength = 0; + if (activeHooks.length > 0 && settings.merged.hooksConfig.notifications) { + const hookLabel = + activeHooks.length > 1 ? 'Executing Hooks' : 'Executing Hook'; + const hookNames = activeHooks + .map( + (h) => + h.name + + (h.index && h.total && h.total > 1 ? ` (${h.index}/${h.total})` : ''), + ) + .join(', '); + estimatedStatusLength = hookLabel.length + hookNames.length + 10; + } else if (showLoadingIndicator) { + const thoughtText = thought?.subject || 'Waiting for model...'; + estimatedStatusLength = thoughtText.length + 25; + } else if (hasPendingActionRequired) { + estimatedStatusLength = 35; + } + + const maxLength = terminalWidth - estimatedStatusLength - 5; + + const { elapsedTime, currentLoadingPhrase, currentTip, currentWittyPhrase } = + useLoadingIndicator({ + streamingState, + shouldShowFocusHint, + retryStatus, + showTips: showStatusTips, + showWit: showStatusWit, + customWittyPhrases: settings.merged.ui.customWittyPhrases, + errorVerbosity: settings.merged.ui.errorVerbosity, + maxLength, + }); + const allowPlanMode = config.isPlanEnabled() && streamingState === StreamingState.Idle && @@ -2159,12 +2191,7 @@ Logging in with Google... Restarting Gemini CLI to continue. ]); const allToolCalls = useMemo( - () => - pendingHistoryItems - .filter( - (item): item is HistoryItemToolGroup => item.type === 'tool_group', - ) - .flatMap((item) => item.tools), + () => getAllToolCalls(pendingHistoryItems), [pendingHistoryItems], ); @@ -2272,6 +2299,8 @@ Logging in with Google... Restarting Gemini CLI to continue. isFocused, elapsedTime, currentLoadingPhrase, + currentTip, + currentWittyPhrase, historyRemountKey, activeHooks, messageQueue, @@ -2291,6 +2320,7 @@ Logging in with Google... Restarting Gemini CLI to continue. contextFileNames, errorCount, availableTerminalHeight, + stableControlsHeight, mainAreaWidth, staticAreaMaxItemHeight, staticExtraHeight, @@ -2329,11 +2359,7 @@ Logging in with Google... Restarting Gemini CLI to continue. newAgents, showIsExpandableHint, hintMode: - config.isModelSteeringEnabled() && - isToolExecuting([ - ...pendingSlashCommandHistoryItems, - ...pendingGeminiHistoryItems, - ]), + config.isModelSteeringEnabled() && isToolExecuting(pendingHistoryItems), hintBuffer: '', }), [ @@ -2399,6 +2425,8 @@ Logging in with Google... Restarting Gemini CLI to continue. isFocused, elapsedTime, currentLoadingPhrase, + currentTip, + currentWittyPhrase, historyRemountKey, activeHooks, messageQueue, @@ -2414,6 +2442,7 @@ Logging in with Google... Restarting Gemini CLI to continue. contextFileNames, errorCount, availableTerminalHeight, + stableControlsHeight, mainAreaWidth, staticAreaMaxItemHeight, staticExtraHeight, diff --git a/packages/cli/src/ui/ToolConfirmationFullFrame.test.tsx b/packages/cli/src/ui/ToolConfirmationFullFrame.test.tsx new file mode 100644 index 0000000000..c8456fb237 --- /dev/null +++ b/packages/cli/src/ui/ToolConfirmationFullFrame.test.tsx @@ -0,0 +1,179 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ +import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'; +import { cleanup, renderWithProviders } from '../test-utils/render.js'; +import { createMockSettings } from '../test-utils/settings.js'; +import { App } from './App.js'; +import { + CoreToolCallStatus, + ApprovalMode, + makeFakeConfig, +} from '@google/gemini-cli-core'; +import { type UIState } from './contexts/UIStateContext.js'; +import type { SerializableConfirmationDetails } from '@google/gemini-cli-core'; +import { act } from 'react'; +import { StreamingState } from './types.js'; + +vi.mock('ink', async (importOriginal) => { + const original = await importOriginal(); + return { + ...original, + useIsScreenReaderEnabled: vi.fn(() => false), + }; +}); + +vi.mock('./components/GeminiSpinner.js', () => ({ + GeminiSpinner: () => null, +})); + +vi.mock('./components/CliSpinner.js', () => ({ + CliSpinner: () => null, +})); + +// Mock hooks to align with codebase style, even if App uses UIState directly +vi.mock('./hooks/useGeminiStream.js'); +vi.mock('./hooks/useHistoryManager.js'); +vi.mock('./hooks/useQuotaAndFallback.js'); +vi.mock('./hooks/useThemeCommand.js'); +vi.mock('./auth/useAuth.js'); +vi.mock('./hooks/useEditorSettings.js'); +vi.mock('./hooks/useSettingsCommand.js'); +vi.mock('./hooks/useModelCommand.js'); +vi.mock('./hooks/slashCommandProcessor.js'); +vi.mock('./hooks/useConsoleMessages.js'); +vi.mock('./hooks/useTerminalSize.js', () => ({ + useTerminalSize: vi.fn(() => ({ columns: 100, rows: 30 })), +})); + +describe('Full Terminal Tool Confirmation Snapshot', () => { + beforeEach(() => { + vi.clearAllMocks(); + }); + + afterEach(() => { + cleanup(); + vi.restoreAllMocks(); + }); + + it('renders tool confirmation box in the frame of the entire terminal', async () => { + // Generate a large diff to warrant truncation + let largeDiff = + '--- a/packages/cli/src/ui/components/InputPrompt.tsx\n+++ b/packages/cli/src/ui/components/InputPrompt.tsx\n@@ -1,100 +1,105 @@\n'; + for (let i = 1; i <= 60; i++) { + largeDiff += ` const line${i} = true;\n`; + } + largeDiff += '- return kittyProtocolSupporte...;\n'; + largeDiff += '+ return kittyProtocolSupporte...;\n'; + largeDiff += ' buffer: TextBuffer;\n'; + largeDiff += ' onSubmit: (value: string) => void;'; + + const confirmationDetails: SerializableConfirmationDetails = { + type: 'edit', + title: 'Edit packages/.../InputPrompt.tsx', + fileName: 'InputPrompt.tsx', + filePath: 'packages/.../InputPrompt.tsx', + fileDiff: largeDiff, + originalContent: 'old', + newContent: 'new', + isModifying: false, + }; + + const toolCalls = [ + { + callId: 'call-1-modify-selected', + name: 'Edit', + description: + 'packages/.../InputPrompt.tsx: return kittyProtocolSupporte... => return kittyProtocolSupporte...', + status: CoreToolCallStatus.AwaitingApproval, + resultDisplay: '', + confirmationDetails, + }, + ]; + + const mockUIState = { + history: [ + { + id: 1, + type: 'user', + text: 'Can you edit InputPrompt.tsx for me?', + }, + ], + mainAreaWidth: 99, + availableTerminalHeight: 36, + streamingState: StreamingState.WaitingForConfirmation, + constrainHeight: true, + isConfigInitialized: true, + cleanUiDetailsVisible: true, + quota: { + userTier: 'PRO', + stats: { + limits: {}, + usage: {}, + }, + proQuotaRequest: null, + validationRequest: null, + }, + pendingHistoryItems: [ + { + id: 2, + type: 'tool_group', + tools: toolCalls, + }, + ], + showApprovalModeIndicator: ApprovalMode.DEFAULT, + sessionStats: { + lastPromptTokenCount: 175400, + contextPercentage: 3, + }, + buffer: { text: '' }, + messageQueue: [], + activeHooks: [], + contextFileNames: [], + rootUiRef: { current: null }, + } as unknown as UIState; + + const mockConfig = makeFakeConfig(); + mockConfig.getUseAlternateBuffer = () => true; + mockConfig.isTrustedFolder = () => true; + mockConfig.getDisableAlwaysAllow = () => false; + mockConfig.getIdeMode = () => false; + mockConfig.getTargetDir = () => '/directory'; + + const { waitUntilReady, lastFrame, generateSvg, unmount } = + await renderWithProviders(, { + uiState: mockUIState, + config: mockConfig, + settings: createMockSettings({ + merged: { + ui: { + useAlternateBuffer: true, + theme: 'default', + showUserIdentity: false, + showShortcutsHint: false, + footer: { + hideContextPercentage: false, + hideTokens: false, + hideModel: false, + }, + }, + security: { + enablePermanentToolApproval: true, + }, + }, + }), + }); + + await waitUntilReady(); + + // Give it a moment to render + await act(async () => { + await new Promise((resolve) => setTimeout(resolve, 500)); + }); + + await expect({ lastFrame, generateSvg }).toMatchSvgSnapshot(); + unmount(); + }); +}); diff --git a/packages/cli/src/ui/__snapshots__/App.test.tsx.snap b/packages/cli/src/ui/__snapshots__/App.test.tsx.snap index 9e1d66df01..f145eadfff 100644 --- a/packages/cli/src/ui/__snapshots__/App.test.tsx.snap +++ b/packages/cli/src/ui/__snapshots__/App.test.tsx.snap @@ -2,10 +2,13 @@ exports[`App > Snapshots > renders default layout correctly 1`] = ` " - ▝▜▄ Gemini CLI v1.2.3 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v1.2.3 + Tips for getting started: @@ -29,16 +32,13 @@ Tips for getting started: - - - - Notifications + Composer " `; @@ -47,10 +47,13 @@ exports[`App > Snapshots > renders screen reader layout correctly 1`] = ` "Notifications Footer - ▝▜▄ Gemini CLI v1.2.3 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v1.2.3 + Tips for getting started: @@ -64,13 +67,12 @@ Composer exports[`App > Snapshots > renders with dialogs visible 1`] = ` " - ▝▜▄ Gemini CLI v1.2.3 - ▝▜▄ - ▗▟▀ - ▝▀ - - + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + Gemini CLI v1.2.3 @@ -101,16 +103,20 @@ exports[`App > Snapshots > renders with dialogs visible 1`] = ` Notifications + DialogManager " `; exports[`App > should render ToolConfirmationQueue along with Composer when tool is confirming and experiment is on 1`] = ` " - ▝▜▄ Gemini CLI v1.2.3 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v1.2.3 + Tips for getting started: @@ -139,11 +145,8 @@ HistoryItemDisplay - - - - Notifications + Composer " `; diff --git a/packages/cli/src/ui/__snapshots__/ToolConfirmationFullFrame-Full-Terminal-Tool-Confirmation-Snapshot-renders-tool-confirmation-box-in-the-frame-of-the-entire-terminal.snap.svg b/packages/cli/src/ui/__snapshots__/ToolConfirmationFullFrame-Full-Terminal-Tool-Confirmation-Snapshot-renders-tool-confirmation-box-in-the-frame-of-the-entire-terminal.snap.svg new file mode 100644 index 0000000000..97b01f3025 --- /dev/null +++ b/packages/cli/src/ui/__snapshots__/ToolConfirmationFullFrame-Full-Terminal-Tool-Confirmation-Snapshot-renders-tool-confirmation-box-in-the-frame-of-the-entire-terminal.snap.svg @@ -0,0 +1,266 @@ + + + + + + ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ + + + > + + Can you edit InputPrompt.tsx for me? + + + ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ + ╭─────────────────────────────────────────────────────────────────────────────────────────────────╮ + + Action Required + + + + + ? + Edit + packages/.../InputPrompt.tsx: return kittyProtocolSupporte... => return kittyProto + + + + + + ... first 44 lines hidden (Ctrl+O to show) ... + + + 45 + const + line45 + = + true + ; + + + 46 + const + line46 + = + true + ; + + + 47 + const + line47 + = + true + ; + + + + 48 + const + line48 + = + true + ; + + + + 49 + const + line49 + = + true + ; + + + + 50 + const + line50 + = + true + ; + + + + 51 + const + line51 + = + true + ; + + + + 52 + const + line52 + = + true + ; + + + + 53 + const + line53 + = + true + ; + + + + 54 + const + line54 + = + true + ; + + + + 55 + const + line55 + = + true + ; + + + + 56 + const + line56 + = + true + ; + + + + 57 + const + line57 + = + true + ; + + + + 58 + const + line58 + = + true + ; + + + + 59 + const + line59 + = + true + ; + + + + 60 + const + line60 + = + true + ; + + + + + 61 + + + - + + + + return + + kittyProtocolSupporte...; + + + + + 61 + + + + + + + + return + + kittyProtocolSupporte...; + + + + 62 + buffer: TextBuffer; + + + + 63 + onSubmit + : ( + value + : + string + ) => + void + ; + + + + Apply this change? + + + + + + + + + + + 1. + + + Allow once + + + + + 2. + Allow for this session + + + + 3. + Allow for this file in all future sessions + + + + 4. + Modify with external editor + + + + 5. + No, suggest changes (esc) + + + + + + ╰─────────────────────────────────────────────────────────────────────────────────────────────────╯ + + + \ No newline at end of file diff --git a/packages/cli/src/ui/__snapshots__/ToolConfirmationFullFrame.test.tsx.snap b/packages/cli/src/ui/__snapshots__/ToolConfirmationFullFrame.test.tsx.snap new file mode 100644 index 0000000000..98853434df --- /dev/null +++ b/packages/cli/src/ui/__snapshots__/ToolConfirmationFullFrame.test.tsx.snap @@ -0,0 +1,43 @@ +// Vitest Snapshot v1, https://vitest.dev/guide/snapshot.html + +exports[`Full Terminal Tool Confirmation Snapshot > renders tool confirmation box in the frame of the entire terminal 1`] = ` +"▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ + > Can you edit InputPrompt.tsx for me? +▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ +╭─────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ Action Required │ +│ │ +│ ? Edit packages/.../InputPrompt.tsx: return kittyProtocolSupporte... => return kittyProto… │ +│ │ +│ ... first 44 lines hidden (Ctrl+O to show) ... │ +│ 45 const line45 = true; │ +│ 46 const line46 = true; │ +│ 47 const line47 = true; │█ +│ 48 const line48 = true; │█ +│ 49 const line49 = true; │█ +│ 50 const line50 = true; │█ +│ 51 const line51 = true; │█ +│ 52 const line52 = true; │█ +│ 53 const line53 = true; │█ +│ 54 const line54 = true; │█ +│ 55 const line55 = true; │█ +│ 56 const line56 = true; │█ +│ 57 const line57 = true; │█ +│ 58 const line58 = true; │█ +│ 59 const line59 = true; │█ +│ 60 const line60 = true; │█ +│ 61 - return kittyProtocolSupporte...; │█ +│ 61 + return kittyProtocolSupporte...; │█ +│ 62 buffer: TextBuffer; │█ +│ 63 onSubmit: (value: string) => void; │█ +│ Apply this change? │█ +│ │█ +│ ● 1. Allow once │█ +│ 2. Allow for this session │█ +│ 3. Allow for this file in all future sessions │█ +│ 4. Modify with external editor │█ +│ 5. No, suggest changes (esc) │█ +│ │█ +╰─────────────────────────────────────────────────────────────────────────────────────────────────╯█ +" +`; diff --git a/packages/cli/src/ui/auth/AuthDialog.test.tsx b/packages/cli/src/ui/auth/AuthDialog.test.tsx index 4837a71490..69593df076 100644 --- a/packages/cli/src/ui/auth/AuthDialog.test.tsx +++ b/packages/cli/src/ui/auth/AuthDialog.test.tsx @@ -254,7 +254,7 @@ describe('AuthDialog', () => { unmount(); }); - it('skips API key dialog on initial setup if env var is present', async () => { + it('always shows API key dialog even when env var is present', async () => { mockedValidateAuthMethod.mockReturnValue(null); vi.stubEnv('GEMINI_API_KEY', 'test-key-from-env'); // props.settings.merged.security.auth.selectedType is undefined here, simulating initial setup @@ -265,12 +265,12 @@ describe('AuthDialog', () => { await handleAuthSelect(AuthType.USE_GEMINI); expect(props.setAuthState).toHaveBeenCalledWith( - AuthState.Unauthenticated, + AuthState.AwaitingApiKeyInput, ); unmount(); }); - it('skips API key dialog if env var is present but empty', async () => { + it('always shows API key dialog even when env var is empty string', async () => { mockedValidateAuthMethod.mockReturnValue(null); vi.stubEnv('GEMINI_API_KEY', ''); // Empty string // props.settings.merged.security.auth.selectedType is undefined here @@ -281,7 +281,7 @@ describe('AuthDialog', () => { await handleAuthSelect(AuthType.USE_GEMINI); expect(props.setAuthState).toHaveBeenCalledWith( - AuthState.Unauthenticated, + AuthState.AwaitingApiKeyInput, ); unmount(); }); @@ -302,10 +302,10 @@ describe('AuthDialog', () => { unmount(); }); - it('skips API key dialog on re-auth if env var is present (cannot edit)', async () => { + it('always shows API key dialog on re-auth even if env var is present', async () => { mockedValidateAuthMethod.mockReturnValue(null); vi.stubEnv('GEMINI_API_KEY', 'test-key-from-env'); - // Simulate that the user has already authenticated once + // Simulate switching from a different auth method (e.g., Google Login → API key) props.settings.merged.security.auth.selectedType = AuthType.LOGIN_WITH_GOOGLE; @@ -315,7 +315,7 @@ describe('AuthDialog', () => { await handleAuthSelect(AuthType.USE_GEMINI); expect(props.setAuthState).toHaveBeenCalledWith( - AuthState.Unauthenticated, + AuthState.AwaitingApiKeyInput, ); unmount(); }); diff --git a/packages/cli/src/ui/auth/AuthDialog.tsx b/packages/cli/src/ui/auth/AuthDialog.tsx index c823f606c6..e73d380bf3 100644 --- a/packages/cli/src/ui/auth/AuthDialog.tsx +++ b/packages/cli/src/ui/auth/AuthDialog.tsx @@ -137,13 +137,11 @@ export function AuthDialog({ } if (authType === AuthType.USE_GEMINI) { - if (process.env['GEMINI_API_KEY'] !== undefined) { - setAuthState(AuthState.Unauthenticated); - return; - } else { - setAuthState(AuthState.AwaitingApiKeyInput); - return; - } + // Always show the API key input dialog so the user can + // explicitly enter or confirm their key, regardless of + // whether GEMINI_API_KEY env var or a stored key exists. + setAuthState(AuthState.AwaitingApiKeyInput); + return; } } setAuthState(AuthState.Unauthenticated); diff --git a/packages/cli/src/ui/components/AppHeader.test.tsx b/packages/cli/src/ui/components/AppHeader.test.tsx index 8ff4caaacf..4dbdbc0052 100644 --- a/packages/cli/src/ui/components/AppHeader.test.tsx +++ b/packages/cli/src/ui/components/AppHeader.test.tsx @@ -8,8 +8,10 @@ import { renderWithProviders, persistentStateMock, } from '../../test-utils/render.js'; +import type { LoadedSettings } from '../../config/settings.js'; import { AppHeader } from './AppHeader.js'; import { describe, it, expect, vi } from 'vitest'; +import { makeFakeConfig } from '@google/gemini-cli-core'; import crypto from 'node:crypto'; vi.mock('../utils/terminalSetup.js', () => ({ @@ -240,4 +242,46 @@ describe('', () => { expect(session2.lastFrame()).not.toContain('Tips'); session2.unmount(); }); + + it('should render the full logo when logged out', async () => { + const mockConfig = makeFakeConfig(); + vi.spyOn(mockConfig, 'getContentGeneratorConfig').mockReturnValue({ + authType: undefined, + } as any); // eslint-disable-line @typescript-eslint/no-explicit-any + + const { lastFrame, waitUntilReady, unmount } = await renderWithProviders( + , + { + config: mockConfig, + uiState: { + terminalWidth: 120, + }, + }, + ); + await waitUntilReady(); + + // Check for block characters from the logo + expect(lastFrame()).toContain('▗█▀▀▜▙'); + expect(lastFrame()).toMatchSnapshot(); + unmount(); + }); + + it('should NOT render Tips when ui.hideTips is true', async () => { + const mockConfig = makeFakeConfig(); + const { lastFrame, waitUntilReady, unmount } = await renderWithProviders( + , + { + config: mockConfig, + settings: { + merged: { + ui: { hideTips: true }, + }, + } as unknown as LoadedSettings, + }, + ); + await waitUntilReady(); + + expect(lastFrame()).not.toContain('Tips'); + unmount(); + }); }); diff --git a/packages/cli/src/ui/components/AppHeader.tsx b/packages/cli/src/ui/components/AppHeader.tsx index 0b15f917a6..7d0ef75a36 100644 --- a/packages/cli/src/ui/components/AppHeader.tsx +++ b/packages/cli/src/ui/components/AppHeader.tsx @@ -19,6 +19,9 @@ import { CliSpinner } from './CliSpinner.js'; import { isAppleTerminal } from '@google/gemini-cli-core'; +import { longAsciiLogoCompactText } from './AsciiArt.js'; +import { getAsciiArtWidth } from '../utils/textUtils.js'; + interface AppHeaderProps { version: string; showDetails?: boolean; @@ -41,6 +44,18 @@ const MAC_TERMINAL_ICON = `▝▜▄ ▗▟▀ ▗▟▀ `; +/** + * The horizontal padding (in columns) required for metadata (version, identity, etc.) + * when rendered alongside the ASCII logo. + */ +const LOGO_METADATA_PADDING = 20; + +/** + * The terminal width below which we switch to a narrow/column layout to prevent + * UI elements from wrapping or overlapping. + */ +const NARROW_TERMINAL_BREAKPOINT = 60; + export const AppHeader = ({ version, showDetails = true }: AppHeaderProps) => { const settings = useSettings(); const config = useConfig(); @@ -49,70 +64,90 @@ export const AppHeader = ({ version, showDetails = true }: AppHeaderProps) => { const { bannerText } = useBanner(bannerData); const { showTips } = useTips(); + const authType = config.getContentGeneratorConfig()?.authType; + const loggedOut = !authType; + const showHeader = !( settings.merged.ui.hideBanner || config.getScreenReader() ); const ICON = isAppleTerminal() ? MAC_TERMINAL_ICON : DEFAULT_ICON; - if (!showDetails) { - return ( - - {showHeader && ( - - - {ICON} - - - - - Gemini CLI - - v{version} - - + let logoTextArt = ''; + if (loggedOut) { + const widthOfLongLogo = + getAsciiArtWidth(longAsciiLogoCompactText) + LOGO_METADATA_PADDING; + + if (terminalWidth >= widthOfLongLogo) { + logoTextArt = longAsciiLogoCompactText.trim(); + } + } + + // If the terminal is too narrow to fit the icon and metadata (especially long nightly versions) + // side-by-side, we switch to column mode to prevent wrapping. + const isNarrow = terminalWidth < NARROW_TERMINAL_BREAKPOINT; + + const renderLogo = () => ( + + + {ICON} + + {logoTextArt && ( + + {logoTextArt} + + )} + + ); + + const renderMetadata = (isBelow = false) => ( + + {/* Line 1: Gemini CLI vVersion [Updating] */} + + + Gemini CLI + + v{version} + {updateInfo?.isUpdating && ( + + + Updating + )} - ); - } + + {showDetails && ( + <> + {/* Line 2: Blank */} + + + {/* Lines 3 & 4: User Identity info (Email /auth and Plan /upgrade) */} + {settings.merged.ui.showUserIdentity !== false && ( + + )} + + )} + + ); + + const useColumnLayout = !!logoTextArt || isNarrow; return ( {showHeader && ( - - - {ICON} - - - {/* Line 1: Gemini CLI vVersion [Updating] */} - - - Gemini CLI - - v{version} - {updateInfo && ( - - - Updating - - - )} - - - {/* Line 2: Blank */} - - - {/* Lines 3 & 4: User Identity info (Email /auth and Plan /upgrade) */} - {settings.merged.ui.showUserIdentity !== false && ( - - )} - + + {renderLogo()} + {useColumnLayout ? ( + {renderMetadata(true)} + ) : ( + renderMetadata(false) + )} )} diff --git a/packages/cli/src/ui/components/AsciiArt.ts b/packages/cli/src/ui/components/AsciiArt.ts index 79eb522c80..40f0eb8296 100644 --- a/packages/cli/src/ui/components/AsciiArt.ts +++ b/packages/cli/src/ui/components/AsciiArt.ts @@ -16,14 +16,14 @@ export const shortAsciiLogo = ` `; export const longAsciiLogo = ` - ███ █████████ ██████████ ██████ ██████ █████ ██████ █████ █████ -░░░███ ███░░░░░███░░███░░░░░█░░██████ ██████ ░░███ ░░██████ ░░███ ░░███ - ░░░███ ███ ░░░ ░███ █ ░ ░███░█████░███ ░███ ░███░███ ░███ ░███ - ░░░███ ░███ ░██████ ░███░░███ ░███ ░███ ░███░░███░███ ░███ - ███░ ░███ █████ ░███░░█ ░███ ░░░ ░███ ░███ ░███ ░░██████ ░███ - ███░ ░░███ ░░███ ░███ ░ █ ░███ ░███ ░███ ░███ ░░█████ ░███ - ███░ ░░█████████ ██████████ █████ █████ █████ █████ ░░█████ █████ -░░░ ░░░░░░░░░ ░░░░░░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ + █████████ ██████████ ██████ ██████ █████ ██████ █████ █████ +███░░░░░███░░███░░░░░█░░██████ █████ ░░███░░██████ ░░███ ░░███ +███ ░░░░░░░ ░███ █ ░ ░███░█████░███ ░███ ░███░███ ░███ ░███ +░███ ░██████ ░███░░███ ░███ ░███ ░███░░███░███ ░███ +░███ █████ ░███░░█ ░███ ░░░ ░███ ░███ ░███ ░░██████ ░███ +░░███ ░░███ ░███ ░ █ ░███ ░███ ░███ ░███ ░░█████ ░███ + ░░█████████ ██████████ █████ █████ █████ █████ ░░████ █████ + ░░░░░░░░░ ░░░░░░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░ ░░░░░ `; export const tinyAsciiLogo = ` @@ -36,3 +36,24 @@ export const tinyAsciiLogo = ` ███░ ░░█████████ ░░░ ░░░░░░░░░ `; + +export const shortAsciiLogoCompactText = ` +▟▛▀▀█▖▜█▀▀▜▝██▙▗██▛▝█▛▝██▙ ▜█▘▜█▘ +▐█ ▐█▄▌ █▌▜█▘█▌ █▌ █▌▜▙▐█ ▐█ +▝█▖ ▜█▘▐█ ▘▗ █▌ █▌ █▌ █▌ ▜██ ▐█ + ▝▀▀▀▀ ▀▀▀▀▀▝▀▀ ▝▀▀▝▀▀▝▀▀ ▀▀▘▀▀▘ +`; + +export const longAsciiLogoCompactText = ` +▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ +█▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ +▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ +`; + +export const tinyAsciiLogoCompactText = ` +▟▛▀▀█▖ +▐█ +▝█▖ ▜█▘ + ▝▀▀▀▀ +`; diff --git a/packages/cli/src/ui/components/AskUserDialog.test.tsx b/packages/cli/src/ui/components/AskUserDialog.test.tsx index 864800a061..53c820f69e 100644 --- a/packages/cli/src/ui/components/AskUserDialog.test.tsx +++ b/packages/cli/src/ui/components/AskUserDialog.test.tsx @@ -287,7 +287,7 @@ describe('AskUserDialog', () => { }); describe.each([ - { useAlternateBuffer: true, expectedArrows: false }, + { useAlternateBuffer: true, expectedArrows: true }, { useAlternateBuffer: false, expectedArrows: true }, ])( 'Scroll Arrows (useAlternateBuffer: $useAlternateBuffer)', @@ -1453,4 +1453,42 @@ describe('AskUserDialog', () => { }); }); }); + + it('shows at least 3 selection options even in small terminal heights', async () => { + const questions: Question[] = [ + { + question: + 'A very long question that would normally take up most of the space and squeeze the list if we did not have a heuristic to prevent it. This line is just to make it longer. And another one. Imagine this is a plan.', + header: 'Test', + type: QuestionType.CHOICE, + options: [ + { label: 'Option 1', description: 'Description 1' }, + { label: 'Option 2', description: 'Description 2' }, + { label: 'Option 3', description: 'Description 3' }, + { label: 'Option 4', description: 'Description 4' }, + ], + multiSelect: false, + }, + ]; + + const { lastFrame, waitUntilReady } = await renderWithProviders( + , + { width: 80 }, + ); + + await waitFor(async () => { + await waitUntilReady(); + const frame = lastFrame(); + // Should show at least 3 options + expect(frame).toContain('1. Option 1'); + expect(frame).toContain('2. Option 2'); + expect(frame).toContain('3. Option 3'); + }); + }); }); diff --git a/packages/cli/src/ui/components/AskUserDialog.tsx b/packages/cli/src/ui/components/AskUserDialog.tsx index b1d23885e6..cbb505320c 100644 --- a/packages/cli/src/ui/components/AskUserDialog.tsx +++ b/packages/cli/src/ui/components/AskUserDialog.tsx @@ -849,16 +849,30 @@ const ChoiceQuestionView: React.FC = ({ ? Math.max(1, availableHeight - overhead) : undefined; + // Reserve space for at least 3 items if more selectionItems available. + const reservedListHeight = Math.min(selectionItems.length * 2, 6); const questionHeightLimit = listHeight && !isAlternateBuffer ? question.unconstrainedHeight ? Math.max(1, listHeight - selectionItems.length * 2) - : Math.min(15, Math.max(1, listHeight - DIALOG_PADDING)) + : Math.min( + 15, + Math.max( + 1, + listHeight - Math.max(DIALOG_PADDING, reservedListHeight), + ), + ) : undefined; const maxItemsToShow = - listHeight && questionHeightLimit - ? Math.max(1, Math.floor((listHeight - questionHeightLimit) / 2)) + listHeight && (!isAlternateBuffer || availableHeight !== undefined) + ? Math.min( + selectionItems.length, + Math.max( + 1, + Math.floor((listHeight - (questionHeightLimit ?? 0)) / 2), + ), + ) : selectionItems.length; return ( diff --git a/packages/cli/src/ui/components/Composer.test.tsx b/packages/cli/src/ui/components/Composer.test.tsx index 8df5f690e7..1cbb29a06c 100644 --- a/packages/cli/src/ui/components/Composer.test.tsx +++ b/packages/cli/src/ui/components/Composer.test.tsx @@ -17,13 +17,6 @@ import { import { ConfigContext } from '../contexts/ConfigContext.js'; import { SettingsContext } from '../contexts/SettingsContext.js'; import { createMockSettings } from '../../test-utils/settings.js'; -// Mock VimModeContext hook -vi.mock('../contexts/VimModeContext.js', () => ({ - useVimMode: vi.fn(() => ({ - vimEnabled: false, - vimMode: 'INSERT', - })), -})); import { ApprovalMode, tokenLimit, @@ -36,6 +29,21 @@ import type { LoadedSettings } from '../../config/settings.js'; import type { SessionMetrics } from '../contexts/SessionContext.js'; import type { TextBuffer } from './shared/text-buffer.js'; +// Mock VimModeContext hook +vi.mock('../contexts/VimModeContext.js', () => ({ + useVimMode: vi.fn(() => ({ + vimEnabled: false, + vimMode: 'INSERT', + })), +})); + +vi.mock('../hooks/useTerminalSize.js', () => ({ + useTerminalSize: vi.fn(() => ({ + columns: 100, + rows: 24, + })), +})); + const composerTestControls = vi.hoisted(() => ({ suggestionsVisible: false, isAlternateBuffer: false, @@ -58,18 +66,9 @@ vi.mock('./LoadingIndicator.js', () => ({ })); vi.mock('./StatusDisplay.js', () => ({ - StatusDisplay: () => StatusDisplay, -})); - -vi.mock('./ToastDisplay.js', () => ({ - ToastDisplay: () => ToastDisplay, - shouldShowToast: (uiState: UIState) => - uiState.ctrlCPressedOnce || - Boolean(uiState.transientMessage) || - uiState.ctrlDPressedOnce || - (uiState.showEscapePrompt && - (uiState.buffer.text.length > 0 || uiState.history.length > 0)) || - Boolean(uiState.queueErrorMessage), + StatusDisplay: ({ hideContextSummary }: { hideContextSummary: boolean }) => ( + StatusDisplay{hideContextSummary ? ' (hidden summary)' : ''} + ), })); vi.mock('./ContextSummaryDisplay.js', () => ({ @@ -81,17 +80,15 @@ vi.mock('./HookStatusDisplay.js', () => ({ })); vi.mock('./ApprovalModeIndicator.js', () => ({ - ApprovalModeIndicator: () => ApprovalModeIndicator, + ApprovalModeIndicator: ({ approvalMode }: { approvalMode: ApprovalMode }) => ( + ApprovalModeIndicator: {approvalMode} + ), })); vi.mock('./ShellModeIndicator.js', () => ({ ShellModeIndicator: () => ShellModeIndicator, })); -vi.mock('./ShortcutsHint.js', () => ({ - ShortcutsHint: () => ShortcutsHint, -})); - vi.mock('./ShortcutsHelp.js', () => ({ ShortcutsHelp: () => ShortcutsHelp, })); @@ -174,6 +171,8 @@ const createMockUIState = (overrides: Partial = {}): UIState => isFocused: true, thought: '', currentLoadingPhrase: '', + currentTip: '', + currentWittyPhrase: '', elapsedTime: 0, ctrlCPressedOnce: false, ctrlDPressedOnce: false, @@ -201,6 +200,7 @@ const createMockUIState = (overrides: Partial = {}): UIState => activeHooks: [], isBackgroundShellVisible: false, embeddedShellFocused: false, + showIsExpandableHint: false, quota: { userTier: undefined, stats: undefined, @@ -247,7 +247,7 @@ const createMockConfig = (overrides = {}): Config => const renderComposer = async ( uiState: UIState, - settings = createMockSettings(), + settings = createMockSettings({ ui: {} }), config = createMockConfig(), uiActions = createMockUIActions(), ) => { @@ -256,7 +256,7 @@ const renderComposer = async ( - + @@ -383,10 +383,12 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState, settings); const output = lastFrame(); - expect(output).toContain('LoadingIndicator: Thinking...'); + // In Refreshed UX, we don't force 'Thinking...' label in renderStatusNode + // It uses the subject directly + expect(output).toContain('LoadingIndicator: Thinking about code'); }); - it('hides shortcuts hint while loading', async () => { + it('shows shortcuts hint while loading', async () => { const uiState = createMockUIState({ streamingState: StreamingState.Responding, elapsedTime: 1, @@ -397,7 +399,8 @@ describe('Composer', () => { const output = lastFrame(); expect(output).toContain('LoadingIndicator'); - expect(output).not.toContain('ShortcutsHint'); + expect(output).toContain('press tab twice for more'); + expect(output).not.toContain('? for shortcuts'); }); it('renders LoadingIndicator with thought when loadingPhrases is off', async () => { @@ -453,9 +456,8 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - const output = lastFrame(); - expect(output).not.toContain('LoadingIndicator'); - expect(output).not.toContain('esc to cancel'); + const output = lastFrame({ allowEmpty: true }); + expect(output).toBe(''); }); it('renders LoadingIndicator when embedded shell is focused but background shell is visible', async () => { @@ -558,8 +560,10 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); const output = lastFrame(); - expect(output).toContain('ToastDisplay'); - expect(output).not.toContain('ApprovalModeIndicator'); + expect(output).toContain('Press Ctrl+C again to exit.'); + // In Refreshed UX, Row 1 shows toast, and Row 2 shows ApprovalModeIndicator/StatusDisplay + // They are no longer mutually exclusive. + expect(output).toContain('ApprovalModeIndicator'); expect(output).toContain('StatusDisplay'); }); @@ -574,8 +578,8 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); const output = lastFrame(); - expect(output).toContain('ToastDisplay'); - expect(output).not.toContain('ApprovalModeIndicator'); + expect(output).toContain('Warning'); + expect(output).toContain('ApprovalModeIndicator'); }); }); @@ -584,15 +588,17 @@ describe('Composer', () => { const uiState = createMockUIState({ cleanUiDetailsVisible: false, }); + const settings = createMockSettings({ + ui: { showShortcutsHint: false }, + }); - const { lastFrame } = await renderComposer(uiState); + const { lastFrame } = await renderComposer(uiState, settings); const output = lastFrame(); - expect(output).toContain('ShortcutsHint'); + expect(output).not.toContain('press tab twice for more'); + expect(output).not.toContain('? for shortcuts'); expect(output).toContain('InputPrompt'); expect(output).not.toContain('Footer'); - expect(output).not.toContain('ApprovalModeIndicator'); - expect(output).not.toContain('ContextSummaryDisplay'); }); it('renders InputPrompt when input is active', async () => { @@ -665,12 +671,15 @@ describe('Composer', () => { }); it.each([ - [ApprovalMode.YOLO, 'YOLO'], - [ApprovalMode.PLAN, 'plan'], - [ApprovalMode.AUTO_EDIT, 'auto edit'], + { mode: ApprovalMode.YOLO, label: '● YOLO' }, + { mode: ApprovalMode.PLAN, label: '● plan' }, + { + mode: ApprovalMode.AUTO_EDIT, + label: '● auto edit', + }, ])( - 'shows minimal mode badge "%s" when clean UI details are hidden', - async (mode, label) => { + 'shows minimal mode badge "$mode" when clean UI details are hidden', + async ({ mode, label }) => { const uiState = createMockUIState({ cleanUiDetailsVisible: false, showApprovalModeIndicator: mode, @@ -693,7 +702,8 @@ describe('Composer', () => { const output = lastFrame(); expect(output).toContain('LoadingIndicator'); expect(output).not.toContain('plan'); - expect(output).not.toContain('ShortcutsHint'); + expect(output).toContain('press tab twice for more'); + expect(output).not.toContain('? for shortcuts'); }); it('hides minimal mode badge while action-required state is active', async () => { @@ -708,9 +718,7 @@ describe('Composer', () => { }); const { lastFrame } = await renderComposer(uiState); - const output = lastFrame(); - expect(output).not.toContain('plan'); - expect(output).not.toContain('ShortcutsHint'); + expect(lastFrame({ allowEmpty: true })).toBe(''); }); it('shows Esc rewind prompt in minimal mode without showing full UI', async () => { @@ -722,7 +730,7 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); const output = lastFrame(); - expect(output).toContain('ToastDisplay'); + expect(output).toContain('Press Esc again to rewind.'); expect(output).not.toContain('ContextSummaryDisplay'); }); @@ -747,7 +755,14 @@ describe('Composer', () => { }); const { lastFrame } = await renderComposer(uiState, settings); - expect(lastFrame()).toContain('%'); + + await act(async () => { + await vi.advanceTimersByTimeAsync(250); + }); + + // StatusDisplay (which contains ContextUsageDisplay) should bleed through in minimal mode + expect(lastFrame()).toContain('StatusDisplay'); + expect(lastFrame()).toContain('70% used'); }); }); @@ -812,14 +827,20 @@ describe('Composer', () => { describe('Shortcuts Hint', () => { it('restores shortcuts hint after 200ms debounce when buffer is empty', async () => { - const { lastFrame } = await renderComposer( - createMockUIState({ - buffer: { text: '' } as unknown as TextBuffer, - cleanUiDetailsVisible: false, - }), - ); + const uiState = createMockUIState({ + buffer: { text: '' } as unknown as TextBuffer, + cleanUiDetailsVisible: false, + }); - expect(lastFrame({ allowEmpty: true })).toContain('ShortcutsHint'); + const { lastFrame } = await renderComposer(uiState); + + await act(async () => { + await vi.advanceTimersByTimeAsync(250); + }); + + expect(lastFrame({ allowEmpty: true })).toContain( + 'press tab twice for more', + ); }); it('hides shortcuts hint when text is typed in buffer', async () => { @@ -830,7 +851,8 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - expect(lastFrame()).not.toContain('ShortcutsHint'); + expect(lastFrame()).not.toContain('press tab twice for more'); + expect(lastFrame()).not.toContain('? for shortcuts'); }); it('hides shortcuts hint when showShortcutsHint setting is false', async () => { @@ -843,7 +865,7 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState, settings); - expect(lastFrame()).not.toContain('ShortcutsHint'); + expect(lastFrame()).not.toContain('? for shortcuts'); }); it('hides shortcuts hint when a action is required (e.g. dialog is open)', async () => { @@ -856,9 +878,10 @@ describe('Composer', () => { ), }); - const { lastFrame } = await renderComposer(uiState); + const { lastFrame, unmount } = await renderComposer(uiState); - expect(lastFrame()).not.toContain('ShortcutsHint'); + expect(lastFrame({ allowEmpty: true })).toBe(''); + unmount(); }); it('keeps shortcuts hint visible when no action is required', async () => { @@ -868,7 +891,11 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - expect(lastFrame()).toContain('ShortcutsHint'); + await act(async () => { + await vi.advanceTimersByTimeAsync(250); + }); + + expect(lastFrame()).toContain('press tab twice for more'); }); it('shows shortcuts hint when full UI details are visible', async () => { @@ -878,10 +905,15 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - expect(lastFrame()).toContain('ShortcutsHint'); + await act(async () => { + await vi.advanceTimersByTimeAsync(250); + }); + + // In Refreshed UX, shortcuts hint is in the top multipurpose status row + expect(lastFrame()).toContain('? for shortcuts'); }); - it('hides shortcuts hint while loading when full UI details are visible', async () => { + it('shows shortcuts hint while loading when full UI details are visible', async () => { const uiState = createMockUIState({ cleanUiDetailsVisible: true, streamingState: StreamingState.Responding, @@ -889,10 +921,17 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - expect(lastFrame()).not.toContain('ShortcutsHint'); + await act(async () => { + await vi.advanceTimersByTimeAsync(250); + }); + + // In experimental layout, status row is visible during loading + expect(lastFrame()).toContain('LoadingIndicator'); + expect(lastFrame()).toContain('? for shortcuts'); + expect(lastFrame()).not.toContain('press tab twice for more'); }); - it('hides shortcuts hint while loading in minimal mode', async () => { + it('shows shortcuts hint while loading in minimal mode', async () => { const uiState = createMockUIState({ cleanUiDetailsVisible: false, streamingState: StreamingState.Responding, @@ -901,7 +940,14 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - expect(lastFrame()).not.toContain('ShortcutsHint'); + await act(async () => { + await vi.advanceTimersByTimeAsync(250); + }); + + // In experimental layout, status row is visible in clean mode while busy + expect(lastFrame()).toContain('LoadingIndicator'); + expect(lastFrame()).toContain('press tab twice for more'); + expect(lastFrame()).not.toContain('? for shortcuts'); }); it('shows shortcuts help in minimal mode when toggled on', async () => { @@ -926,7 +972,8 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - expect(lastFrame()).not.toContain('ShortcutsHint'); + expect(lastFrame()).not.toContain('press tab twice for more'); + expect(lastFrame()).not.toContain('? for shortcuts'); expect(lastFrame()).not.toContain('plan'); }); @@ -954,7 +1001,12 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - expect(lastFrame()).toContain('ShortcutsHint'); + await act(async () => { + await vi.advanceTimersByTimeAsync(250); + }); + + // In Refreshed UX, shortcuts hint is in the top status row and doesn't collide with suggestions below + expect(lastFrame()).toContain('press tab twice for more'); }); }); @@ -982,24 +1034,22 @@ describe('Composer', () => { expect(lastFrame()).not.toContain('ShortcutsHelp'); unmount(); }); - it('hides shortcuts help when action is required', async () => { const uiState = createMockUIState({ shortcutsHelpVisible: true, customDialog: ( - Dialog content + Test Dialog ), }); const { lastFrame, unmount } = await renderComposer(uiState); - expect(lastFrame()).not.toContain('ShortcutsHelp'); + expect(lastFrame({ allowEmpty: true })).toBe(''); unmount(); }); }); - describe('Snapshots', () => { it('matches snapshot in idle state', async () => { const uiState = createMockUIState(); diff --git a/packages/cli/src/ui/components/Composer.tsx b/packages/cli/src/ui/components/Composer.tsx index 89c9c9d3d6..af6d3b32da 100644 --- a/packages/cli/src/ui/components/Composer.tsx +++ b/packages/cli/src/ui/components/Composer.tsx @@ -4,58 +4,63 @@ * SPDX-License-Identifier: Apache-2.0 */ -import { useState, useEffect, useMemo } from 'react'; -import { Box, Text, useIsScreenReaderEnabled } from 'ink'; import { ApprovalMode, checkExhaustive, CoreToolCallStatus, + isUserVisibleHook, } from '@google/gemini-cli-core'; +import { Box, Text, useIsScreenReaderEnabled } from 'ink'; +import { useState, useEffect, useMemo } from 'react'; +import { useConfig } from '../contexts/ConfigContext.js'; +import { useSettings } from '../contexts/SettingsContext.js'; +import { useUIState } from '../contexts/UIStateContext.js'; +import { useUIActions } from '../contexts/UIActionsContext.js'; +import { useVimMode } from '../contexts/VimModeContext.js'; +import { useAlternateBuffer } from '../hooks/useAlternateBuffer.js'; +import { useTerminalSize } from '../hooks/useTerminalSize.js'; +import { isNarrowWidth } from '../utils/isNarrowWidth.js'; +import { isContextUsageHigh } from '../utils/contextUsage.js'; +import { theme } from '../semantic-colors.js'; +import { GENERIC_WORKING_LABEL } from '../textConstants.js'; +import { INTERACTIVE_SHELL_WAITING_PHRASE } from '../hooks/usePhraseCycler.js'; +import { StreamingState, type HistoryItemToolGroup } from '../types.js'; import { LoadingIndicator } from './LoadingIndicator.js'; +import { ContextUsageDisplay } from './ContextUsageDisplay.js'; import { StatusDisplay } from './StatusDisplay.js'; +import { HorizontalLine } from './shared/HorizontalLine.js'; import { ToastDisplay, shouldShowToast } from './ToastDisplay.js'; import { ApprovalModeIndicator } from './ApprovalModeIndicator.js'; import { ShellModeIndicator } from './ShellModeIndicator.js'; import { DetailedMessagesDisplay } from './DetailedMessagesDisplay.js'; import { RawMarkdownIndicator } from './RawMarkdownIndicator.js'; -import { ShortcutsHint } from './ShortcutsHint.js'; import { ShortcutsHelp } from './ShortcutsHelp.js'; import { InputPrompt } from './InputPrompt.js'; import { Footer } from './Footer.js'; import { ShowMoreLines } from './ShowMoreLines.js'; import { QueuedMessageDisplay } from './QueuedMessageDisplay.js'; -import { ContextUsageDisplay } from './ContextUsageDisplay.js'; -import { HorizontalLine } from './shared/HorizontalLine.js'; import { OverflowProvider } from '../contexts/OverflowContext.js'; -import { isNarrowWidth } from '../utils/isNarrowWidth.js'; -import { useUIState } from '../contexts/UIStateContext.js'; -import { useUIActions } from '../contexts/UIActionsContext.js'; -import { useVimMode } from '../contexts/VimModeContext.js'; -import { useConfig } from '../contexts/ConfigContext.js'; -import { useSettings } from '../contexts/SettingsContext.js'; -import { useAlternateBuffer } from '../hooks/useAlternateBuffer.js'; -import { StreamingState, type HistoryItemToolGroup } from '../types.js'; -import { ConfigInitDisplay } from '../components/ConfigInitDisplay.js'; +import { ConfigInitDisplay } from './ConfigInitDisplay.js'; import { TodoTray } from './messages/Todo.js'; -import { getInlineThinkingMode } from '../utils/inlineThinkingMode.js'; -import { isContextUsageHigh } from '../utils/contextUsage.js'; -import { theme } from '../semantic-colors.js'; export const Composer = ({ isFocused = true }: { isFocused?: boolean }) => { - const config = useConfig(); - const settings = useSettings(); - const isScreenReaderEnabled = useIsScreenReaderEnabled(); const uiState = useUIState(); const uiActions = useUIActions(); + const settings = useSettings(); + const config = useConfig(); const { vimEnabled, vimMode } = useVimMode(); - const inlineThinkingMode = getInlineThinkingMode(settings); - const terminalWidth = uiState.terminalWidth; + const isScreenReaderEnabled = useIsScreenReaderEnabled(); + const { columns: terminalWidth } = useTerminalSize(); const isNarrow = isNarrowWidth(terminalWidth); const debugConsoleMaxHeight = Math.floor(Math.max(terminalWidth * 0.2, 5)); const [suggestionsVisible, setSuggestionsVisible] = useState(false); const isAlternateBuffer = useAlternateBuffer(); - const { showApprovalModeIndicator } = uiState; + const showApprovalModeIndicator = uiState.showApprovalModeIndicator; + const loadingPhrases = settings.merged.ui.loadingPhrases; + const showTips = loadingPhrases === 'tips' || loadingPhrases === 'all'; + const showWit = loadingPhrases === 'witty' || loadingPhrases === 'all'; + const showUiDetails = uiState.cleanUiDetailsVisible; const suggestionsPosition = isAlternateBuffer ? 'above' : 'below'; const hideContextSummary = @@ -84,6 +89,7 @@ export const Composer = ({ isFocused = true }: { isFocused?: boolean }) => { Boolean(uiState.quota.proQuotaRequest) || Boolean(uiState.quota.validationRequest) || Boolean(uiState.customDialog); + const isPassiveShortcutsHelpState = uiState.isInputActive && uiState.streamingState === StreamingState.Idle && @@ -105,16 +111,30 @@ export const Composer = ({ isFocused = true }: { isFocused?: boolean }) => { uiState.shortcutsHelpVisible && uiState.streamingState === StreamingState.Idle && !hasPendingActionRequired; + + /** + * Use the setting if provided, otherwise default to true for the new UX. + * This allows tests to override the collapse behavior. + */ + const shouldCollapseDuringApproval = + settings.merged.ui.collapseDrawerDuringApproval !== false; + + if (hasPendingActionRequired && shouldCollapseDuringApproval) { + return null; + } + const hasToast = shouldShowToast(uiState); const showLoadingIndicator = (!uiState.embeddedShellFocused || uiState.isBackgroundShellVisible) && uiState.streamingState === StreamingState.Responding && !hasPendingActionRequired; + const hideUiDetailsForSuggestions = suggestionsVisible && suggestionsPosition === 'above'; const showApprovalIndicator = !uiState.shellModeActive && !hideUiDetailsForSuggestions; const showRawMarkdownIndicator = !uiState.renderMarkdown; + let modeBleedThrough: { text: string; color: string } | null = null; switch (showApprovalModeIndicator) { case ApprovalMode.YOLO: @@ -137,57 +157,359 @@ export const Composer = ({ isFocused = true }: { isFocused?: boolean }) => { const hideMinimalModeHintWhileBusy = !showUiDetails && (showLoadingIndicator || hasPendingActionRequired); - const minimalModeBleedThrough = hideMinimalModeHintWhileBusy - ? null - : modeBleedThrough; - const hasMinimalStatusBleedThrough = shouldShowToast(uiState); - const showMinimalContextBleedThrough = - !settings.merged.ui.footer.hideContextPercentage && - isContextUsageHigh( - uiState.sessionStats.lastPromptTokenCount, - typeof uiState.currentModel === 'string' - ? uiState.currentModel - : undefined, - ); - const hideShortcutsHintForSuggestions = hideUiDetailsForSuggestions; - const isModelIdle = uiState.streamingState === StreamingState.Idle; - const isBufferEmpty = uiState.buffer.text.length === 0; - const canShowShortcutsHint = - isModelIdle && isBufferEmpty && !hasPendingActionRequired; - const [showShortcutsHintDebounced, setShowShortcutsHintDebounced] = - useState(canShowShortcutsHint); + // Universal Content Objects + const modeContentObj = hideMinimalModeHintWhileBusy ? null : modeBleedThrough; - useEffect(() => { - if (!canShowShortcutsHint) { - setShowShortcutsHintDebounced(false); - return; - } - - const timeout = setTimeout(() => { - setShowShortcutsHintDebounced(true); - }, 200); - - return () => clearTimeout(timeout); - }, [canShowShortcutsHint]); + const allHooks = uiState.activeHooks; + const hasAnyHooks = allHooks.length > 0; + const userVisibleHooks = allHooks.filter((h) => isUserVisibleHook(h.source)); + const hasUserVisibleHooks = userVisibleHooks.length > 0; const shouldReserveSpaceForShortcutsHint = - settings.merged.ui.showShortcutsHint && !hideShortcutsHintForSuggestions; - const showShortcutsHint = - shouldReserveSpaceForShortcutsHint && showShortcutsHintDebounced; - const showMinimalModeBleedThrough = - !hideUiDetailsForSuggestions && Boolean(minimalModeBleedThrough); - const showMinimalInlineLoading = !showUiDetails && showLoadingIndicator; - const showMinimalBleedThroughRow = - !showUiDetails && - (showMinimalModeBleedThrough || - hasMinimalStatusBleedThrough || - showMinimalContextBleedThrough); - const showMinimalMetaRow = - !showUiDetails && - (showMinimalInlineLoading || - showMinimalBleedThroughRow || - shouldReserveSpaceForShortcutsHint); + settings.merged.ui.showShortcutsHint && + !hideUiDetailsForSuggestions && + !hasPendingActionRequired; + + const isInteractiveShellWaiting = uiState.currentLoadingPhrase?.includes( + INTERACTIVE_SHELL_WAITING_PHRASE, + ); + + /** + * Calculate the estimated length of the status message to avoid collisions + * with the tips area. + */ + let estimatedStatusLength = 0; + if (hasAnyHooks) { + if (hasUserVisibleHooks) { + const hookLabel = + userVisibleHooks.length > 1 ? 'Executing Hooks' : 'Executing Hook'; + const hookNames = userVisibleHooks + .map( + (h) => + h.name + + (h.index && h.total && h.total > 1 + ? ` (${h.index}/${h.total})` + : ''), + ) + .join(', '); + estimatedStatusLength = hookLabel.length + hookNames.length + 10; + } else { + estimatedStatusLength = GENERIC_WORKING_LABEL.length + 10; + } + } else if (showLoadingIndicator) { + const thoughtText = uiState.thought?.subject || GENERIC_WORKING_LABEL; + const inlineWittyLength = + showWit && uiState.currentWittyPhrase + ? uiState.currentWittyPhrase.length + 1 + : 0; + estimatedStatusLength = thoughtText.length + 25 + inlineWittyLength; + } else if (hasPendingActionRequired) { + estimatedStatusLength = 20; + } else if (hasToast) { + estimatedStatusLength = 40; + } + + /** + * Determine the ambient text (tip) to display. + */ + const tipContentStr = (() => { + // 1. Proactive Tip (Priority) + if ( + showTips && + uiState.currentTip && + !( + isInteractiveShellWaiting && + uiState.currentTip === INTERACTIVE_SHELL_WAITING_PHRASE + ) + ) { + if ( + estimatedStatusLength + uiState.currentTip.length + 10 <= + terminalWidth + ) { + return uiState.currentTip; + } + } + + // 2. Shortcut Hint (Fallback) + if ( + settings.merged.ui.showShortcutsHint && + !hideUiDetailsForSuggestions && + !hasPendingActionRequired && + uiState.buffer.text.length === 0 + ) { + return showUiDetails ? '? for shortcuts' : 'press tab twice for more'; + } + + return undefined; + })(); + + const tipLength = tipContentStr?.length || 0; + const willCollideTip = estimatedStatusLength + tipLength + 5 > terminalWidth; + + const showTipLine = + !hasPendingActionRequired && tipContentStr && !willCollideTip && !isNarrow; + + // Mini Mode VIP Flags (Pure Content Triggers) + const miniMode_ShowApprovalMode = + Boolean(modeContentObj) && !hideUiDetailsForSuggestions; + const miniMode_ShowToast = hasToast; + const miniMode_ShowShortcuts = shouldReserveSpaceForShortcutsHint; + const miniMode_ShowStatus = showLoadingIndicator || hasAnyHooks; + const miniMode_ShowTip = showTipLine; + const miniMode_ShowContext = isContextUsageHigh( + uiState.sessionStats.lastPromptTokenCount, + uiState.currentModel, + settings.merged.model?.compressionThreshold, + ); + + // Composite Mini Mode Triggers + const showRow1_MiniMode = + miniMode_ShowToast || + miniMode_ShowStatus || + miniMode_ShowShortcuts || + miniMode_ShowTip; + + const showRow2_MiniMode = miniMode_ShowApprovalMode || miniMode_ShowContext; + + // Final Display Rules (Stable Footer Architecture) + const showRow1 = showUiDetails || showRow1_MiniMode; + const showRow2 = showUiDetails || showRow2_MiniMode; + + const showMinimalBleedThroughRow = !showUiDetails && showRow2_MiniMode; + + const renderTipNode = () => { + if (!tipContentStr) return null; + + const isShortcutHint = + tipContentStr === '? for shortcuts' || + tipContentStr === 'press tab twice for more'; + const color = + isShortcutHint && uiState.shortcutsHelpVisible + ? theme.text.accent + : theme.text.secondary; + + return ( + + + {tipContentStr === uiState.currentTip + ? `Tip: ${tipContentStr}` + : tipContentStr} + + + ); + }; + + const renderStatusNode = () => { + const allHooks = uiState.activeHooks; + if (allHooks.length === 0 && !showLoadingIndicator) return null; + + if (allHooks.length > 0) { + const userVisibleHooks = allHooks.filter((h) => + isUserVisibleHook(h.source), + ); + + let hookText = GENERIC_WORKING_LABEL; + if (userVisibleHooks.length > 0) { + const label = + userVisibleHooks.length > 1 ? 'Executing Hooks' : 'Executing Hook'; + const displayNames = userVisibleHooks.map((h) => { + let name = h.name; + if (h.index && h.total && h.total > 1) { + name += ` (${h.index}/${h.total})`; + } + return name; + }); + hookText = `${label}: ${displayNames.join(', ')}`; + } + + return ( + + ); + } + + return ( + + ); + }; + + const statusNode = renderStatusNode(); + + /** + * Renders the minimal metadata row content shown when UI details are hidden. + */ + const renderMinimalMetaRowContent = () => ( + + {renderStatusNode()} + {showMinimalBleedThroughRow && ( + + {miniMode_ShowApprovalMode && modeContentObj && ( + ● {modeContentObj.text} + )} + + )} + + ); + + const renderStatusRow = () => { + // Mini Mode Height Reservation (The "Anti-Jitter" line) + if (!showUiDetails && !showRow1_MiniMode && !showRow2_MiniMode) { + return ; + } + + return ( + + {/* Row 1: multipurpose status (thinking, hooks, wit, tips) */} + {showRow1 && ( + + + {!showUiDetails && showRow1_MiniMode ? ( + renderMinimalMetaRowContent() + ) : isInteractiveShellWaiting ? ( + + + ! Shell awaiting input (Tab to focus) + + + ) : ( + + {statusNode} + + )} + + + + {!isNarrow && showTipLine && renderTipNode()} + + + )} + + {/* Internal Separator Line */} + {showRow1 && + showRow2 && + (showUiDetails || (showRow1_MiniMode && showRow2_MiniMode)) && ( + + + + )} + + {/* Row 2: Mode and Context Summary */} + {showRow2 && ( + + + {showUiDetails ? ( + <> + {showApprovalIndicator && ( + + )} + {uiState.shellModeActive && ( + + + + )} + {showRawMarkdownIndicator && ( + + + + )} + + ) : ( + miniMode_ShowApprovalMode && + modeContentObj && ( + + ● {modeContentObj.text} + + ) + )} + + + {(showUiDetails || miniMode_ShowContext) && ( + + )} + {miniMode_ShowContext && !showUiDetails && ( + + + + )} + + + )} + + ); + }; return ( { flexGrow={0} flexShrink={0} > - {(!uiState.slashCommands || - !uiState.isConfigInitialized || - uiState.isResuming) && ( - + {uiState.isResuming && ( + )} {showUiDetails && ( @@ -210,212 +528,16 @@ export const Composer = ({ isFocused = true }: { isFocused?: boolean }) => { {showUiDetails && } - - - - {showUiDetails && showLoadingIndicator && ( - - )} - - - {showUiDetails && showShortcutsHint && } - - - {showMinimalMetaRow && ( - - - {showMinimalInlineLoading && ( - - )} - {showMinimalModeBleedThrough && minimalModeBleedThrough && ( - - ● {minimalModeBleedThrough.text} - - )} - {hasMinimalStatusBleedThrough && ( - - - - )} - - {(showMinimalContextBleedThrough || - shouldReserveSpaceForShortcutsHint) && ( - - {showMinimalContextBleedThrough && ( - - )} - - {showShortcutsHint && } - - - )} - - )} - {showShortcutsHelp && } - {showUiDetails && } - {showUiDetails && ( - - - {hasToast ? ( - - ) : ( - - {showApprovalIndicator && ( - - )} - {!showLoadingIndicator && ( - <> - {uiState.shellModeActive && ( - - - - )} - {showRawMarkdownIndicator && ( - - - - )} - - )} - - )} - + {showShortcutsHelp && } - - {!showLoadingIndicator && ( - - )} - - - )} + {(showUiDetails || miniMode_ShowToast) && ( + + + + )} + + + {renderStatusRow()} {showUiDetails && uiState.showErrorDetails && ( @@ -466,12 +588,15 @@ export const Composer = ({ isFocused = true }: { isFocused?: boolean }) => { streamingState={uiState.streamingState} suggestionsPosition={suggestionsPosition} onSuggestionsVisibilityChange={setSuggestionsVisible} + copyModeEnabled={uiState.copyModeEnabled} /> )} {showUiDetails && !settings.merged.ui.hideFooter && - !isScreenReaderEnabled &&