diff --git a/.gemini/commands/fix-behavioral-eval.toml b/.gemini/commands/fix-behavioral-eval.toml deleted file mode 100644 index d2f1c5b3ed..0000000000 --- a/.gemini/commands/fix-behavioral-eval.toml +++ /dev/null @@ -1,60 +0,0 @@ -description = "Check status of nightly evals, fix failures for key models, and re-run." -prompt = """ -You are an expert at fixing behavioral evaluations. - -1. **Investigate**: - - Use 'gh' cli to fetch the results from the latest run from the main branch: https://github.com/google-gemini/gemini-cli/actions/workflows/evals-nightly.yml. - - DO NOT push any changes or start any runs. The rest of your evaluation will be local. - - Evals are in evals/ directory and are documented by evals/README.md. - - The test case trajectory logs will be logged to evals/logs. - - You should also enable and review the verbose agent logs by setting the GEMINI_DEBUG_LOG_FILE environment variable. - - Identify the relevant test. Confine your investigation and validation to just this test. - - Proactively add logging that will aid in gathering information or validating your hypotheses. - -2. **Fix**: - - If a relevant test is failing, locate the test file and the corresponding prompt/code. - - It's often helpful to make an extreme, brute force change to see if you are changing the right place to make an improvement and then scope it back iteratively. - - Your **final** change should be **minimal and targeted**. - - Keep in mind the following: - - The prompt has multiple configurations and pieces. Take care that your changes - end up in the final prompt for the selected model and configuration. - - The prompt chosen for the eval is intentional. It's often vague or indirect - to see how the agent performs with ambiguous instructions. Changing it should - be a last resort. - - When changing the test prompt, carefully consider whether the prompt still tests - the same scenario. We don't want to lose test fidelity by making the prompts too - direct (i.e.: easy). - - Your primary mechanism for improving the agent's behavior is to make changes to - tool instructions, system prompt (snippets.ts), and/or modules that contribute to the prompt. - - If prompt and description changes are unsuccessful, use logs and debugging to - confirm that everything is working as expected. - - If unable to fix the test, you can make recommendations for architecture changes - that might help stablize the test. Be sure to THINK DEEPLY if offering architecture guidance. - Some facts that might help with this are: - - Agents may be composed of one or more agent loops. - - AgentLoop == 'context + toolset + prompt'. Subagents are one type of agent loop. - - Agent loops perform better when: - - They have direct, unambiguous, and non-contradictory prompts. - - They have fewer irrelevant tools. - - They have fewer goals or steps to perform. - - They have less low value or irrelevant context. - - You may suggest compositions of existing primitives, like subagents, or - propose a new one. - - These recommendations should be high confidence and should be grounded - in observed deficient behaviors rather than just parroting the facts above. - Investigate as needed to ground your recommendations. - -3. **Verify**: - - Run just that one test if needed to validate that it is fixed. Be sure to run vitest in non-interactive mode. - - Running the tests can take a long time, so consider whether you can diagnose via other means or log diagnostics before committing the time. You must minimize the number of test runs needed to diagnose the failure. - - After the test completes, check whether it seems to have improved. - - You will need to run the test 3 times for Gemini 3.0, Gemini 3 flash, and Gemini 2.5 pro to ensure that it is truly stable. Run these runs in parallel, using scripts if needed. - - Some flakiness is expected; if it looks like a transient issue or the test is inherently unstable but passes 2/3 times, you might decide it cannot be improved. - -4. **Report**: - - Provide a summary of the test success rate for each of the tested models. - - Success rate is calculated based on 3 runs per model (e.g., 3/3 = 100%). - - If you couldn't fix it due to persistent flakiness, explain why. - -{{args}} -""" \ No newline at end of file diff --git a/.gemini/commands/promote-behavioral-eval.toml b/.gemini/commands/promote-behavioral-eval.toml deleted file mode 100644 index 9893e9b02b..0000000000 --- a/.gemini/commands/promote-behavioral-eval.toml +++ /dev/null @@ -1,29 +0,0 @@ -description = "Promote behavioral evals that have a 100% success rate over the last 7 nightly runs." -prompt = """ -You are an expert at analyzing and promoting behavioral evaluations. - -1. **Investigate**: - - Use 'gh' cli to fetch the results from the most recent run from the main branch: https://github.com/google-gemini/gemini-cli/actions/workflows/evals-nightly.yml. - - DO NOT push any changes or start any runs. The rest of your evaluation will be local. - - Evals are in evals/ directory and are documented by evals/README.md. - - Identify tests that have passed 100% of the time for ALL enabled models across the past 7 runs in a row. - - NOTE: the results summary from the most recent run contains the last 7 runs test results. 100% means the test passed 3/3 times for that model and run. - - If a test meets this criteria, it is a candidate for promotion. - -2. **Promote**: - - For each candidate test, locate the test file in the evals/ directory. - - Promote the test according to the project's standard promotion process (e.g., moving it to a stable suite, updating its tags, or removing skip/flaky annotations). - - Ensure you follow any guidelines in evals/README.md for stable tests. - - Your **final** change should be **minimal and targeted** to just promoting the test status. - -3. **Verify**: - - Run the promoted tests locally to validate that they still execute correctly. Be sure to run vitest in non-interactive mode. - - Check that the test is now part of the expected standard or stable test suites. - -4. **Report**: - - Provide a summary of the tests that were promoted. - - Include the success rate evidence (7/7 runs passed for all models) for each promoted test. - - If no tests met the criteria for promotion, clearly state that and summarize the closest candidates. - -{{args}} -""" diff --git a/.gemini/skills/behavioral-evals/SKILL.md b/.gemini/skills/behavioral-evals/SKILL.md new file mode 100644 index 0000000000..f60fb04832 --- /dev/null +++ b/.gemini/skills/behavioral-evals/SKILL.md @@ -0,0 +1,56 @@ +--- +name: behavioral-evals +description: Guidance for creating, running, fixing, and promoting behavioral evaluations. Use when verifying agent decision logic, debugging failures, debugging prompt steering, or adding workspace regression tests. +--- + +# Behavioral Evals + +## Overview + +Behavioral evaluations (evals) are tests that validate the **agent's decision-making** (e.g., tool choice) rather than pure functionality. They are critical for verifying prompt changes, debugging steerability, and preventing regressions. + +> [!NOTE] +> **Single Source of Truth**: For core concepts, policies, running tests, and general best practices, always refer to **[evals/README.md](file:///Users/abhipatel/code/gemini-cli/docs/evals/README.md)**. + +--- + +## 🔄 Workflow Decision Tree + +1. **Does a prompt/tool change need validation?** + * *No* -> Normal integration tests. + * *Yes* -> Continue below. +2. **Is it UI/Interaction heavy?** + * *Yes* -> Use `appEvalTest` (`AppRig`). See **[creating.md](references/creating.md)**. + * *No* -> Use `evalTest` (`TestRig`). See **[creating.md](references/creating.md)**. +3. **Is it a new test?** + * *Yes* -> Set policy to `USUALLY_PASSES`. + * *No* -> `ALWAYS_PASSES` (locks in regression). +4. **Are you fixing a failure or promoting a test?** + * *Fixing* -> See **[fixing.md](references/fixing.md)**. + * *Promoting* -> See **[promoting.md](references/promoting.md)**. + +--- + +## 📋 Quick Checklist + +### 1. Setup Workspace +Seed the workspace with necessary files using the `files` object to simulate a realistic scenario (e.g., NodeJS project with `package.json`). +* *Details in **[creating.md](references/creating.md)*** + +### 2. Write Assertions +Audit agent decisions using `rig.setBreakpoint()` (AppRig only) or index verification on `rig.readToolLogs()`. +* *Details in **[creating.md](references/creating.md)*** + +### 3. Verify +Run single tests locally with Vitest. Confirm stability locally before relying on CI workflows. +* *See **[evals/README.md](file:///Users/abhipatel/code/gemini-cli/docs/evals/README.md)** for running commands.* + +--- + +## 📦 Bundled Resources + +Detailed procedural guides: +* **[creating.md](references/creating.md)**: Assertion strategies, Rig selection, Mock MCPs. +* **[fixing.md](references/fixing.md)**: Step-by-step automated investigation, architecture diagnosis guidelines. +* **[promoting.md](references/promoting.md)**: Candidate identification criteria and threshold guidelines. + diff --git a/.gemini/skills/behavioral-evals/assets/interactive_eval.ts.txt b/.gemini/skills/behavioral-evals/assets/interactive_eval.ts.txt new file mode 100644 index 0000000000..2d2b7433dc --- /dev/null +++ b/.gemini/skills/behavioral-evals/assets/interactive_eval.ts.txt @@ -0,0 +1,27 @@ +import { describe, expect } from 'vitest'; +import { appEvalTest } from './app-test-helper.js'; + +describe('interactive_feature', () => { + // New tests MUST start as USUALLY_PASSES + appEvalTest('USUALLY_PASSES', { + name: 'should pause for user confirmation', + files: { + 'package.json': JSON.stringify({ name: 'app' }) + }, + prompt: 'Task description here requiring approval', + timeout: 60000, + setup: async (rig) => { + // ⚠️ Breakpoints are ONLY safe in appEvalTest + rig.setBreakpoint(['ask_user']); + }, + assert: async (rig) => { + // 1. Wait for the breakpoint to trigger + const confirmation = await rig.waitForPendingConfirmation('ask_user'); + expect(confirmation).toBeDefined(); + + // 2. Resolve it so the test can finish + await rig.resolveTool(confirmation); + await rig.waitForIdle(); + }, + }); +}); diff --git a/.gemini/skills/behavioral-evals/assets/standard_eval.ts.txt b/.gemini/skills/behavioral-evals/assets/standard_eval.ts.txt new file mode 100644 index 0000000000..3e666dfc37 --- /dev/null +++ b/.gemini/skills/behavioral-evals/assets/standard_eval.ts.txt @@ -0,0 +1,30 @@ +import { describe, expect } from 'vitest'; +import { evalTest } from './test-helper.js'; + +describe('core_feature', () => { + // New tests MUST start as USUALLY_PASSES + evalTest('USUALLY_PASSES', { + name: 'should perform expected agent action', + setup: async (rig) => { + // For mocking offline MCP: + // rig.addMockMcpServer('workspace-server', 'google-workspace'); + }, + files: { + 'src/app.ts': '// some code', + }, + prompt: 'Task description here', + timeout: 60000, // 1 minute safety limit + assert: async (rig, result) => { + // 1. Audit the trajectory (Safe for standard evalTest) + const logs = rig.readToolLogs(); + const hasTool = logs.some((l) => l.toolRequest.name === 'read_file'); + expect(hasTool, 'Agent should have read the file').toBe(true); + + // 2. Assert efficiency (Cost/Turn) + expect(logs.length).toBeLessThan(5); + + // 3. Assert final output + expect(result).toContain('Expected Keyword'); + }, + }); +}); diff --git a/.gemini/skills/behavioral-evals/references/creating.md b/.gemini/skills/behavioral-evals/references/creating.md new file mode 100644 index 0000000000..bcc1baff06 --- /dev/null +++ b/.gemini/skills/behavioral-evals/references/creating.md @@ -0,0 +1,151 @@ +# Creating Behavioral Evals + +## 🔬 Rig Selection + +| Rig Type | Import From | Architecture | Use When | +| :---------------- | :--------------------- | :------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | +| **`evalTest`** | `./test-helper.js` | **Subprocess**. Runs the CLI in a separate process + waits for exit. | Standard workspace tests. **Do not use `setBreakpoint`**; auditing history (`readToolLogs`) is safer. | +| **`appEvalTest`** | `./app-test-helper.js` | **In-Process**. Runs directly inside the runner loop. | UI/Ink rendering. Safe for `setBreakpoint` triggers. | + +--- + +## 🏗️ Scenario Design + +Evals must simulate realistic agent environments to effectively test +decision-making. + +- **Workspace State**: Seed with standard project anchors if testing general + capabilities: + - `package.json` for NodeJS environments. + - Minimal configuration files (`tsconfig.json`, `GEMINI.md`). +- **Structural Complexity**: Provide enough files to force the agent to _search_ + or _navigate_, rather than giving the answer directly. Avoid trivial one-file + tests unless testing exact prompt steering. + +--- + +## ❌ Fail First Principle + +Before asserting a new capability or locking in a fix, **verify that the test +fails first**. + +- It is easy to accidentally write an eval that asserts behaviors that are + already met or pass by default. +- **Process**: reproduce failure with test -> apply fix (prompt/tool) -> verify + test passes. + +--- + +## ✋ Testing Patterns + +### 1. Breakpoints + +Verifies the agent _intends_ to use a tool BEFORE executing it. Useful for +interactive prompts or safety checks. + +```typescript +// ⚠️ Only works with appEvalTest (AppRig) +setup: async (rig) => { + rig.setBreakpoint(['ask_user']); +}, +assert: async (rig) => { + const confirmation = await rig.waitForPendingConfirmation('ask_user'); + expect(confirmation).toBeDefined(); +} +``` + +### 2. Tool Confirmation Race + +When asserting multiple triggers (e.g., "enters plan mode then asks question"): + +```typescript +assert: async (rig) => { + let confirmation = await rig.waitForPendingConfirmation([ + 'enter_plan_mode', + 'ask_user', + ]); + + if (confirmation?.name === 'enter_plan_mode') { + rig.acceptConfirmation('enter_plan_mode'); + confirmation = await rig.waitForPendingConfirmation('ask_user'); + } + expect(confirmation?.toolName).toBe('ask_user'); +}; +``` + +### 3. Audit Tool Logs + +Audit exact operations to ensure efficiency (e.g., no redundant reads). + +```typescript +assert: async (rig, result) => { + await rig.waitForTelemetryReady(); + const toolLogs = rig.readToolLogs(); + + const writeCall = toolLogs.find( + (log) => log.toolRequest.name === 'write_file', + ); + expect(writeCall).toBeDefined(); +}; +``` + +### 4. Mock MCP Facades + +To evaluate tools connected via MCP without hitting live endpoints, load a mock +server configuration in the `setup` hook. + +```typescript +setup: async (rig) => { + rig.addMockMcpServer('workspace-server', 'google-workspace'); +}, +assert: async (rig) => { + await rig.waitForTelemetryReady(); + const toolLogs = rig.readToolLogs(); + const workspaceCall = toolLogs.find( + (log) => log.toolRequest.name === 'mcp_workspace-server_docs.getText' + ); + expect(workspaceCall).toBeDefined(); +}; +``` + +--- + +## ⚠️ Safety & Efficiency Guardrails + +### 1. Breakpoint Deadlocks + +Breakpoints (`setBreakpoint`) pause execution. In standard `evalTest`, +`rig.run()` waits for the process to exit _before_ assertions run. **This will +hang indefinitely.** + +- **Use Breakpoints** for `appEvalTest` or interactive simulations. +- **Use Audit Tool Logs** (above) for standard trajectory tests. + +### 2. Runaway Timeout + +Always set a budget boundary in the `EvalCase` to prevent runaway loops on +quota: + +```typescript +evalTest('USUALLY_PASSES', { + name: '...', + timeout: 60000, // 1 minute safety limit + // ... +}); +``` + +### 3. Efficiency Assertion (Turn limits) + +Check if a tool is called _early_ using index checks: + +```typescript +assert: async (rig) => { + const toolLogs = rig.readToolLogs(); + const toolCallIndex = toolLogs.findIndex( + (log) => log.toolRequest.name === 'cli_help', + ); + + expect(toolCallIndex).toBeGreaterThan(-1); + expect(toolCallIndex).toBeLessThan(5); // Called within first 5 turns +}; +``` diff --git a/.gemini/skills/behavioral-evals/references/fixing.md b/.gemini/skills/behavioral-evals/references/fixing.md new file mode 100644 index 0000000000..fc78870515 --- /dev/null +++ b/.gemini/skills/behavioral-evals/references/fixing.md @@ -0,0 +1,71 @@ +# Fixing Behavioral Evals + +Use this guide when asked to debug, troubleshoot, or fix a failing behavioral +evaluation. + +--- + +## 1. 🔍 Investigate + +1. **Fetch Nightly Results**: Use the `gh` CLI to inspect the latest run from + `evals-nightly.yml` if applicable. + - _Example view URL_: + `https://github.com/google-gemini/gemini-cli/actions/workflows/evals-nightly.yml` +2. **Isolate**: DO NOT push changes or start remote runs. Confine investigation + to the local workspace. +3. **Read Logs**: + - Eval logs live in `evals/logs/.log`. + - Enable verbose debugging via `export GEMINI_DEBUG_LOG_FILE="debug.log"`. +4. **Diagnose**: Audit tool logs and telemetry. Note if due to setup/assert. + - **Tip**: Proactively add custom logging/diagnostics to check hypotheses. + +--- + +## 2. 🛠️ Fix Strategy + +1. **Targeted Location**: Locate the test case and the corresponding + prompt/code. +2. **Iterative Scope**: Make extreme change first to verify scope, then refine + to a minimal, targeted change. +3. **Assertion Fidelity**: + - Changing the test prompt is a **last resort** (prompts are often vague by + design). + - **Warning**: Do not lose test fidelity by making prompts too direct/easy. + - **Primary Fix Trigger**: Adjust tool descriptions, system prompts + (`snippets.ts`), or **modules that contribute to the prompt template**. + - **Warning**: Prompts have multiple configurations; ensure your fix targets + the correct config for the model in question. +4. **Architecture Options**: If prompt or instruction tuning triggers no + improvement, analyze loop composition. + - **AgentLoop**: Defined by `context + toolset + prompt`. + - **Enhancements**: Loops perform best with direct prompts, fewer irrelevant + tools, low goal density, and minimal low-value/irrelevant context. + - **Modifications**: Compose subagents or isolate tools. Ground in observed + traces. + - **Warning**: Think deeply before offering recommendations; avoid parroting + abstract design guidelines. + +--- + +## 3. ✅ Verify + +1. **Run Local**: Run Vitest in non-interactive mode on just the file. +2. **Log Audit**: Prioritize diagnosing failures via log comparison before + triggering heavy test runs. +3. **Stability Limit**: Run the test **3 times** locally on key models (can use + scripts to run in parallel for speed): + - **Gemini 3.0** + - **Gemini 3 Flash** + - **Gemini 2.5 Pro** +4. **Flakiness Rule**: If it passes 2/3 times, it may be inherent noise + difficult to improve without a structural split. + +--- + +## 4. 📊 Report + +Provide a summary of: + +- Test success rate for each tested model (e.g., 3/3 = 100%). +- Root cause identification and fix explanation. +- If unfixed, provide high-confidence architecture recommendations. diff --git a/.gemini/skills/behavioral-evals/references/promoting.md b/.gemini/skills/behavioral-evals/references/promoting.md new file mode 100644 index 0000000000..d3d3eaf88f --- /dev/null +++ b/.gemini/skills/behavioral-evals/references/promoting.md @@ -0,0 +1,55 @@ +# Promoting Behavioral Evals + +Use this guide when asked to analyze nightly results and promote incubated tests +to stable suites. + +--- + +## 1. 🔍 Investigate candidates + +1. **Audit Nightly Logs**: Use the `gh` CLI to fetch results from + `evals-nightly.yml` (Direct URL: + `https://github.com/google-gemini/gemini-cli/actions/workflows/evals-nightly.yml`). + - **Tip**: The aggregate summary from the most recent run integrates the + last 7 runs of history automatically. + - **Safety**: DO NOT push changes or start remote runs. All verification is + local. +2. **Assess Stability**: Identify tests that pass **100% of the time** across + ALL enabled models over the **last 7 nightly runs** in a row. + - _100% means the test passed 3/3 times for every model and run._ +3. **Promotion Targets**: Tests meeting this criteria are candidates for + promotion from `USUALLY_PASSES` to `ALWAYS_PASSES`. + +--- + +## 2. 🚥 Promotion Steps + +1. **Locate File**: Locate the eval file in the `evals/` directory. +2. **Update Policy**: Modify the policy argument to `ALWAYS_PASSES`. + ```typescript + evalTest('ALWAYS_PASSES', { ... }) + ``` +3. **Targeting**: Follow guidelines in `evals/README.md` regarding stable suite + organization. +4. **Constraint**: Your final change must be **minimal and targeted** strictly + to promoting the test status. Do not refactor the test or setup fixtures. + +--- + +## 3. ✅ Verify + +1. **Run Prompted Tests**: Run the promoted test locally using non-interactive + Vitest to confirm structure validity. +2. **Verify Suite Inclusion**: Check that the test is successfully picked up by + standard runnable ranges. + +--- + +## 4. 📊 Report + +Provide a summary of: + +- Which tests were promoted. +- Provide the success rate evidence (e.g., 7/7 runs passed for all models). +- If no candidates qualified, list the next closest candidates and their current + pass rate. diff --git a/.gemini/skills/behavioral-evals/references/running.md b/.gemini/skills/behavioral-evals/references/running.md new file mode 100644 index 0000000000..cf8c46a8d6 --- /dev/null +++ b/.gemini/skills/behavioral-evals/references/running.md @@ -0,0 +1,95 @@ +# Running & Promoting Evals + +## 🛠️ Prerequisites + +Behavioral evals run against the compiled binary. You **must** build and bundle +the project first after making changes: + +```bash +npm run build && npm run bundle +``` + +--- + +## 🏃‍♂️ Running Tests + +### 1. Configure Environment Variables + +Evals require a standard API key. If your `.env` file has multiple keys or +comments, use this precise extraction setup: + +```bash +export GEMINI_API_KEY=$(grep '^GEMINI_API_KEY=' .env | cut -d '=' -f2) && RUN_EVALS=1 npx vitest run --config evals/vitest.config.ts +``` + +### 2. Commands + +| Command | Scope | Description | +| :---------------------------------- | :-------------- | :------------------------------------------------- | +| `npm run test:always_passing_evals` | `ALWAYS_PASSES` | Fast feedback, runs in CI. | +| `npm run test:all_evals` | All | Runs nightly incubation tests. Sets `RUN_EVALS=1`. | + +### Target Specific File + +_Note: `RUN_EVALS=1` is required for incubated (`USUALLY_PASSES`) tests._ + +```bash +RUN_EVALS=1 npx vitest run --config evals/vitest.config.ts my_feature.eval.ts +``` + +--- + +## 🐞 Debugging and Logs + +If a test fails, verify: + +- **Tool Trajectory Logs**:序列 of calls in `evals/logs/.log`. +- **Verbose Reasoning**: Capture raw buffer traces by setting + `GEMINI_DEBUG_LOG_FILE`: + ```bash + export GEMINI_DEBUG_LOG_FILE="debug.log" + ``` + +--- + +### 🎯 Verify Model Targeting + +- **Tip:** Standard evals benchmark against model variations. If a test passes + on Flash but fails on Pro (or vice versa), the issue is usually in the **tool + description**, not the prompt definition. Flash is sensitive to "instruction + bloat," while Pro is sensitive to "ambiguous intent." + +--- + +## 🚥 deflaking & Promotion + +To maintain CI stability, all new evals follow a strict incubation period. + +### 1. Incubation (`USUALLY_PASSES`) + +New tests must be created with the `USUALLY_PASSES` policy. + +```typescript +evalTest('USUALLY_PASSES', { ... }) +``` + +They run in **Evals: Nightly** workflows and do not block PR merges. + +### 2. Investigate Failures + +If a nightly eval regresses, investigate via agent: + +```bash +gemini /fix-behavioral-eval [optional-run-uri] +``` + +### 3. Promotion (`ALWAYS_PASSES`) + +Once a test scores 100% consistency over multiple nightly cycles: + +```bash +gemini /promote-behavioral-eval +``` + +_Do not promote manually._ The command verifies trajectory logs before updating +the file policy. diff --git a/.github/workflows/eval-guidance.yml b/.github/workflows/eval-guidance.yml new file mode 100644 index 0000000000..e1f1ab3168 --- /dev/null +++ b/.github/workflows/eval-guidance.yml @@ -0,0 +1,69 @@ +name: 'Evals: PR Guidance' + +on: + pull_request: + paths: + - 'packages/core/src/**/*.ts' + - '!**/*.test.ts' + - '!**/*.test.tsx' + +permissions: + pull-requests: 'write' + contents: 'read' + +jobs: + provide-guidance: + name: 'Model Steering Guidance' + runs-on: 'ubuntu-latest' + if: "github.repository == 'google-gemini/gemini-cli'" + steps: + - name: 'Checkout' + uses: 'actions/checkout@08eba0b27e820071cde6df949e0beb9ba4906955' # ratchet:actions/checkout@v4 + with: + fetch-depth: 0 + + - name: 'Set up Node.js' + uses: 'actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020' # ratchet:actions/setup-node@v4.4.0 + with: + node-version-file: '.nvmrc' + cache: 'npm' + + - name: 'Detect Steering Changes' + id: 'detect' + run: | + STEERING_DETECTED=$(node scripts/changed_prompt.js --steering-only) + echo "STEERING_DETECTED=$STEERING_DETECTED" >> "$GITHUB_OUTPUT" + + - name: 'Analyze PR Content' + if: "steps.detect.outputs.STEERING_DETECTED == 'true'" + id: 'analysis' + env: + GH_TOKEN: '${{ secrets.GITHUB_TOKEN }}' + run: | + # Check for behavioral eval changes + EVAL_CHANGES=$(git diff --name-only origin/${{ github.base_ref }}...HEAD | grep "^evals/" || true) + if [ -z "$EVAL_CHANGES" ]; then + echo "MISSING_EVALS=true" >> "$GITHUB_OUTPUT" + fi + + # Check if user is a maintainer (has write/admin access) + USER_PERMISSION=$(gh api repos/${{ github.repository }}/collaborators/${{ github.actor }}/permission --jq '.permission') + if [[ "$USER_PERMISSION" == "admin" || "$USER_PERMISSION" == "write" ]]; then + echo "IS_MAINTAINER=true" >> "$GITHUB_OUTPUT" + fi + + - name: 'Post Guidance Comment' + if: "steps.detect.outputs.STEERING_DETECTED == 'true'" + uses: 'thollander/actions-comment-pull-request@65f9e5c9a1f2cd378bd74b2e057c9736982a8e74' # ratchet:thollander/actions-comment-pull-request@v3 + with: + comment-tag: 'eval-guidance-bot' + message: | + ### 🧠 Model Steering Guidance + + This PR modifies files that affect the model's behavior (prompts, tools, or instructions). + + ${{ steps.analysis.outputs.MISSING_EVALS == 'true' && '- ⚠️ **Consider adding Evals:** No behavioral evaluations (`evals/*.eval.ts`) were added or updated in this PR. Consider adding a test case to verify the new behavior and prevent regressions.' || '' }} + ${{ steps.analysis.outputs.IS_MAINTAINER == 'true' && '- 🚀 **Maintainer Reminder:** Please ensure that these changes do not regress results on benchmark evals before merging.' || '' }} + + --- + *This is an automated guidance message triggered by steering logic signatures.* diff --git a/docs/changelogs/preview.md b/docs/changelogs/preview.md index 39e1e0a2ed..0172fcdb87 100644 --- a/docs/changelogs/preview.md +++ b/docs/changelogs/preview.md @@ -1,6 +1,6 @@ -# Preview release: v0.35.0-preview.2 +# Preview release: v0.35.0-preview.5 -Released: March 19, 2026 +Released: March 23, 2026 Our preview release includes the latest, new, and experimental features. This release may not be as stable as our [latest weekly release](latest.md). @@ -33,6 +33,13 @@ npm install -g @google/gemini-cli@preview ## What's Changed +- fix(patch): cherry-pick b2d6dc4 to release/v0.35.0-preview.4-pr-23546 + [CONFLICTS] by @gemini-cli-robot in + [#23585](https://github.com/google-gemini/gemini-cli/pull/23585) +- fix(patch): cherry-pick daf3691 to release/v0.35.0-preview.2-pr-23558 to patch + version v0.35.0-preview.2 and create version 0.35.0-preview.3 by + @gemini-cli-robot in + [#23565](https://github.com/google-gemini/gemini-cli/pull/23565) - fix(patch): cherry-pick 4e5dfd0 to release/v0.35.0-preview.1-pr-23074 to patch version v0.35.0-preview.1 and create version 0.35.0-preview.2 by @gemini-cli-robot in @@ -377,4 +384,4 @@ npm install -g @google/gemini-cli@preview [#22815](https://github.com/google-gemini/gemini-cli/pull/22815) **Full Changelog**: -https://github.com/google-gemini/gemini-cli/compare/v0.34.0-preview.4...v0.35.0-preview.2 +https://github.com/google-gemini/gemini-cli/compare/v0.34.0-preview.4...v0.35.0-preview.5 diff --git a/docs/cli/plan-mode.md b/docs/cli/plan-mode.md index 5299bb3463..2163e4fcd1 100644 --- a/docs/cli/plan-mode.md +++ b/docs/cli/plan-mode.md @@ -200,6 +200,7 @@ your specific environment. ```toml [[rule]] +toolName = "*" mcpName = "*" toolAnnotations = { readOnlyHint = true } decision = "allow" diff --git a/docs/cli/telemetry.md b/docs/cli/telemetry.md index fec0fb41c3..dd13d5eb82 100644 --- a/docs/cli/telemetry.md +++ b/docs/cli/telemetry.md @@ -904,6 +904,20 @@ Logs keychain availability checks. - `available` (boolean) +##### `gemini_cli.startup_stats` + +Logs detailed startup performance statistics. + +
+Attributes + +- `phases` (json array of startup phases) +- `os_platform` (string) +- `os_release` (string) +- `is_docker` (boolean) + +
+ ### Metrics @@ -920,6 +934,20 @@ Gemini CLI exports several custom metrics. Incremented once per CLI startup. +##### Onboarding + +Tracks onboarding flow from authentication to the user + +- `gemini_cli.onboarding.start` (Counter, Int): Incremented when the + authentication flow begins. + +- `gemini_cli.onboarding.success` (Counter, Int): Incremented when the user +onboarding flow completes successfully. +
+Attributes (Success) + +- `user_tier` (string) + ##### Tools ##### `gemini_cli.tool.call.count` diff --git a/docs/reference/configuration.md b/docs/reference/configuration.md index 47b0d8124a..a5533e199c 100644 --- a/docs/reference/configuration.md +++ b/docs/reference/configuration.md @@ -295,6 +295,11 @@ their corresponding top-level category object in your `settings.json` file. - **Description:** Hide the footer from the UI - **Default:** `false` +- **`ui.collapseDrawerDuringApproval`** (boolean): + - **Description:** Whether to collapse the UI drawer when a tool is awaiting + confirmation. + - **Default:** `true` + - **`ui.showMemoryUsage`** (boolean): - **Description:** Display memory usage information in the UI - **Default:** `false` @@ -1535,7 +1540,7 @@ their corresponding top-level category object in your `settings.json` file. - **`experimental.enableAgents`** (boolean): - **Description:** Enable local and remote subagents. - - **Default:** `true` + - **Default:** `false` - **Requires restart:** Yes - **`experimental.worktrees`** (boolean): diff --git a/docs/reference/policy-engine.md b/docs/reference/policy-engine.md index 456c8a9dc8..c9fc482ea7 100644 --- a/docs/reference/policy-engine.md +++ b/docs/reference/policy-engine.md @@ -301,7 +301,7 @@ priority = 10 # (Optional) A custom message to display when a tool call is denied by this # rule. This message is returned to the model and user, # useful for explaining *why* it was denied. -deny_message = "Deletion is permanent" +denyMessage = "Deletion is permanent" # (Optional) An array of approval modes where this rule is active. modes = ["autoEdit"] @@ -310,6 +310,14 @@ modes = ["autoEdit"] # non-interactive (false) environments. # If omitted, the rule applies to both. interactive = true + +# (Optional) If true, lets shell commands use redirection operators +# (>, >>, <, <<, <<<). By default, the policy engine asks for confirmation +# when redirection is detected, even if a rule matches the command. +# This permission is granular; it only applies to the specific rule it's +# defined in. In chained commands (e.g., cmd1 > file && cmd2), each +# individual command rule must permit redirection if it's used. +allowRedirection = true ``` ### Using arrays (lists) @@ -394,7 +402,7 @@ server. mcpName = "untrusted-server" decision = "deny" priority = 500 -deny_message = "This server is not trusted by the admin." +denyMessage = "This server is not trusted by the admin." ``` **3. Targeting all MCP servers** @@ -405,6 +413,7 @@ registered MCP server. This is useful for setting category-wide defaults. ```toml # Ask user for any tool call from any MCP server [[rule]] +toolName = "*" mcpName = "*" decision = "ask_user" priority = 10 diff --git a/eslint.config.js b/eslint.config.js index 38dec43857..e827f9b236 100644 --- a/eslint.config.js +++ b/eslint.config.js @@ -35,6 +35,12 @@ const commonRestrictedSyntaxRules = [ message: 'Do not throw string literals or non-Error objects. Throw new Error("...") instead.', }, + { + selector: + 'UnaryExpression[operator="typeof"] > MemberExpression[computed=true][property.type="Literal"]', + message: + 'Do not use typeof to check object properties. Define a TypeScript interface and a type guard function instead.', + }, ]; export default tseslint.config( @@ -133,16 +139,7 @@ export default tseslint.config( 'no-cond-assign': 'error', 'no-debugger': 'error', 'no-duplicate-case': 'error', - 'no-restricted-syntax': [ - 'error', - ...commonRestrictedSyntaxRules, - { - selector: - 'UnaryExpression[operator="typeof"] > MemberExpression[computed=true][property.type="Literal"]', - message: - 'Do not use typeof to check object properties. Define a TypeScript interface and a type guard function instead.', - }, - ], + 'no-restricted-syntax': ['error', ...commonRestrictedSyntaxRules], 'no-unsafe-finally': 'error', 'no-unused-expressions': 'off', // Disable base rule '@typescript-eslint/no-unused-expressions': [ @@ -161,6 +158,7 @@ export default tseslint.config( '@typescript-eslint/await-thenable': ['error'], '@typescript-eslint/no-floating-promises': ['error'], '@typescript-eslint/no-unnecessary-type-assertion': ['error'], + '@typescript-eslint/no-misused-spread': ['error'], 'no-restricted-imports': [ 'error', { diff --git a/evals/README.md b/evals/README.md index 6cfecbad07..9e3697a6b8 100644 --- a/evals/README.md +++ b/evals/README.md @@ -6,6 +6,10 @@ for changes to system prompts, tool definitions, and other model-steering mechanisms, and as a tool for assessing feature reliability by model, and preventing regressions. +> [!TIP] **Agent Automation**: If you are pair-programming with Gemini CLI, you +> can leverage the **behavioral-evals skill** to automate fixing failing tests +> or promoting incubation candidates. + ## Why Behavioral Evals? Unlike traditional **integration tests** which verify that the system functions @@ -121,7 +125,7 @@ import { describe, expect } from 'vitest'; import { evalTest } from './test-helper.js'; describe('my_feature', () => { - // New tests MUST start as USUALLY_PASSES and be promoted via /promote-behavioral-eval + // New tests MUST start as USUALLY_PASSES and be promoted based on consistency metrics evalTest('USUALLY_PASSES', { name: 'should do something', prompt: 'do it', @@ -183,12 +187,10 @@ mandatory deflaking process. 1. **Incubation**: You must create all new tests with the `USUALLY_PASSES` policy. This lets them be monitored in the nightly runs without blocking PRs. -2. **Monitoring**: The test must complete at least 10 nightly runs across all +2. **Monitoring**: The test must complete at least 7 nightly runs across all supported models. -3. **Promotion**: Promotion to `ALWAYS_PASSES` happens exclusively through the - `/promote-behavioral-eval` slash command. This command verifies the 100% - success rate requirement is met across many runs before updating the test - policy. +3. **Promotion**: Promotion to `ALWAYS_PASSES` is conducted by the agent after + verifying the 100% success rate requirement is met across many runs. This promotion process is essential for preventing the introduction of flaky evaluations into the CI. @@ -225,42 +227,21 @@ tool definition has made the model's behavior less reliable. ## Fixing Evaluations -If an evaluation is failing or has a regressed pass rate, you can use the -`/fix-behavioral-eval` command within Gemini CLI to help investigate and fix the -issue. - -### `/fix-behavioral-eval` - -This command is designed to automate the investigation and fixing process for -failing evaluations. It will: +If an evaluation is failing or has a regressed pass rate, ask the agent to +investigate and fix the issue using the **behavioral-evals skill**. The agent +will automate the following process: 1. **Investigate**: Fetch the latest results from the nightly workflow using the `gh` CLI, identify the failing test, and review test trajectory logs in `evals/logs`. 2. **Fix**: Suggest and apply targeted fixes to the prompt or tool definitions. - It prioritizes minimal changes to `prompt.ts`, tool instructions, and - modules that contribute to the prompt. It generally tries to avoid changing - the test itself. -3. **Verify**: Re-run the test 3 times across multiple models (e.g., Gemini - 3.0, Gemini 3 Flash, Gemini 2.5 Pro) to ensure stability and calculate a - success rate. -4. **Report**: Provide a summary of the success rate for each model and details - on the applied fixes. + It prioritizes minimal changes to `prompt.ts` and tool instructions, + avoiding changing the test itself unless necessary. +3. **Verify**: Re-run the test locally across multiple models to ensure + stability. +4. **Report**: Provide a summary of the success rate. -To use it, run: - -```bash -gemini /fix-behavioral-eval -``` - -You can also provide a link to a specific GitHub Action run or the name of a -specific test to focus the investigation: - -```bash -gemini /fix-behavioral-eval https://github.com/google-gemini/gemini-cli/actions/runs/123456789 -``` - -When investigating failures manually, you can also enable verbose agent logs by +When investigating failures manually, you can enable verbose agent logs by setting the `GEMINI_DEBUG_LOG_FILE` environment variable. ### Best practices @@ -273,25 +254,14 @@ instrospecting on its prompt when asked the right questions. ## Promoting evaluations -Evaluations must be promoted from `USUALLY_PASSES` to `ALWAYS_PASSES` -exclusively using the `/promote-behavioral-eval` slash command. Manual promotion -is not allowed to ensure that the 100% success rate requirement is empirically -met. +Evaluations must be promoted from `USUALLY_PASSES` to `ALWAYS_PASSES` by the +agent to ensure that the 100% success rate requirement is empirically met. -### `/promote-behavioral-eval` - -This command automates the promotion of stable tests by: +The agent automates the promotion by: 1. **Investigating**: Analyzing the results of the last 7 nightly runs on the - `main` branch using the `gh` CLI. -2. **Criteria Check**: Identifying tests that have passed 100% of the time for - ALL enabled models across the entire 7-run history. -3. **Promotion**: Updating the test file's policy from `USUALLY_PASSES` to - `ALWAYS_PASSES`. + `main` branch. +2. **Criteria Check**: Ensuring tests passed 100% of the time for ALL enabled + models. +3. **Promotion**: Updating the test file's policy to `ALWAYS_PASSES`. 4. **Verification**: Running the promoted test locally to ensure correctness. - -To run it: - -```bash -gemini /promote-behavioral-eval -``` diff --git a/evals/app-test-helper.ts b/evals/app-test-helper.ts index 89f1582bdc..2bcff41924 100644 --- a/evals/app-test-helper.ts +++ b/evals/app-test-helper.ts @@ -15,9 +15,26 @@ import fs from 'node:fs'; import path from 'node:path'; import { DEFAULT_GEMINI_MODEL } from '@google/gemini-cli-core'; +/** + * Config overrides for evals, with tool-restriction fields explicitly + * forbidden. Evals must test against the full, default tool set to ensure + * realistic behavior. + */ +interface EvalConfigOverrides { + /** Restricting tools via excludeTools in evals is forbidden. */ + excludeTools?: never; + /** Restricting tools via coreTools in evals is forbidden. */ + coreTools?: never; + /** Restricting tools via allowedTools in evals is forbidden. */ + allowedTools?: never; + /** Restricting tools via mainAgentTools in evals is forbidden. */ + mainAgentTools?: never; + [key: string]: unknown; +} + export interface AppEvalCase { name: string; - configOverrides?: any; + configOverrides?: EvalConfigOverrides; prompt: string; timeout?: number; files?: Record; diff --git a/evals/cli_help_delegation.eval.ts b/evals/cli_help_delegation.eval.ts new file mode 100644 index 0000000000..8be3bf1c51 --- /dev/null +++ b/evals/cli_help_delegation.eval.ts @@ -0,0 +1,25 @@ +import { describe, expect } from 'vitest'; +import { evalTest } from './test-helper.js'; + +describe('CliHelpAgent Delegation', () => { + evalTest('USUALLY_PASSES', { + name: 'should delegate to cli_help agent for subagent creation questions', + params: { + settings: { + experimental: { + enableAgents: true, + }, + }, + }, + prompt: 'Help me create a subagent in this project', + timeout: 60000, + assert: async (rig, _result) => { + const toolLogs = rig.readToolLogs(); + const toolCallIndex = toolLogs.findIndex( + (log) => log.toolRequest.name === 'cli_help', + ); + expect(toolCallIndex).toBeGreaterThan(-1); + expect(toolCallIndex).toBeLessThan(5); // Called within first 5 turns + }, + }); +}); diff --git a/evals/generalist_delegation.eval.ts b/evals/generalist_delegation.eval.ts index 7e6358ae1f..81252880eb 100644 --- a/evals/generalist_delegation.eval.ts +++ b/evals/generalist_delegation.eval.ts @@ -21,7 +21,6 @@ describe('generalist_delegation', () => { experimental: { enableAgents: true, }, - excludeTools: ['run_shell_command'], }, files: { 'file1.ts': 'console.log("no semi")', @@ -65,7 +64,6 @@ describe('generalist_delegation', () => { experimental: { enableAgents: true, }, - excludeTools: ['run_shell_command'], }, files: { 'src/a.ts': 'export const a = 1;', @@ -106,7 +104,6 @@ describe('generalist_delegation', () => { experimental: { enableAgents: true, }, - excludeTools: ['run_shell_command'], }, files: { 'README.md': 'This is a proyect.', @@ -141,7 +138,6 @@ describe('generalist_delegation', () => { experimental: { enableAgents: true, }, - excludeTools: ['run_shell_command'], }, files: { 'src/VERSION': '1.2.3', diff --git a/evals/model_steering.eval.ts b/evals/model_steering.eval.ts index 4a5ae46e3f..2cb87edcc2 100644 --- a/evals/model_steering.eval.ts +++ b/evals/model_steering.eval.ts @@ -15,7 +15,6 @@ describe('Model Steering Behavioral Evals', () => { appEvalTest('USUALLY_PASSES', { name: 'Corrective Hint: Model switches task based on hint during tool turn', configOverrides: { - excludeTools: ['run_shell_command', 'ls', 'google_web_search'], modelSteering: true, }, files: { @@ -55,7 +54,6 @@ describe('Model Steering Behavioral Evals', () => { appEvalTest('USUALLY_PASSES', { name: 'Suggestive Hint: Model incorporates user guidance mid-stream', configOverrides: { - excludeTools: ['run_shell_command', 'ls', 'google_web_search'], modelSteering: true, }, files: {}, diff --git a/evals/save_memory.eval.ts b/evals/save_memory.eval.ts index 901cbf3c17..25e081a819 100644 --- a/evals/save_memory.eval.ts +++ b/evals/save_memory.eval.ts @@ -16,9 +16,7 @@ describe('save_memory', () => { const rememberingFavoriteColor = "Agent remembers user's favorite color"; evalTest('ALWAYS_PASSES', { name: rememberingFavoriteColor, - params: { - settings: { tools: { core: ['save_memory'] } }, - }, + prompt: `remember that my favorite color is blue. what is my favorite color? tell me that and surround it with $ symbol`, @@ -38,9 +36,7 @@ describe('save_memory', () => { const rememberingCommandRestrictions = 'Agent remembers command restrictions'; evalTest('USUALLY_PASSES', { name: rememberingCommandRestrictions, - params: { - settings: { tools: { core: ['save_memory'] } }, - }, + prompt: `I don't want you to ever run npm commands.`, assert: async (rig, result) => { const wasToolCalled = await rig.waitForToolCall('save_memory'); @@ -59,9 +55,7 @@ describe('save_memory', () => { const rememberingWorkflow = 'Agent remembers workflow preferences'; evalTest('USUALLY_PASSES', { name: rememberingWorkflow, - params: { - settings: { tools: { core: ['save_memory'] } }, - }, + prompt: `I want you to always lint after building.`, assert: async (rig, result) => { const wasToolCalled = await rig.waitForToolCall('save_memory'); @@ -81,9 +75,7 @@ describe('save_memory', () => { 'Agent ignores temporary conversation details'; evalTest('ALWAYS_PASSES', { name: ignoringTemporaryInformation, - params: { - settings: { tools: { core: ['save_memory'] } }, - }, + prompt: `I'm going to get a coffee.`, assert: async (rig, result) => { await rig.waitForTelemetryReady(); @@ -106,9 +98,7 @@ describe('save_memory', () => { const rememberingPetName = "Agent remembers user's pet's name"; evalTest('ALWAYS_PASSES', { name: rememberingPetName, - params: { - settings: { tools: { core: ['save_memory'] } }, - }, + prompt: `Please remember that my dog's name is Buddy.`, assert: async (rig, result) => { const wasToolCalled = await rig.waitForToolCall('save_memory'); @@ -127,9 +117,7 @@ describe('save_memory', () => { const rememberingCommandAlias = 'Agent remembers custom command aliases'; evalTest('ALWAYS_PASSES', { name: rememberingCommandAlias, - params: { - settings: { tools: { core: ['save_memory'] } }, - }, + prompt: `When I say 'start server', you should run 'npm run dev'.`, assert: async (rig, result) => { const wasToolCalled = await rig.waitForToolCall('save_memory'); @@ -149,18 +137,6 @@ describe('save_memory', () => { "Agent ignores workspace's database schema location"; evalTest('USUALLY_PASSES', { name: ignoringDbSchemaLocation, - params: { - settings: { - tools: { - core: [ - 'save_memory', - 'list_directory', - 'read_file', - 'run_shell_command', - ], - }, - }, - }, prompt: `The database schema for this workspace is located in \`db/schema.sql\`.`, assert: async (rig, result) => { await rig.waitForTelemetryReady(); @@ -180,9 +156,7 @@ describe('save_memory', () => { "Agent remembers user's coding style preference"; evalTest('ALWAYS_PASSES', { name: rememberingCodingStyle, - params: { - settings: { tools: { core: ['save_memory'] } }, - }, + prompt: `I prefer to use tabs instead of spaces for indentation.`, assert: async (rig, result) => { const wasToolCalled = await rig.waitForToolCall('save_memory'); @@ -202,18 +176,6 @@ describe('save_memory', () => { 'Agent ignores workspace build artifact location'; evalTest('USUALLY_PASSES', { name: ignoringBuildArtifactLocation, - params: { - settings: { - tools: { - core: [ - 'save_memory', - 'list_directory', - 'read_file', - 'run_shell_command', - ], - }, - }, - }, prompt: `In this workspace, build artifacts are stored in the \`dist/artifacts\` directory.`, assert: async (rig, result) => { await rig.waitForTelemetryReady(); @@ -232,18 +194,6 @@ describe('save_memory', () => { const ignoringMainEntryPoint = "Agent ignores workspace's main entry point"; evalTest('USUALLY_PASSES', { name: ignoringMainEntryPoint, - params: { - settings: { - tools: { - core: [ - 'save_memory', - 'list_directory', - 'read_file', - 'run_shell_command', - ], - }, - }, - }, prompt: `The main entry point for this workspace is \`src/index.js\`.`, assert: async (rig, result) => { await rig.waitForTelemetryReady(); @@ -262,9 +212,7 @@ describe('save_memory', () => { const rememberingBirthday = "Agent remembers user's birthday"; evalTest('ALWAYS_PASSES', { name: rememberingBirthday, - params: { - settings: { tools: { core: ['save_memory'] } }, - }, + prompt: `My birthday is on June 15th.`, assert: async (rig, result) => { const wasToolCalled = await rig.waitForToolCall('save_memory'); @@ -279,4 +227,136 @@ describe('save_memory', () => { }); }, }); + + const proactiveMemoryFromLongSession = + 'Agent saves preference from earlier in conversation history'; + evalTest('USUALLY_PASSES', { + name: proactiveMemoryFromLongSession, + params: { + settings: { + experimental: { memoryManager: true }, + }, + }, + messages: [ + { + id: 'msg-1', + type: 'user', + content: [ + { + text: 'By the way, I always prefer Vitest over Jest for testing in all my projects.', + }, + ], + timestamp: '2026-01-01T00:00:00Z', + }, + { + id: 'msg-2', + type: 'gemini', + content: [{ text: 'Noted! What are you working on today?' }], + timestamp: '2026-01-01T00:00:05Z', + }, + { + id: 'msg-3', + type: 'user', + content: [ + { + text: "I'm debugging a failing API endpoint. The /users route returns a 500 error.", + }, + ], + timestamp: '2026-01-01T00:01:00Z', + }, + { + id: 'msg-4', + type: 'gemini', + content: [ + { + text: 'It looks like the database connection might not be initialized before the query runs.', + }, + ], + timestamp: '2026-01-01T00:01:10Z', + }, + { + id: 'msg-5', + type: 'user', + content: [ + { text: 'Good catch — I fixed the import and the route works now.' }, + ], + timestamp: '2026-01-01T00:02:00Z', + }, + { + id: 'msg-6', + type: 'gemini', + content: [{ text: 'Great! Anything else you would like to work on?' }], + timestamp: '2026-01-01T00:02:05Z', + }, + ], + prompt: + 'Please save any persistent preferences or facts about me from our conversation to memory.', + assert: async (rig, result) => { + const wasToolCalled = await rig.waitForToolCall( + 'save_memory', + undefined, + (args) => /vitest/i.test(args), + ); + expect( + wasToolCalled, + 'Expected save_memory to be called with the Vitest preference from the conversation history', + ).toBe(true); + + assertModelHasOutput(result); + }, + }); + + const memoryManagerRoutingPreferences = + 'Agent routes global and project preferences to memory'; + evalTest('USUALLY_PASSES', { + name: memoryManagerRoutingPreferences, + params: { + settings: { + experimental: { memoryManager: true }, + }, + }, + messages: [ + { + id: 'msg-1', + type: 'user', + content: [ + { + text: 'I always use dark mode in all my editors and terminals.', + }, + ], + timestamp: '2026-01-01T00:00:00Z', + }, + { + id: 'msg-2', + type: 'gemini', + content: [{ text: 'Got it, I will keep that in mind!' }], + timestamp: '2026-01-01T00:00:05Z', + }, + { + id: 'msg-3', + type: 'user', + content: [ + { + text: 'For this project specifically, we use 2-space indentation.', + }, + ], + timestamp: '2026-01-01T00:01:00Z', + }, + { + id: 'msg-4', + type: 'gemini', + content: [ + { text: 'Understood, 2-space indentation for this project.' }, + ], + timestamp: '2026-01-01T00:01:05Z', + }, + ], + prompt: 'Please save the preferences I mentioned earlier to memory.', + assert: async (rig, result) => { + const wasToolCalled = await rig.waitForToolCall('save_memory'); + expect(wasToolCalled, 'Expected save_memory to be called').toBe(true); + + assertModelHasOutput(result); + }, + }); }); diff --git a/evals/subagents.eval.ts b/evals/subagents.eval.ts index 7e9b3cd808..3a7d8fa44f 100644 --- a/evals/subagents.eval.ts +++ b/evals/subagents.eval.ts @@ -4,21 +4,41 @@ * SPDX-License-Identifier: Apache-2.0 */ -import { describe } from 'vitest'; +import fs from 'node:fs'; +import path from 'node:path'; + +import { describe, expect } from 'vitest'; + import { evalTest } from './test-helper.js'; -const AGENT_DEFINITION = `--- +const DOCS_AGENT_DEFINITION = `--- name: docs-agent description: An agent with expertise in updating documentation. tools: - read_file - write_file --- - -You are the docs agent. Update the documentation. +You are the docs agent. Update documentation clearly and accurately. `; -const INDEX_TS = 'export const add = (a: number, b: number) => a + b;'; +const TEST_AGENT_DEFINITION = `--- +name: test-agent +description: An agent with expertise in writing and updating tests. +tools: + - read_file + - write_file +--- +You are the test agent. Add or update tests. +`; + +const INDEX_TS = 'export const add = (a: number, b: number) => a + b;\n'; + +function readProjectFile( + rig: { testDir?: string }, + relativePath: string, +): string { + return fs.readFileSync(path.join(rig.testDir!, relativePath), 'utf8'); +} describe('subagent eval test cases', () => { /** @@ -42,12 +62,147 @@ describe('subagent eval test cases', () => { }, prompt: 'Please update README.md with a description of this library.', files: { - '.gemini/agents/test-agent.md': AGENT_DEFINITION, + '.gemini/agents/docs-agent.md': DOCS_AGENT_DEFINITION, 'index.ts': INDEX_TS, - 'README.md': 'TODO: update the README.', + 'README.md': 'TODO: update the README.\n', }, assert: async (rig, _result) => { await rig.expectToolCallSuccess(['docs-agent']); }, }); + + /** + * Checks that the outer agent does not over-delegate trivial work when + * subagents are available. This helps catch orchestration overuse. + */ + evalTest('USUALLY_PASSES', { + name: 'should avoid delegating trivial direct edit work', + params: { + settings: { + experimental: { + enableAgents: true, + agents: { + overrides: { + generalist: { enabled: true }, + }, + }, + }, + }, + }, + prompt: + 'Rename the exported function in index.ts from add to sum and update the file directly.', + files: { + '.gemini/agents/docs-agent.md': DOCS_AGENT_DEFINITION, + 'index.ts': INDEX_TS, + }, + assert: async (rig, _result) => { + const updatedIndex = readProjectFile(rig, 'index.ts'); + const toolLogs = rig.readToolLogs() as Array<{ + toolRequest: { name: string }; + }>; + + expect(updatedIndex).toContain('export const sum ='); + expect(toolLogs.some((l) => l.toolRequest.name === 'docs-agent')).toBe( + false, + ); + expect(toolLogs.some((l) => l.toolRequest.name === 'generalist')).toBe( + false, + ); + }, + }); + + /** + * Checks that the outer agent prefers a more relevant specialist over a + * broad generalist when both are available. + * + * This is meant to codify the "overusing Generalist" failure mode. + */ + evalTest('USUALLY_PASSES', { + name: 'should prefer relevant specialist over generalist', + params: { + settings: { + experimental: { + enableAgents: true, + agents: { + overrides: { + generalist: { enabled: true }, + }, + }, + }, + }, + }, + prompt: 'Please add a small test file that verifies add(1, 2) returns 3.', + files: { + '.gemini/agents/test-agent.md': TEST_AGENT_DEFINITION, + 'index.ts': INDEX_TS, + 'package.json': JSON.stringify( + { + name: 'subagent-eval-project', + version: '1.0.0', + type: 'module', + }, + null, + 2, + ), + }, + assert: async (rig, _result) => { + const toolLogs = rig.readToolLogs() as Array<{ + toolRequest: { name: string }; + }>; + + await rig.expectToolCallSuccess(['test-agent']); + expect(toolLogs.some((l) => l.toolRequest.name === 'generalist')).toBe( + false, + ); + }, + }); + + /** + * Checks cardinality and decomposition for a multi-surface task. The task + * naturally spans docs and tests, so multiple specialists should be used. + */ + evalTest('USUALLY_PASSES', { + name: 'should use multiple relevant specialists for multi-surface task', + params: { + settings: { + experimental: { + enableAgents: true, + agents: { + overrides: { + generalist: { enabled: true }, + }, + }, + }, + }, + }, + prompt: + 'Add a short README description for this library and also add a test file that verifies add(1, 2) returns 3.', + files: { + '.gemini/agents/docs-agent.md': DOCS_AGENT_DEFINITION, + '.gemini/agents/test-agent.md': TEST_AGENT_DEFINITION, + 'index.ts': INDEX_TS, + 'README.md': 'TODO: update the README.\n', + 'package.json': JSON.stringify( + { + name: 'subagent-eval-project', + version: '1.0.0', + type: 'module', + }, + null, + 2, + ), + }, + assert: async (rig, _result) => { + const toolLogs = rig.readToolLogs() as Array<{ + toolRequest: { name: string }; + }>; + const readme = readProjectFile(rig, 'README.md'); + + await rig.expectToolCallSuccess(['docs-agent', 'test-agent']); + expect(readme).not.toContain('TODO: update the README.'); + expect(toolLogs.some((l) => l.toolRequest.name === 'generalist')).toBe( + false, + ); + }, + }); }); diff --git a/evals/test-helper.ts b/evals/test-helper.ts index a5b789ab33..9bd5e219d9 100644 --- a/evals/test-helper.ts +++ b/evals/test-helper.ts @@ -13,6 +13,9 @@ import { TestRig } from '@google/gemini-cli-test-utils'; import { createUnauthorizedToolError, parseAgentMarkdown, + Storage, + getProjectHash, + SESSION_FILE_PREFIX, } from '@google/gemini-cli-core'; export * from '@google/gemini-cli-test-utils'; @@ -64,8 +67,57 @@ export async function internalEvalTest(evalCase: EvalCase) { symlinkNodeModules(rig.testDir || ''); + // If messages are provided, write a session file so --resume can load it. + let sessionId: string | undefined; + if (evalCase.messages) { + sessionId = + evalCase.sessionId || + `test-session-${crypto.randomUUID().slice(0, 8)}`; + + // Temporarily set GEMINI_CLI_HOME so Storage writes to the same + // directory the CLI subprocess will use (rig.homeDir). + const originalGeminiHome = process.env['GEMINI_CLI_HOME']; + process.env['GEMINI_CLI_HOME'] = rig.homeDir!; + try { + const storage = new Storage(fs.realpathSync(rig.testDir!)); + await storage.initialize(); + const chatsDir = path.join(storage.getProjectTempDir(), 'chats'); + fs.mkdirSync(chatsDir, { recursive: true }); + + const conversation = { + sessionId, + projectHash: getProjectHash(fs.realpathSync(rig.testDir!)), + startTime: new Date().toISOString(), + lastUpdated: new Date().toISOString(), + messages: evalCase.messages, + }; + + const timestamp = new Date() + .toISOString() + .slice(0, 16) + .replace(/:/g, '-'); + const filename = `${SESSION_FILE_PREFIX}${timestamp}-${sessionId.slice(0, 8)}.json`; + fs.writeFileSync( + path.join(chatsDir, filename), + JSON.stringify(conversation, null, 2), + ); + } catch (e) { + // Storage initialization may fail in some environments; log and continue. + console.warn('Failed to write session history:', e); + } finally { + // Restore original GEMINI_CLI_HOME. + if (originalGeminiHome === undefined) { + delete process.env['GEMINI_CLI_HOME']; + } else { + process.env['GEMINI_CLI_HOME'] = originalGeminiHome; + } + } + } + const result = await rig.run({ - args: evalCase.prompt, + args: sessionId + ? ['--resume', sessionId, evalCase.prompt] + : evalCase.prompt, approvalMode: evalCase.approvalMode ?? 'yolo', timeout: evalCase.timeout, env: { @@ -299,12 +351,32 @@ export function symlinkNodeModules(testDir: string) { } } +/** + * Settings that are forbidden in evals. Evals should never restrict which + * tools are available — they must test against the full, default tool set + * to ensure realistic behavior. + */ +interface ForbiddenToolSettings { + tools?: { + /** Restricting core tools in evals is forbidden. */ + core?: never; + [key: string]: unknown; + }; +} + export interface EvalCase { name: string; - params?: Record; + params?: { + settings?: ForbiddenToolSettings & Record; + [key: string]: unknown; + }; prompt: string; timeout?: number; files?: Record; + /** Conversation history to pre-load via --resume. Each entry is a message object with type, content, etc. */ + messages?: Record[]; + /** Session ID for the resumed session. Auto-generated if not provided. */ + sessionId?: string; approvalMode?: 'default' | 'auto_edit' | 'yolo' | 'plan'; assert: (rig: TestRig, result: string) => Promise; } diff --git a/evals/vitest.config.ts b/evals/vitest.config.ts index 50733a999c..3231f31a10 100644 --- a/evals/vitest.config.ts +++ b/evals/vitest.config.ts @@ -16,6 +16,10 @@ export default defineConfig({ }, test: { testTimeout: 300000, // 5 minutes + // Retry in CI but not nightly to avoid blocking on API error. + retry: process.env['VITEST_RETRY'] + ? parseInt(process.env['VITEST_RETRY'], 10) + : 3, reporters: ['default', 'json'], outputFile: { json: 'evals/logs/report.json', diff --git a/integration-tests/browser-agent.cleanup.responses b/integration-tests/browser-agent.cleanup.responses index 988f2fa456..9cf7a7b356 100644 --- a/integration-tests/browser-agent.cleanup.responses +++ b/integration-tests/browser-agent.cleanup.responses @@ -1,2 +1,4 @@ {"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I'll open https://example.com and check the page title for you."},{"functionCall":{"name":"browser_agent","args":{"task":"Open https://example.com and get the page title"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":35,"totalTokenCount":135}}]} -{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"The page title of https://example.com is \"Example Domain\". The browser session has been completed and cleaned up successfully."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":30,"totalTokenCount":230}}]} +{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I have opened the page and the title is 'Example Domain'."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":200,"candidatesTokenCount":30,"totalTokenCount":230}}]} +{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"The task is complete. The page title is 'Example Domain'."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":300,"candidatesTokenCount":20,"totalTokenCount":320}}]} +{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"Done."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":400,"candidatesTokenCount":5,"totalTokenCount":405}}]} diff --git a/integration-tests/browser-policy.test.ts b/integration-tests/browser-policy.test.ts index 1bfdc27415..bb66b10aab 100644 --- a/integration-tests/browser-policy.test.ts +++ b/integration-tests/browser-policy.test.ts @@ -63,6 +63,9 @@ describe.skipIf(!chromeAvailable)('browser-policy', () => { rig.setup('browser-policy-skip-confirmation', { fakeResponsesPath: join(__dirname, 'browser-policy.responses'), settings: { + experimental: { + enableAgents: true, + }, agents: { overrides: { browser_agent: { @@ -175,4 +178,39 @@ priority = 200 expect(output).toContain('browser_agent'); expect(output).toContain('completed successfully'); }); + + it('should show the visible warning when browser agent starts in existing session mode', async () => { + rig.setup('browser-session-warning', { + fakeResponsesPath: join(__dirname, 'browser-agent.cleanup.responses'), + settings: { + experimental: { + enableAgents: true, + }, + general: { + enableAutoUpdateNotification: false, + }, + agents: { + overrides: { + browser_agent: { + enabled: true, + }, + }, + browser: { + sessionMode: 'existing', + headless: true, + }, + }, + }, + }); + + const stdout = await rig.runCommand(['Open https://example.com'], { + env: { + GEMINI_API_KEY: 'fake-key', + GEMINI_TELEMETRY_DISABLED: 'true', + DEV: 'true', + }, + }); + + expect(stdout).toContain('saved logins will be visible'); + }); }); diff --git a/integration-tests/hooks-system.test.ts b/integration-tests/hooks-system.test.ts index 479851957b..73a7ca03ab 100644 --- a/integration-tests/hooks-system.test.ts +++ b/integration-tests/hooks-system.test.ts @@ -5,405 +5,413 @@ */ import { describe, it, expect, beforeEach, afterEach } from 'vitest'; -import { TestRig, poll, normalizePath } from './test-helper.js'; +import { TestRig, poll, normalizePath, skipFlaky } from './test-helper.js'; import { join } from 'node:path'; -import { writeFileSync } from 'node:fs'; +import { writeFileSync, existsSync, mkdirSync } from 'node:fs'; +import os from 'node:os'; -describe('Hooks System Integration', () => { - let rig: TestRig; +describe.skipIf(skipFlaky)( + 'Hooks System Integration', + { timeout: 120000 }, + () => { + let rig: TestRig; - beforeEach(() => { - rig = new TestRig(); - }); - - afterEach(async () => { - if (rig) { - await rig.cleanup(); - } - }); - - describe('Command Hooks - Blocking Behavior', () => { - it('should block tool execution when hook returns block decision', async () => { - rig.setup( - 'should block tool execution when hook returns block decision', - { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.block-tool.responses', - ), - }, - ); - - const scriptPath = rig.createScript( - 'block_hook.cjs', - "console.log(JSON.stringify({decision: 'block', reason: 'File writing blocked by security policy'}));", - ); - - rig.setup( - 'should block tool execution when hook returns block decision', - { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - matcher: 'write_file', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }, - ); - - const result = await rig.run({ - args: 'Create a file called test.txt with content "Hello World"', - }); - - // The hook should block the write_file tool - const toolLogs = rig.readToolLogs(); - const writeFileCalls = toolLogs.filter( - (t) => - t.toolRequest.name === 'write_file' && t.toolRequest.success === true, - ); - - // Tool should not be called due to blocking hook - expect(writeFileCalls).toHaveLength(0); - - // Result should mention the blocking reason - expect(result).toContain('File writing blocked by security policy'); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); + beforeEach(() => { + rig = new TestRig(); }); - it('should block tool execution and use stderr as reason when hook exits with code 2', async () => { - rig.setup( - 'should block tool execution and use stderr as reason when hook exits with code 2', - { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.block-tool.responses', - ), - }, - ); - - const blockMsg = 'File writing blocked by security policy'; - - const scriptPath = rig.createScript( - 'stderr_block_hook.cjs', - `process.stderr.write(JSON.stringify({ decision: 'deny', reason: '${blockMsg}' })); process.exit(2);`, - ); - - rig.setup( - 'should block tool execution and use stderr as reason when hook exits with code 2', - { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - matcher: 'write_file', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`)!, - timeout: 5000, - }, - ], - }, - ], - }, - }, - }, - ); - - const result = await rig.run({ - args: 'Create a file called test.txt with content "Hello World"', - }); - - // The hook should block the write_file tool - const toolLogs = rig.readToolLogs(); - const writeFileCalls = toolLogs.filter( - (t) => - t.toolRequest.name === 'write_file' && t.toolRequest.success === true, - ); - - // Tool should not be called due to blocking hook - expect(writeFileCalls).toHaveLength(0); - - // Result should mention the blocking reason - expect(result).toContain(blockMsg); - - // Verify hook telemetry shows the deny decision - const hookLogs = rig.readHookLogs(); - const blockHook = hookLogs.find( - (log) => - log.hookCall.hook_event_name === 'BeforeTool' && - (log.hookCall.stdout.includes('"decision":"deny"') || - log.hookCall.stderr.includes('"decision":"deny"')), - ); - expect(blockHook).toBeDefined(); - expect(blockHook?.hookCall.stdout + blockHook?.hookCall.stderr).toContain( - blockMsg, - ); + afterEach(async () => { + if (rig) { + await rig.cleanup(); + } }); - it('should allow tool execution when hook returns allow decision', async () => { - rig.setup( - 'should allow tool execution when hook returns allow decision', - { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.allow-tool.responses', - ), - }, - ); - - const scriptPath = rig.createScript( - 'allow_hook.cjs', - "console.log(JSON.stringify({decision: 'allow', reason: 'File writing approved'}));", - ); - - rig.setup( - 'should allow tool execution when hook returns allow decision', - { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - matcher: 'write_file', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, + describe('Command Hooks - Blocking Behavior', () => { + it('should block tool execution when hook returns block decision', async () => { + rig.setup( + 'should block tool execution when hook returns block decision', + { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.block-tool.responses', + ), }, - }, - ); + ); - await rig.run({ - args: 'Create a file called approved.txt with content "Approved content"', - }); + const scriptPath = rig.createScript( + 'block_hook.cjs', + "console.log(JSON.stringify({decision: 'block', reason: 'File writing blocked by security policy'}));", + ); - // The hook should allow the write_file tool - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // File should be created - const fileContent = rig.readFile('approved.txt'); - expect(fileContent).toContain('Approved content'); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - }); - }); - - describe('Command Hooks - Additional Context', () => { - it('should add additional context from AfterTool hooks', async () => { - rig.setup('should add additional context from AfterTool hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.after-tool-context.responses', - ), - }); - - const scriptPath = rig.createScript( - 'after_tool_context.cjs', - "console.log(JSON.stringify({hookSpecificOutput: {hookEventName: 'AfterTool', additionalContext: 'Security scan: File content appears safe'}}));", - ); - - const command = `node "${scriptPath}"`; - rig.setup('should add additional context from AfterTool hooks', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - AfterTool: [ - { - matcher: 'read_file', - sequential: true, - hooks: [ + rig.setup( + 'should block tool execution when hook returns block decision', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ { - type: 'command', - command: normalizePath(command), - timeout: 5000, + matcher: 'write_file', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], }, ], }, - ], - }, - }, - }); - - // Create a test file to read - rig.createFile('test-file.txt', 'This is test content'); - - await rig.run({ - args: 'Read the contents of test-file.txt and tell me what it contains', - }); - - // Should find read_file tool call - const foundReadFile = await rig.waitForToolCall('read_file'); - expect(foundReadFile).toBeTruthy(); - - // Should generate hook telemetry - const hookTelemetryFound = rig.readHookLogs(); - expect(hookTelemetryFound.length).toBeGreaterThan(0); - expect(hookTelemetryFound[0].hookCall.hook_event_name).toBe('AfterTool'); - expect(hookTelemetryFound[0].hookCall.hook_name).toBe( - normalizePath(command), - ); - expect(hookTelemetryFound[0].hookCall.hook_input).toBeDefined(); - expect(hookTelemetryFound[0].hookCall.hook_output).toBeDefined(); - expect(hookTelemetryFound[0].hookCall.exit_code).toBe(0); - expect(hookTelemetryFound[0].hookCall.stdout).toBeDefined(); - expect(hookTelemetryFound[0].hookCall.stderr).toBeDefined(); - }); - }); - - describe('Command Hooks - Tail Tool Calls', () => { - it('should execute a tail tool call from AfterTool hooks and replace original response', async () => { - // Create a script that acts as the hook. - // It will trigger on "read_file" and issue a tail call to "write_file". - rig.setup('should execute a tail tool call from AfterTool hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.tail-tool-call.responses', - ), - }); - - const hookOutput = { - decision: 'allow', - hookSpecificOutput: { - hookEventName: 'AfterTool', - tailToolCallRequest: { - name: 'write_file', - args: { - file_path: 'tail-called-file.txt', - content: 'Content from tail call', }, }, - }, - }; + ); - const hookScript = `console.log(JSON.stringify(${JSON.stringify( - hookOutput, - )})); process.exit(0);`; + const result = await rig.run({ + args: 'Create a file called test.txt with content "Hello World"', + }); - const scriptPath = join(rig.testDir!, 'tail_call_hook.js'); - writeFileSync(scriptPath, hookScript); - const commandPath = scriptPath.replace(/\\/g, '/'); + // The hook should block the write_file tool + const toolLogs = rig.readToolLogs(); + const writeFileCalls = toolLogs.filter( + (t) => + t.toolRequest.name === 'write_file' && + t.toolRequest.success === true, + ); - rig.setup('should execute a tail tool call from AfterTool hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.tail-tool-call.responses', - ), - settings: { - hooksConfig: { - enabled: true, + // Tool should not be called due to blocking hook + expect(writeFileCalls).toHaveLength(0); + + // Result should mention the blocking reason + expect(result).toContain('File writing blocked by security policy'); + + // Should generate hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); + }); + + it('should block tool execution and use stderr as reason when hook exits with code 2', async () => { + rig.setup( + 'should block tool execution and use stderr as reason when hook exits with code 2', + { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.block-tool.responses', + ), }, - hooks: { - AfterTool: [ - { - matcher: 'read_file', - hooks: [ + ); + + const blockMsg = 'File writing blocked by security policy'; + + const scriptPath = rig.createScript( + 'stderr_block_hook.cjs', + `process.stderr.write(JSON.stringify({ decision: 'deny', reason: '${blockMsg}' })); process.exit(2);`, + ); + + rig.setup( + 'should block tool execution and use stderr as reason when hook exits with code 2', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ { - type: 'command', - command: `node "${commandPath}"`, - timeout: 5000, + matcher: 'write_file', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`)!, + timeout: 5000, + }, + ], }, ], }, - ], + }, }, - }, + ); + + const result = await rig.run({ + args: 'Create a file called test.txt with content "Hello World"', + }); + + // The hook should block the write_file tool + const toolLogs = rig.readToolLogs(); + const writeFileCalls = toolLogs.filter( + (t) => + t.toolRequest.name === 'write_file' && + t.toolRequest.success === true, + ); + + // Tool should not be called due to blocking hook + expect(writeFileCalls).toHaveLength(0); + + // Result should mention the blocking reason + expect(result).toContain(blockMsg); + + // Verify hook telemetry shows the deny decision + const hookLogs = rig.readHookLogs(); + const blockHook = hookLogs.find( + (log) => + log.hookCall.hook_event_name === 'BeforeTool' && + (log.hookCall.stdout.includes('"decision":"deny"') || + log.hookCall.stderr.includes('"decision":"deny"')), + ); + expect(blockHook).toBeDefined(); + expect( + blockHook?.hookCall.stdout + blockHook?.hookCall.stderr, + ).toContain(blockMsg); }); - // Create a test file to trigger the read_file tool - rig.createFile('original.txt', 'Original content'); + it('should allow tool execution when hook returns allow decision', async () => { + rig.setup( + 'should allow tool execution when hook returns allow decision', + { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.allow-tool.responses', + ), + }, + ); - const cliOutput = await rig.run({ - args: 'Read original.txt', // Fake responses should trigger read_file on this + const scriptPath = rig.createScript( + 'allow_hook.cjs', + "console.log(JSON.stringify({decision: 'allow', reason: 'File writing approved'}));", + ); + + rig.setup( + 'should allow tool execution when hook returns allow decision', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ + { + matcher: 'write_file', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }, + ); + + await rig.run({ + args: 'Create a file called approved.txt with content "Approved content"', + }); + + // The hook should allow the write_file tool + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // File should be created + const fileContent = rig.readFile('approved.txt'); + expect(fileContent).toContain('Approved content'); + + // Should generate hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); }); - - // 1. Verify that write_file was called (as a tail call replacing read_file) - // Since read_file was replaced before finalizing, it will not appear in the tool logs. - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // Ensure hook logs are flushed and the final LLM response is received. - // The mock LLM is configured to respond with "Tail call completed successfully." - expect(cliOutput).toContain('Tail call completed successfully.'); - - // Ensure telemetry is written to disk - await rig.waitForTelemetryReady(); - - // Read hook logs to debug - const hookLogs = rig.readHookLogs(); - const relevantHookLog = hookLogs.find( - (l) => l.hookCall.hook_event_name === 'AfterTool', - ); - - expect(relevantHookLog).toBeDefined(); - - // 2. Verify write_file was executed. - // In non-interactive mode, the CLI deduplicates tool execution logs by callId. - // Since a tail call reuses the original callId, "Tool: write_file" is not printed. - // Instead, we verify the side-effect (file creation) and the telemetry log. - - // 3. Verify the tail-called tool actually wrote the file - const modifiedContent = rig.readFile('tail-called-file.txt'); - expect(modifiedContent).toBe('Content from tail call'); - - // 4. Verify telemetry for the final tool call. - // The original 'read_file' call is replaced, so only 'write_file' is finalized and logged. - const toolLogs = rig.readToolLogs(); - const successfulTools = toolLogs.filter((t) => t.toolRequest.success); - expect( - successfulTools.some((t) => t.toolRequest.name === 'write_file'), - ).toBeTruthy(); - // The original request name should be preserved in the log payload if possible, - // but the executed tool name is 'write_file'. }); - }); - describe('BeforeModel Hooks - LLM Request Modification', () => { - it('should modify LLM requests with BeforeModel hooks', async () => { - // Create a hook script that replaces the LLM request with a modified version - // Note: Providing messages in the hook output REPLACES the entire conversation - rig.setup('should modify LLM requests with BeforeModel hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.before-model.responses', - ), + describe('Command Hooks - Additional Context', () => { + it('should add additional context from AfterTool hooks', async () => { + rig.setup('should add additional context from AfterTool hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.after-tool-context.responses', + ), + }); + + const scriptPath = rig.createScript( + 'after_tool_context.cjs', + "console.log(JSON.stringify({hookSpecificOutput: {hookEventName: 'AfterTool', additionalContext: 'Security scan: File content appears safe'}}));", + ); + + const command = `node "${scriptPath}"`; + rig.setup('should add additional context from AfterTool hooks', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + AfterTool: [ + { + matcher: 'read_file', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(command), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + // Create a test file to read + rig.createFile('test-file.txt', 'This is test content'); + + await rig.run({ + args: 'Read the contents of test-file.txt and tell me what it contains', + }); + + // Should find read_file tool call + const foundReadFile = await rig.waitForToolCall('read_file'); + expect(foundReadFile).toBeTruthy(); + + // Should generate hook telemetry + const hookTelemetryFound = rig.readHookLogs(); + expect(hookTelemetryFound.length).toBeGreaterThan(0); + expect(hookTelemetryFound[0].hookCall.hook_event_name).toBe( + 'AfterTool', + ); + expect(hookTelemetryFound[0].hookCall.hook_name).toBe( + normalizePath(command), + ); + expect(hookTelemetryFound[0].hookCall.hook_input).toBeDefined(); + expect(hookTelemetryFound[0].hookCall.hook_output).toBeDefined(); + expect(hookTelemetryFound[0].hookCall.exit_code).toBe(0); + expect(hookTelemetryFound[0].hookCall.stdout).toBeDefined(); + expect(hookTelemetryFound[0].hookCall.stderr).toBeDefined(); }); - const hookScript = `const fs = require('fs'); + }); + + describe('Command Hooks - Tail Tool Calls', () => { + it('should execute a tail tool call from AfterTool hooks and replace original response', async () => { + // Create a script that acts as the hook. + // It will trigger on "read_file" and issue a tail call to "write_file". + rig.setup('should execute a tail tool call from AfterTool hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.tail-tool-call.responses', + ), + }); + + const hookOutput = { + decision: 'allow', + hookSpecificOutput: { + hookEventName: 'AfterTool', + tailToolCallRequest: { + name: 'write_file', + args: { + file_path: 'tail-called-file.txt', + content: 'Content from tail call', + }, + }, + }, + }; + + const hookScript = `console.log(JSON.stringify(${JSON.stringify( + hookOutput, + )})); process.exit(0);`; + + const scriptPath = join(rig.testDir!, 'tail_call_hook.js'); + writeFileSync(scriptPath, hookScript); + const commandPath = scriptPath.replace(/\\/g, '/'); + + rig.setup('should execute a tail tool call from AfterTool hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.tail-tool-call.responses', + ), + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + AfterTool: [ + { + matcher: 'read_file', + hooks: [ + { + type: 'command', + command: `node "${commandPath}"`, + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + // Create a test file to trigger the read_file tool + rig.createFile('original.txt', 'Original content'); + + const cliOutput = await rig.run({ + args: 'Read original.txt', // Fake responses should trigger read_file on this + }); + + // 1. Verify that write_file was called (as a tail call replacing read_file) + // Since read_file was replaced before finalizing, it will not appear in the tool logs. + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // Ensure hook logs are flushed and the final LLM response is received. + // The mock LLM is configured to respond with "Tail call completed successfully." + expect(cliOutput).toContain('Tail call completed successfully.'); + + // Ensure telemetry is written to disk + await rig.waitForTelemetryReady(); + + // Read hook logs to debug + const hookLogs = rig.readHookLogs(); + const relevantHookLog = hookLogs.find( + (l) => l.hookCall.hook_event_name === 'AfterTool', + ); + + expect(relevantHookLog).toBeDefined(); + + // 2. Verify write_file was executed. + // In non-interactive mode, the CLI deduplicates tool execution logs by callId. + // Since a tail call reuses the original callId, "Tool: write_file" is not printed. + // Instead, we verify the side-effect (file creation) and the telemetry log. + + // 3. Verify the tail-called tool actually wrote the file + const modifiedContent = rig.readFile('tail-called-file.txt'); + expect(modifiedContent).toBe('Content from tail call'); + + // 4. Verify telemetry for the final tool call. + // The original 'read_file' call is replaced, so only 'write_file' is finalized and logged. + const toolLogs = rig.readToolLogs(); + const successfulTools = toolLogs.filter((t) => t.toolRequest.success); + expect( + successfulTools.some((t) => t.toolRequest.name === 'write_file'), + ).toBeTruthy(); + // The original request name should be preserved in the log payload if possible, + // but the executed tool name is 'write_file'. + }); + }); + + describe('BeforeModel Hooks - LLM Request Modification', () => { + it('should modify LLM requests with BeforeModel hooks', async () => { + // Create a hook script that replaces the LLM request with a modified version + // Note: Providing messages in the hook output REPLACES the entire conversation + rig.setup('should modify LLM requests with BeforeModel hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.before-model.responses', + ), + }); + const hookScript = `const fs = require('fs'); console.log(JSON.stringify({ decision: "allow", hookSpecificOutput: { @@ -419,166 +427,169 @@ console.log(JSON.stringify({ } }));`; - const scriptPath = rig.createScript('before_model_hook.cjs', hookScript); + const scriptPath = rig.createScript( + 'before_model_hook.cjs', + hookScript, + ); - rig.setup('should modify LLM requests with BeforeModel hooks', { - settings: { - hooksConfig: { - enabled: true, + rig.setup('should modify LLM requests with BeforeModel hooks', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeModel: [ + { + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, }, - hooks: { - BeforeModel: [ - { - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, - }, + }); + + const result = await rig.run({ args: 'Tell me a story' }); + + // The hook should have replaced the request entirely + // Verify that the model responded to the modified request, not the original + expect(result).toBeDefined(); + expect(result.length).toBeGreaterThan(0); + // The response should contain the expected text from the modified request + expect(result.toLowerCase()).toContain('security hook modified'); + + // Should generate hook telemetry + + // Should generate hook telemetry + const hookTelemetryFound = rig.readHookLogs(); + expect(hookTelemetryFound.length).toBeGreaterThan(0); + expect(hookTelemetryFound[0].hookCall.hook_event_name).toBe( + 'BeforeModel', + ); + expect(hookTelemetryFound[0].hookCall.hook_name).toBe( + `node "${scriptPath}"`, + ); + expect(hookTelemetryFound[0].hookCall.hook_input).toBeDefined(); + expect(hookTelemetryFound[0].hookCall.hook_output).toBeDefined(); + expect(hookTelemetryFound[0].hookCall.exit_code).toBe(0); + expect(hookTelemetryFound[0].hookCall.stdout).toBeDefined(); + expect(hookTelemetryFound[0].hookCall.stderr).toBeDefined(); }); - const result = await rig.run({ args: 'Tell me a story' }); - - // The hook should have replaced the request entirely - // Verify that the model responded to the modified request, not the original - expect(result).toBeDefined(); - expect(result.length).toBeGreaterThan(0); - // The response should contain the expected text from the modified request - expect(result.toLowerCase()).toContain('security hook modified'); - - // Should generate hook telemetry - - // Should generate hook telemetry - const hookTelemetryFound = rig.readHookLogs(); - expect(hookTelemetryFound.length).toBeGreaterThan(0); - expect(hookTelemetryFound[0].hookCall.hook_event_name).toBe( - 'BeforeModel', - ); - expect(hookTelemetryFound[0].hookCall.hook_name).toBe( - `node "${scriptPath}"`, - ); - expect(hookTelemetryFound[0].hookCall.hook_input).toBeDefined(); - expect(hookTelemetryFound[0].hookCall.hook_output).toBeDefined(); - expect(hookTelemetryFound[0].hookCall.exit_code).toBe(0); - expect(hookTelemetryFound[0].hookCall.stdout).toBeDefined(); - expect(hookTelemetryFound[0].hookCall.stderr).toBeDefined(); - }); - - it('should block model execution when BeforeModel hook returns deny decision', async () => { - rig.setup( - 'should block model execution when BeforeModel hook returns deny decision', - ); - const hookScript = `console.log(JSON.stringify({ + it('should block model execution when BeforeModel hook returns deny decision', async () => { + rig.setup( + 'should block model execution when BeforeModel hook returns deny decision', + ); + const hookScript = `console.log(JSON.stringify({ decision: "deny", reason: "Model execution blocked by security policy" }));`; - const scriptPath = rig.createScript( - 'before_model_deny_hook.cjs', - hookScript, - ); + const scriptPath = rig.createScript( + 'before_model_deny_hook.cjs', + hookScript, + ); - rig.setup( - 'should block model execution when BeforeModel hook returns deny decision', - { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeModel: [ - { - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], + rig.setup( + 'should block model execution when BeforeModel hook returns deny decision', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeModel: [ + { + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, }, }, - }, - ); + ); - const result = await rig.run({ args: 'Hello' }); + const result = await rig.run({ args: 'Hello' }); - // The hook should have blocked the request - expect(result).toContain('Model execution blocked by security policy'); + // The hook should have blocked the request + expect(result).toContain('Model execution blocked by security policy'); - // Verify no API requests were made to the LLM - const apiRequests = rig.readAllApiRequest(); - expect(apiRequests).toHaveLength(0); - }); + // Verify no API requests were made to the LLM + const apiRequests = rig.readAllApiRequest(); + expect(apiRequests).toHaveLength(0); + }); - it('should block model execution when BeforeModel hook returns block decision', async () => { - rig.setup( - 'should block model execution when BeforeModel hook returns block decision', - ); - const hookScript = `console.log(JSON.stringify({ + it('should block model execution when BeforeModel hook returns block decision', async () => { + rig.setup( + 'should block model execution when BeforeModel hook returns block decision', + ); + const hookScript = `console.log(JSON.stringify({ decision: "block", reason: "Model execution blocked by security policy" }));`; - const scriptPath = rig.createScript( - 'before_model_block_hook.cjs', - hookScript, - ); + const scriptPath = rig.createScript( + 'before_model_block_hook.cjs', + hookScript, + ); - rig.setup( - 'should block model execution when BeforeModel hook returns block decision', - { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeModel: [ - { - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], + rig.setup( + 'should block model execution when BeforeModel hook returns block decision', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeModel: [ + { + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, }, }, - }, - ); + ); - const result = await rig.run({ args: 'Hello' }); + const result = await rig.run({ args: 'Hello' }); - // The hook should have blocked the request - expect(result).toContain('Model execution blocked by security policy'); + // The hook should have blocked the request + expect(result).toContain('Model execution blocked by security policy'); - // Verify no API requests were made to the LLM - const apiRequests = rig.readAllApiRequest(); - expect(apiRequests).toHaveLength(0); + // Verify no API requests were made to the LLM + const apiRequests = rig.readAllApiRequest(); + expect(apiRequests).toHaveLength(0); + }); }); - }); - describe('AfterModel Hooks - LLM Response Modification', () => { - it.skipIf(process.platform === 'win32')( - 'should modify LLM responses with AfterModel hooks', - async () => { - rig.setup('should modify LLM responses with AfterModel hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.after-model.responses', - ), - }); - // Create a hook script that modifies the LLM response - const hookScript = `const fs = require('fs'); + describe('AfterModel Hooks - LLM Response Modification', () => { + it.skipIf(process.platform === 'win32')( + 'should modify LLM responses with AfterModel hooks', + async () => { + rig.setup('should modify LLM responses with AfterModel hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.after-model.responses', + ), + }); + // Create a hook script that modifies the LLM response + const hookScript = `const fs = require('fs'); console.log(JSON.stringify({ hookSpecificOutput: { hookEventName: "AfterModel", @@ -598,15 +609,148 @@ console.log(JSON.stringify({ } }));`; - const scriptPath = rig.createScript('after_model_hook.cjs', hookScript); + const scriptPath = rig.createScript( + 'after_model_hook.cjs', + hookScript, + ); - rig.setup('should modify LLM responses with AfterModel hooks', { + rig.setup('should modify LLM responses with AfterModel hooks', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + AfterModel: [ + { + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + const result = await rig.run({ args: 'What is 2 + 2?' }); + + // The hook should have replaced the model response + expect(result).toContain( + '[FILTERED] Response has been filtered for security compliance', + ); + + // Should generate hook telemetry + const hookTelemetryFound = + await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); + }, + ); + }); + + describe('BeforeToolSelection Hooks - Tool Configuration', () => { + it('should modify tool selection with BeforeToolSelection hooks', async () => { + // 1. Initial setup to establish test directory + rig.setup('BeforeToolSelection Hooks'); + + const toolConfigJson = JSON.stringify({ + decision: 'allow', + hookSpecificOutput: { + hookEventName: 'BeforeToolSelection', + toolConfig: { + mode: 'ANY', + allowedFunctionNames: ['read_file'], + }, + }, + }); + + // Use file-based hook to avoid quoting issues + const hookScript = `console.log(JSON.stringify(${toolConfigJson}));`; + const hookFilename = 'before_tool_selection_hook.js'; + const scriptPath = rig.createScript(hookFilename, hookScript); + + // 2. Final setup with script path + rig.setup('BeforeToolSelection Hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.before-tool-selection.responses', + ), + settings: { + debugMode: true, + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeToolSelection: [ + { + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 60000, + }, + ], + }, + ], + }, + }, + }); + + // Create a test file + rig.createFile('new_file_data.txt', 'test data'); + + await rig.run({ + args: 'Check the content of new_file_data.txt', + }); + + // Verify the hook was called for BeforeToolSelection event + const hookLogs = rig.readHookLogs(); + const beforeToolSelectionHook = hookLogs.find( + (log) => log.hookCall.hook_event_name === 'BeforeToolSelection', + ); + expect(beforeToolSelectionHook).toBeDefined(); + expect(beforeToolSelectionHook?.hookCall.success).toBe(true); + + // Verify hook telemetry shows it modified the config + expect( + JSON.stringify(beforeToolSelectionHook?.hookCall.hook_output), + ).toContain('read_file'); + }); + }); + + describe('BeforeAgent Hooks - Prompt Augmentation', () => { + it('should augment prompts with BeforeAgent hooks', async () => { + // Create a hook script that adds context to the prompt + const hookScript = `const fs = require('fs'); +console.log(JSON.stringify({ + decision: "allow", + hookSpecificOutput: { + hookEventName: "BeforeAgent", + additionalContext: "SYSTEM INSTRUCTION: You are in a secure environment. Always mention security compliance in your responses." + } +}));`; + + rig.setup('should augment prompts with BeforeAgent hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.before-agent.responses', + ), + }); + + const scriptPath = rig.createScript( + 'before_agent_hook.cjs', + hookScript, + ); + + rig.setup('should augment prompts with BeforeAgent hooks', { settings: { hooksConfig: { enabled: true, }, hooks: { - AfterModel: [ + BeforeAgent: [ { hooks: [ { @@ -621,335 +765,210 @@ console.log(JSON.stringify({ }, }); - const result = await rig.run({ args: 'What is 2 + 2?' }); + const result = await rig.run({ args: 'Hello, how are you?' }); - // The hook should have replaced the model response - expect(result).toContain( - '[FILTERED] Response has been filtered for security compliance', - ); + // The hook should have added security context, which should influence the response + expect(result).toContain('security'); // Should generate hook telemetry const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); expect(hookTelemetryFound).toBeTruthy(); - }, - ); - }); - - describe('BeforeToolSelection Hooks - Tool Configuration', () => { - it('should modify tool selection with BeforeToolSelection hooks', async () => { - // 1. Initial setup to establish test directory - rig.setup('BeforeToolSelection Hooks'); - - const toolConfigJson = JSON.stringify({ - decision: 'allow', - hookSpecificOutput: { - hookEventName: 'BeforeToolSelection', - toolConfig: { - mode: 'ANY', - allowedFunctionNames: ['read_file'], - }, - }, }); - - // Use file-based hook to avoid quoting issues - const hookScript = `console.log(JSON.stringify(${toolConfigJson}));`; - const hookFilename = 'before_tool_selection_hook.js'; - const scriptPath = rig.createScript(hookFilename, hookScript); - - // 2. Final setup with script path - rig.setup('BeforeToolSelection Hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.before-tool-selection.responses', - ), - settings: { - debugMode: true, - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeToolSelection: [ - { - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 60000, - }, - ], - }, - ], - }, - }, - }); - - // Create a test file - rig.createFile('new_file_data.txt', 'test data'); - - await rig.run({ - args: 'Check the content of new_file_data.txt', - }); - - // Verify the hook was called for BeforeToolSelection event - const hookLogs = rig.readHookLogs(); - const beforeToolSelectionHook = hookLogs.find( - (log) => log.hookCall.hook_event_name === 'BeforeToolSelection', - ); - expect(beforeToolSelectionHook).toBeDefined(); - expect(beforeToolSelectionHook?.hookCall.success).toBe(true); - - // Verify hook telemetry shows it modified the config - expect( - JSON.stringify(beforeToolSelectionHook?.hookCall.hook_output), - ).toContain('read_file'); }); - }); - describe('BeforeAgent Hooks - Prompt Augmentation', () => { - it('should augment prompts with BeforeAgent hooks', async () => { - // Create a hook script that adds context to the prompt - const hookScript = `const fs = require('fs'); -console.log(JSON.stringify({ - decision: "allow", - hookSpecificOutput: { - hookEventName: "BeforeAgent", - additionalContext: "SYSTEM INSTRUCTION: You are in a secure environment. Always mention security compliance in your responses." - } -}));`; + describe('Notification Hooks - Permission Handling', () => { + it('should handle notification hooks for tool permissions', async () => { + rig.setup('should handle notification hooks for tool permissions', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.notification.responses', + ), + }); - rig.setup('should augment prompts with BeforeAgent hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.before-agent.responses', - ), - }); - - const scriptPath = rig.createScript('before_agent_hook.cjs', hookScript); - - rig.setup('should augment prompts with BeforeAgent hooks', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeAgent: [ - { - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - const result = await rig.run({ args: 'Hello, how are you?' }); - - // The hook should have added security context, which should influence the response - expect(result).toContain('security'); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - }); - }); - - describe('Notification Hooks - Permission Handling', () => { - it('should handle notification hooks for tool permissions', async () => { - rig.setup('should handle notification hooks for tool permissions', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.notification.responses', - ), - }); - - // Create script file for hook - const scriptPath = rig.createScript( - 'notification_hook.cjs', - "console.log(JSON.stringify({suppressOutput: false, systemMessage: 'Permission request logged by security hook'}));", - ); - - const hookCommand = `node "${scriptPath}"`; - - rig.setup('should handle notification hooks for tool permissions', { - settings: { - // Configure tools to enable hooks and require confirmation to trigger notifications - tools: { - approval: 'ASK', // Disable YOLO mode to show permission prompts - confirmationRequired: ['run_shell_command'], - }, - hooksConfig: { - enabled: true, - }, - hooks: { - Notification: [ - { - matcher: 'ToolPermission', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(hookCommand), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - const run = await rig.runInteractive({ approvalMode: 'default' }); - - // Send prompt that will trigger a permission request - await run.type('Run the command "echo test"'); - await run.type('\r'); - - // Wait for permission prompt to appear - await run.expectText('Allow', 10000); - - // Approve the permission - await run.type('y'); - await run.type('\r'); - - // Wait for command to execute - await run.expectText('test', 10000); - - // Should find the shell command execution - const foundShellCommand = await rig.waitForToolCall('run_shell_command'); - expect(foundShellCommand).toBeTruthy(); - - // Verify Notification hook executed - const hookLogs = rig.readHookLogs(); - const notificationLog = hookLogs.find( - (log) => - log.hookCall.hook_event_name === 'Notification' && - log.hookCall.hook_name === normalizePath(hookCommand), - ); - - expect(notificationLog).toBeDefined(); - if (notificationLog) { - expect(notificationLog.hookCall.exit_code).toBe(0); - expect(notificationLog.hookCall.stdout).toContain( - 'Permission request logged by security hook', + // Create script file for hook + const scriptPath = rig.createScript( + 'notification_hook.cjs', + "console.log(JSON.stringify({suppressOutput: false, systemMessage: 'Permission request logged by security hook'}));", ); - // Verify hook input contains notification details - const hookInputStr = - typeof notificationLog.hookCall.hook_input === 'string' - ? notificationLog.hookCall.hook_input - : JSON.stringify(notificationLog.hookCall.hook_input); - const hookInput = JSON.parse(hookInputStr) as Record; + const hookCommand = `node "${scriptPath}"`; - // Should have notification type (uses snake_case) - expect(hookInput['notification_type']).toBe('ToolPermission'); - - // Should have message - expect(hookInput['message']).toBeDefined(); - - // Should have details with tool info - expect(hookInput['details']).toBeDefined(); - const details = hookInput['details'] as Record; - // For 'exec' type confirmations, details contains: type, title, command, rootCommand - expect(details['type']).toBe('exec'); - expect(details['command']).toBeDefined(); - expect(details['title']).toBeDefined(); - } - }); - }); - - describe('Sequential Hook Execution', () => { - it('should execute hooks sequentially when configured', async () => { - rig.setup('should execute hooks sequentially when configured', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.sequential-execution.responses', - ), - }); - - // Create script files for hooks - const hook1Path = rig.createScript( - 'seq_hook1.cjs', - "console.log(JSON.stringify({decision: 'allow', hookSpecificOutput: {hookEventName: 'BeforeAgent', additionalContext: 'Step 1: Initial validation passed.'}}));", - ); - const hook2Path = rig.createScript( - 'seq_hook2.cjs', - "console.log(JSON.stringify({decision: 'allow', hookSpecificOutput: {hookEventName: 'BeforeAgent', additionalContext: 'Step 2: Security check completed.'}}));", - ); - - const hook1Command = `node "${hook1Path}"`; - const hook2Command = `node "${hook2Path}"`; - - rig.setup('should execute hooks sequentially when configured', { - settings: { - hooksConfig: { - enabled: true, + rig.setup('should handle notification hooks for tool permissions', { + settings: { + // Configure tools to enable hooks and require confirmation to trigger notifications + tools: { + approval: 'ASK', // Disable YOLO mode to show permission prompts + confirmationRequired: ['run_shell_command'], + }, + hooksConfig: { + enabled: true, + }, + hooks: { + Notification: [ + { + matcher: 'ToolPermission', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(hookCommand), + timeout: 5000, + }, + ], + }, + ], + }, }, - hooks: { - BeforeAgent: [ - { - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(hook1Command), - timeout: 5000, - }, - { - type: 'command', - command: normalizePath(hook2Command), - timeout: 5000, - }, - ], - }, - ], - }, - }, + }); + + const run = await rig.runInteractive({ approvalMode: 'default' }); + + // Send prompt that will trigger a permission request + await run.type('Run the command "echo test"'); + await run.type('\r'); + + // Wait for permission prompt to appear + await run.expectText('Allow', 10000); + + // Approve the permission + await run.type('y'); + await run.type('\r'); + + // Wait for command to execute + await run.expectText('test', 10000); + + // Should find the shell command execution + const foundShellCommand = + await rig.waitForToolCall('run_shell_command'); + expect(foundShellCommand).toBeTruthy(); + + // Verify Notification hook executed + const hookLogs = rig.readHookLogs(); + const notificationLog = hookLogs.find( + (log) => + log.hookCall.hook_event_name === 'Notification' && + log.hookCall.hook_name === normalizePath(hookCommand), + ); + + expect(notificationLog).toBeDefined(); + if (notificationLog) { + expect(notificationLog.hookCall.exit_code).toBe(0); + expect(notificationLog.hookCall.stdout).toContain( + 'Permission request logged by security hook', + ); + + // Verify hook input contains notification details + const hookInputStr = + typeof notificationLog.hookCall.hook_input === 'string' + ? notificationLog.hookCall.hook_input + : JSON.stringify(notificationLog.hookCall.hook_input); + const hookInput = JSON.parse(hookInputStr) as Record; + + // Should have notification type (uses snake_case) + expect(hookInput['notification_type']).toBe('ToolPermission'); + + // Should have message + expect(hookInput['message']).toBeDefined(); + + // Should have details with tool info + expect(hookInput['details']).toBeDefined(); + const details = hookInput['details'] as Record; + // For 'exec' type confirmations, details contains: type, title, command, rootCommand + expect(details['type']).toBe('exec'); + expect(details['command']).toBeDefined(); + expect(details['title']).toBeDefined(); + } }); - - await rig.run({ args: 'Hello, please help me with a task' }); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - - // Verify both hooks executed - const hookLogs = rig.readHookLogs(); - const hook1Log = hookLogs.find( - (log) => log.hookCall.hook_name === normalizePath(hook1Command), - ); - const hook2Log = hookLogs.find( - (log) => log.hookCall.hook_name === normalizePath(hook2Command), - ); - - expect(hook1Log).toBeDefined(); - expect(hook1Log?.hookCall.exit_code).toBe(0); - expect(hook1Log?.hookCall.stdout).toContain( - 'Step 1: Initial validation passed', - ); - - expect(hook2Log).toBeDefined(); - expect(hook2Log?.hookCall.exit_code).toBe(0); - expect(hook2Log?.hookCall.stdout).toContain( - 'Step 2: Security check completed', - ); }); - }); - describe('Hook Input/Output Validation', () => { - it('should provide correct input format to hooks', async () => { - rig.setup('should provide correct input format to hooks', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.input-validation.responses', - ), + describe('Sequential Hook Execution', () => { + it('should execute hooks sequentially when configured', async () => { + rig.setup('should execute hooks sequentially when configured', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.sequential-execution.responses', + ), + }); + + // Create script files for hooks + const hook1Path = rig.createScript( + 'seq_hook1.cjs', + "console.log(JSON.stringify({decision: 'allow', hookSpecificOutput: {hookEventName: 'BeforeAgent', additionalContext: 'Step 1: Initial validation passed.'}}));", + ); + const hook2Path = rig.createScript( + 'seq_hook2.cjs', + "console.log(JSON.stringify({decision: 'allow', hookSpecificOutput: {hookEventName: 'BeforeAgent', additionalContext: 'Step 2: Security check completed.'}}));", + ); + + const hook1Command = `node "${hook1Path}"`; + const hook2Command = `node "${hook2Path}"`; + + rig.setup('should execute hooks sequentially when configured', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeAgent: [ + { + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(hook1Command), + timeout: 5000, + }, + { + type: 'command', + command: normalizePath(hook2Command), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + await rig.run({ args: 'Hello, please help me with a task' }); + + // Should generate hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); + + // Verify both hooks executed + const hookLogs = rig.readHookLogs(); + const hook1Log = hookLogs.find( + (log) => log.hookCall.hook_name === normalizePath(hook1Command), + ); + const hook2Log = hookLogs.find( + (log) => log.hookCall.hook_name === normalizePath(hook2Command), + ); + + expect(hook1Log).toBeDefined(); + expect(hook1Log?.hookCall.exit_code).toBe(0); + expect(hook1Log?.hookCall.stdout).toContain( + 'Step 1: Initial validation passed', + ); + + expect(hook2Log).toBeDefined(); + expect(hook2Log?.hookCall.exit_code).toBe(0); + expect(hook2Log?.hookCall.stdout).toContain( + 'Step 2: Security check completed', + ); }); - // Create a hook script that validates the input format - const hookScript = `const fs = require('fs'); + }); + + describe('Hook Input/Output Validation', () => { + it('should provide correct input format to hooks', async () => { + rig.setup('should provide correct input format to hooks', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.input-validation.responses', + ), + }); + // Create a hook script that validates the input format + const hookScript = `const fs = require('fs'); const input = fs.readFileSync(0, 'utf-8'); try { const json = JSON.parse(input); @@ -963,69 +982,12 @@ try { console.log(JSON.stringify({decision: "block", reason: "Invalid JSON"})); }`; - const scriptPath = rig.createScript( - 'input_validation_hook.cjs', - hookScript, - ); + const scriptPath = rig.createScript( + 'input_validation_hook.cjs', + hookScript, + ); - rig.setup('should provide correct input format to hooks', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - await rig.run({ - args: 'Create a file called input-test.txt with content "test"', - }); - - // Hook should validate input format successfully - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // Check that the file was created (hook allowed it) - const fileContent = rig.readFile('input-test.txt'); - expect(fileContent).toContain('test'); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - }); - - it('should treat mixed stdout (text + JSON) as system message and allow execution when exit code is 0', async () => { - rig.setup( - 'should treat mixed stdout (text + JSON) as system message and allow execution when exit code is 0', - { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.allow-tool.responses', - ), - }, - ); - - // Create script file for hook - const scriptPath = rig.createScript( - 'pollution_hook.cjs', - "console.log('Pollution'); console.log(JSON.stringify({decision: 'deny', reason: 'Should be ignored'}));", - ); - - rig.setup( - 'should treat mixed stdout (text + JSON) as system message and allow execution when exit code is 0', - { + rig.setup('should provide correct input format to hooks', { settings: { hooksConfig: { enabled: true, @@ -1033,13 +995,9 @@ try { hooks: { BeforeTool: [ { - matcher: 'write_file', - sequential: true, hooks: [ { type: 'command', - // Output plain text then JSON. - // This breaks JSON parsing, so it falls back to 'allow' with the whole stdout as systemMessage. command: normalizePath(`node "${scriptPath}"`), timeout: 5000, }, @@ -1048,341 +1006,402 @@ try { ], }, }, - }, - ); + }); - const result = await rig.run({ - args: 'Create a file called approved.txt with content "Approved content"', + await rig.run({ + args: 'Create a file called input-test.txt with content "test"', + }); + + // Hook should validate input format successfully + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // Check that the file was created (hook allowed it) + const fileContent = rig.readFile('input-test.txt'); + expect(fileContent).toContain('test'); + + // Should generate hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); }); - // The hook logic fails to parse JSON, so it allows the tool. - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // The entire stdout (including the JSON part) becomes the systemMessage - expect(result).toContain('Pollution'); - expect(result).toContain('Should be ignored'); - }); - }); - - describe('Multiple Event Types', () => { - it('should handle hooks for all major event types', async () => { - rig.setup('should handle hooks for all major event types', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.multiple-events.responses', - ), - }); - - // Create script files for hooks - const btPath = rig.createScript( - 'bt_hook.cjs', - "console.log(JSON.stringify({decision: 'allow', systemMessage: 'BeforeTool: File operation logged'}));", - ); - const atPath = rig.createScript( - 'at_hook.cjs', - "console.log(JSON.stringify({hookSpecificOutput: {hookEventName: 'AfterTool', additionalContext: 'AfterTool: Operation completed successfully'}}));", - ); - const baPath = rig.createScript( - 'ba_hook.cjs', - "console.log(JSON.stringify({decision: 'allow', hookSpecificOutput: {hookEventName: 'BeforeAgent', additionalContext: 'BeforeAgent: User request processed'}}));", - ); - - const beforeToolCommand = `node "${btPath}"`; - const afterToolCommand = `node "${atPath}"`; - const beforeAgentCommand = `node "${baPath}"`; - - rig.setup('should handle hooks for all major event types', { - settings: { - hooksConfig: { - enabled: true, + it('should treat mixed stdout (text + JSON) as system message and allow execution when exit code is 0', async () => { + rig.setup( + 'should treat mixed stdout (text + JSON) as system message and allow execution when exit code is 0', + { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.allow-tool.responses', + ), }, - hooks: { - BeforeAgent: [ - { - hooks: [ - { - type: 'command', - command: normalizePath(beforeAgentCommand), - timeout: 5000, - }, - ], - }, - ], - BeforeTool: [ - { - matcher: 'write_file', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(beforeToolCommand), - timeout: 5000, - }, - ], - }, - ], - AfterTool: [ - { - matcher: 'write_file', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(afterToolCommand), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - const result = await rig.run({ - args: - 'Create a file called multi-event-test.txt with content ' + - '"testing multiple events", and then please reply with ' + - 'everything I say just after this:"', - }); - - // Should execute write_file tool - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // File should be created - const fileContent = rig.readFile('multi-event-test.txt'); - expect(fileContent).toContain('testing multiple events'); - - // Result should contain context from all hooks - expect(result).toContain('BeforeTool: File operation logged'); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - - // Verify all three hooks executed - const hookLogs = rig.readHookLogs(); - const beforeAgentLog = hookLogs.find( - (log) => log.hookCall.hook_name === normalizePath(beforeAgentCommand), - ); - const beforeToolLog = hookLogs.find( - (log) => log.hookCall.hook_name === normalizePath(beforeToolCommand), - ); - const afterToolLog = hookLogs.find( - (log) => log.hookCall.hook_name === normalizePath(afterToolCommand), - ); - - expect(beforeAgentLog).toBeDefined(); - expect(beforeAgentLog?.hookCall.exit_code).toBe(0); - expect(beforeAgentLog?.hookCall.stdout).toContain( - 'BeforeAgent: User request processed', - ); - - expect(beforeToolLog).toBeDefined(); - expect(beforeToolLog?.hookCall.exit_code).toBe(0); - expect(beforeToolLog?.hookCall.stdout).toContain( - 'BeforeTool: File operation logged', - ); - - expect(afterToolLog).toBeDefined(); - expect(afterToolLog?.hookCall.exit_code).toBe(0); - expect(afterToolLog?.hookCall.stdout).toContain( - 'AfterTool: Operation completed successfully', - ); - }); - }); - - describe('Hook Error Handling', () => { - it('should handle hook failures gracefully', async () => { - rig.setup('should handle hook failures gracefully', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.error-handling.responses', - ), - }); - // Create script files for hooks - const failingPath = join(rig.testDir!, 'fail_hook.cjs'); - writeFileSync(failingPath, 'process.exit(1);'); - const workingPath = join(rig.testDir!, 'work_hook.cjs'); - writeFileSync( - workingPath, - "console.log(JSON.stringify({decision: 'allow', reason: 'Working hook succeeded'}));", - ); - - // Failing hook: exits with non-zero code - const failingCommand = `node "${failingPath}"`; - // Working hook: returns success with JSON - const workingCommand = `node "${workingPath}"`; - - rig.setup('should handle hook failures gracefully', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - hooks: [ - { - type: 'command', - command: normalizePath(failingCommand), - timeout: 5000, - }, - { - type: 'command', - command: normalizePath(workingCommand), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - await rig.run({ - args: 'Create a file called error-test.txt with content "testing error handling"', - }); - - // Despite one hook failing, the working hook should still allow the operation - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // File should be created - const fileContent = rig.readFile('error-test.txt'); - expect(fileContent).toContain('testing error handling'); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - }); - }); - - describe('Hook Telemetry and Observability', () => { - it('should generate telemetry events for hook executions', async () => { - rig.setup('should generate telemetry events for hook executions', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.telemetry.responses', - ), - }); - - // Create script file for hook - const scriptPath = rig.createScript( - 'telemetry_hook.cjs', - "console.log(JSON.stringify({decision: 'allow', reason: 'Telemetry test hook'}));", - ); - - const hookCommand = `node "${scriptPath}"`; - - rig.setup('should generate telemetry events for hook executions', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - hooks: [ - { - type: 'command', - command: normalizePath(hookCommand), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - await rig.run({ args: 'Create a file called telemetry-test.txt' }); - - // Should execute the tool - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // Should generate hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - }); - }); - - describe('Session Lifecycle Hooks', () => { - it('should fire SessionStart hook on app startup', async () => { - rig.setup('should fire SessionStart hook on app startup', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.session-startup.responses', - ), - }); - - // Create script file for hook - const scriptPath = rig.createScript( - 'session_start_hook.cjs', - "console.log(JSON.stringify({decision: 'allow', systemMessage: 'Session starting on startup'}));", - ); - - const sessionStartCommand = `node "${scriptPath}"`; - - rig.setup('should fire SessionStart hook on app startup', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - SessionStart: [ - { - matcher: 'startup', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(sessionStartCommand), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - // Run a simple query - the SessionStart hook will fire during app initialization - await rig.run({ args: 'Say hello' }); - - // Verify hook executed with correct parameters - const hookLogs = rig.readHookLogs(); - const sessionStartLog = hookLogs.find( - (log) => log.hookCall.hook_event_name === 'SessionStart', - ); - - expect(sessionStartLog).toBeDefined(); - if (sessionStartLog) { - expect(sessionStartLog.hookCall.hook_name).toBe( - normalizePath(sessionStartCommand), ); - expect(sessionStartLog.hookCall.exit_code).toBe(0); - expect(sessionStartLog.hookCall.hook_input).toBeDefined(); - // hook_input is a string that needs to be parsed - const hookInputStr = - typeof sessionStartLog.hookCall.hook_input === 'string' - ? sessionStartLog.hookCall.hook_input - : JSON.stringify(sessionStartLog.hookCall.hook_input); - const hookInput = JSON.parse(hookInputStr) as Record; - - expect(hookInput['source']).toBe('startup'); - expect(sessionStartLog.hookCall.stdout).toContain( - 'Session starting on startup', + // Create script file for hook + const scriptPath = rig.createScript( + 'pollution_hook.cjs', + "console.log('Pollution'); console.log(JSON.stringify({decision: 'deny', reason: 'Should be ignored'}));", ); - } + + rig.setup( + 'should treat mixed stdout (text + JSON) as system message and allow execution when exit code is 0', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ + { + matcher: 'write_file', + sequential: true, + hooks: [ + { + type: 'command', + // Output plain text then JSON. + // This breaks JSON parsing, so it falls back to 'allow' with the whole stdout as systemMessage. + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }, + ); + + const result = await rig.run({ + args: 'Create a file called approved.txt with content "Approved content"', + }); + + // The hook logic fails to parse JSON, so it allows the tool. + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // The entire stdout (including the JSON part) becomes the systemMessage + expect(result).toContain('Pollution'); + expect(result).toContain('Should be ignored'); + }); }); - it('should fire SessionStart hook and inject context', async () => { - // Create hook script that outputs JSON with additionalContext - const hookScript = `const fs = require('fs'); + describe('Multiple Event Types', () => { + it('should handle hooks for all major event types', async () => { + rig.setup('should handle hooks for all major event types', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.multiple-events.responses', + ), + }); + + // Create script files for hooks + const btPath = rig.createScript( + 'bt_hook.cjs', + "console.log(JSON.stringify({decision: 'allow', systemMessage: 'BeforeTool: File operation logged'}));", + ); + const atPath = rig.createScript( + 'at_hook.cjs', + "console.log(JSON.stringify({hookSpecificOutput: {hookEventName: 'AfterTool', additionalContext: 'AfterTool: Operation completed successfully'}}));", + ); + const baPath = rig.createScript( + 'ba_hook.cjs', + "console.log(JSON.stringify({decision: 'allow', hookSpecificOutput: {hookEventName: 'BeforeAgent', additionalContext: 'BeforeAgent: User request processed'}}));", + ); + + const beforeToolCommand = `node "${btPath}"`; + const afterToolCommand = `node "${atPath}"`; + const beforeAgentCommand = `node "${baPath}"`; + + rig.setup('should handle hooks for all major event types', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeAgent: [ + { + hooks: [ + { + type: 'command', + command: normalizePath(beforeAgentCommand), + timeout: 5000, + }, + ], + }, + ], + BeforeTool: [ + { + matcher: 'write_file', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(beforeToolCommand), + timeout: 5000, + }, + ], + }, + ], + AfterTool: [ + { + matcher: 'write_file', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(afterToolCommand), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + const result = await rig.run({ + args: + 'Create a file called multi-event-test.txt with content ' + + '"testing multiple events", and then please reply with ' + + 'everything I say just after this:"', + }); + + // Should execute write_file tool + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // File should be created + const fileContent = rig.readFile('multi-event-test.txt'); + expect(fileContent).toContain('testing multiple events'); + + // Result should contain context from all hooks + expect(result).toContain('BeforeTool: File operation logged'); + + // Should generate hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); + + // Verify all three hooks executed + const hookLogs = rig.readHookLogs(); + const beforeAgentLog = hookLogs.find( + (log) => log.hookCall.hook_name === normalizePath(beforeAgentCommand), + ); + const beforeToolLog = hookLogs.find( + (log) => log.hookCall.hook_name === normalizePath(beforeToolCommand), + ); + const afterToolLog = hookLogs.find( + (log) => log.hookCall.hook_name === normalizePath(afterToolCommand), + ); + + expect(beforeAgentLog).toBeDefined(); + expect(beforeAgentLog?.hookCall.exit_code).toBe(0); + expect(beforeAgentLog?.hookCall.stdout).toContain( + 'BeforeAgent: User request processed', + ); + + expect(beforeToolLog).toBeDefined(); + expect(beforeToolLog?.hookCall.exit_code).toBe(0); + expect(beforeToolLog?.hookCall.stdout).toContain( + 'BeforeTool: File operation logged', + ); + + expect(afterToolLog).toBeDefined(); + expect(afterToolLog?.hookCall.exit_code).toBe(0); + expect(afterToolLog?.hookCall.stdout).toContain( + 'AfterTool: Operation completed successfully', + ); + }); + }); + + describe('Hook Error Handling', () => { + it('should handle hook failures gracefully', async () => { + rig.setup('should handle hook failures gracefully', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.error-handling.responses', + ), + }); + // Create script files for hooks + const failingPath = join(rig.testDir!, 'fail_hook.cjs'); + writeFileSync(failingPath, 'process.exit(1);'); + const workingPath = join(rig.testDir!, 'work_hook.cjs'); + writeFileSync( + workingPath, + "console.log(JSON.stringify({decision: 'allow', reason: 'Working hook succeeded'}));", + ); + + // Failing hook: exits with non-zero code + const failingCommand = `node "${failingPath}"`; + // Working hook: returns success with JSON + const workingCommand = `node "${workingPath}"`; + + rig.setup('should handle hook failures gracefully', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ + { + hooks: [ + { + type: 'command', + command: normalizePath(failingCommand), + timeout: 5000, + }, + { + type: 'command', + command: normalizePath(workingCommand), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + await rig.run({ + args: 'Create a file called error-test.txt with content "testing error handling"', + }); + + // Despite one hook failing, the working hook should still allow the operation + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // File should be created + const fileContent = rig.readFile('error-test.txt'); + expect(fileContent).toContain('testing error handling'); + + // Should generate hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); + }); + }); + + describe('Hook Telemetry and Observability', () => { + it('should generate telemetry events for hook executions', async () => { + rig.setup('should generate telemetry events for hook executions', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.telemetry.responses', + ), + }); + + // Create script file for hook + const scriptPath = rig.createScript( + 'telemetry_hook.cjs', + "console.log(JSON.stringify({decision: 'allow', reason: 'Telemetry test hook'}));", + ); + + const hookCommand = `node "${scriptPath}"`; + + rig.setup('should generate telemetry events for hook executions', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ + { + hooks: [ + { + type: 'command', + command: normalizePath(hookCommand), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + await rig.run({ args: 'Create a file called telemetry-test.txt' }); + + // Should execute the tool + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // Should generate hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); + }); + }); + + describe('Session Lifecycle Hooks', () => { + it('should fire SessionStart hook on app startup', async () => { + rig.setup('should fire SessionStart hook on app startup', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.session-startup.responses', + ), + }); + + // Create script file for hook + const scriptPath = rig.createScript( + 'session_start_hook.cjs', + "console.log(JSON.stringify({decision: 'allow', systemMessage: 'Session starting on startup'}));", + ); + + const sessionStartCommand = `node "${scriptPath}"`; + + rig.setup('should fire SessionStart hook on app startup', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + SessionStart: [ + { + matcher: 'startup', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(sessionStartCommand), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + // Run a simple query - the SessionStart hook will fire during app initialization + await rig.run({ args: 'Say hello' }); + + // Verify hook executed with correct parameters + const hookLogs = rig.readHookLogs(); + const sessionStartLog = hookLogs.find( + (log) => log.hookCall.hook_event_name === 'SessionStart', + ); + + expect(sessionStartLog).toBeDefined(); + if (sessionStartLog) { + expect(sessionStartLog.hookCall.hook_name).toBe( + normalizePath(sessionStartCommand), + ); + expect(sessionStartLog.hookCall.exit_code).toBe(0); + expect(sessionStartLog.hookCall.hook_input).toBeDefined(); + + // hook_input is a string that needs to be parsed + const hookInputStr = + typeof sessionStartLog.hookCall.hook_input === 'string' + ? sessionStartLog.hookCall.hook_input + : JSON.stringify(sessionStartLog.hookCall.hook_input); + const hookInput = JSON.parse(hookInputStr) as Record; + + expect(hookInput['source']).toBe('startup'); + expect(sessionStartLog.hookCall.stdout).toContain( + 'Session starting on startup', + ); + } + }); + + it('should fire SessionStart hook and inject context', async () => { + // Create hook script that outputs JSON with additionalContext + const hookScript = `const fs = require('fs'); console.log(JSON.stringify({ decision: 'allow', systemMessage: 'Context injected via SessionStart hook', @@ -1392,104 +1411,19 @@ console.log(JSON.stringify({ } }));`; - rig.setup('should fire SessionStart hook and inject context', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.session-startup.responses', - ), - }); - - const scriptPath = rig.createScript( - 'session_start_context_hook.cjs', - hookScript, - ); - - rig.setup('should fire SessionStart hook and inject context', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - SessionStart: [ - { - matcher: 'startup', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - // Run a query - the SessionStart hook will fire during app initialization - const result = await rig.run({ args: 'Who are you?' }); - - // Check if systemMessage was displayed (in stderr, which rig.run captures) - expect(result).toContain('Context injected via SessionStart hook'); - - // Check if additionalContext influenced the model response - // Note: We use fake responses, but the rig records interactions. - // If we are using fake responses, the model won't actually respond unless we provide a fake response for the injected context. - // But the test rig setup uses 'hooks-system.session-startup.responses'. - // If I'm adding a new test, I might need to generate new fake responses or expect the context to be sent to the model (verify API logs). - - // Verify hook executed - const hookLogs = rig.readHookLogs(); - const sessionStartLog = hookLogs.find( - (log) => log.hookCall.hook_event_name === 'SessionStart', - ); - - expect(sessionStartLog).toBeDefined(); - - // Verify the API request contained the injected context - // rig.readAllApiRequest() gives us telemetry on API requests. - const apiRequests = rig.readAllApiRequest(); - // We expect at least one API request - expect(apiRequests.length).toBeGreaterThan(0); - - // The injected context should be in the request text - // For non-interactive mode, I prepended it to input: "context\n\ninput" - // The telemetry `request_text` should contain it. - const requestText = apiRequests[0].attributes?.request_text || ''; - expect(requestText).toContain('protocol droid'); - }); - - it('should fire SessionStart hook and display systemMessage in interactive mode', async () => { - // Create hook script that outputs JSON with systemMessage and additionalContext - const hookScript = `const fs = require('fs'); -console.log(JSON.stringify({ - decision: 'allow', - systemMessage: 'Interactive Session Start Message', - hookSpecificOutput: { - hookEventName: 'SessionStart', - additionalContext: 'The user is a Jedi Master.' - } -}));`; - - rig.setup( - 'should fire SessionStart hook and display systemMessage in interactive mode', - { + rig.setup('should fire SessionStart hook and inject context', { fakeResponsesPath: join( import.meta.dirname, 'hooks-system.session-startup.responses', ), - }, - ); + }); - const scriptPath = rig.createScript( - 'session_start_interactive_hook.cjs', - hookScript, - ); + const scriptPath = rig.createScript( + 'session_start_context_hook.cjs', + hookScript, + ); - rig.setup( - 'should fire SessionStart hook and display systemMessage in interactive mode', - { + rig.setup('should fire SessionStart hook and inject context', { settings: { hooksConfig: { enabled: true, @@ -1510,70 +1444,418 @@ console.log(JSON.stringify({ ], }, }, - }, - ); + }); - const run = await rig.runInteractive(); + // Run a query - the SessionStart hook will fire during app initialization + const result = await rig.run({ args: 'Who are you?' }); - // Verify systemMessage is displayed - await run.expectText('Interactive Session Start Message', 10000); + // Check if systemMessage was displayed (in stderr, which rig.run captures) + expect(result).toContain('Context injected via SessionStart hook'); - // Send a prompt to establish a session and trigger an API call - await run.sendKeys('Hello'); - await run.type('\r'); + // Check if additionalContext influenced the model response + // Note: We use fake responses, but the rig records interactions. + // If we are using fake responses, the model won't actually respond unless we provide a fake response for the injected context. + // But the test rig setup uses 'hooks-system.session-startup.responses'. + // If I'm adding a new test, I might need to generate new fake responses or expect the context to be sent to the model (verify API logs). - // Wait for response to ensure API call happened - await run.expectText('Hello', 15000); + // Verify hook executed + const hookLogs = rig.readHookLogs(); + const sessionStartLog = hookLogs.find( + (log) => log.hookCall.hook_event_name === 'SessionStart', + ); - // Wait for telemetry to be written to disk - await rig.waitForTelemetryReady(); + expect(sessionStartLog).toBeDefined(); - // Verify the API request contained the injected context - // We may need to poll for API requests as they are written asynchronously - const pollResult = await poll( - () => { - const apiRequests = rig.readAllApiRequest(); - return apiRequests.length > 0; - }, - 15000, - 500, - ); + // Verify the API request contained the injected context + // rig.readAllApiRequest() gives us telemetry on API requests. + const apiRequests = rig.readAllApiRequest(); + // We expect at least one API request + expect(apiRequests.length).toBeGreaterThan(0); - expect(pollResult).toBe(true); + // The injected context should be in the request text + // For non-interactive mode, I prepended it to input: "context\n\ninput" + // The telemetry `request_text` should contain it. + const requestText = apiRequests[0].attributes?.request_text || ''; + expect(requestText).toContain('protocol droid'); + }); - const apiRequests = rig.readAllApiRequest(); - // The injected context should be in the request_text of the API request - const requestText = apiRequests[0].attributes?.request_text || ''; - expect(requestText).toContain('Jedi Master'); + it('should fire SessionStart hook and display systemMessage in interactive mode', async () => { + // Create hook script that outputs JSON with systemMessage and additionalContext + const hookScript = `const fs = require('fs'); +console.log(JSON.stringify({ + decision: 'allow', + systemMessage: 'Interactive Session Start Message', + hookSpecificOutput: { + hookEventName: 'SessionStart', + additionalContext: 'The user is a Jedi Master.' + } +}));`; + + rig.setup( + 'should fire SessionStart hook and display systemMessage in interactive mode', + { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.session-startup.responses', + ), + }, + ); + + const scriptPath = rig.createScript( + 'session_start_interactive_hook.cjs', + hookScript, + ); + + rig.setup( + 'should fire SessionStart hook and display systemMessage in interactive mode', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + SessionStart: [ + { + matcher: 'startup', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }, + ); + + const run = await rig.runInteractive(); + + // Verify systemMessage is displayed + await run.expectText('Interactive Session Start Message', 10000); + + // Send a prompt to establish a session and trigger an API call + await run.sendKeys('Hello'); + await run.type('\r'); + + // Wait for response to ensure API call happened + await run.expectText('Hello', 15000); + + // Wait for telemetry to be written to disk + await rig.waitForTelemetryReady(); + + // Verify the API request contained the injected context + // We may need to poll for API requests as they are written asynchronously + const pollResult = await poll( + () => { + const apiRequests = rig.readAllApiRequest(); + return apiRequests.length > 0; + }, + 15000, + 500, + ); + + expect(pollResult).toBe(true); + + const apiRequests = rig.readAllApiRequest(); + // The injected context should be in the request_text of the API request + const requestText = apiRequests[0].attributes?.request_text || ''; + expect(requestText).toContain('Jedi Master'); + }); + + it('should fire SessionEnd and SessionStart hooks on /clear command', async () => { + rig.setup( + 'should fire SessionEnd and SessionStart hooks on /clear command', + { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.session-clear.responses', + ), + }, + ); + + // Create script files for hooks + const endScriptPath = rig.createScript( + 'session_end_clear.cjs', + "console.log(JSON.stringify({decision: 'allow', systemMessage: 'Session ending due to clear'}));", + ); + const startScriptPath = rig.createScript( + 'session_start_clear.cjs', + "console.log(JSON.stringify({decision: 'allow', systemMessage: 'Session starting after clear'}));", + ); + + const sessionEndCommand = `node "${endScriptPath}"`; + const sessionStartCommand = `node "${startScriptPath}"`; + + rig.setup( + 'should fire SessionEnd and SessionStart hooks on /clear command', + { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + SessionEnd: [ + { + matcher: '*', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(sessionEndCommand), + timeout: 5000, + }, + ], + }, + ], + SessionStart: [ + { + matcher: '*', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(sessionStartCommand), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }, + ); + + const run = await rig.runInteractive(); + + // Send an initial prompt to establish a session + await run.sendKeys('Say hello'); + await run.type('\r'); + + // Wait for the response + await run.expectText('Hello', 10000); + + // Execute /clear command multiple times to generate more hook events + // This makes the test more robust by creating multiple start/stop cycles + const numClears = 3; + for (let i = 0; i < numClears; i++) { + await run.sendKeys('/clear'); + await run.type('\r'); + + // Wait a bit for clear to complete + await new Promise((resolve) => setTimeout(resolve, 2000)); + + // Send a prompt to establish an active session before next clear + await run.sendKeys('Say hello'); + await run.type('\r'); + + // Wait for response + await run.expectText('Hello', 10000); + } + + // Wait for all clears to complete + // BatchLogRecordProcessor exports telemetry every 10 seconds by default + // Use generous wait time across all platforms (CI, Docker, Mac, Linux) + await new Promise((resolve) => setTimeout(resolve, 15000)); + + // Wait for telemetry to be written to disk + await rig.waitForTelemetryReady(); + + // Wait for hook telemetry events to be flushed to disk + // In interactive mode, telemetry may be buffered, so we need to poll for the events + // We execute multiple clears to generate more hook events (total: 1 + numClears * 2) + // But we only require >= 1 hooks to pass, making the test more permissive + const expectedMinHooks = 1; // SessionStart (startup), SessionEnd (clear), SessionStart (clear) + const pollResult = await poll( + () => { + const hookLogs = rig.readHookLogs(); + return hookLogs.length >= expectedMinHooks; + }, + 90000, // 90 second timeout for all platforms + 1000, // check every 1s to reduce I/O overhead + ); + + // If polling failed, log diagnostic info + if (!pollResult) { + const hookLogs = rig.readHookLogs(); + const hookEvents = hookLogs.map( + (log) => log.hookCall.hook_event_name, + ); + console.error( + `Polling timeout after 90000ms: Expected >= ${expectedMinHooks} hooks, got ${hookLogs.length}`, + ); + console.error( + 'Hooks found:', + hookEvents.length > 0 ? hookEvents.join(', ') : 'NONE', + ); + console.error('Full hook logs:', JSON.stringify(hookLogs, null, 2)); + } + + // Verify hooks executed + const hookLogs = rig.readHookLogs(); + + // Diagnostic: Log which hooks we actually got + const hookEvents = hookLogs.map((log) => log.hookCall.hook_event_name); + if (hookLogs.length < expectedMinHooks) { + console.error( + `TEST FAILURE: Expected >= ${expectedMinHooks} hooks, got ${hookLogs.length}: [${hookEvents.length > 0 ? hookEvents.join(', ') : 'NONE'}]`, + ); + } + + expect(hookLogs.length).toBeGreaterThanOrEqual(expectedMinHooks); + + // Find SessionEnd hook log + const sessionEndLog = hookLogs.find( + (log) => + log.hookCall.hook_event_name === 'SessionEnd' && + log.hookCall.hook_name === normalizePath(sessionEndCommand), + ); + // Because the flakiness of the test, we relax this check + // expect(sessionEndLog).toBeDefined(); + if (sessionEndLog) { + expect(sessionEndLog.hookCall.exit_code).toBe(0); + expect(sessionEndLog.hookCall.stdout).toContain( + 'Session ending due to clear', + ); + + // Verify hook input contains reason + const hookInputStr = + typeof sessionEndLog.hookCall.hook_input === 'string' + ? sessionEndLog.hookCall.hook_input + : JSON.stringify(sessionEndLog.hookCall.hook_input); + const hookInput = JSON.parse(hookInputStr) as Record; + expect(hookInput['reason']).toBe('clear'); + } + + // Find SessionStart hook log after clear + const sessionStartAfterClearLogs = hookLogs.filter( + (log) => + log.hookCall.hook_event_name === 'SessionStart' && + log.hookCall.hook_name === normalizePath(sessionStartCommand), + ); + // Should have at least one SessionStart from after clear + // Because the flakiness of the test, we relax this check + // expect(sessionStartAfterClearLogs.length).toBeGreaterThanOrEqual(1); + + const sessionStartLog = sessionStartAfterClearLogs.find((log) => { + const hookInputStr = + typeof log.hookCall.hook_input === 'string' + ? log.hookCall.hook_input + : JSON.stringify(log.hookCall.hook_input); + const hookInput = JSON.parse(hookInputStr) as Record; + return hookInput['source'] === 'clear'; + }); + + // Because the flakiness of the test, we relax this check + // expect(sessionStartLog).toBeDefined(); + if (sessionStartLog) { + expect(sessionStartLog.hookCall.exit_code).toBe(0); + expect(sessionStartLog.hookCall.stdout).toContain( + 'Session starting after clear', + ); + } + }); }); - it('should fire SessionEnd and SessionStart hooks on /clear command', async () => { - rig.setup( - 'should fire SessionEnd and SessionStart hooks on /clear command', - { + describe('Compression Hooks', () => { + it('should fire PreCompress hook on automatic compression', async () => { + rig.setup('should fire PreCompress hook on automatic compression', { fakeResponsesPath: join( import.meta.dirname, - 'hooks-system.session-clear.responses', + 'hooks-system.compress-auto.responses', ), - }, - ); + }); - // Create script files for hooks - const endScriptPath = rig.createScript( - 'session_end_clear.cjs', - "console.log(JSON.stringify({decision: 'allow', systemMessage: 'Session ending due to clear'}));", - ); - const startScriptPath = rig.createScript( - 'session_start_clear.cjs', - "console.log(JSON.stringify({decision: 'allow', systemMessage: 'Session starting after clear'}));", - ); + // Create script file for hook + const scriptPath = rig.createScript( + 'pre_compress_hook.cjs', + "console.log(JSON.stringify({decision: 'allow', systemMessage: 'PreCompress hook executed for automatic compression'}));", + ); - const sessionEndCommand = `node "${endScriptPath}"`; - const sessionStartCommand = `node "${startScriptPath}"`; + const preCompressCommand = `node "${scriptPath}"`; - rig.setup( - 'should fire SessionEnd and SessionStart hooks on /clear command', - { + rig.setup('should fire PreCompress hook on automatic compression', { + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + PreCompress: [ + { + matcher: 'auto', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(preCompressCommand), + timeout: 5000, + }, + ], + }, + ], + }, + // Configure automatic compression with a very low threshold + // This will trigger auto-compression after the first response + contextCompression: { + // enabled: true, + targetTokenCount: 10, // Very low threshold to trigger compression + }, + }, + }); + + // Run a simple query that will trigger automatic compression + await rig.run({ args: 'Say hello in exactly 5 words' }); + + // Verify hook executed with correct parameters + const hookLogs = rig.readHookLogs(); + const preCompressLog = hookLogs.find( + (log) => log.hookCall.hook_event_name === 'PreCompress', + ); + + expect(preCompressLog).toBeDefined(); + if (preCompressLog) { + expect(preCompressLog.hookCall.hook_name).toBe( + normalizePath(preCompressCommand), + ); + expect(preCompressLog.hookCall.exit_code).toBe(0); + expect(preCompressLog.hookCall.hook_input).toBeDefined(); + + // hook_input is a string that needs to be parsed + const hookInputStr = + typeof preCompressLog.hookCall.hook_input === 'string' + ? preCompressLog.hookCall.hook_input + : JSON.stringify(preCompressLog.hookCall.hook_input); + const hookInput = JSON.parse(hookInputStr) as Record; + + expect(hookInput['trigger']).toBe('auto'); + expect(preCompressLog.hookCall.stdout).toContain( + 'PreCompress hook executed for automatic compression', + ); + } + }); + }); + + describe('SessionEnd on Exit', () => { + it('should fire SessionEnd hook on graceful exit in non-interactive mode', async () => { + rig.setup('should fire SessionEnd hook on graceful exit', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.session-startup.responses', + ), + }); + + // Create script file for hook + const scriptPath = rig.createScript( + 'session_end_exit.cjs', + "console.log(JSON.stringify({decision: 'allow', systemMessage: 'SessionEnd hook executed on exit'}));", + ); + + const sessionEndCommand = `node "${scriptPath}"`; + + rig.setup('should fire SessionEnd hook on graceful exit', { settings: { hooksConfig: { enabled: true, @@ -1581,7 +1863,7 @@ console.log(JSON.stringify({ hooks: { SessionEnd: [ { - matcher: '*', + matcher: 'exit', sequential: true, hooks: [ { @@ -1592,14 +1874,287 @@ console.log(JSON.stringify({ ], }, ], - SessionStart: [ + }, + }, + }); + + // Run in non-interactive mode with a simple prompt + await rig.run({ args: 'Hello' }); + + // The process should exit gracefully, firing the SessionEnd hook + // Wait for telemetry to be written to disk + await rig.waitForTelemetryReady(); + + // Poll for the hook log to appear + const isCI = process.env['CI'] === 'true'; + const pollTimeout = isCI ? 30000 : 10000; + const pollResult = await poll( + () => { + const hookLogs = rig.readHookLogs(); + return hookLogs.some( + (log) => log.hookCall.hook_event_name === 'SessionEnd', + ); + }, + pollTimeout, + 200, + ); + + if (!pollResult) { + const hookLogs = rig.readHookLogs(); + console.error( + 'Polling timeout: Expected SessionEnd hook, got:', + JSON.stringify(hookLogs, null, 2), + ); + } + + expect(pollResult).toBe(true); + + const hookLogs = rig.readHookLogs(); + const sessionEndLog = hookLogs.find( + (log) => log.hookCall.hook_event_name === 'SessionEnd', + ); + + expect(sessionEndLog).toBeDefined(); + if (sessionEndLog) { + expect(sessionEndLog.hookCall.hook_name).toBe( + normalizePath(sessionEndCommand), + ); + expect(sessionEndLog.hookCall.exit_code).toBe(0); + expect(sessionEndLog.hookCall.hook_input).toBeDefined(); + + const hookInputStr = + typeof sessionEndLog.hookCall.hook_input === 'string' + ? sessionEndLog.hookCall.hook_input + : JSON.stringify(sessionEndLog.hookCall.hook_input); + const hookInput = JSON.parse(hookInputStr) as Record; + + expect(hookInput['reason']).toBe('exit'); + expect(sessionEndLog.hookCall.stdout).toContain( + 'SessionEnd hook executed', + ); + } + }); + }); + + describe('Hook Disabling', () => { + it('should not execute hooks disabled in settings file', async () => { + const enabledMsg = 'EXECUTION_ALLOWED_BY_HOOK_A'; + const disabledMsg = 'EXECUTION_BLOCKED_BY_HOOK_B'; + + const enabledJson = JSON.stringify({ + decision: 'allow', + systemMessage: enabledMsg, + }); + const disabledJson = JSON.stringify({ + decision: 'block', + reason: disabledMsg, + }); + + const enabledScript = `console.log(JSON.stringify(${enabledJson}));`; + const disabledScript = `console.log(JSON.stringify(${disabledJson}));`; + const enabledFilename = 'enabled_hook.js'; + const disabledFilename = 'disabled_hook.js'; + const enabledCmd = `node ${enabledFilename}`; + const disabledCmd = `node ${disabledFilename}`; + + // 3. Final setup with full settings + rig.setup('Hook Disabling Settings', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.disabled-via-settings.responses', + ), + settings: { + hooksConfig: { + enabled: true, + disabled: ['hook-b'], + }, + hooks: { + BeforeTool: [ { - matcher: '*', + hooks: [ + { + type: 'command', + name: 'hook-a', + command: enabledCmd, + timeout: 60000, + }, + { + type: 'command', + name: 'hook-b', + command: disabledCmd, + timeout: 60000, + }, + ], + }, + ], + }, + }, + }); + + rig.createScript(enabledFilename, enabledScript); + rig.createScript(disabledFilename, disabledScript); + + await rig.run({ + args: 'Create a file called disabled-test.txt with content "test"', + }); + + // Tool should execute (enabled hook allows it) + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // Check hook telemetry - only enabled hook should have executed + const hookLogs = rig.readHookLogs(); + const enabledHookLog = hookLogs.find((log) => + JSON.stringify(log.hookCall.hook_output).includes(enabledMsg), + ); + const disabledHookLog = hookLogs.find((log) => + JSON.stringify(log.hookCall.hook_output).includes(disabledMsg), + ); + + expect(enabledHookLog).toBeDefined(); + expect(disabledHookLog).toBeUndefined(); + }); + + it('should respect disabled hooks across multiple operations', async () => { + const activeMsg = 'MULTIPLE_OPS_ENABLED_HOOK'; + const disabledMsg = 'MULTIPLE_OPS_DISABLED_HOOK'; + + const activeJson = JSON.stringify({ + decision: 'allow', + systemMessage: activeMsg, + }); + const disabledJson = JSON.stringify({ + decision: 'block', + reason: disabledMsg, + }); + + const activeScript = `console.log(JSON.stringify(${activeJson}));`; + const disabledScript = `console.log(JSON.stringify(${disabledJson}));`; + const activeFilename = 'active_hook.js'; + const disabledFilename = 'disabled_hook.js'; + const activeCmd = `node ${activeFilename}`; + const disabledCmd = `node ${disabledFilename}`; + + // 3. Final setup with full settings + rig.setup('Hook Disabling Multiple Ops', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.disabled-via-command.responses', + ), + settings: { + hooksConfig: { + enabled: true, + disabled: ['multi-hook-disabled'], + }, + hooks: { + BeforeTool: [ + { + hooks: [ + { + type: 'command', + name: 'multi-hook-active', + command: activeCmd, + timeout: 60000, + }, + { + type: 'command', + name: 'multi-hook-disabled', + command: disabledCmd, + timeout: 60000, + }, + ], + }, + ], + }, + }, + }); + + rig.createScript(activeFilename, activeScript); + rig.createScript(disabledFilename, disabledScript); + + // First run - only active hook should execute + await rig.run({ + args: 'Create a file called first-run.txt with "test1"', + }); + + // Tool should execute (active hook allows it) + const foundWriteFile1 = await rig.waitForToolCall('write_file'); + expect(foundWriteFile1).toBeTruthy(); + + // Check hook telemetry - only active hook should have executed + const hookLogs1 = rig.readHookLogs(); + const activeHookLog1 = hookLogs1.find((log) => + JSON.stringify(log.hookCall.hook_output).includes(activeMsg), + ); + const disabledHookLog1 = hookLogs1.find((log) => + JSON.stringify(log.hookCall.hook_output).includes(disabledMsg), + ); + + expect(activeHookLog1).toBeDefined(); + expect(disabledHookLog1).toBeUndefined(); + + // Second run - verify disabled hook stays disabled + await rig.run({ + args: 'Create a file called second-run.txt with "test2"', + }); + + const foundWriteFile2 = await rig.waitForToolCall('write_file'); + expect(foundWriteFile2).toBeTruthy(); + + // Verify disabled hook still hasn't executed + const hookLogs2 = rig.readHookLogs(); + const disabledHookLog2 = hookLogs2.find((log) => + JSON.stringify(log.hookCall.hook_output).includes(disabledMsg), + ); + expect(disabledHookLog2).toBeUndefined(); + }); + }); + + describe('BeforeTool Hooks - Input Override', () => { + it('should override tool input parameters via BeforeTool hook', async () => { + // 1. First setup to get the test directory and prepare the hook script + rig.setup('should override tool input parameters via BeforeTool hook'); + + // Create a hook script that overrides the tool input + const hookOutput = { + decision: 'allow', + hookSpecificOutput: { + hookEventName: 'BeforeTool', + tool_input: { + file_path: 'modified.txt', + content: 'modified content', + }, + }, + }; + + const hookScript = `process.stdout.write(JSON.stringify(${JSON.stringify( + hookOutput, + )}));`; + + const scriptPath = rig.createScript( + 'input_override_hook.js', + hookScript, + ); + + // 2. Full setup with settings and fake responses + rig.setup('should override tool input parameters via BeforeTool hook', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.input-modification.responses', + ), + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ + { + matcher: 'write_file', sequential: true, hooks: [ { type: 'command', - command: normalizePath(sessionStartCommand), + command: normalizePath(`node "${scriptPath}"`), timeout: 5000, }, ], @@ -1607,639 +2162,322 @@ console.log(JSON.stringify({ ], }, }, - }, - ); + }); - const run = await rig.runInteractive(); + // Run the agent. The fake response will attempt to call write_file with + // file_path="original.txt" and content="original content" + await rig.run({ + args: 'Create a file called original.txt with content "original content"', + }); - // Send an initial prompt to establish a session - await run.sendKeys('Say hello'); - await run.type('\r'); + // 1. Verify that 'modified.txt' was created with 'modified content' (Override successful) + const modifiedContent = rig.readFile('modified.txt'); + expect(modifiedContent).toBe('modified content'); - // Wait for the response - await run.expectText('Hello', 10000); + // 2. Verify that 'original.txt' was NOT created (Override replaced original) + let originalExists = false; + try { + rig.readFile('original.txt'); + originalExists = true; + } catch { + originalExists = false; + } + expect(originalExists).toBe(false); - // Execute /clear command multiple times to generate more hook events - // This makes the test more robust by creating multiple start/stop cycles - const numClears = 3; - for (let i = 0; i < numClears; i++) { - await run.sendKeys('/clear'); - await run.type('\r'); + // 3. Verify hook telemetry + const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); + expect(hookTelemetryFound).toBeTruthy(); - // Wait a bit for clear to complete - await new Promise((resolve) => setTimeout(resolve, 2000)); - - // Send a prompt to establish an active session before next clear - await run.sendKeys('Say hello'); - await run.type('\r'); - - // Wait for response - await run.expectText('Hello', 10000); - } - - // Wait for all clears to complete - // BatchLogRecordProcessor exports telemetry every 10 seconds by default - // Use generous wait time across all platforms (CI, Docker, Mac, Linux) - await new Promise((resolve) => setTimeout(resolve, 15000)); - - // Wait for telemetry to be written to disk - await rig.waitForTelemetryReady(); - - // Wait for hook telemetry events to be flushed to disk - // In interactive mode, telemetry may be buffered, so we need to poll for the events - // We execute multiple clears to generate more hook events (total: 1 + numClears * 2) - // But we only require >= 1 hooks to pass, making the test more permissive - const expectedMinHooks = 1; // SessionStart (startup), SessionEnd (clear), SessionStart (clear) - const pollResult = await poll( - () => { - const hookLogs = rig.readHookLogs(); - return hookLogs.length >= expectedMinHooks; - }, - 90000, // 90 second timeout for all platforms - 1000, // check every 1s to reduce I/O overhead - ); - - // If polling failed, log diagnostic info - if (!pollResult) { const hookLogs = rig.readHookLogs(); - const hookEvents = hookLogs.map((log) => log.hookCall.hook_event_name); - console.error( - `Polling timeout after 90000ms: Expected >= ${expectedMinHooks} hooks, got ${hookLogs.length}`, - ); - console.error( - 'Hooks found:', - hookEvents.length > 0 ? hookEvents.join(', ') : 'NONE', - ); - console.error('Full hook logs:', JSON.stringify(hookLogs, null, 2)); - } - - // Verify hooks executed - const hookLogs = rig.readHookLogs(); - - // Diagnostic: Log which hooks we actually got - const hookEvents = hookLogs.map((log) => log.hookCall.hook_event_name); - if (hookLogs.length < expectedMinHooks) { - console.error( - `TEST FAILURE: Expected >= ${expectedMinHooks} hooks, got ${hookLogs.length}: [${hookEvents.length > 0 ? hookEvents.join(', ') : 'NONE'}]`, - ); - } - - expect(hookLogs.length).toBeGreaterThanOrEqual(expectedMinHooks); - - // Find SessionEnd hook log - const sessionEndLog = hookLogs.find( - (log) => - log.hookCall.hook_event_name === 'SessionEnd' && - log.hookCall.hook_name === normalizePath(sessionEndCommand), - ); - // Because the flakiness of the test, we relax this check - // expect(sessionEndLog).toBeDefined(); - if (sessionEndLog) { - expect(sessionEndLog.hookCall.exit_code).toBe(0); - expect(sessionEndLog.hookCall.stdout).toContain( - 'Session ending due to clear', + expect(hookLogs.length).toBe(1); + expect(hookLogs[0].hookCall.hook_name).toContain( + 'input_override_hook.js', ); - // Verify hook input contains reason - const hookInputStr = - typeof sessionEndLog.hookCall.hook_input === 'string' - ? sessionEndLog.hookCall.hook_input - : JSON.stringify(sessionEndLog.hookCall.hook_input); - const hookInput = JSON.parse(hookInputStr) as Record; - expect(hookInput['reason']).toBe('clear'); - } - - // Find SessionStart hook log after clear - const sessionStartAfterClearLogs = hookLogs.filter( - (log) => - log.hookCall.hook_event_name === 'SessionStart' && - log.hookCall.hook_name === normalizePath(sessionStartCommand), - ); - // Should have at least one SessionStart from after clear - // Because the flakiness of the test, we relax this check - // expect(sessionStartAfterClearLogs.length).toBeGreaterThanOrEqual(1); - - const sessionStartLog = sessionStartAfterClearLogs.find((log) => { - const hookInputStr = - typeof log.hookCall.hook_input === 'string' - ? log.hookCall.hook_input - : JSON.stringify(log.hookCall.hook_input); - const hookInput = JSON.parse(hookInputStr) as Record; - return hookInput['source'] === 'clear'; + // 4. Verify that the agent didn't try to work-around the hook input change + const toolLogs = rig.readToolLogs(); + expect(toolLogs.length).toBe(1); + expect(toolLogs[0].toolRequest.name).toBe('write_file'); + expect(JSON.parse(toolLogs[0].toolRequest.args).file_path).toBe( + 'modified.txt', + ); }); - - // Because the flakiness of the test, we relax this check - // expect(sessionStartLog).toBeDefined(); - if (sessionStartLog) { - expect(sessionStartLog.hookCall.exit_code).toBe(0); - expect(sessionStartLog.hookCall.stdout).toContain( - 'Session starting after clear', - ); - } }); - }); - describe('Compression Hooks', () => { - it('should fire PreCompress hook on automatic compression', async () => { - rig.setup('should fire PreCompress hook on automatic compression', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.compress-auto.responses', - ), - }); - - // Create script file for hook - const scriptPath = rig.createScript( - 'pre_compress_hook.cjs', - "console.log(JSON.stringify({decision: 'allow', systemMessage: 'PreCompress hook executed for automatic compression'}));", - ); - - const preCompressCommand = `node "${scriptPath}"`; - - rig.setup('should fire PreCompress hook on automatic compression', { - settings: { - hooksConfig: { - enabled: true, + describe('BeforeTool Hooks - Stop Execution', () => { + it('should stop agent execution via BeforeTool hook', async () => { + // Create a hook script that stops execution + const hookOutput = { + continue: false, + reason: 'Emergency Stop triggered by hook', + hookSpecificOutput: { + hookEventName: 'BeforeTool', }, - hooks: { - PreCompress: [ - { - matcher: 'auto', - sequential: true, - hooks: [ + }; + + const hookScript = `console.log(JSON.stringify(${JSON.stringify( + hookOutput, + )}));`; + + rig.setup('should stop agent execution via BeforeTool hook'); + const scriptPath = rig.createScript( + 'before_tool_stop_hook.js', + hookScript, + ); + + rig.setup('should stop agent execution via BeforeTool hook', { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.before-tool-stop.responses', + ), + settings: { + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ + { + matcher: 'write_file', + sequential: true, + hooks: [ + { + type: 'command', + command: normalizePath(`node "${scriptPath}"`), + timeout: 5000, + }, + ], + }, + ], + }, + }, + }); + + const result = await rig.run({ + args: 'Use write_file to create test.txt', + }); + + // The hook should have stopped execution message (returned from tool) + expect(result).toContain( + 'Agent execution stopped by hook: Emergency Stop triggered by hook', + ); + + // Tool should NOT be called successfully (it was blocked/stopped) + const toolLogs = rig.readToolLogs(); + const writeFileCalls = toolLogs.filter( + (t) => + t.toolRequest.name === 'write_file' && + t.toolRequest.success === true, + ); + expect(writeFileCalls).toHaveLength(0); + }); + }); + + describe('Hooks "ask" Decision Integration', () => { + it( + 'should force confirmation prompt when hook returns "ask" decision even in YOLO mode', + { timeout: 60000 }, + async () => { + const testName = + 'should force confirmation prompt when hook returns "ask" decision even in YOLO mode'; + + // 1. Setup hook script that returns 'ask' decision + const hookOutput = { + decision: 'ask', + systemMessage: 'Confirmation forced by security hook', + hookSpecificOutput: { + hookEventName: 'BeforeTool', + }, + }; + + const hookScript = `console.log(JSON.stringify(${JSON.stringify( + hookOutput, + )}));`; + + // Create script path predictably + const scriptPath = join(os.tmpdir(), 'gemini-cli-tests-ask-hook.js'); + writeFileSync(scriptPath, hookScript); + + // 2. Setup rig with YOLO mode enabled but with the 'ask' hook + rig.setup(testName, { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.allow-tool.responses', + ), + settings: { + debugMode: true, + tools: { + approval: 'yolo', + }, + general: { + enableAutoUpdateNotification: false, + }, + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ { - type: 'command', - command: normalizePath(preCompressCommand), - timeout: 5000, + matcher: 'write_file', + hooks: [ + { + type: 'command', + command: `node "${scriptPath}"`, + timeout: 5000, + }, + ], }, ], }, - ], - }, - // Configure automatic compression with a very low threshold - // This will trigger auto-compression after the first response - contextCompression: { - // enabled: true, - targetTokenCount: 10, // Very low threshold to trigger compression - }, - }, - }); + }, + }); - // Run a simple query that will trigger automatic compression - await rig.run({ args: 'Say hello in exactly 5 words' }); - - // Verify hook executed with correct parameters - const hookLogs = rig.readHookLogs(); - const preCompressLog = hookLogs.find( - (log) => log.hookCall.hook_event_name === 'PreCompress', - ); - - expect(preCompressLog).toBeDefined(); - if (preCompressLog) { - expect(preCompressLog.hookCall.hook_name).toBe( - normalizePath(preCompressCommand), - ); - expect(preCompressLog.hookCall.exit_code).toBe(0); - expect(preCompressLog.hookCall.hook_input).toBeDefined(); - - // hook_input is a string that needs to be parsed - const hookInputStr = - typeof preCompressLog.hookCall.hook_input === 'string' - ? preCompressLog.hookCall.hook_input - : JSON.stringify(preCompressLog.hookCall.hook_input); - const hookInput = JSON.parse(hookInputStr) as Record; - - expect(hookInput['trigger']).toBe('auto'); - expect(preCompressLog.hookCall.stdout).toContain( - 'PreCompress hook executed for automatic compression', - ); - } - }); - }); - - describe('SessionEnd on Exit', () => { - it('should fire SessionEnd hook on graceful exit in non-interactive mode', async () => { - rig.setup('should fire SessionEnd hook on graceful exit', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.session-startup.responses', - ), - }); - - // Create script file for hook - const scriptPath = rig.createScript( - 'session_end_exit.cjs', - "console.log(JSON.stringify({decision: 'allow', systemMessage: 'SessionEnd hook executed on exit'}));", - ); - - const sessionEndCommand = `node "${scriptPath}"`; - - rig.setup('should fire SessionEnd hook on graceful exit', { - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - SessionEnd: [ - { - matcher: 'exit', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(sessionEndCommand), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - // Run in non-interactive mode with a simple prompt - await rig.run({ args: 'Hello' }); - - // The process should exit gracefully, firing the SessionEnd hook - // Wait for telemetry to be written to disk - await rig.waitForTelemetryReady(); - - // Poll for the hook log to appear - const isCI = process.env['CI'] === 'true'; - const pollTimeout = isCI ? 30000 : 10000; - const pollResult = await poll( - () => { - const hookLogs = rig.readHookLogs(); - return hookLogs.some( - (log) => log.hookCall.hook_event_name === 'SessionEnd', + // Bypass terminal setup prompt and other startup banners + const stateDir = join(rig.homeDir!, '.gemini'); + if (!existsSync(stateDir)) mkdirSync(stateDir, { recursive: true }); + writeFileSync( + join(stateDir, 'state.json'), + JSON.stringify({ + terminalSetupPromptShown: true, + hasSeenScreenReaderNudge: true, + tipsShown: 100, + }), ); + + // 3. Run interactive and verify prompt appears despite YOLO mode + const run = await rig.runInteractive(); + + // Wait for prompt to appear + await run.expectText('Type your message', 30000); + + // Send prompt that will trigger write_file + await run.type( + 'Create a file called ask-test.txt with content "test"', + ); + await run.type('\r'); + + // Wait for the FORCED confirmation prompt to appear + // It should contain the system message from the hook + await run.expectText('Confirmation forced by security hook', 30000); + await run.expectText('Allow', 5000); + + // 4. Approve the permission + await run.type('y'); + await run.type('\r'); + + // Wait for command to execute + await run.expectText('approved.txt', 30000); + + // Should find the tool call + const foundWriteFile = await rig.waitForToolCall('write_file'); + expect(foundWriteFile).toBeTruthy(); + + // File should be created + const fileContent = rig.readFile('approved.txt'); + expect(fileContent).toBe('Approved content'); }, - pollTimeout, - 200, ); - if (!pollResult) { - const hookLogs = rig.readHookLogs(); - console.error( - 'Polling timeout: Expected SessionEnd hook, got:', - JSON.stringify(hookLogs, null, 2), - ); - } + it( + 'should allow cancelling when hook forces "ask" decision', + { timeout: 60000 }, + async () => { + const testName = + 'should allow cancelling when hook forces "ask" decision'; + const hookOutput = { + decision: 'ask', + systemMessage: 'Confirmation forced for cancellation test', + hookSpecificOutput: { + hookEventName: 'BeforeTool', + }, + }; - expect(pollResult).toBe(true); + const hookScript = `console.log(JSON.stringify(${JSON.stringify( + hookOutput, + )}));`; - const hookLogs = rig.readHookLogs(); - const sessionEndLog = hookLogs.find( - (log) => log.hookCall.hook_event_name === 'SessionEnd', - ); + const scriptPath = join( + os.tmpdir(), + 'gemini-cli-tests-ask-cancel-hook.js', + ); + writeFileSync(scriptPath, hookScript); - expect(sessionEndLog).toBeDefined(); - if (sessionEndLog) { - expect(sessionEndLog.hookCall.hook_name).toBe( - normalizePath(sessionEndCommand), - ); - expect(sessionEndLog.hookCall.exit_code).toBe(0); - expect(sessionEndLog.hookCall.hook_input).toBeDefined(); - - const hookInputStr = - typeof sessionEndLog.hookCall.hook_input === 'string' - ? sessionEndLog.hookCall.hook_input - : JSON.stringify(sessionEndLog.hookCall.hook_input); - const hookInput = JSON.parse(hookInputStr) as Record; - - expect(hookInput['reason']).toBe('exit'); - expect(sessionEndLog.hookCall.stdout).toContain( - 'SessionEnd hook executed', - ); - } - }); - }); - - describe('Hook Disabling', () => { - it('should not execute hooks disabled in settings file', async () => { - const enabledMsg = 'EXECUTION_ALLOWED_BY_HOOK_A'; - const disabledMsg = 'EXECUTION_BLOCKED_BY_HOOK_B'; - - const enabledJson = JSON.stringify({ - decision: 'allow', - systemMessage: enabledMsg, - }); - const disabledJson = JSON.stringify({ - decision: 'block', - reason: disabledMsg, - }); - - const enabledScript = `console.log(JSON.stringify(${enabledJson}));`; - const disabledScript = `console.log(JSON.stringify(${disabledJson}));`; - const enabledFilename = 'enabled_hook.js'; - const disabledFilename = 'disabled_hook.js'; - const enabledCmd = `node ${enabledFilename}`; - const disabledCmd = `node ${disabledFilename}`; - - // 3. Final setup with full settings - rig.setup('Hook Disabling Settings', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.disabled-via-settings.responses', - ), - settings: { - hooksConfig: { - enabled: true, - disabled: ['hook-b'], - }, - hooks: { - BeforeTool: [ - { - hooks: [ + rig.setup(testName, { + fakeResponsesPath: join( + import.meta.dirname, + 'hooks-system.allow-tool.responses', + ), + settings: { + debugMode: true, + tools: { + approval: 'yolo', + }, + general: { + enableAutoUpdateNotification: false, + }, + hooksConfig: { + enabled: true, + }, + hooks: { + BeforeTool: [ { - type: 'command', - name: 'hook-a', - command: enabledCmd, - timeout: 60000, - }, - { - type: 'command', - name: 'hook-b', - command: disabledCmd, - timeout: 60000, + matcher: 'write_file', + hooks: [ + { + type: 'command', + command: `node "${scriptPath}"`, + timeout: 5000, + }, + ], }, ], }, - ], - }, + }, + }); + + // Bypass terminal setup prompt and other startup banners + const stateDir = join(rig.homeDir!, '.gemini'); + if (!existsSync(stateDir)) mkdirSync(stateDir, { recursive: true }); + writeFileSync( + join(stateDir, 'state.json'), + JSON.stringify({ + terminalSetupPromptShown: true, + hasSeenScreenReaderNudge: true, + tipsShown: 100, + }), + ); + + const run = await rig.runInteractive(); + + // Wait for prompt to appear + await run.expectText('Type your message', 30000); + + await run.type( + 'Create a file called cancel-test.txt with content "test"', + ); + await run.type('\r'); + + await run.expectText( + 'Confirmation forced for cancellation test', + 30000, + ); + + // 4. Deny the permission using option 4 + await run.type('4'); + await run.type('\r'); + + // Wait for cancellation message + await run.expectText('Cancelled', 15000); + + // Tool should NOT be called successfully + const toolLogs = rig.readToolLogs(); + const writeFileCalls = toolLogs.filter( + (t) => + t.toolRequest.name === 'write_file' && + t.toolRequest.success === true, + ); + expect(writeFileCalls).toHaveLength(0); }, - }); - - rig.createScript(enabledFilename, enabledScript); - rig.createScript(disabledFilename, disabledScript); - - await rig.run({ - args: 'Create a file called disabled-test.txt with content "test"', - }); - - // Tool should execute (enabled hook allows it) - const foundWriteFile = await rig.waitForToolCall('write_file'); - expect(foundWriteFile).toBeTruthy(); - - // Check hook telemetry - only enabled hook should have executed - const hookLogs = rig.readHookLogs(); - const enabledHookLog = hookLogs.find((log) => - JSON.stringify(log.hookCall.hook_output).includes(enabledMsg), - ); - const disabledHookLog = hookLogs.find((log) => - JSON.stringify(log.hookCall.hook_output).includes(disabledMsg), - ); - - expect(enabledHookLog).toBeDefined(); - expect(disabledHookLog).toBeUndefined(); - }); - - it('should respect disabled hooks across multiple operations', async () => { - const activeMsg = 'MULTIPLE_OPS_ENABLED_HOOK'; - const disabledMsg = 'MULTIPLE_OPS_DISABLED_HOOK'; - - const activeJson = JSON.stringify({ - decision: 'allow', - systemMessage: activeMsg, - }); - const disabledJson = JSON.stringify({ - decision: 'block', - reason: disabledMsg, - }); - - const activeScript = `console.log(JSON.stringify(${activeJson}));`; - const disabledScript = `console.log(JSON.stringify(${disabledJson}));`; - const activeFilename = 'active_hook.js'; - const disabledFilename = 'disabled_hook.js'; - const activeCmd = `node ${activeFilename}`; - const disabledCmd = `node ${disabledFilename}`; - - // 3. Final setup with full settings - rig.setup('Hook Disabling Multiple Ops', { - settings: { - hooksConfig: { - enabled: true, - disabled: ['multi-hook-disabled'], - }, - hooks: { - BeforeTool: [ - { - hooks: [ - { - type: 'command', - name: 'multi-hook-active', - command: activeCmd, - timeout: 60000, - }, - { - type: 'command', - name: 'multi-hook-disabled', - command: disabledCmd, - timeout: 60000, - }, - ], - }, - ], - }, - }, - }); - - rig.createScript(activeFilename, activeScript); - rig.createScript(disabledFilename, disabledScript); - - // First run - only active hook should execute - await rig.run({ - args: 'Create a file called first-run.txt with "test1"', - }); - - // Tool should execute (active hook allows it) - const foundWriteFile1 = await rig.waitForToolCall('write_file'); - expect(foundWriteFile1).toBeTruthy(); - - // Check hook telemetry - only active hook should have executed - const hookLogs1 = rig.readHookLogs(); - const activeHookLog1 = hookLogs1.find((log) => - JSON.stringify(log.hookCall.hook_output).includes(activeMsg), - ); - const disabledHookLog1 = hookLogs1.find((log) => - JSON.stringify(log.hookCall.hook_output).includes(disabledMsg), - ); - - expect(activeHookLog1).toBeDefined(); - expect(disabledHookLog1).toBeUndefined(); - - // Second run - verify disabled hook stays disabled - await rig.run({ - args: 'Create a file called second-run.txt with "test2"', - }); - - const foundWriteFile2 = await rig.waitForToolCall('write_file'); - expect(foundWriteFile2).toBeTruthy(); - - // Verify disabled hook still hasn't executed - const hookLogs2 = rig.readHookLogs(); - const disabledHookLog2 = hookLogs2.find((log) => - JSON.stringify(log.hookCall.hook_output).includes(disabledMsg), - ); - expect(disabledHookLog2).toBeUndefined(); - }); - }); - - describe('BeforeTool Hooks - Input Override', () => { - it('should override tool input parameters via BeforeTool hook', async () => { - // 1. First setup to get the test directory and prepare the hook script - rig.setup('should override tool input parameters via BeforeTool hook'); - - // Create a hook script that overrides the tool input - const hookOutput = { - decision: 'allow', - hookSpecificOutput: { - hookEventName: 'BeforeTool', - tool_input: { - file_path: 'modified.txt', - content: 'modified content', - }, - }, - }; - - const hookScript = `process.stdout.write(JSON.stringify(${JSON.stringify( - hookOutput, - )}));`; - - const scriptPath = rig.createScript('input_override_hook.js', hookScript); - - // 2. Full setup with settings and fake responses - rig.setup('should override tool input parameters via BeforeTool hook', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.input-modification.responses', - ), - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - matcher: 'write_file', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - // Run the agent. The fake response will attempt to call write_file with - // file_path="original.txt" and content="original content" - await rig.run({ - args: 'Create a file called original.txt with content "original content"', - }); - - // 1. Verify that 'modified.txt' was created with 'modified content' (Override successful) - const modifiedContent = rig.readFile('modified.txt'); - expect(modifiedContent).toBe('modified content'); - - // 2. Verify that 'original.txt' was NOT created (Override replaced original) - let originalExists = false; - try { - rig.readFile('original.txt'); - originalExists = true; - } catch { - originalExists = false; - } - expect(originalExists).toBe(false); - - // 3. Verify hook telemetry - const hookTelemetryFound = await rig.waitForTelemetryEvent('hook_call'); - expect(hookTelemetryFound).toBeTruthy(); - - const hookLogs = rig.readHookLogs(); - expect(hookLogs.length).toBe(1); - expect(hookLogs[0].hookCall.hook_name).toContain( - 'input_override_hook.js', - ); - - // 4. Verify that the agent didn't try to work-around the hook input change - const toolLogs = rig.readToolLogs(); - expect(toolLogs.length).toBe(1); - expect(toolLogs[0].toolRequest.name).toBe('write_file'); - expect(JSON.parse(toolLogs[0].toolRequest.args).file_path).toBe( - 'modified.txt', ); }); - }); - - describe('BeforeTool Hooks - Stop Execution', () => { - it('should stop agent execution via BeforeTool hook', async () => { - // Create a hook script that stops execution - const hookOutput = { - continue: false, - reason: 'Emergency Stop triggered by hook', - hookSpecificOutput: { - hookEventName: 'BeforeTool', - }, - }; - - const hookScript = `console.log(JSON.stringify(${JSON.stringify( - hookOutput, - )}));`; - - rig.setup('should stop agent execution via BeforeTool hook'); - const scriptPath = rig.createScript( - 'before_tool_stop_hook.js', - hookScript, - ); - - rig.setup('should stop agent execution via BeforeTool hook', { - fakeResponsesPath: join( - import.meta.dirname, - 'hooks-system.before-tool-stop.responses', - ), - settings: { - hooksConfig: { - enabled: true, - }, - hooks: { - BeforeTool: [ - { - matcher: 'write_file', - sequential: true, - hooks: [ - { - type: 'command', - command: normalizePath(`node "${scriptPath}"`), - timeout: 5000, - }, - ], - }, - ], - }, - }, - }); - - const result = await rig.run({ - args: 'Use write_file to create test.txt', - }); - - // The hook should have stopped execution message (returned from tool) - expect(result).toContain( - 'Agent execution stopped: Emergency Stop triggered by hook', - ); - - // Tool should NOT be called successfully (it was blocked/stopped) - const toolLogs = rig.readToolLogs(); - const writeFileCalls = toolLogs.filter( - (t) => - t.toolRequest.name === 'write_file' && t.toolRequest.success === true, - ); - expect(writeFileCalls).toHaveLength(0); - }); - }); -}); + }, +); diff --git a/integration-tests/symlink-install.test.ts b/integration-tests/symlink-install.test.ts index be4a5ac398..c98db98029 100644 --- a/integration-tests/symlink-install.test.ts +++ b/integration-tests/symlink-install.test.ts @@ -5,7 +5,7 @@ */ import { describe, expect, it, beforeEach, afterEach } from 'vitest'; -import { TestRig, InteractiveRun } from './test-helper.js'; +import { TestRig, InteractiveRun, skipFlaky } from './test-helper.js'; import * as fs from 'node:fs'; import * as os from 'node:os'; import { @@ -33,104 +33,107 @@ const otherExtension = `{ "version": "6.6.6" }`; -describe('extension symlink install spoofing protection', () => { - let rig: TestRig; +describe.skipIf(skipFlaky)( + 'extension symlink install spoofing protection', + () => { + let rig: TestRig; - beforeEach(() => { - rig = new TestRig(); - }); - - afterEach(async () => await rig.cleanup()); - - it('canonicalizes the trust path and prevents symlink spoofing', async () => { - // Enable folder trust for this test - rig.setup('symlink spoofing test', { - settings: { - security: { - folderTrust: { - enabled: true, - }, - }, - }, + beforeEach(() => { + rig = new TestRig(); }); - const realExtPath = join(rig.testDir!, 'real-extension'); - mkdirSync(realExtPath); - writeFileSync(join(realExtPath, 'gemini-extension.json'), extension); + afterEach(async () => await rig.cleanup()); - const maliciousExtPath = join( - os.tmpdir(), - `malicious-extension-${Date.now()}`, - ); - mkdirSync(maliciousExtPath); - writeFileSync( - join(maliciousExtPath, 'gemini-extension.json'), - otherExtension, - ); - - const symlinkPath = join(rig.testDir!, 'symlink-extension'); - symlinkSync(realExtPath, symlinkPath); - - // Function to run a command with a PTY to avoid headless mode - const runPty = (args: string[]) => { - const ptyProcess = pty.spawn(process.execPath, [BUNDLE_PATH, ...args], { - name: 'xterm-color', - cols: 80, - rows: 80, - cwd: rig.testDir!, - env: { - ...process.env, - GEMINI_CLI_HOME: rig.homeDir!, - GEMINI_CLI_INTEGRATION_TEST: 'true', - GEMINI_PTY_INFO: 'node-pty', + it('canonicalizes the trust path and prevents symlink spoofing', async () => { + // Enable folder trust for this test + rig.setup('symlink spoofing test', { + settings: { + security: { + folderTrust: { + enabled: true, + }, + }, }, }); - return new InteractiveRun(ptyProcess); - }; - // 1. Install via symlink, trust it - const run1 = runPty(['extensions', 'install', symlinkPath]); - await run1.expectText('Do you want to trust this folder', 30000); - await run1.type('y\r'); - await run1.expectText('trust this workspace', 30000); - await run1.type('y\r'); - await run1.expectText('Do you want to continue', 30000); - await run1.type('y\r'); - await run1.expectText('installed successfully', 30000); - await run1.kill(); + const realExtPath = join(rig.testDir!, 'real-extension'); + mkdirSync(realExtPath); + writeFileSync(join(realExtPath, 'gemini-extension.json'), extension); - // 2. Verify trustedFolders.json contains the REAL path, not the symlink path - const trustedFoldersPath = join( - rig.homeDir!, - GEMINI_DIR, - 'trustedFolders.json', - ); - // Wait for file to be written - let attempts = 0; - while (!fs.existsSync(trustedFoldersPath) && attempts < 50) { - await new Promise((resolve) => setTimeout(resolve, 100)); - attempts++; - } + const maliciousExtPath = join( + os.tmpdir(), + `malicious-extension-${Date.now()}`, + ); + mkdirSync(maliciousExtPath); + writeFileSync( + join(maliciousExtPath, 'gemini-extension.json'), + otherExtension, + ); - const trustedFolders = JSON.parse( - readFileSync(trustedFoldersPath, 'utf-8'), - ); - const trustedPaths = Object.keys(trustedFolders); - const canonicalRealExtPath = fs.realpathSync(realExtPath); + const symlinkPath = join(rig.testDir!, 'symlink-extension'); + symlinkSync(realExtPath, symlinkPath); - expect(trustedPaths).toContain(canonicalRealExtPath); - expect(trustedPaths).not.toContain(symlinkPath); + // Function to run a command with a PTY to avoid headless mode + const runPty = (args: string[]) => { + const ptyProcess = pty.spawn(process.execPath, [BUNDLE_PATH, ...args], { + name: 'xterm-color', + cols: 80, + rows: 80, + cwd: rig.testDir!, + env: { + ...process.env, + GEMINI_CLI_HOME: rig.homeDir!, + GEMINI_CLI_INTEGRATION_TEST: 'true', + GEMINI_PTY_INFO: 'node-pty', + }, + }); + return new InteractiveRun(ptyProcess); + }; - // 3. Swap the symlink to point to the malicious extension - unlinkSync(symlinkPath); - symlinkSync(maliciousExtPath, symlinkPath); + // 1. Install via symlink, trust it + const run1 = runPty(['extensions', 'install', symlinkPath]); + await run1.expectText('Do you want to trust this folder', 30000); + await run1.type('y\r'); + await run1.expectText('trust this workspace', 30000); + await run1.type('y\r'); + await run1.expectText('Do you want to continue', 30000); + await run1.type('y\r'); + await run1.expectText('installed successfully', 30000); + await run1.kill(); - // 4. Try to install again via the same symlink path. - // It should NOT be trusted because the real path changed. - const run2 = runPty(['extensions', 'install', symlinkPath]); - await run2.expectText('Do you want to trust this folder', 30000); - await run2.type('n\r'); - await run2.expectText('Installation aborted', 30000); - await run2.kill(); - }, 60000); -}); + // 2. Verify trustedFolders.json contains the REAL path, not the symlink path + const trustedFoldersPath = join( + rig.homeDir!, + GEMINI_DIR, + 'trustedFolders.json', + ); + // Wait for file to be written + let attempts = 0; + while (!fs.existsSync(trustedFoldersPath) && attempts < 50) { + await new Promise((resolve) => setTimeout(resolve, 100)); + attempts++; + } + + const trustedFolders = JSON.parse( + readFileSync(trustedFoldersPath, 'utf-8'), + ); + const trustedPaths = Object.keys(trustedFolders); + const canonicalRealExtPath = fs.realpathSync(realExtPath); + + expect(trustedPaths).toContain(canonicalRealExtPath); + expect(trustedPaths).not.toContain(symlinkPath); + + // 3. Swap the symlink to point to the malicious extension + unlinkSync(symlinkPath); + symlinkSync(maliciousExtPath, symlinkPath); + + // 4. Try to install again via the same symlink path. + // It should NOT be trusted because the real path changed. + const run2 = runPty(['extensions', 'install', symlinkPath]); + await run2.expectText('Do you want to trust this folder', 30000); + await run2.type('n\r'); + await run2.expectText('Installation aborted', 30000); + await run2.kill(); + }, 60000); + }, +); diff --git a/integration-tests/test-helper.ts b/integration-tests/test-helper.ts index a4546a2cd3..5f205ae997 100644 --- a/integration-tests/test-helper.ts +++ b/integration-tests/test-helper.ts @@ -6,3 +6,5 @@ export * from '@google/gemini-cli-test-utils'; export { normalizePath } from '@google/gemini-cli-test-utils'; + +export const skipFlaky = !process.env['RUN_FLAKY_INTEGRATION']; diff --git a/integration-tests/test-mcp-support.responses b/integration-tests/test-mcp-support.responses new file mode 100644 index 0000000000..1db32fdc21 --- /dev/null +++ b/integration-tests/test-mcp-support.responses @@ -0,0 +1,2 @@ +{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"mcp_weather-server_get_weather","args":{"location":"London"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":10,"candidatesTokenCount":10,"totalTokenCount":20}}]} +{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"The weather in London is rainy."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":10,"candidatesTokenCount":10,"totalTokenCount":20}}]} diff --git a/integration-tests/test-mcp-support.test.ts b/integration-tests/test-mcp-support.test.ts new file mode 100644 index 0000000000..15266e6be9 --- /dev/null +++ b/integration-tests/test-mcp-support.test.ts @@ -0,0 +1,75 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import { + TestRig, + assertModelHasOutput, + TestMcpServerBuilder, +} from './test-helper.js'; +import { join, dirname } from 'node:path'; +import { fileURLToPath } from 'node:url'; +import fs from 'node:fs'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); + +describe('test-mcp-support', () => { + let rig: TestRig; + + beforeEach(() => { + rig = new TestRig(); + }); + + afterEach(async () => await rig.cleanup()); + + it('should discover and call a tool on the test server', async () => { + await rig.setup('test-mcp-test', { + settings: { + tools: { core: [] }, // disable core tools to force using MCP + model: { + name: 'gemini-3-flash-preview', + }, + }, + fakeResponsesPath: join(__dirname, 'test-mcp-support.responses'), + }); + + // Workaround for ProjectRegistry save issue + const userGeminiDir = join(rig.homeDir!, '.gemini'); + fs.writeFileSync(join(userGeminiDir, 'projects.json'), '{"projects":{}}'); + + const builder = new TestMcpServerBuilder('weather-server').addTool( + 'get_weather', + 'Get the weather for a location', + 'The weather in London is always rainy.', + { + type: 'object', + properties: { + location: { type: 'string' }, + }, + }, + ); + + rig.addTestMcpServer('weather-server', builder.build()); + + // Run the CLI asking for weather + const output = await rig.run({ + args: 'What is the weather in London? Answer with the raw tool response snippet.', + env: { GEMINI_API_KEY: 'dummy' }, + }); + + // Assert tool call + const foundToolCall = await rig.waitForToolCall( + 'mcp_weather-server_get_weather', + ); + expect( + foundToolCall, + 'Expected to find a get_weather tool call', + ).toBeTruthy(); + + assertModelHasOutput(output); + expect(output.toLowerCase()).toContain('rainy'); + }, 30000); +}); diff --git a/package.json b/package.json index 72676cf90b..d66132c066 100644 --- a/package.json +++ b/package.json @@ -48,10 +48,11 @@ "test:all_evals": "cross-env RUN_EVALS=1 vitest run --config evals/vitest.config.ts", "test:e2e": "cross-env VERBOSE=true KEEP_OUTPUT=true npm run test:integration:sandbox:none", "test:integration:all": "npm run test:integration:sandbox:none && npm run test:integration:sandbox:docker && npm run test:integration:sandbox:podman", + "test:integration:flaky": "cross-env RUN_FLAKY_INTEGRATION=1 npm run test:integration:sandbox:none", "test:integration:sandbox:none": "cross-env GEMINI_SANDBOX=false vitest run --root ./integration-tests", "test:integration:sandbox:docker": "cross-env GEMINI_SANDBOX=docker npm run build:sandbox && cross-env GEMINI_SANDBOX=docker vitest run --root ./integration-tests", "test:integration:sandbox:podman": "cross-env GEMINI_SANDBOX=podman vitest run --root ./integration-tests", - "lint": "eslint . --cache", + "lint": "eslint . --cache --max-warnings 0", "lint:fix": "eslint . --fix --ext .ts,.tsx && eslint integration-tests --fix && eslint scripts --fix && npm run format", "lint:ci": "npm run lint:all", "lint:all": "node scripts/lint.js", diff --git a/packages/a2a-server/src/config/config.test.ts b/packages/a2a-server/src/config/config.test.ts index cfe77311ea..370c859944 100644 --- a/packages/a2a-server/src/config/config.test.ts +++ b/packages/a2a-server/src/config/config.test.ts @@ -341,11 +341,11 @@ describe('loadConfig', () => { ); }); - it('should default enableAgents to true when not provided', async () => { + it('should default enableAgents to false when not provided', async () => { await loadConfig(mockSettings, mockExtensionLoader, taskId); expect(Config).toHaveBeenCalledWith( expect.objectContaining({ - enableAgents: true, + enableAgents: false, }), ); }); diff --git a/packages/a2a-server/src/config/config.ts b/packages/a2a-server/src/config/config.ts index 9474c4d9c5..1fe55258fc 100644 --- a/packages/a2a-server/src/config/config.ts +++ b/packages/a2a-server/src/config/config.ts @@ -87,6 +87,7 @@ export async function loadConfig( approvalMode === ApprovalMode.YOLO ? [ { + toolName: '*', decision: PolicyDecision.ALLOW, priority: PRIORITY_YOLO_ALLOW_ALL, modes: [ApprovalMode.YOLO], @@ -127,7 +128,7 @@ export async function loadConfig( interactive: !isHeadlessMode(), enableInteractiveShell: !isHeadlessMode(), ptyInfo: 'auto', - enableAgents: settings.experimental?.enableAgents ?? true, + enableAgents: settings.experimental?.enableAgents ?? false, }; const fileService = new FileDiscoveryService(workspaceDir, { diff --git a/packages/a2a-server/src/utils/testing_utils.ts b/packages/a2a-server/src/utils/testing_utils.ts index fd4d721732..8181f702f1 100644 --- a/packages/a2a-server/src/utils/testing_utils.ts +++ b/packages/a2a-server/src/utils/testing_utils.ts @@ -97,6 +97,7 @@ export function createMockConfig( getMcpClientManager: vi.fn().mockReturnValue({ getMcpServers: vi.fn().mockReturnValue({}), }), + getTelemetryLogPromptsEnabled: vi.fn().mockReturnValue(false), getGitService: vi.fn(), validatePathAccess: vi.fn().mockReturnValue(undefined), getShellExecutionConfig: vi.fn().mockReturnValue({ diff --git a/packages/cli/GEMINI.md b/packages/cli/GEMINI.md index e98ca81376..8bad8f0721 100644 --- a/packages/cli/GEMINI.md +++ b/packages/cli/GEMINI.md @@ -7,7 +7,10 @@ - **Shortcuts**: only define keyboard shortcuts in `packages/cli/src/ui/key/keyBindings.ts` - Do not implement any logic performing custom string measurement or string - truncation. Use Ink layout instead leveraging ResizeObserver as needed. + truncation. Use Ink layout instead leveraging ResizeObserver as needed. When + using `ResizeObserver`, prefer the `useCallback` ref pattern (as seen in + `MaxSizedBox.tsx`) to ensure size measurements are captured as soon as the + element is available, avoiding potential rendering timing issues. - Avoid prop drilling when at all possible. ## Testing diff --git a/packages/cli/src/acp/acpClient.test.ts b/packages/cli/src/acp/acpClient.test.ts index 0f9c4a8e5b..3ae71e6ebb 100644 --- a/packages/cli/src/acp/acpClient.test.ts +++ b/packages/cli/src/acp/acpClient.test.ts @@ -1080,6 +1080,70 @@ describe('Session', () => { ); }); + it('should split getDisplayTitle and getExplanation for title and content in permission request', async () => { + const confirmationDetails = { + type: 'info', + onConfirm: vi.fn(), + }; + mockTool.build.mockReturnValue({ + getDescription: () => 'Original Description', + getDisplayTitle: () => 'Display Title Only', + getExplanation: () => 'A detailed explanation text', + toolLocations: () => [], + shouldConfirmExecute: vi.fn().mockResolvedValue(confirmationDetails), + execute: vi.fn().mockResolvedValue({ llmContent: 'Tool Result' }), + }); + + mockConnection.requestPermission.mockResolvedValue({ + outcome: { + outcome: 'selected', + optionId: ToolConfirmationOutcome.ProceedOnce, + }, + }); + + const stream1 = createMockStream([ + { + type: StreamEventType.CHUNK, + value: { + functionCalls: [{ name: 'test_tool', args: {} }], + }, + }, + ]); + const stream2 = createMockStream([ + { + type: StreamEventType.CHUNK, + value: { candidates: [] }, + }, + ]); + + mockChat.sendMessageStream + .mockResolvedValueOnce(stream1) + .mockResolvedValueOnce(stream2); + + await session.prompt({ + sessionId: 'session-1', + prompt: [{ type: 'text', text: 'Call tool' }], + }); + + expect(mockConnection.requestPermission).toHaveBeenCalledWith( + expect.objectContaining({ + toolCall: expect.objectContaining({ + title: 'Display Title Only', + content: [], + }), + }), + ); + + expect(mockConnection.sessionUpdate).toHaveBeenCalledWith( + expect.objectContaining({ + update: expect.objectContaining({ + sessionUpdate: 'agent_thought_chunk', + content: { type: 'text', text: 'A detailed explanation text' }, + }), + }), + ); + }); + it('should use filePath for ACP diff content in tool result', async () => { mockTool.build.mockReturnValue({ getDescription: () => 'Test Tool', diff --git a/packages/cli/src/acp/acpClient.ts b/packages/cli/src/acp/acpClient.ts index 5e3f3666b1..bead6f0067 100644 --- a/packages/cli/src/acp/acpClient.ts +++ b/packages/cli/src/acp/acpClient.ts @@ -98,6 +98,12 @@ export async function runAcpClient( } export class GeminiAgent { + private static callIdCounter = 0; + + static generateCallId(name: string): string { + return `${name}-${Date.now()}-${++GeminiAgent.callIdCounter}`; + } + private sessions: Map = new Map(); private clientCapabilities: acp.ClientCapabilities | undefined; private apiKey: string | undefined; @@ -897,7 +903,7 @@ export class Session { promptId: string, fc: FunctionCall, ): Promise { - const callId = fc.id ?? `${fc.name}-${Date.now()}`; + const callId = fc.id ?? GeminiAgent.generateCallId(fc.name || 'unknown'); const args = fc.args ?? {}; const startTime = Date.now(); @@ -947,6 +953,23 @@ export class Session { try { const invocation = tool.build(args); + const displayTitle = + typeof invocation.getDisplayTitle === 'function' + ? invocation.getDisplayTitle() + : invocation.getDescription(); + + const explanation = + typeof invocation.getExplanation === 'function' + ? invocation.getExplanation() + : ''; + + if (explanation) { + await this.sendUpdate({ + sessionUpdate: 'agent_thought_chunk', + content: { type: 'text', text: explanation }, + }); + } + const confirmationDetails = await invocation.shouldConfirmExecute(abortSignal); @@ -978,7 +1001,7 @@ export class Session { toolCall: { toolCallId: callId, status: 'pending', - title: invocation.getDescription(), + title: displayTitle, content, locations: invocation.toolLocations(), kind: toAcpToolKind(tool.kind), @@ -1014,12 +1037,14 @@ export class Session { } } } else { + const content: acp.ToolCallContent[] = []; + await this.sendUpdate({ sessionUpdate: 'tool_call', toolCallId: callId, status: 'in_progress', - title: invocation.getDescription(), - content: [], + title: displayTitle, + content, locations: invocation.toolLocations(), kind: toAcpToolKind(tool.kind), }); @@ -1028,12 +1053,14 @@ export class Session { const toolResult: ToolResult = await invocation.execute(abortSignal); const content = toToolCallContent(toolResult); + const updateContent: acp.ToolCallContent[] = content ? [content] : []; + await this.sendUpdate({ sessionUpdate: 'tool_call_update', toolCallId: callId, status: 'completed', - title: invocation.getDescription(), - content: content ? [content] : [], + title: displayTitle, + content: updateContent, locations: invocation.toolLocations(), kind: toAcpToolKind(tool.kind), }); @@ -1370,7 +1397,7 @@ export class Session { include: pathSpecsToRead, }; - const callId = `${readManyFilesTool.name}-${Date.now()}`; + const callId = GeminiAgent.generateCallId(readManyFilesTool.name); try { const invocation = readManyFilesTool.build(toolArgs); diff --git a/packages/cli/src/commands/extensions/examples/policies/policies/policies.toml b/packages/cli/src/commands/extensions/examples/policies/policies/policies.toml index d89d5e5737..225627c59b 100644 --- a/packages/cli/src/commands/extensions/examples/policies/policies/policies.toml +++ b/packages/cli/src/commands/extensions/examples/policies/policies/policies.toml @@ -16,7 +16,7 @@ toolName = "grep_search" argsPattern = "(\.env|id_rsa|passwd)" decision = "deny" priority = 200 -deny_message = "Access to sensitive credentials or system files is restricted by the policy-example extension." +denyMessage = "Access to sensitive credentials or system files is restricted by the policy-example extension." # Safety Checker: Apply path validation to all write operations. [[safety_checker]] diff --git a/packages/cli/src/commands/mcp/list.ts b/packages/cli/src/commands/mcp/list.ts index a1df1a8027..8154e3b7bf 100644 --- a/packages/cli/src/commands/mcp/list.ts +++ b/packages/cli/src/commands/mcp/list.ts @@ -54,6 +54,7 @@ export async function getMcpServersFromConfig( return; } mcpServers[key] = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...server, extension, }; diff --git a/packages/cli/src/config/config.test.ts b/packages/cli/src/config/config.test.ts index 746fc14475..f312ddde4f 100644 --- a/packages/cli/src/config/config.test.ts +++ b/packages/cli/src/config/config.test.ts @@ -322,6 +322,41 @@ describe('parseArguments', () => { }, ); + describe('isCommand middleware', () => { + it.each([ + { cmd: 'mcp list', expected: true }, + { cmd: 'extensions list', expected: true }, + { cmd: 'extension list', expected: true }, + { cmd: 'skills list', expected: true }, + { cmd: 'skill list', expected: true }, + { cmd: 'hooks migrate', expected: true }, + { cmd: 'hook migrate', expected: true }, + { cmd: 'some query', expected: undefined }, + { cmd: 'hello world', expected: undefined }, + ])( + 'should set isCommand to $expected for "$cmd"', + async ({ cmd, expected }) => { + process.argv = ['node', 'script.js', ...cmd.split(' ')]; + const settings = createTestMergedSettings({ + admin: { + mcp: { enabled: true }, + }, + experimental: { + extensionManagement: true, + }, + skills: { + enabled: true, + }, + hooksConfig: { + enabled: true, + }, + }); + const parsedArgs = await parseArguments(settings); + expect(parsedArgs.isCommand).toBe(expected); + }, + ); + }); + it.each([ { description: 'should allow --prompt without --prompt-interactive', @@ -1716,6 +1751,7 @@ describe('loadCliConfig with admin.mcp.config', () => { const serverA = config.getMcpServers()?.['serverA']; expect(serverA).toEqual({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...localMcpServers['serverA'], type: 'sse', url: 'https://admin-server-a.com/sse', @@ -1766,6 +1802,7 @@ describe('loadCliConfig with admin.mcp.config', () => { }; const localMcpServersWithTools: Record = { serverA: { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...localMcpServers['serverA'], includeTools: ['local_tool'], timeout: 1234, @@ -1808,6 +1845,7 @@ describe('loadCliConfig with admin.mcp.config', () => { }; const localMcpServersWithTools: Record = { serverA: { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...localMcpServers['serverA'], includeTools: ['local_tool'], }, diff --git a/packages/cli/src/config/config.ts b/packages/cli/src/config/config.ts index 227ad4e8ed..fa6d16fc72 100755 --- a/packages/cli/src/config/config.ts +++ b/packages/cli/src/config/config.ts @@ -163,12 +163,104 @@ export async function parseArguments( .usage( 'Usage: gemini [options] [command]\n\nGemini CLI - Defaults to interactive mode. Use -p/--prompt for non-interactive (headless) mode.', ) + .option('isCommand', { + type: 'boolean', + hidden: true, + description: 'Internal flag to indicate if a subcommand is being run', + }) .option('debug', { alias: 'd', type: 'boolean', description: 'Run in debug mode (open debug console with F12)', default: false, }) + .middleware((argv) => { + const commandModules = [ + mcpCommand, + extensionsCommand, + skillsCommand, + hooksCommand, + ]; + + const subcommands = commandModules.flatMap((mod) => { + const names: string[] = []; + + const cmd = mod.command; + if (cmd) { + if (Array.isArray(cmd)) { + for (const c of cmd) { + names.push(String(c).split(' ')[0]); + } + } else { + names.push(String(cmd).split(' ')[0]); + } + } + + const aliases = mod.aliases; + if (aliases) { + if (Array.isArray(aliases)) { + for (const a of aliases) { + names.push(String(a).split(' ')[0]); + } + } else { + names.push(String(aliases).split(' ')[0]); + } + } + + return names; + }); + + const firstArg = argv._[0]; + if (typeof firstArg === 'string' && subcommands.includes(firstArg)) { + argv['isCommand'] = true; + } + }, true) + // Ensure validation flows through .fail() for clean UX + .fail((msg, err) => { + if (err) throw err; + throw new Error(msg); + }) + .check((argv) => { + // The 'query' positional can be a string (for one arg) or string[] (for multiple). + // This guard safely checks if any positional argument was provided. + const queryArg = argv['query']; + const query = + typeof queryArg === 'string' || Array.isArray(queryArg) + ? queryArg + : undefined; + const hasPositionalQuery = Array.isArray(query) + ? query.length > 0 + : !!query; + + if (argv['prompt'] && hasPositionalQuery) { + return 'Cannot use both a positional prompt and the --prompt (-p) flag together'; + } + if (argv['prompt'] && argv['promptInteractive']) { + return 'Cannot use both --prompt (-p) and --prompt-interactive (-i) together'; + } + if (argv['yolo'] && argv['approvalMode']) { + return 'Cannot use both --yolo (-y) and --approval-mode together. Use --approval-mode=yolo instead.'; + } + + const outputFormat = argv['outputFormat']; + if ( + typeof outputFormat === 'string' && + !['text', 'json', 'stream-json'].includes(outputFormat) + ) { + return `Invalid values:\n Argument: output-format, Given: "${outputFormat}", Choices: "text", "json", "stream-json"`; + } + if (argv['worktree'] && !settings.experimental?.worktrees) { + return 'The --worktree flag is only available when experimental.worktrees is enabled in your settings.'; + } + return true; + }); + + yargsInstance.command(mcpCommand); + yargsInstance.command(extensionsCommand); + yargsInstance.command(skillsCommand); + yargsInstance.command(hooksCommand); + + yargsInstance .command('$0 [query..]', 'Launch Gemini CLI', (yargsInstance) => yargsInstance .positional('query', { @@ -352,59 +444,6 @@ export async function parseArguments( description: 'Suppress the security warning when using --raw-output.', }), ) - // Register MCP subcommands - .command(mcpCommand) - // Ensure validation flows through .fail() for clean UX - .fail((msg, err) => { - if (err) throw err; - throw new Error(msg); - }) - .check((argv) => { - // The 'query' positional can be a string (for one arg) or string[] (for multiple). - // This guard safely checks if any positional argument was provided. - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion - const query = argv['query'] as string | string[] | undefined; - const hasPositionalQuery = Array.isArray(query) - ? query.length > 0 - : !!query; - - if (argv['prompt'] && hasPositionalQuery) { - return 'Cannot use both a positional prompt and the --prompt (-p) flag together'; - } - if (argv['prompt'] && argv['promptInteractive']) { - return 'Cannot use both --prompt (-p) and --prompt-interactive (-i) together'; - } - if (argv['yolo'] && argv['approvalMode']) { - return 'Cannot use both --yolo (-y) and --approval-mode together. Use --approval-mode=yolo instead.'; - } - if ( - argv['outputFormat'] && - !['text', 'json', 'stream-json'].includes( - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion - argv['outputFormat'] as string, - ) - ) { - return `Invalid values:\n Argument: output-format, Given: "${argv['outputFormat']}", Choices: "text", "json", "stream-json"`; - } - if (argv['worktree'] && !settings.experimental?.worktrees) { - return 'The --worktree flag is only available when experimental.worktrees is enabled in your settings.'; - } - return true; - }); - - if (settings.experimental?.extensionManagement) { - yargsInstance.command(extensionsCommand); - } - - if (settings.skills?.enabled ?? true) { - yargsInstance.command(skillsCommand); - } - // Register hooks command if hooks are enabled - if (settings.hooksConfig.enabled) { - yargsInstance.command(hooksCommand); - } - - yargsInstance .version(await getVersion()) // This will enable the --version flag based on package.json .alias('v', 'version') .help() diff --git a/packages/cli/src/config/extension-manager.ts b/packages/cli/src/config/extension-manager.ts index 04487bc5f8..65b3539794 100644 --- a/packages/cli/src/config/extension-manager.ts +++ b/packages/cli/src/config/extension-manager.ts @@ -614,7 +614,7 @@ Would you like to attempt to install via "git clone" instead?`, this.loadingPromise = (async () => { try { - if (this.settings.admin.extensions.enabled === false) { + if (this.settings.admin?.extensions?.enabled === false) { this.loadedExtensions = []; return this.loadedExtensions; } @@ -824,11 +824,11 @@ Would you like to attempt to install via "git clone" instead?`, } if (config.mcpServers) { - if (this.settings.admin.mcp.enabled === false) { + if (this.settings.admin?.mcp?.enabled === false) { config.mcpServers = undefined; } else { // Apply admin allowlist if configured - const adminAllowlist = this.settings.admin.mcp.config; + const adminAllowlist = this.settings.admin?.mcp?.config; if (adminAllowlist && Object.keys(adminAllowlist).length > 0) { const result = applyAdminAllowlist( config.mcpServers, @@ -1298,7 +1298,9 @@ export async function inferInstallMetadata( source.startsWith('http://') || source.startsWith('https://') || source.startsWith('git@') || - source.startsWith('sso://') + source.startsWith('sso://') || + source.startsWith('github:') || + source.startsWith('gitlab:') ) { return { source, diff --git a/packages/cli/src/config/mcp/mcpServerEnablement.test.ts b/packages/cli/src/config/mcp/mcpServerEnablement.test.ts index 8b41324790..12b483d59d 100644 --- a/packages/cli/src/config/mcp/mcpServerEnablement.test.ts +++ b/packages/cli/src/config/mcp/mcpServerEnablement.test.ts @@ -13,6 +13,7 @@ vi.mock('@google/gemini-cli-core', async (importOriginal) => { return { ...actual, Storage: { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...actual.Storage, getGlobalGeminiDir: () => '/virtual-home/.gemini', }, diff --git a/packages/cli/src/config/policy-engine.integration.test.ts b/packages/cli/src/config/policy-engine.integration.test.ts index 2e74a28201..3b2a34ca69 100644 --- a/packages/cli/src/config/policy-engine.integration.test.ts +++ b/packages/cli/src/config/policy-engine.integration.test.ts @@ -381,6 +381,7 @@ describe('Policy Engine Integration Tests', () => { // Add a manual rule with annotations to the config config.rules = config.rules || []; config.rules.push({ + toolName: '*', toolAnnotations: { readOnlyHint: true }, decision: PolicyDecision.ALLOW, priority: 10, diff --git a/packages/cli/src/config/settingsSchema.test.ts b/packages/cli/src/config/settingsSchema.test.ts index c358cd65aa..9b643396ae 100644 --- a/packages/cli/src/config/settingsSchema.test.ts +++ b/packages/cli/src/config/settingsSchema.test.ts @@ -400,7 +400,7 @@ describe('SettingsSchema', () => { expect(setting).toBeDefined(); expect(setting.type).toBe('boolean'); expect(setting.category).toBe('Experimental'); - expect(setting.default).toBe(true); + expect(setting.default).toBe(false); expect(setting.requiresRestart).toBe(true); expect(setting.showInDialog).toBe(false); expect(setting.description).toBe('Enable local and remote subagents.'); diff --git a/packages/cli/src/config/settingsSchema.ts b/packages/cli/src/config/settingsSchema.ts index 277dcfdcb9..b886dfccf3 100644 --- a/packages/cli/src/config/settingsSchema.ts +++ b/packages/cli/src/config/settingsSchema.ts @@ -657,6 +657,16 @@ const SETTINGS_SCHEMA = { description: 'Hide the footer from the UI', showInDialog: true, }, + collapseDrawerDuringApproval: { + type: 'boolean', + label: 'Collapse Drawer During Approval', + category: 'UI', + requiresRestart: false, + default: true, + description: + 'Whether to collapse the UI drawer when a tool is awaiting confirmation.', + showInDialog: false, + }, showMemoryUsage: { type: 'boolean', label: 'Show Memory Usage', @@ -1922,7 +1932,7 @@ const SETTINGS_SCHEMA = { label: 'Enable Agents', category: 'Experimental', requiresRestart: true, - default: true, + default: false, description: 'Enable local and remote subagents.', showInDialog: false, }, diff --git a/packages/cli/src/core/initializer.test.ts b/packages/cli/src/core/initializer.test.ts index e4fdb2cba5..9093ad54ee 100644 --- a/packages/cli/src/core/initializer.test.ts +++ b/packages/cli/src/core/initializer.test.ts @@ -105,6 +105,9 @@ describe('initializer', () => { mockSettings, ); + // Wait for the background promise to resolve + await new Promise((resolve) => setTimeout(resolve, 0)); + expect(result).toEqual({ authError: null, accountSuspensionInfo: null, diff --git a/packages/cli/src/core/initializer.ts b/packages/cli/src/core/initializer.ts index f27e9a9511..607129ae3e 100644 --- a/packages/cli/src/core/initializer.ts +++ b/packages/cli/src/core/initializer.ts @@ -13,6 +13,7 @@ import { StartSessionEvent, logCliConfiguration, startupProfiler, + debugLogger, } from '@google/gemini-cli-core'; import { type LoadedSettings } from '../config/settings.js'; import { performInitialAuth } from './auth.js'; @@ -55,9 +56,18 @@ export async function initializeApp( ); if (config.getIdeMode()) { - const ideClient = await IdeClient.getInstance(); - await ideClient.connect(); - logIdeConnection(config, new IdeConnectionEvent(IdeConnectionType.START)); + IdeClient.getInstance() + .then(async (ideClient) => { + await ideClient.connect(); + logIdeConnection( + config, + new IdeConnectionEvent(IdeConnectionType.START), + ); + }) + .catch((e) => { + // We log locally if IDE connection setup fails in the background. + debugLogger.error('Failed to initialize IDE client:', e); + }); } return { diff --git a/packages/cli/src/gemini.test.tsx b/packages/cli/src/gemini.test.tsx index 08c2cbabe8..69ea6db56e 100644 --- a/packages/cli/src/gemini.test.tsx +++ b/packages/cli/src/gemini.test.tsx @@ -126,6 +126,7 @@ vi.mock('@google/gemini-cli-core', async (importOriginal) => { clearInstance: vi.fn(), }, coreEvents: { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...actual.coreEvents, emitFeedback: vi.fn(), emitConsoleLog: vi.fn(), @@ -1508,6 +1509,7 @@ describe('startInteractiveUI', () => { .spyOn(process.stdout, 'write') .mockImplementation(() => true); const mockConfigWithScreenReader = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...mockConfig, getScreenReader: () => screenReader, } as Config; diff --git a/packages/cli/src/gemini.tsx b/packages/cli/src/gemini.tsx index c8cd2b3cd8..5bd9944f63 100644 --- a/packages/cli/src/gemini.tsx +++ b/packages/cli/src/gemini.tsx @@ -213,12 +213,36 @@ export async function main() { loadSettingsHandle?.end(); // If a worktree is requested and enabled, set it up early. + // This must be awaited before any other async tasks that depend on CWD (like loadCliConfig) + // because setupWorktree calls process.chdir(). const requestedWorktree = cliConfig.getRequestedWorktreeName(settings); let worktreeInfo: WorktreeInfo | undefined; if (requestedWorktree !== undefined) { + const worktreeHandle = startupProfiler.start('setup_worktree'); worktreeInfo = await setupWorktree(requestedWorktree || undefined); + worktreeHandle?.end(); } + const cleanupOpsHandle = startupProfiler.start('cleanup_ops'); + Promise.all([ + cleanupCheckpoints(), + cleanupToolOutputFiles(settings.merged), + cleanupBackgroundLogs(), + ]) + .catch((e) => { + debugLogger.error('Early cleanup failed:', e); + }) + .finally(() => { + cleanupOpsHandle?.end(); + }); + + const parseArgsHandle = startupProfiler.start('parse_arguments'); + const argvPromise = parseArguments(settings.merged).finally(() => { + parseArgsHandle?.end(); + }); + + const rawStartupWarningsPromise = getStartupWarnings(); + // Report settings errors once during startup settings.errors.forEach((error) => { coreEvents.emitFeedback('warning', error.message); @@ -232,15 +256,7 @@ export async function main() { ); }); - await Promise.all([ - cleanupCheckpoints(), - cleanupToolOutputFiles(settings.merged), - cleanupBackgroundLogs(), - ]); - - const parseArgsHandle = startupProfiler.start('parse_arguments'); - const argv = await parseArguments(settings.merged); - parseArgsHandle?.end(); + const argv = await argvPromise; if ( (argv.allowedTools && argv.allowedTools.length > 0) || @@ -318,7 +334,7 @@ export async function main() { // the sandbox because the sandbox will interfere with the Oauth2 web // redirect. let initialAuthFailed = false; - if (!settings.merged.security.auth.useExternal) { + if (!settings.merged.security.auth.useExternal && !argv.isCommand) { try { if ( partialConfig.isInteractive() && @@ -370,7 +386,7 @@ export async function main() { await runDeferredCommand(settings.merged); // hop into sandbox if we are outside and sandboxing is enabled - if (!process.env['SANDBOX']) { + if (!process.env['SANDBOX'] && !argv.isCommand) { const memoryArgs = settings.merged.advanced.autoConfigureMemory ? getNodeMemoryArgs(isDebugMode) : []; @@ -467,12 +483,10 @@ export async function main() { await config.getHookSystem()?.fireSessionEndEvent(SessionEndReason.Exit); }); - // Cleanup sessions after config initialization - try { - await cleanupExpiredSessions(config, settings.merged); - } catch (e) { + // Launch cleanup expired sessions as a background task + cleanupExpiredSessions(config, settings.merged).catch((e) => { debugLogger.error('Failed to cleanup expired sessions:', e); - } + }); if (config.getListExtensions()) { debugLogger.log('Installed extensions:'); @@ -524,7 +538,9 @@ export async function main() { }); } + const terminalHandle = startupProfiler.start('setup_terminal'); await setupTerminalAndTheme(config, settings); + terminalHandle?.end(); const initAppHandle = startupProfiler.start('initialize_app'); const initializationResult = await initializeApp(config, settings); @@ -548,7 +564,7 @@ export async function main() { isAlternateBufferEnabled(config), config.getScreenReader(), ); - const rawStartupWarnings = await getStartupWarnings(); + const rawStartupWarnings = await rawStartupWarningsPromise; const startupWarnings: StartupWarning[] = [ ...rawStartupWarnings.map((message) => ({ id: `startup-${createHash('sha256').update(message).digest('hex').substring(0, 16)}`, diff --git a/packages/cli/src/nonInteractiveCli.test.ts b/packages/cli/src/nonInteractiveCli.test.ts index 206d011e63..4e45b0f188 100644 --- a/packages/cli/src/nonInteractiveCli.test.ts +++ b/packages/cli/src/nonInteractiveCli.test.ts @@ -1137,6 +1137,7 @@ describe('runNonInteractive', () => { expect( processStderrSpy.mock.calls.some( + // eslint-disable-next-line no-restricted-syntax (call) => typeof call[0] === 'string' && call[0].includes('Cancelling'), ), ).toBe(true); diff --git a/packages/cli/src/services/BuiltinCommandLoader.test.ts b/packages/cli/src/services/BuiltinCommandLoader.test.ts index b5e7856711..f166c161cd 100644 --- a/packages/cli/src/services/BuiltinCommandLoader.test.ts +++ b/packages/cli/src/services/BuiltinCommandLoader.test.ts @@ -266,6 +266,7 @@ describe('BuiltinCommandLoader', () => { it('should include policies command when message bus integration is enabled', async () => { const mockConfigWithMessageBus = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...mockConfig, getEnableHooks: () => false, getMcpEnabled: () => true, diff --git a/packages/cli/src/services/SlashCommandResolver.test.ts b/packages/cli/src/services/SlashCommandResolver.test.ts index 43d1c310a8..40e3b6f1d5 100644 --- a/packages/cli/src/services/SlashCommandResolver.test.ts +++ b/packages/cli/src/services/SlashCommandResolver.test.ts @@ -43,7 +43,7 @@ describe('SlashCommandResolver', () => { ]); expect(finalCommands.map((c) => c.name)).toContain('deploy'); - expect(finalCommands.map((c) => c.name)).toContain('firebase.deploy'); + expect(finalCommands.map((c) => c.name)).toContain('firebase:deploy'); expect(conflicts).toHaveLength(1); }); @@ -159,7 +159,7 @@ describe('SlashCommandResolver', () => { it('should apply numeric suffixes when renames also conflict', () => { const user1 = createMockCommand('deploy', CommandKind.USER_FILE); - const user2 = createMockCommand('gcp.deploy', CommandKind.USER_FILE); + const user2 = createMockCommand('gcp:deploy', CommandKind.USER_FILE); const extension = { ...createMockCommand('deploy', CommandKind.EXTENSION_FILE), extensionName: 'gcp', @@ -171,7 +171,7 @@ describe('SlashCommandResolver', () => { extension, ]); - expect(finalCommands.find((c) => c.name === 'gcp.deploy1')).toBeDefined(); + expect(finalCommands.find((c) => c.name === 'gcp:deploy1')).toBeDefined(); }); it('should prefix skills with extension name when they conflict with built-in', () => { @@ -185,7 +185,37 @@ describe('SlashCommandResolver', () => { const names = finalCommands.map((c) => c.name); expect(names).toContain('chat'); - expect(names).toContain('google-workspace.chat'); + expect(names).toContain('google-workspace:chat'); + }); + + it('should ALWAYS prefix extension skills even if no conflict exists', () => { + const skill = { + ...createMockCommand('chat', CommandKind.SKILL), + extensionName: 'google-workspace', + }; + + const { finalCommands } = SlashCommandResolver.resolve([skill]); + + const names = finalCommands.map((c) => c.name); + expect(names).toContain('google-workspace:chat'); + expect(names).not.toContain('chat'); + }); + + it('should use numeric suffixes if prefixed skill names collide', () => { + const skill1 = { + ...createMockCommand('chat', CommandKind.SKILL), + extensionName: 'google-workspace', + }; + const skill2 = { + ...createMockCommand('chat', CommandKind.SKILL), + extensionName: 'google-workspace', + }; + + const { finalCommands } = SlashCommandResolver.resolve([skill1, skill2]); + + const names = finalCommands.map((c) => c.name); + expect(names).toContain('google-workspace:chat'); + expect(names).toContain('google-workspace:chat1'); }); it('should NOT prefix skills with "skill" when extension name is missing', () => { diff --git a/packages/cli/src/services/SlashCommandResolver.ts b/packages/cli/src/services/SlashCommandResolver.ts index 4947e6545a..e956d6f566 100644 --- a/packages/cli/src/services/SlashCommandResolver.ts +++ b/packages/cli/src/services/SlashCommandResolver.ts @@ -47,7 +47,17 @@ export class SlashCommandResolver { const originalName = cmd.name; let finalName = originalName; - if (registry.firstEncounters.has(originalName)) { + const shouldAlwaysPrefix = + cmd.kind === CommandKind.SKILL && !!cmd.extensionName; + + if (shouldAlwaysPrefix) { + finalName = this.getRenamedName( + originalName, + this.getPrefix(cmd), + registry.commandMap, + cmd.kind, + ); + } else if (registry.firstEncounters.has(originalName)) { // We've already seen a command with this name, so resolve the conflict. finalName = this.handleConflict(cmd, registry); } else { @@ -93,6 +103,7 @@ export class SlashCommandResolver { incoming.name, this.getPrefix(incoming), registry.commandMap, + incoming.kind, ); this.trackConflict( registry.conflictsMap, @@ -132,6 +143,7 @@ export class SlashCommandResolver { currentOwner.name, this.getPrefix(currentOwner), registry.commandMap, + currentOwner.kind, ); // Update the registry: remove the old name and add the owner under the new name. @@ -156,8 +168,12 @@ export class SlashCommandResolver { name: string, prefix: string | undefined, commandMap: Map, + kind?: CommandKind, ): string { - const base = prefix ? `${prefix}.${name}` : name; + const isExtensionPrefix = + kind === CommandKind.SKILL || kind === CommandKind.EXTENSION_FILE; + const separator = isExtensionPrefix ? ':' : '.'; + const base = prefix ? `${prefix}${separator}${name}` : name; let renamedName = base; let suffix = 1; diff --git a/packages/cli/src/test-utils/AppRig.tsx b/packages/cli/src/test-utils/AppRig.tsx index 5ead5d615a..548372a139 100644 --- a/packages/cli/src/test-utils/AppRig.tsx +++ b/packages/cli/src/test-utils/AppRig.tsx @@ -11,7 +11,11 @@ import os from 'node:os'; import path from 'node:path'; import fs from 'node:fs'; import { AppContainer } from '../ui/AppContainer.js'; -import { renderWithProviders, type RenderInstance } from './render.js'; +import { + renderWithProviders, + type RenderInstance, + persistentStateMock, +} from './render.js'; import { makeFakeConfig, type Config, @@ -162,7 +166,7 @@ export class AppRig { private sessionId: string; private pendingConfirmations = new Map(); - private breakpointTools = new Set(); + private breakpointTools = new Set(); private lastAwaitedConfirmation: PendingConfirmation | undefined; /** @@ -177,9 +181,24 @@ export class AppRig { ); this.sessionId = `test-session-${uniqueId}`; activeRigs.set(this.sessionId, this); + + // Pre-create the persistent state file to bypass the terminal setup prompt + const geminiDir = path.join(this.testDir, '.gemini'); + if (!fs.existsSync(geminiDir)) { + fs.mkdirSync(geminiDir, { recursive: true }); + } + fs.writeFileSync( + path.join(geminiDir, 'state.json'), + JSON.stringify({ terminalSetupPromptShown: true }), + ); } async initialize() { + persistentStateMock.setData({ + terminalSetupPromptShown: true, + tipsShown: 10, + }); + this.setupEnvironment(); resetSettingsCacheForTesting(); this.settings = this.createRigSettings(); @@ -226,6 +245,8 @@ export class AppRig { private setupEnvironment() { // Stub environment variables to avoid interference from developer's machine vi.stubEnv('GEMINI_CLI_HOME', this.testDir); + vi.stubEnv('TERM_PROGRAM', 'other'); + vi.stubEnv('VSCODE_GIT_IPC_HANDLE', ''); if (this.options.fakeResponsesPath) { vi.stubEnv('GEMINI_API_KEY', 'test-api-key'); MockShellExecutionService.setPassthrough(false); @@ -291,7 +312,6 @@ export class AppRig { const newContentGeneratorConfig = { authType: authMethod, - proxy: gcConfig.getProxy(), apiKey: process.env['GEMINI_API_KEY'] || 'test-api-key', }; @@ -426,11 +446,7 @@ export class AppRig { MockShellExecutionService.setMockCommands(commands); } - setToolPolicy( - toolName: string | undefined, - decision: PolicyDecision, - priority = 10, - ) { + setToolPolicy(toolName: string, decision: PolicyDecision, priority = 10) { if (!this.config) throw new Error('AppRig not initialized'); this.config.getPolicyEngine().addRule({ toolName, @@ -440,27 +456,20 @@ export class AppRig { }); } - setBreakpoint(toolName: string | string[] | undefined) { + setBreakpoint(toolName: string | string[]) { if (Array.isArray(toolName)) { for (const name of toolName) { this.setBreakpoint(name); } } else { - // Use undefined toolName to create a global rule if '*' is provided - const actualToolName = toolName === '*' ? undefined : toolName; - this.setToolPolicy(actualToolName, PolicyDecision.ASK_USER, 100); + this.setToolPolicy(toolName, PolicyDecision.ASK_USER, 100); this.breakpointTools.add(toolName); } } - removeToolPolicy(toolName?: string, source = 'AppRig Override') { + removeToolPolicy(toolName: string, source = 'AppRig Override') { if (!this.config) throw new Error('AppRig not initialized'); - // Map '*' back to undefined for policy removal - const actualToolName = toolName === '*' ? undefined : toolName; - this.config - .getPolicyEngine() - - .removeRulesForTool(actualToolName as string, source); + this.config.getPolicyEngine().removeRulesForTool(toolName, source); this.breakpointTools.delete(toolName); } diff --git a/packages/cli/src/test-utils/customMatchers.ts b/packages/cli/src/test-utils/customMatchers.ts index ae9b44ee44..d34576cf3f 100644 --- a/packages/cli/src/test-utils/customMatchers.ts +++ b/packages/cli/src/test-utils/customMatchers.ts @@ -79,7 +79,7 @@ export async function toMatchSvgSnapshot( } function toHaveOnlyValidCharacters(this: Assertion, buffer: TextBuffer) { - // eslint-disable-next-line @typescript-eslint/no-explicit-any, @typescript-eslint/no-unsafe-type-assertion, @typescript-eslint/no-unsafe-assignment + // eslint-disable-next-line @typescript-eslint/no-explicit-any const { isNot } = this as any; let pass = true; const invalidLines: Array<{ line: number; content: string }> = []; @@ -108,7 +108,6 @@ function toHaveOnlyValidCharacters(this: Assertion, buffer: TextBuffer) { }; } -// eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion expect.extend({ toHaveOnlyValidCharacters, toMatchSvgSnapshot, diff --git a/packages/cli/src/test-utils/mockCommandContext.ts b/packages/cli/src/test-utils/mockCommandContext.ts index 15e6422e1a..6eda7f3109 100644 --- a/packages/cli/src/test-utils/mockCommandContext.ts +++ b/packages/cli/src/test-utils/mockCommandContext.ts @@ -37,14 +37,12 @@ export const createMockCommandContext = ( }, services: { agentContext: null, - settings: { merged: defaultMergedSettings, setValue: vi.fn(), forScope: vi.fn().mockReturnValue({ settings: {} }), } as unknown as LoadedSettings, git: undefined as GitService | undefined, - logger: { log: vi.fn(), logMessage: vi.fn(), @@ -53,7 +51,6 @@ export const createMockCommandContext = ( // eslint-disable-next-line @typescript-eslint/no-explicit-any } as any, // Cast because Logger is a class. }, - ui: { addItem: vi.fn(), clear: vi.fn(), @@ -72,7 +69,6 @@ export const createMockCommandContext = ( } as any, session: { sessionShellAllowlist: new Set(), - stats: { sessionStartTime: new Date(), lastPromptTokenCount: 0, @@ -98,7 +94,6 @@ export const createMockCommandContext = ( for (const key in source) { if (Object.prototype.hasOwnProperty.call(source, key)) { const sourceValue = source[key]; - const targetValue = output[key]; if ( @@ -109,7 +104,6 @@ export const createMockCommandContext = ( output[key] = merge(targetValue, sourceValue); } else { // If not, we do a direct assignment. This preserves Date objects and others. - output[key] = sourceValue; } } diff --git a/packages/cli/src/test-utils/render.tsx b/packages/cli/src/test-utils/render.tsx index a655088e79..9dd0f96758 100644 --- a/packages/cli/src/test-utils/render.tsx +++ b/packages/cli/src/test-utils/render.tsx @@ -665,7 +665,7 @@ export const renderWithProviders = async ( ); } - const mainAreaWidth = terminalWidth; + const mainAreaWidth = providedUiState?.mainAreaWidth ?? terminalWidth; const finalUiState = { ...baseState, @@ -778,7 +778,6 @@ export async function renderHook( generateSvg: () => string; }> { const result = { current: undefined as unknown as Result }; - let currentProps = options?.initialProps as Props; function TestComponent({ diff --git a/packages/cli/src/test-utils/settings.ts b/packages/cli/src/test-utils/settings.ts index ab2420849d..20d0613f83 100644 --- a/packages/cli/src/test-utils/settings.ts +++ b/packages/cli/src/test-utils/settings.ts @@ -46,7 +46,6 @@ export const createMockSettings = ( workspace, isTrusted, errors, - merged: mergedOverride, ...settingsOverrides } = overrides; @@ -61,7 +60,6 @@ export const createMockSettings = ( settings: settingsOverrides, originalSettings: settingsOverrides, }, - (workspace as any) || { path: '', settings: {}, originalSettings: {} }, isTrusted ?? true, errors || [], diff --git a/packages/cli/src/ui/AppContainer.test.tsx b/packages/cli/src/ui/AppContainer.test.tsx index 313573a573..3324505778 100644 --- a/packages/cli/src/ui/AppContainer.test.tsx +++ b/packages/cli/src/ui/AppContainer.test.tsx @@ -489,8 +489,8 @@ describe('AppContainer State Management', () => { // Mock LoadedSettings mockSettings = createMockSettings({ hideBanner: false, - hideFooter: false, hideTips: false, + hideFooter: false, showMemoryUsage: false, theme: 'default', ui: { @@ -911,8 +911,8 @@ describe('AppContainer State Management', () => { it('handles settings with all display options disabled', async () => { const settingsAllHidden = createMockSettings({ hideBanner: true, - hideFooter: true, hideTips: true, + hideFooter: true, showMemoryUsage: false, }); @@ -2157,13 +2157,8 @@ describe('AppContainer State Management', () => { expect(mockHandleSlashCommand).not.toHaveBeenCalled(); pressKey('\x04'); // Ctrl+D - // Now count is 2, it should quit. - expect(mockHandleSlashCommand).toHaveBeenCalledWith( - '/quit', - undefined, - undefined, - false, - ); + // It should still not quit because buffer is non-empty. + expect(mockHandleSlashCommand).not.toHaveBeenCalled(); unmount(); }); diff --git a/packages/cli/src/ui/AppContainer.tsx b/packages/cli/src/ui/AppContainer.tsx index 9d05f54347..326d02b250 100644 --- a/packages/cli/src/ui/AppContainer.tsx +++ b/packages/cli/src/ui/AppContainer.tsx @@ -30,8 +30,6 @@ import { import { ConfigContext } from './contexts/ConfigContext.js'; import { type HistoryItem, - type HistoryItemWithoutId, - type HistoryItemToolGroup, AuthState, type ConfirmationRequest, type PermissionConfirmationRequest, @@ -81,7 +79,6 @@ import { type AgentsDiscoveredPayload, ChangeAuthRequestedError, ProjectIdRequiredError, - CoreToolCallStatus, buildUserSteeringHintPrompt, logBillingEvent, ApiKeyUpdatedEvent, @@ -170,29 +167,11 @@ import { useIsHelpDismissKey } from './utils/shortcutsHelp.js'; import { useSuspend } from './hooks/useSuspend.js'; import { useRunEventNotifications } from './hooks/useRunEventNotifications.js'; import { isNotificationsEnabled } from '../utils/terminalNotifications.js'; - -function isToolExecuting(pendingHistoryItems: HistoryItemWithoutId[]) { - return pendingHistoryItems.some((item) => { - if (item && item.type === 'tool_group') { - return item.tools.some( - (tool) => CoreToolCallStatus.Executing === tool.status, - ); - } - return false; - }); -} - -function isToolAwaitingConfirmation( - pendingHistoryItems: HistoryItemWithoutId[], -) { - return pendingHistoryItems - .filter((item): item is HistoryItemToolGroup => item.type === 'tool_group') - .some((item) => - item.tools.some( - (tool) => CoreToolCallStatus.AwaitingApproval === tool.status, - ), - ); -} +import { + isToolExecuting, + isToolAwaitingConfirmation, + getAllToolCalls, +} from './utils/historyUtils.js'; interface AppContainerProps { config: Config; @@ -1151,6 +1130,16 @@ Logging in with Google... Restarting Gemini CLI to continue. consumePendingHints, ); + const pendingHistoryItems = useMemo( + () => [...pendingSlashCommandHistoryItems, ...pendingGeminiHistoryItems], + [pendingSlashCommandHistoryItems, pendingGeminiHistoryItems], + ); + + const hasPendingToolConfirmation = useMemo( + () => isToolAwaitingConfirmation(pendingHistoryItems), + [pendingHistoryItems], + ); + toggleBackgroundShellRef.current = toggleBackgroundShell; isBackgroundShellVisibleRef.current = isBackgroundShellVisible; backgroundShellsRef.current = backgroundShells; @@ -1222,10 +1211,6 @@ Logging in with Google... Restarting Gemini CLI to continue. cancelHandlerRef.current = useCallback( (shouldRestorePrompt: boolean = true) => { - const pendingHistoryItems = [ - ...pendingSlashCommandHistoryItems, - ...pendingGeminiHistoryItems, - ]; if (isToolAwaitingConfirmation(pendingHistoryItems)) { return; // Don't clear - user may be composing a follow-up message } @@ -1259,8 +1244,7 @@ Logging in with Google... Restarting Gemini CLI to continue. inputHistory, getQueuedMessagesText, clearQueue, - pendingSlashCommandHistoryItems, - pendingGeminiHistoryItems, + pendingHistoryItems, ], ); @@ -1296,10 +1280,7 @@ Logging in with Google... Restarting Gemini CLI to continue. const isIdle = streamingState === StreamingState.Idle; const isAgentRunning = streamingState === StreamingState.Responding || - isToolExecuting([ - ...pendingSlashCommandHistoryItems, - ...pendingGeminiHistoryItems, - ]); + isToolExecuting(pendingHistoryItems); if (isSlash && isAgentRunning) { const { commandToExecute } = parseSlashCommand( @@ -1361,8 +1342,7 @@ Logging in with Google... Restarting Gemini CLI to continue. isMcpReady, streamingState, messageQueue.length, - pendingSlashCommandHistoryItems, - pendingGeminiHistoryItems, + pendingHistoryItems, config, constrainHeight, setConstrainHeight, @@ -1406,7 +1386,8 @@ Logging in with Google... Restarting Gemini CLI to continue. !isResuming && !!slashCommands && (streamingState === StreamingState.Idle || - streamingState === StreamingState.Responding) && + streamingState === StreamingState.Responding || + streamingState === StreamingState.WaitingForConfirmation) && !proQuotaRequest; const [controlsHeight, setControlsHeight] = useState(0); @@ -1419,7 +1400,7 @@ Logging in with Google... Restarting Gemini CLI to continue. setControlsHeight(roundedHeight); } } - }, [buffer, terminalWidth, terminalHeight, controlsHeight]); + }, [buffer, terminalWidth, terminalHeight, controlsHeight, isInputActive]); // Compute available terminal height based on controls measurement const availableTerminalHeight = Math.max( @@ -1673,17 +1654,13 @@ Logging in with Google... Restarting Gemini CLI to continue. [handleSlashCommand, settings], ); - const { elapsedTime, currentLoadingPhrase } = useLoadingIndicator({ - streamingState, - shouldShowFocusHint, - retryStatus, - loadingPhrasesMode: settings.merged.ui.loadingPhrases, - customWittyPhrases: settings.merged.ui.customWittyPhrases, - errorVerbosity: settings.merged.ui.errorVerbosity, - }); - const handleGlobalKeypress = useCallback( (key: Key): boolean => { + // Debug log keystrokes if enabled + if (settings.merged.general.debugKeystrokeLogging) { + debugLogger.log('[DEBUG] Keystroke:', JSON.stringify(key)); + } + if (shortcutsHelpVisible && isHelpDismissKey(key)) { setShortcutsHelpVisible(false); } @@ -1702,6 +1679,10 @@ Logging in with Google... Restarting Gemini CLI to continue. handleCtrlCPress(); return true; } else if (keyMatchers[Command.EXIT](key)) { + // If the input field is non-empty, do not exit. + if (bufferRef.current.text.length > 0) { + return false; + } handleCtrlDPress(); return true; } else if (keyMatchers[Command.SUSPEND_APP](key)) { @@ -1862,6 +1843,7 @@ Logging in with Google... Restarting Gemini CLI to continue. activePtyId, handleSuspend, embeddedShellFocused, + settings.merged.general.debugKeystrokeLogging, refreshStatic, setCopyModeEnabled, tabFocusTimeoutRef, @@ -2022,16 +2004,6 @@ Logging in with Google... Restarting Gemini CLI to continue. authState === AuthState.AwaitingApiKeyInput || !!newAgents; - const pendingHistoryItems = useMemo( - () => [...pendingSlashCommandHistoryItems, ...pendingGeminiHistoryItems], - [pendingSlashCommandHistoryItems, pendingGeminiHistoryItems], - ); - - const hasPendingToolConfirmation = useMemo( - () => isToolAwaitingConfirmation(pendingHistoryItems), - [pendingHistoryItems], - ); - const hasConfirmUpdateExtensionRequests = confirmUpdateExtensionRequests.length > 0; const hasLoopDetectionConfirmationRequest = @@ -2049,6 +2021,48 @@ Logging in with Google... Restarting Gemini CLI to continue. !!emptyWalletRequest || !!customDialog; + const loadingPhrases = settings.merged.ui.loadingPhrases; + const showStatusTips = loadingPhrases === 'tips' || loadingPhrases === 'all'; + const showStatusWit = loadingPhrases === 'witty' || loadingPhrases === 'all'; + + const showLoadingIndicator = + (!embeddedShellFocused || isBackgroundShellVisible) && + streamingState === StreamingState.Responding && + !hasPendingActionRequired; + + let estimatedStatusLength = 0; + if (activeHooks.length > 0 && settings.merged.hooksConfig.notifications) { + const hookLabel = + activeHooks.length > 1 ? 'Executing Hooks' : 'Executing Hook'; + const hookNames = activeHooks + .map( + (h) => + h.name + + (h.index && h.total && h.total > 1 ? ` (${h.index}/${h.total})` : ''), + ) + .join(', '); + estimatedStatusLength = hookLabel.length + hookNames.length + 10; + } else if (showLoadingIndicator) { + const thoughtText = thought?.subject || 'Waiting for model...'; + estimatedStatusLength = thoughtText.length + 25; + } else if (hasPendingActionRequired) { + estimatedStatusLength = 35; + } + + const maxLength = terminalWidth - estimatedStatusLength - 5; + + const { elapsedTime, currentLoadingPhrase, currentTip, currentWittyPhrase } = + useLoadingIndicator({ + streamingState, + shouldShowFocusHint, + retryStatus, + showTips: showStatusTips, + showWit: showStatusWit, + customWittyPhrases: settings.merged.ui.customWittyPhrases, + errorVerbosity: settings.merged.ui.errorVerbosity, + maxLength, + }); + const allowPlanMode = config.isPlanEnabled() && streamingState === StreamingState.Idle && @@ -2121,12 +2135,7 @@ Logging in with Google... Restarting Gemini CLI to continue. ]); const allToolCalls = useMemo( - () => - pendingHistoryItems - .filter( - (item): item is HistoryItemToolGroup => item.type === 'tool_group', - ) - .flatMap((item) => item.tools), + () => getAllToolCalls(pendingHistoryItems), [pendingHistoryItems], ); @@ -2234,6 +2243,8 @@ Logging in with Google... Restarting Gemini CLI to continue. isFocused, elapsedTime, currentLoadingPhrase, + currentTip, + currentWittyPhrase, historyRemountKey, activeHooks, messageQueue, @@ -2291,11 +2302,7 @@ Logging in with Google... Restarting Gemini CLI to continue. newAgents, showIsExpandableHint, hintMode: - config.isModelSteeringEnabled() && - isToolExecuting([ - ...pendingSlashCommandHistoryItems, - ...pendingGeminiHistoryItems, - ]), + config.isModelSteeringEnabled() && isToolExecuting(pendingHistoryItems), hintBuffer: '', }), [ @@ -2361,6 +2368,8 @@ Logging in with Google... Restarting Gemini CLI to continue. isFocused, elapsedTime, currentLoadingPhrase, + currentTip, + currentWittyPhrase, historyRemountKey, activeHooks, messageQueue, diff --git a/packages/cli/src/ui/IdeIntegrationNudge.test.tsx b/packages/cli/src/ui/IdeIntegrationNudge.test.tsx index eb3e6a3e4c..d05a17dad8 100644 --- a/packages/cli/src/ui/IdeIntegrationNudge.test.tsx +++ b/packages/cli/src/ui/IdeIntegrationNudge.test.tsx @@ -42,6 +42,7 @@ describe('IdeIntegrationNudge', () => { beforeEach(() => { vi.mocked(debugLogger.warn).mockImplementation((...args) => { if ( + // eslint-disable-next-line no-restricted-syntax typeof args[0] === 'string' && /was not wrapped in act/.test(args[0]) ) { diff --git a/packages/cli/src/ui/ToolConfirmationFullFrame.test.tsx b/packages/cli/src/ui/ToolConfirmationFullFrame.test.tsx new file mode 100644 index 0000000000..c8456fb237 --- /dev/null +++ b/packages/cli/src/ui/ToolConfirmationFullFrame.test.tsx @@ -0,0 +1,179 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ +import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'; +import { cleanup, renderWithProviders } from '../test-utils/render.js'; +import { createMockSettings } from '../test-utils/settings.js'; +import { App } from './App.js'; +import { + CoreToolCallStatus, + ApprovalMode, + makeFakeConfig, +} from '@google/gemini-cli-core'; +import { type UIState } from './contexts/UIStateContext.js'; +import type { SerializableConfirmationDetails } from '@google/gemini-cli-core'; +import { act } from 'react'; +import { StreamingState } from './types.js'; + +vi.mock('ink', async (importOriginal) => { + const original = await importOriginal(); + return { + ...original, + useIsScreenReaderEnabled: vi.fn(() => false), + }; +}); + +vi.mock('./components/GeminiSpinner.js', () => ({ + GeminiSpinner: () => null, +})); + +vi.mock('./components/CliSpinner.js', () => ({ + CliSpinner: () => null, +})); + +// Mock hooks to align with codebase style, even if App uses UIState directly +vi.mock('./hooks/useGeminiStream.js'); +vi.mock('./hooks/useHistoryManager.js'); +vi.mock('./hooks/useQuotaAndFallback.js'); +vi.mock('./hooks/useThemeCommand.js'); +vi.mock('./auth/useAuth.js'); +vi.mock('./hooks/useEditorSettings.js'); +vi.mock('./hooks/useSettingsCommand.js'); +vi.mock('./hooks/useModelCommand.js'); +vi.mock('./hooks/slashCommandProcessor.js'); +vi.mock('./hooks/useConsoleMessages.js'); +vi.mock('./hooks/useTerminalSize.js', () => ({ + useTerminalSize: vi.fn(() => ({ columns: 100, rows: 30 })), +})); + +describe('Full Terminal Tool Confirmation Snapshot', () => { + beforeEach(() => { + vi.clearAllMocks(); + }); + + afterEach(() => { + cleanup(); + vi.restoreAllMocks(); + }); + + it('renders tool confirmation box in the frame of the entire terminal', async () => { + // Generate a large diff to warrant truncation + let largeDiff = + '--- a/packages/cli/src/ui/components/InputPrompt.tsx\n+++ b/packages/cli/src/ui/components/InputPrompt.tsx\n@@ -1,100 +1,105 @@\n'; + for (let i = 1; i <= 60; i++) { + largeDiff += ` const line${i} = true;\n`; + } + largeDiff += '- return kittyProtocolSupporte...;\n'; + largeDiff += '+ return kittyProtocolSupporte...;\n'; + largeDiff += ' buffer: TextBuffer;\n'; + largeDiff += ' onSubmit: (value: string) => void;'; + + const confirmationDetails: SerializableConfirmationDetails = { + type: 'edit', + title: 'Edit packages/.../InputPrompt.tsx', + fileName: 'InputPrompt.tsx', + filePath: 'packages/.../InputPrompt.tsx', + fileDiff: largeDiff, + originalContent: 'old', + newContent: 'new', + isModifying: false, + }; + + const toolCalls = [ + { + callId: 'call-1-modify-selected', + name: 'Edit', + description: + 'packages/.../InputPrompt.tsx: return kittyProtocolSupporte... => return kittyProtocolSupporte...', + status: CoreToolCallStatus.AwaitingApproval, + resultDisplay: '', + confirmationDetails, + }, + ]; + + const mockUIState = { + history: [ + { + id: 1, + type: 'user', + text: 'Can you edit InputPrompt.tsx for me?', + }, + ], + mainAreaWidth: 99, + availableTerminalHeight: 36, + streamingState: StreamingState.WaitingForConfirmation, + constrainHeight: true, + isConfigInitialized: true, + cleanUiDetailsVisible: true, + quota: { + userTier: 'PRO', + stats: { + limits: {}, + usage: {}, + }, + proQuotaRequest: null, + validationRequest: null, + }, + pendingHistoryItems: [ + { + id: 2, + type: 'tool_group', + tools: toolCalls, + }, + ], + showApprovalModeIndicator: ApprovalMode.DEFAULT, + sessionStats: { + lastPromptTokenCount: 175400, + contextPercentage: 3, + }, + buffer: { text: '' }, + messageQueue: [], + activeHooks: [], + contextFileNames: [], + rootUiRef: { current: null }, + } as unknown as UIState; + + const mockConfig = makeFakeConfig(); + mockConfig.getUseAlternateBuffer = () => true; + mockConfig.isTrustedFolder = () => true; + mockConfig.getDisableAlwaysAllow = () => false; + mockConfig.getIdeMode = () => false; + mockConfig.getTargetDir = () => '/directory'; + + const { waitUntilReady, lastFrame, generateSvg, unmount } = + await renderWithProviders(, { + uiState: mockUIState, + config: mockConfig, + settings: createMockSettings({ + merged: { + ui: { + useAlternateBuffer: true, + theme: 'default', + showUserIdentity: false, + showShortcutsHint: false, + footer: { + hideContextPercentage: false, + hideTokens: false, + hideModel: false, + }, + }, + security: { + enablePermanentToolApproval: true, + }, + }, + }), + }); + + await waitUntilReady(); + + // Give it a moment to render + await act(async () => { + await new Promise((resolve) => setTimeout(resolve, 500)); + }); + + await expect({ lastFrame, generateSvg }).toMatchSvgSnapshot(); + unmount(); + }); +}); diff --git a/packages/cli/src/ui/__snapshots__/App.test.tsx.snap b/packages/cli/src/ui/__snapshots__/App.test.tsx.snap index 9e1d66df01..1d1ebbb3d1 100644 --- a/packages/cli/src/ui/__snapshots__/App.test.tsx.snap +++ b/packages/cli/src/ui/__snapshots__/App.test.tsx.snap @@ -2,10 +2,13 @@ exports[`App > Snapshots > renders default layout correctly 1`] = ` " - ▝▜▄ Gemini CLI v1.2.3 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v1.2.3 + Tips for getting started: @@ -33,8 +36,6 @@ Tips for getting started: - - @@ -47,10 +48,13 @@ exports[`App > Snapshots > renders screen reader layout correctly 1`] = ` "Notifications Footer - ▝▜▄ Gemini CLI v1.2.3 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v1.2.3 + Tips for getting started: @@ -64,11 +68,12 @@ Composer exports[`App > Snapshots > renders with dialogs visible 1`] = ` " - ▝▜▄ Gemini CLI v1.2.3 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + Gemini CLI v1.2.3 @@ -107,10 +112,13 @@ DialogManager exports[`App > should render ToolConfirmationQueue along with Composer when tool is confirming and experiment is on 1`] = ` " - ▝▜▄ Gemini CLI v1.2.3 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v1.2.3 + Tips for getting started: @@ -141,8 +149,6 @@ HistoryItemDisplay - - Notifications Composer " diff --git a/packages/cli/src/ui/__snapshots__/ToolConfirmationFullFrame-Full-Terminal-Tool-Confirmation-Snapshot-renders-tool-confirmation-box-in-the-frame-of-the-entire-terminal.snap.svg b/packages/cli/src/ui/__snapshots__/ToolConfirmationFullFrame-Full-Terminal-Tool-Confirmation-Snapshot-renders-tool-confirmation-box-in-the-frame-of-the-entire-terminal.snap.svg new file mode 100644 index 0000000000..be799c5d80 --- /dev/null +++ b/packages/cli/src/ui/__snapshots__/ToolConfirmationFullFrame-Full-Terminal-Tool-Confirmation-Snapshot-renders-tool-confirmation-box-in-the-frame-of-the-entire-terminal.snap.svg @@ -0,0 +1,271 @@ + + + + + 3. Ask coding questions, edit code or run commands + 4. Be specific for the best results + + ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ + + + > + + Can you edit InputPrompt.tsx for me? + + + ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ + ╭─────────────────────────────────────────────────────────────────────────────────────────────────╮ + + Action Required + + + + + ? + Edit + packages/.../InputPrompt.tsx: return kittyProtocolSupporte... => return kittyProto + + + + + + ... first 44 lines hidden (Ctrl+O to show) ... + + + + 45 + const + line45 + = + true + ; + + + + 46 + const + line46 + = + true + ; + + + + 47 + const + line47 + = + true + ; + + + + 48 + const + line48 + = + true + ; + + + + 49 + const + line49 + = + true + ; + + + + 50 + const + line50 + = + true + ; + + + + 51 + const + line51 + = + true + ; + + + + 52 + const + line52 + = + true + ; + + + + 53 + const + line53 + = + true + ; + + + + 54 + const + line54 + = + true + ; + + + + 55 + const + line55 + = + true + ; + + + + 56 + const + line56 + = + true + ; + + + + 57 + const + line57 + = + true + ; + + + + 58 + const + line58 + = + true + ; + + + + 59 + const + line59 + = + true + ; + + + + 60 + const + line60 + = + true + ; + + + + + 61 + + + - + + + + return + + kittyProtocolSupporte...; + + + + + 61 + + + + + + + + return + + kittyProtocolSupporte...; + + + + 62 + buffer: TextBuffer; + + + + 63 + onSubmit + : ( + value + : + string + ) => + void + ; + + + + Apply this change? + + + + + + + + + + + 1. + + + Allow once + + + + + 2. + Allow for this session + + + + 3. + Allow for this file in all future sessions + + + + 4. + Modify with external editor + + + + 5. + No, suggest changes (esc) + + + + + + ╰─────────────────────────────────────────────────────────────────────────────────────────────────╯ + + + \ No newline at end of file diff --git a/packages/cli/src/ui/__snapshots__/ToolConfirmationFullFrame.test.tsx.snap b/packages/cli/src/ui/__snapshots__/ToolConfirmationFullFrame.test.tsx.snap new file mode 100644 index 0000000000..202f814c05 --- /dev/null +++ b/packages/cli/src/ui/__snapshots__/ToolConfirmationFullFrame.test.tsx.snap @@ -0,0 +1,45 @@ +// Vitest Snapshot v1, https://vitest.dev/guide/snapshot.html + +exports[`Full Terminal Tool Confirmation Snapshot > renders tool confirmation box in the frame of the entire terminal 1`] = ` +"3. Ask coding questions, edit code or run commands +4. Be specific for the best results +▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ + > Can you edit InputPrompt.tsx for me? +▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ +╭─────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ Action Required │ +│ │ +│ ? Edit packages/.../InputPrompt.tsx: return kittyProtocolSupporte... => return kittyProto… │ +│ │ +│ ... first 44 lines hidden (Ctrl+O to show) ... │█ +│ 45 const line45 = true; │█ +│ 46 const line46 = true; │█ +│ 47 const line47 = true; │█ +│ 48 const line48 = true; │█ +│ 49 const line49 = true; │█ +│ 50 const line50 = true; │█ +│ 51 const line51 = true; │█ +│ 52 const line52 = true; │█ +│ 53 const line53 = true; │█ +│ 54 const line54 = true; │█ +│ 55 const line55 = true; │█ +│ 56 const line56 = true; │█ +│ 57 const line57 = true; │█ +│ 58 const line58 = true; │█ +│ 59 const line59 = true; │█ +│ 60 const line60 = true; │█ +│ 61 - return kittyProtocolSupporte...; │█ +│ 61 + return kittyProtocolSupporte...; │█ +│ 62 buffer: TextBuffer; │█ +│ 63 onSubmit: (value: string) => void; │█ +│ Apply this change? │█ +│ │█ +│ ● 1. Allow once │█ +│ 2. Allow for this session │█ +│ 3. Allow for this file in all future sessions │█ +│ 4. Modify with external editor │█ +│ 5. No, suggest changes (esc) │█ +│ │█ +╰─────────────────────────────────────────────────────────────────────────────────────────────────╯█ +" +`; diff --git a/packages/cli/src/ui/auth/AuthDialog.test.tsx b/packages/cli/src/ui/auth/AuthDialog.test.tsx index 4837a71490..69593df076 100644 --- a/packages/cli/src/ui/auth/AuthDialog.test.tsx +++ b/packages/cli/src/ui/auth/AuthDialog.test.tsx @@ -254,7 +254,7 @@ describe('AuthDialog', () => { unmount(); }); - it('skips API key dialog on initial setup if env var is present', async () => { + it('always shows API key dialog even when env var is present', async () => { mockedValidateAuthMethod.mockReturnValue(null); vi.stubEnv('GEMINI_API_KEY', 'test-key-from-env'); // props.settings.merged.security.auth.selectedType is undefined here, simulating initial setup @@ -265,12 +265,12 @@ describe('AuthDialog', () => { await handleAuthSelect(AuthType.USE_GEMINI); expect(props.setAuthState).toHaveBeenCalledWith( - AuthState.Unauthenticated, + AuthState.AwaitingApiKeyInput, ); unmount(); }); - it('skips API key dialog if env var is present but empty', async () => { + it('always shows API key dialog even when env var is empty string', async () => { mockedValidateAuthMethod.mockReturnValue(null); vi.stubEnv('GEMINI_API_KEY', ''); // Empty string // props.settings.merged.security.auth.selectedType is undefined here @@ -281,7 +281,7 @@ describe('AuthDialog', () => { await handleAuthSelect(AuthType.USE_GEMINI); expect(props.setAuthState).toHaveBeenCalledWith( - AuthState.Unauthenticated, + AuthState.AwaitingApiKeyInput, ); unmount(); }); @@ -302,10 +302,10 @@ describe('AuthDialog', () => { unmount(); }); - it('skips API key dialog on re-auth if env var is present (cannot edit)', async () => { + it('always shows API key dialog on re-auth even if env var is present', async () => { mockedValidateAuthMethod.mockReturnValue(null); vi.stubEnv('GEMINI_API_KEY', 'test-key-from-env'); - // Simulate that the user has already authenticated once + // Simulate switching from a different auth method (e.g., Google Login → API key) props.settings.merged.security.auth.selectedType = AuthType.LOGIN_WITH_GOOGLE; @@ -315,7 +315,7 @@ describe('AuthDialog', () => { await handleAuthSelect(AuthType.USE_GEMINI); expect(props.setAuthState).toHaveBeenCalledWith( - AuthState.Unauthenticated, + AuthState.AwaitingApiKeyInput, ); unmount(); }); diff --git a/packages/cli/src/ui/auth/AuthDialog.tsx b/packages/cli/src/ui/auth/AuthDialog.tsx index c823f606c6..e73d380bf3 100644 --- a/packages/cli/src/ui/auth/AuthDialog.tsx +++ b/packages/cli/src/ui/auth/AuthDialog.tsx @@ -137,13 +137,11 @@ export function AuthDialog({ } if (authType === AuthType.USE_GEMINI) { - if (process.env['GEMINI_API_KEY'] !== undefined) { - setAuthState(AuthState.Unauthenticated); - return; - } else { - setAuthState(AuthState.AwaitingApiKeyInput); - return; - } + // Always show the API key input dialog so the user can + // explicitly enter or confirm their key, regardless of + // whether GEMINI_API_KEY env var or a stored key exists. + setAuthState(AuthState.AwaitingApiKeyInput); + return; } } setAuthState(AuthState.Unauthenticated); diff --git a/packages/cli/src/ui/auth/AuthInProgress.test.tsx b/packages/cli/src/ui/auth/AuthInProgress.test.tsx index 1c392be28d..a387fcb6f3 100644 --- a/packages/cli/src/ui/auth/AuthInProgress.test.tsx +++ b/packages/cli/src/ui/auth/AuthInProgress.test.tsx @@ -42,6 +42,7 @@ describe('AuthInProgress', () => { vi.useFakeTimers(); vi.mocked(debugLogger.error).mockImplementation((...args) => { if ( + // eslint-disable-next-line no-restricted-syntax typeof args[0] === 'string' && args[0].includes('was not wrapped in act') ) { diff --git a/packages/cli/src/ui/commands/rewindCommand.test.tsx b/packages/cli/src/ui/commands/rewindCommand.test.tsx index d93d365a3e..f878091a45 100644 --- a/packages/cli/src/ui/commands/rewindCommand.test.tsx +++ b/packages/cli/src/ui/commands/rewindCommand.test.tsx @@ -38,6 +38,7 @@ vi.mock('@google/gemini-cli-core', async (importOriginal) => { return { ...actual, coreEvents: { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...actual.coreEvents, emitFeedback: vi.fn(), }, diff --git a/packages/cli/src/ui/components/AppHeader.test.tsx b/packages/cli/src/ui/components/AppHeader.test.tsx index 8ff4caaacf..4dbdbc0052 100644 --- a/packages/cli/src/ui/components/AppHeader.test.tsx +++ b/packages/cli/src/ui/components/AppHeader.test.tsx @@ -8,8 +8,10 @@ import { renderWithProviders, persistentStateMock, } from '../../test-utils/render.js'; +import type { LoadedSettings } from '../../config/settings.js'; import { AppHeader } from './AppHeader.js'; import { describe, it, expect, vi } from 'vitest'; +import { makeFakeConfig } from '@google/gemini-cli-core'; import crypto from 'node:crypto'; vi.mock('../utils/terminalSetup.js', () => ({ @@ -240,4 +242,46 @@ describe('', () => { expect(session2.lastFrame()).not.toContain('Tips'); session2.unmount(); }); + + it('should render the full logo when logged out', async () => { + const mockConfig = makeFakeConfig(); + vi.spyOn(mockConfig, 'getContentGeneratorConfig').mockReturnValue({ + authType: undefined, + } as any); // eslint-disable-line @typescript-eslint/no-explicit-any + + const { lastFrame, waitUntilReady, unmount } = await renderWithProviders( + , + { + config: mockConfig, + uiState: { + terminalWidth: 120, + }, + }, + ); + await waitUntilReady(); + + // Check for block characters from the logo + expect(lastFrame()).toContain('▗█▀▀▜▙'); + expect(lastFrame()).toMatchSnapshot(); + unmount(); + }); + + it('should NOT render Tips when ui.hideTips is true', async () => { + const mockConfig = makeFakeConfig(); + const { lastFrame, waitUntilReady, unmount } = await renderWithProviders( + , + { + config: mockConfig, + settings: { + merged: { + ui: { hideTips: true }, + }, + } as unknown as LoadedSettings, + }, + ); + await waitUntilReady(); + + expect(lastFrame()).not.toContain('Tips'); + unmount(); + }); }); diff --git a/packages/cli/src/ui/components/AppHeader.tsx b/packages/cli/src/ui/components/AppHeader.tsx index 0b15f917a6..704b094663 100644 --- a/packages/cli/src/ui/components/AppHeader.tsx +++ b/packages/cli/src/ui/components/AppHeader.tsx @@ -19,6 +19,9 @@ import { CliSpinner } from './CliSpinner.js'; import { isAppleTerminal } from '@google/gemini-cli-core'; +import { longAsciiLogoCompactText } from './AsciiArt.js'; +import { getAsciiArtWidth } from '../utils/textUtils.js'; + interface AppHeaderProps { version: string; showDetails?: boolean; @@ -41,6 +44,18 @@ const MAC_TERMINAL_ICON = `▝▜▄ ▗▟▀ ▗▟▀ `; +/** + * The horizontal padding (in columns) required for metadata (version, identity, etc.) + * when rendered alongside the ASCII logo. + */ +const LOGO_METADATA_PADDING = 20; + +/** + * The terminal width below which we switch to a narrow/column layout to prevent + * UI elements from wrapping or overlapping. + */ +const NARROW_TERMINAL_BREAKPOINT = 60; + export const AppHeader = ({ version, showDetails = true }: AppHeaderProps) => { const settings = useSettings(); const config = useConfig(); @@ -49,70 +64,90 @@ export const AppHeader = ({ version, showDetails = true }: AppHeaderProps) => { const { bannerText } = useBanner(bannerData); const { showTips } = useTips(); + const authType = config.getContentGeneratorConfig()?.authType; + const loggedOut = !authType; + const showHeader = !( settings.merged.ui.hideBanner || config.getScreenReader() ); const ICON = isAppleTerminal() ? MAC_TERMINAL_ICON : DEFAULT_ICON; - if (!showDetails) { - return ( - - {showHeader && ( - - - {ICON} - - - - - Gemini CLI - - v{version} - - + let logoTextArt = ''; + if (loggedOut) { + const widthOfLongLogo = + getAsciiArtWidth(longAsciiLogoCompactText) + LOGO_METADATA_PADDING; + + if (terminalWidth >= widthOfLongLogo) { + logoTextArt = longAsciiLogoCompactText.trim(); + } + } + + // If the terminal is too narrow to fit the icon and metadata (especially long nightly versions) + // side-by-side, we switch to column mode to prevent wrapping. + const isNarrow = terminalWidth < NARROW_TERMINAL_BREAKPOINT; + + const renderLogo = () => ( + + + {ICON} + + {logoTextArt && ( + + {logoTextArt} + + )} + + ); + + const renderMetadata = (isBelow = false) => ( + + {/* Line 1: Gemini CLI vVersion [Updating] */} + + + Gemini CLI + + v{version} + {updateInfo && ( + + + Updating + )} - ); - } + + {showDetails && ( + <> + {/* Line 2: Blank */} + + + {/* Lines 3 & 4: User Identity info (Email /auth and Plan /upgrade) */} + {settings.merged.ui.showUserIdentity !== false && ( + + )} + + )} + + ); + + const useColumnLayout = !!logoTextArt || isNarrow; return ( {showHeader && ( - - - {ICON} - - - {/* Line 1: Gemini CLI vVersion [Updating] */} - - - Gemini CLI - - v{version} - {updateInfo && ( - - - Updating - - - )} - - - {/* Line 2: Blank */} - - - {/* Lines 3 & 4: User Identity info (Email /auth and Plan /upgrade) */} - {settings.merged.ui.showUserIdentity !== false && ( - - )} - + + {renderLogo()} + {useColumnLayout ? ( + {renderMetadata(true)} + ) : ( + renderMetadata(false) + )} )} diff --git a/packages/cli/src/ui/components/AsciiArt.ts b/packages/cli/src/ui/components/AsciiArt.ts index 79eb522c80..40f0eb8296 100644 --- a/packages/cli/src/ui/components/AsciiArt.ts +++ b/packages/cli/src/ui/components/AsciiArt.ts @@ -16,14 +16,14 @@ export const shortAsciiLogo = ` `; export const longAsciiLogo = ` - ███ █████████ ██████████ ██████ ██████ █████ ██████ █████ █████ -░░░███ ███░░░░░███░░███░░░░░█░░██████ ██████ ░░███ ░░██████ ░░███ ░░███ - ░░░███ ███ ░░░ ░███ █ ░ ░███░█████░███ ░███ ░███░███ ░███ ░███ - ░░░███ ░███ ░██████ ░███░░███ ░███ ░███ ░███░░███░███ ░███ - ███░ ░███ █████ ░███░░█ ░███ ░░░ ░███ ░███ ░███ ░░██████ ░███ - ███░ ░░███ ░░███ ░███ ░ █ ░███ ░███ ░███ ░███ ░░█████ ░███ - ███░ ░░█████████ ██████████ █████ █████ █████ █████ ░░█████ █████ -░░░ ░░░░░░░░░ ░░░░░░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ + █████████ ██████████ ██████ ██████ █████ ██████ █████ █████ +███░░░░░███░░███░░░░░█░░██████ █████ ░░███░░██████ ░░███ ░░███ +███ ░░░░░░░ ░███ █ ░ ░███░█████░███ ░███ ░███░███ ░███ ░███ +░███ ░██████ ░███░░███ ░███ ░███ ░███░░███░███ ░███ +░███ █████ ░███░░█ ░███ ░░░ ░███ ░███ ░███ ░░██████ ░███ +░░███ ░░███ ░███ ░ █ ░███ ░███ ░███ ░███ ░░█████ ░███ + ░░█████████ ██████████ █████ █████ █████ █████ ░░████ █████ + ░░░░░░░░░ ░░░░░░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░ ░░░░░ `; export const tinyAsciiLogo = ` @@ -36,3 +36,24 @@ export const tinyAsciiLogo = ` ███░ ░░█████████ ░░░ ░░░░░░░░░ `; + +export const shortAsciiLogoCompactText = ` +▟▛▀▀█▖▜█▀▀▜▝██▙▗██▛▝█▛▝██▙ ▜█▘▜█▘ +▐█ ▐█▄▌ █▌▜█▘█▌ █▌ █▌▜▙▐█ ▐█ +▝█▖ ▜█▘▐█ ▘▗ █▌ █▌ █▌ █▌ ▜██ ▐█ + ▝▀▀▀▀ ▀▀▀▀▀▝▀▀ ▝▀▀▝▀▀▝▀▀ ▀▀▘▀▀▘ +`; + +export const longAsciiLogoCompactText = ` +▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ +█▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ +▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ +`; + +export const tinyAsciiLogoCompactText = ` +▟▛▀▀█▖ +▐█ +▝█▖ ▜█▘ + ▝▀▀▀▀ +`; diff --git a/packages/cli/src/ui/components/AskUserDialog.test.tsx b/packages/cli/src/ui/components/AskUserDialog.test.tsx index 864800a061..53c820f69e 100644 --- a/packages/cli/src/ui/components/AskUserDialog.test.tsx +++ b/packages/cli/src/ui/components/AskUserDialog.test.tsx @@ -287,7 +287,7 @@ describe('AskUserDialog', () => { }); describe.each([ - { useAlternateBuffer: true, expectedArrows: false }, + { useAlternateBuffer: true, expectedArrows: true }, { useAlternateBuffer: false, expectedArrows: true }, ])( 'Scroll Arrows (useAlternateBuffer: $useAlternateBuffer)', @@ -1453,4 +1453,42 @@ describe('AskUserDialog', () => { }); }); }); + + it('shows at least 3 selection options even in small terminal heights', async () => { + const questions: Question[] = [ + { + question: + 'A very long question that would normally take up most of the space and squeeze the list if we did not have a heuristic to prevent it. This line is just to make it longer. And another one. Imagine this is a plan.', + header: 'Test', + type: QuestionType.CHOICE, + options: [ + { label: 'Option 1', description: 'Description 1' }, + { label: 'Option 2', description: 'Description 2' }, + { label: 'Option 3', description: 'Description 3' }, + { label: 'Option 4', description: 'Description 4' }, + ], + multiSelect: false, + }, + ]; + + const { lastFrame, waitUntilReady } = await renderWithProviders( + , + { width: 80 }, + ); + + await waitFor(async () => { + await waitUntilReady(); + const frame = lastFrame(); + // Should show at least 3 options + expect(frame).toContain('1. Option 1'); + expect(frame).toContain('2. Option 2'); + expect(frame).toContain('3. Option 3'); + }); + }); }); diff --git a/packages/cli/src/ui/components/AskUserDialog.tsx b/packages/cli/src/ui/components/AskUserDialog.tsx index b1d23885e6..cbb505320c 100644 --- a/packages/cli/src/ui/components/AskUserDialog.tsx +++ b/packages/cli/src/ui/components/AskUserDialog.tsx @@ -849,16 +849,30 @@ const ChoiceQuestionView: React.FC = ({ ? Math.max(1, availableHeight - overhead) : undefined; + // Reserve space for at least 3 items if more selectionItems available. + const reservedListHeight = Math.min(selectionItems.length * 2, 6); const questionHeightLimit = listHeight && !isAlternateBuffer ? question.unconstrainedHeight ? Math.max(1, listHeight - selectionItems.length * 2) - : Math.min(15, Math.max(1, listHeight - DIALOG_PADDING)) + : Math.min( + 15, + Math.max( + 1, + listHeight - Math.max(DIALOG_PADDING, reservedListHeight), + ), + ) : undefined; const maxItemsToShow = - listHeight && questionHeightLimit - ? Math.max(1, Math.floor((listHeight - questionHeightLimit) / 2)) + listHeight && (!isAlternateBuffer || availableHeight !== undefined) + ? Math.min( + selectionItems.length, + Math.max( + 1, + Math.floor((listHeight - (questionHeightLimit ?? 0)) / 2), + ), + ) : selectionItems.length; return ( diff --git a/packages/cli/src/ui/components/Composer.test.tsx b/packages/cli/src/ui/components/Composer.test.tsx index 8df5f690e7..1cbb29a06c 100644 --- a/packages/cli/src/ui/components/Composer.test.tsx +++ b/packages/cli/src/ui/components/Composer.test.tsx @@ -17,13 +17,6 @@ import { import { ConfigContext } from '../contexts/ConfigContext.js'; import { SettingsContext } from '../contexts/SettingsContext.js'; import { createMockSettings } from '../../test-utils/settings.js'; -// Mock VimModeContext hook -vi.mock('../contexts/VimModeContext.js', () => ({ - useVimMode: vi.fn(() => ({ - vimEnabled: false, - vimMode: 'INSERT', - })), -})); import { ApprovalMode, tokenLimit, @@ -36,6 +29,21 @@ import type { LoadedSettings } from '../../config/settings.js'; import type { SessionMetrics } from '../contexts/SessionContext.js'; import type { TextBuffer } from './shared/text-buffer.js'; +// Mock VimModeContext hook +vi.mock('../contexts/VimModeContext.js', () => ({ + useVimMode: vi.fn(() => ({ + vimEnabled: false, + vimMode: 'INSERT', + })), +})); + +vi.mock('../hooks/useTerminalSize.js', () => ({ + useTerminalSize: vi.fn(() => ({ + columns: 100, + rows: 24, + })), +})); + const composerTestControls = vi.hoisted(() => ({ suggestionsVisible: false, isAlternateBuffer: false, @@ -58,18 +66,9 @@ vi.mock('./LoadingIndicator.js', () => ({ })); vi.mock('./StatusDisplay.js', () => ({ - StatusDisplay: () => StatusDisplay, -})); - -vi.mock('./ToastDisplay.js', () => ({ - ToastDisplay: () => ToastDisplay, - shouldShowToast: (uiState: UIState) => - uiState.ctrlCPressedOnce || - Boolean(uiState.transientMessage) || - uiState.ctrlDPressedOnce || - (uiState.showEscapePrompt && - (uiState.buffer.text.length > 0 || uiState.history.length > 0)) || - Boolean(uiState.queueErrorMessage), + StatusDisplay: ({ hideContextSummary }: { hideContextSummary: boolean }) => ( + StatusDisplay{hideContextSummary ? ' (hidden summary)' : ''} + ), })); vi.mock('./ContextSummaryDisplay.js', () => ({ @@ -81,17 +80,15 @@ vi.mock('./HookStatusDisplay.js', () => ({ })); vi.mock('./ApprovalModeIndicator.js', () => ({ - ApprovalModeIndicator: () => ApprovalModeIndicator, + ApprovalModeIndicator: ({ approvalMode }: { approvalMode: ApprovalMode }) => ( + ApprovalModeIndicator: {approvalMode} + ), })); vi.mock('./ShellModeIndicator.js', () => ({ ShellModeIndicator: () => ShellModeIndicator, })); -vi.mock('./ShortcutsHint.js', () => ({ - ShortcutsHint: () => ShortcutsHint, -})); - vi.mock('./ShortcutsHelp.js', () => ({ ShortcutsHelp: () => ShortcutsHelp, })); @@ -174,6 +171,8 @@ const createMockUIState = (overrides: Partial = {}): UIState => isFocused: true, thought: '', currentLoadingPhrase: '', + currentTip: '', + currentWittyPhrase: '', elapsedTime: 0, ctrlCPressedOnce: false, ctrlDPressedOnce: false, @@ -201,6 +200,7 @@ const createMockUIState = (overrides: Partial = {}): UIState => activeHooks: [], isBackgroundShellVisible: false, embeddedShellFocused: false, + showIsExpandableHint: false, quota: { userTier: undefined, stats: undefined, @@ -247,7 +247,7 @@ const createMockConfig = (overrides = {}): Config => const renderComposer = async ( uiState: UIState, - settings = createMockSettings(), + settings = createMockSettings({ ui: {} }), config = createMockConfig(), uiActions = createMockUIActions(), ) => { @@ -256,7 +256,7 @@ const renderComposer = async ( - + @@ -383,10 +383,12 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState, settings); const output = lastFrame(); - expect(output).toContain('LoadingIndicator: Thinking...'); + // In Refreshed UX, we don't force 'Thinking...' label in renderStatusNode + // It uses the subject directly + expect(output).toContain('LoadingIndicator: Thinking about code'); }); - it('hides shortcuts hint while loading', async () => { + it('shows shortcuts hint while loading', async () => { const uiState = createMockUIState({ streamingState: StreamingState.Responding, elapsedTime: 1, @@ -397,7 +399,8 @@ describe('Composer', () => { const output = lastFrame(); expect(output).toContain('LoadingIndicator'); - expect(output).not.toContain('ShortcutsHint'); + expect(output).toContain('press tab twice for more'); + expect(output).not.toContain('? for shortcuts'); }); it('renders LoadingIndicator with thought when loadingPhrases is off', async () => { @@ -453,9 +456,8 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - const output = lastFrame(); - expect(output).not.toContain('LoadingIndicator'); - expect(output).not.toContain('esc to cancel'); + const output = lastFrame({ allowEmpty: true }); + expect(output).toBe(''); }); it('renders LoadingIndicator when embedded shell is focused but background shell is visible', async () => { @@ -558,8 +560,10 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); const output = lastFrame(); - expect(output).toContain('ToastDisplay'); - expect(output).not.toContain('ApprovalModeIndicator'); + expect(output).toContain('Press Ctrl+C again to exit.'); + // In Refreshed UX, Row 1 shows toast, and Row 2 shows ApprovalModeIndicator/StatusDisplay + // They are no longer mutually exclusive. + expect(output).toContain('ApprovalModeIndicator'); expect(output).toContain('StatusDisplay'); }); @@ -574,8 +578,8 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); const output = lastFrame(); - expect(output).toContain('ToastDisplay'); - expect(output).not.toContain('ApprovalModeIndicator'); + expect(output).toContain('Warning'); + expect(output).toContain('ApprovalModeIndicator'); }); }); @@ -584,15 +588,17 @@ describe('Composer', () => { const uiState = createMockUIState({ cleanUiDetailsVisible: false, }); + const settings = createMockSettings({ + ui: { showShortcutsHint: false }, + }); - const { lastFrame } = await renderComposer(uiState); + const { lastFrame } = await renderComposer(uiState, settings); const output = lastFrame(); - expect(output).toContain('ShortcutsHint'); + expect(output).not.toContain('press tab twice for more'); + expect(output).not.toContain('? for shortcuts'); expect(output).toContain('InputPrompt'); expect(output).not.toContain('Footer'); - expect(output).not.toContain('ApprovalModeIndicator'); - expect(output).not.toContain('ContextSummaryDisplay'); }); it('renders InputPrompt when input is active', async () => { @@ -665,12 +671,15 @@ describe('Composer', () => { }); it.each([ - [ApprovalMode.YOLO, 'YOLO'], - [ApprovalMode.PLAN, 'plan'], - [ApprovalMode.AUTO_EDIT, 'auto edit'], + { mode: ApprovalMode.YOLO, label: '● YOLO' }, + { mode: ApprovalMode.PLAN, label: '● plan' }, + { + mode: ApprovalMode.AUTO_EDIT, + label: '● auto edit', + }, ])( - 'shows minimal mode badge "%s" when clean UI details are hidden', - async (mode, label) => { + 'shows minimal mode badge "$mode" when clean UI details are hidden', + async ({ mode, label }) => { const uiState = createMockUIState({ cleanUiDetailsVisible: false, showApprovalModeIndicator: mode, @@ -693,7 +702,8 @@ describe('Composer', () => { const output = lastFrame(); expect(output).toContain('LoadingIndicator'); expect(output).not.toContain('plan'); - expect(output).not.toContain('ShortcutsHint'); + expect(output).toContain('press tab twice for more'); + expect(output).not.toContain('? for shortcuts'); }); it('hides minimal mode badge while action-required state is active', async () => { @@ -708,9 +718,7 @@ describe('Composer', () => { }); const { lastFrame } = await renderComposer(uiState); - const output = lastFrame(); - expect(output).not.toContain('plan'); - expect(output).not.toContain('ShortcutsHint'); + expect(lastFrame({ allowEmpty: true })).toBe(''); }); it('shows Esc rewind prompt in minimal mode without showing full UI', async () => { @@ -722,7 +730,7 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); const output = lastFrame(); - expect(output).toContain('ToastDisplay'); + expect(output).toContain('Press Esc again to rewind.'); expect(output).not.toContain('ContextSummaryDisplay'); }); @@ -747,7 +755,14 @@ describe('Composer', () => { }); const { lastFrame } = await renderComposer(uiState, settings); - expect(lastFrame()).toContain('%'); + + await act(async () => { + await vi.advanceTimersByTimeAsync(250); + }); + + // StatusDisplay (which contains ContextUsageDisplay) should bleed through in minimal mode + expect(lastFrame()).toContain('StatusDisplay'); + expect(lastFrame()).toContain('70% used'); }); }); @@ -812,14 +827,20 @@ describe('Composer', () => { describe('Shortcuts Hint', () => { it('restores shortcuts hint after 200ms debounce when buffer is empty', async () => { - const { lastFrame } = await renderComposer( - createMockUIState({ - buffer: { text: '' } as unknown as TextBuffer, - cleanUiDetailsVisible: false, - }), - ); + const uiState = createMockUIState({ + buffer: { text: '' } as unknown as TextBuffer, + cleanUiDetailsVisible: false, + }); - expect(lastFrame({ allowEmpty: true })).toContain('ShortcutsHint'); + const { lastFrame } = await renderComposer(uiState); + + await act(async () => { + await vi.advanceTimersByTimeAsync(250); + }); + + expect(lastFrame({ allowEmpty: true })).toContain( + 'press tab twice for more', + ); }); it('hides shortcuts hint when text is typed in buffer', async () => { @@ -830,7 +851,8 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - expect(lastFrame()).not.toContain('ShortcutsHint'); + expect(lastFrame()).not.toContain('press tab twice for more'); + expect(lastFrame()).not.toContain('? for shortcuts'); }); it('hides shortcuts hint when showShortcutsHint setting is false', async () => { @@ -843,7 +865,7 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState, settings); - expect(lastFrame()).not.toContain('ShortcutsHint'); + expect(lastFrame()).not.toContain('? for shortcuts'); }); it('hides shortcuts hint when a action is required (e.g. dialog is open)', async () => { @@ -856,9 +878,10 @@ describe('Composer', () => { ), }); - const { lastFrame } = await renderComposer(uiState); + const { lastFrame, unmount } = await renderComposer(uiState); - expect(lastFrame()).not.toContain('ShortcutsHint'); + expect(lastFrame({ allowEmpty: true })).toBe(''); + unmount(); }); it('keeps shortcuts hint visible when no action is required', async () => { @@ -868,7 +891,11 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - expect(lastFrame()).toContain('ShortcutsHint'); + await act(async () => { + await vi.advanceTimersByTimeAsync(250); + }); + + expect(lastFrame()).toContain('press tab twice for more'); }); it('shows shortcuts hint when full UI details are visible', async () => { @@ -878,10 +905,15 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - expect(lastFrame()).toContain('ShortcutsHint'); + await act(async () => { + await vi.advanceTimersByTimeAsync(250); + }); + + // In Refreshed UX, shortcuts hint is in the top multipurpose status row + expect(lastFrame()).toContain('? for shortcuts'); }); - it('hides shortcuts hint while loading when full UI details are visible', async () => { + it('shows shortcuts hint while loading when full UI details are visible', async () => { const uiState = createMockUIState({ cleanUiDetailsVisible: true, streamingState: StreamingState.Responding, @@ -889,10 +921,17 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - expect(lastFrame()).not.toContain('ShortcutsHint'); + await act(async () => { + await vi.advanceTimersByTimeAsync(250); + }); + + // In experimental layout, status row is visible during loading + expect(lastFrame()).toContain('LoadingIndicator'); + expect(lastFrame()).toContain('? for shortcuts'); + expect(lastFrame()).not.toContain('press tab twice for more'); }); - it('hides shortcuts hint while loading in minimal mode', async () => { + it('shows shortcuts hint while loading in minimal mode', async () => { const uiState = createMockUIState({ cleanUiDetailsVisible: false, streamingState: StreamingState.Responding, @@ -901,7 +940,14 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - expect(lastFrame()).not.toContain('ShortcutsHint'); + await act(async () => { + await vi.advanceTimersByTimeAsync(250); + }); + + // In experimental layout, status row is visible in clean mode while busy + expect(lastFrame()).toContain('LoadingIndicator'); + expect(lastFrame()).toContain('press tab twice for more'); + expect(lastFrame()).not.toContain('? for shortcuts'); }); it('shows shortcuts help in minimal mode when toggled on', async () => { @@ -926,7 +972,8 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - expect(lastFrame()).not.toContain('ShortcutsHint'); + expect(lastFrame()).not.toContain('press tab twice for more'); + expect(lastFrame()).not.toContain('? for shortcuts'); expect(lastFrame()).not.toContain('plan'); }); @@ -954,7 +1001,12 @@ describe('Composer', () => { const { lastFrame } = await renderComposer(uiState); - expect(lastFrame()).toContain('ShortcutsHint'); + await act(async () => { + await vi.advanceTimersByTimeAsync(250); + }); + + // In Refreshed UX, shortcuts hint is in the top status row and doesn't collide with suggestions below + expect(lastFrame()).toContain('press tab twice for more'); }); }); @@ -982,24 +1034,22 @@ describe('Composer', () => { expect(lastFrame()).not.toContain('ShortcutsHelp'); unmount(); }); - it('hides shortcuts help when action is required', async () => { const uiState = createMockUIState({ shortcutsHelpVisible: true, customDialog: ( - Dialog content + Test Dialog ), }); const { lastFrame, unmount } = await renderComposer(uiState); - expect(lastFrame()).not.toContain('ShortcutsHelp'); + expect(lastFrame({ allowEmpty: true })).toBe(''); unmount(); }); }); - describe('Snapshots', () => { it('matches snapshot in idle state', async () => { const uiState = createMockUIState(); diff --git a/packages/cli/src/ui/components/Composer.tsx b/packages/cli/src/ui/components/Composer.tsx index 89c9c9d3d6..042f50776d 100644 --- a/packages/cli/src/ui/components/Composer.tsx +++ b/packages/cli/src/ui/components/Composer.tsx @@ -4,58 +4,63 @@ * SPDX-License-Identifier: Apache-2.0 */ -import { useState, useEffect, useMemo } from 'react'; -import { Box, Text, useIsScreenReaderEnabled } from 'ink'; import { ApprovalMode, checkExhaustive, CoreToolCallStatus, + isUserVisibleHook, } from '@google/gemini-cli-core'; +import { Box, Text, useIsScreenReaderEnabled } from 'ink'; +import { useState, useEffect, useMemo } from 'react'; +import { useConfig } from '../contexts/ConfigContext.js'; +import { useSettings } from '../contexts/SettingsContext.js'; +import { useUIState } from '../contexts/UIStateContext.js'; +import { useUIActions } from '../contexts/UIActionsContext.js'; +import { useVimMode } from '../contexts/VimModeContext.js'; +import { useAlternateBuffer } from '../hooks/useAlternateBuffer.js'; +import { useTerminalSize } from '../hooks/useTerminalSize.js'; +import { isNarrowWidth } from '../utils/isNarrowWidth.js'; +import { isContextUsageHigh } from '../utils/contextUsage.js'; +import { theme } from '../semantic-colors.js'; +import { GENERIC_WORKING_LABEL } from '../textConstants.js'; +import { INTERACTIVE_SHELL_WAITING_PHRASE } from '../hooks/usePhraseCycler.js'; +import { StreamingState, type HistoryItemToolGroup } from '../types.js'; import { LoadingIndicator } from './LoadingIndicator.js'; +import { ContextUsageDisplay } from './ContextUsageDisplay.js'; import { StatusDisplay } from './StatusDisplay.js'; +import { HorizontalLine } from './shared/HorizontalLine.js'; import { ToastDisplay, shouldShowToast } from './ToastDisplay.js'; import { ApprovalModeIndicator } from './ApprovalModeIndicator.js'; import { ShellModeIndicator } from './ShellModeIndicator.js'; import { DetailedMessagesDisplay } from './DetailedMessagesDisplay.js'; import { RawMarkdownIndicator } from './RawMarkdownIndicator.js'; -import { ShortcutsHint } from './ShortcutsHint.js'; import { ShortcutsHelp } from './ShortcutsHelp.js'; import { InputPrompt } from './InputPrompt.js'; import { Footer } from './Footer.js'; import { ShowMoreLines } from './ShowMoreLines.js'; import { QueuedMessageDisplay } from './QueuedMessageDisplay.js'; -import { ContextUsageDisplay } from './ContextUsageDisplay.js'; -import { HorizontalLine } from './shared/HorizontalLine.js'; import { OverflowProvider } from '../contexts/OverflowContext.js'; -import { isNarrowWidth } from '../utils/isNarrowWidth.js'; -import { useUIState } from '../contexts/UIStateContext.js'; -import { useUIActions } from '../contexts/UIActionsContext.js'; -import { useVimMode } from '../contexts/VimModeContext.js'; -import { useConfig } from '../contexts/ConfigContext.js'; -import { useSettings } from '../contexts/SettingsContext.js'; -import { useAlternateBuffer } from '../hooks/useAlternateBuffer.js'; -import { StreamingState, type HistoryItemToolGroup } from '../types.js'; -import { ConfigInitDisplay } from '../components/ConfigInitDisplay.js'; +import { ConfigInitDisplay } from './ConfigInitDisplay.js'; import { TodoTray } from './messages/Todo.js'; -import { getInlineThinkingMode } from '../utils/inlineThinkingMode.js'; -import { isContextUsageHigh } from '../utils/contextUsage.js'; -import { theme } from '../semantic-colors.js'; export const Composer = ({ isFocused = true }: { isFocused?: boolean }) => { - const config = useConfig(); - const settings = useSettings(); - const isScreenReaderEnabled = useIsScreenReaderEnabled(); const uiState = useUIState(); const uiActions = useUIActions(); + const settings = useSettings(); + const config = useConfig(); const { vimEnabled, vimMode } = useVimMode(); - const inlineThinkingMode = getInlineThinkingMode(settings); - const terminalWidth = uiState.terminalWidth; + const isScreenReaderEnabled = useIsScreenReaderEnabled(); + const { columns: terminalWidth } = useTerminalSize(); const isNarrow = isNarrowWidth(terminalWidth); const debugConsoleMaxHeight = Math.floor(Math.max(terminalWidth * 0.2, 5)); const [suggestionsVisible, setSuggestionsVisible] = useState(false); const isAlternateBuffer = useAlternateBuffer(); - const { showApprovalModeIndicator } = uiState; + const showApprovalModeIndicator = uiState.showApprovalModeIndicator; + const loadingPhrases = settings.merged.ui.loadingPhrases; + const showTips = loadingPhrases === 'tips' || loadingPhrases === 'all'; + const showWit = loadingPhrases === 'witty' || loadingPhrases === 'all'; + const showUiDetails = uiState.cleanUiDetailsVisible; const suggestionsPosition = isAlternateBuffer ? 'above' : 'below'; const hideContextSummary = @@ -84,6 +89,7 @@ export const Composer = ({ isFocused = true }: { isFocused?: boolean }) => { Boolean(uiState.quota.proQuotaRequest) || Boolean(uiState.quota.validationRequest) || Boolean(uiState.customDialog); + const isPassiveShortcutsHelpState = uiState.isInputActive && uiState.streamingState === StreamingState.Idle && @@ -105,16 +111,30 @@ export const Composer = ({ isFocused = true }: { isFocused?: boolean }) => { uiState.shortcutsHelpVisible && uiState.streamingState === StreamingState.Idle && !hasPendingActionRequired; + + /** + * Use the setting if provided, otherwise default to true for the new UX. + * This allows tests to override the collapse behavior. + */ + const shouldCollapseDuringApproval = + settings.merged.ui.collapseDrawerDuringApproval !== false; + + if (hasPendingActionRequired && shouldCollapseDuringApproval) { + return null; + } + const hasToast = shouldShowToast(uiState); const showLoadingIndicator = (!uiState.embeddedShellFocused || uiState.isBackgroundShellVisible) && uiState.streamingState === StreamingState.Responding && !hasPendingActionRequired; + const hideUiDetailsForSuggestions = suggestionsVisible && suggestionsPosition === 'above'; const showApprovalIndicator = !uiState.shellModeActive && !hideUiDetailsForSuggestions; const showRawMarkdownIndicator = !uiState.renderMarkdown; + let modeBleedThrough: { text: string; color: string } | null = null; switch (showApprovalModeIndicator) { case ApprovalMode.YOLO: @@ -137,57 +157,359 @@ export const Composer = ({ isFocused = true }: { isFocused?: boolean }) => { const hideMinimalModeHintWhileBusy = !showUiDetails && (showLoadingIndicator || hasPendingActionRequired); - const minimalModeBleedThrough = hideMinimalModeHintWhileBusy - ? null - : modeBleedThrough; - const hasMinimalStatusBleedThrough = shouldShowToast(uiState); - const showMinimalContextBleedThrough = - !settings.merged.ui.footer.hideContextPercentage && - isContextUsageHigh( - uiState.sessionStats.lastPromptTokenCount, - typeof uiState.currentModel === 'string' - ? uiState.currentModel - : undefined, - ); - const hideShortcutsHintForSuggestions = hideUiDetailsForSuggestions; - const isModelIdle = uiState.streamingState === StreamingState.Idle; - const isBufferEmpty = uiState.buffer.text.length === 0; - const canShowShortcutsHint = - isModelIdle && isBufferEmpty && !hasPendingActionRequired; - const [showShortcutsHintDebounced, setShowShortcutsHintDebounced] = - useState(canShowShortcutsHint); + // Universal Content Objects + const modeContentObj = hideMinimalModeHintWhileBusy ? null : modeBleedThrough; - useEffect(() => { - if (!canShowShortcutsHint) { - setShowShortcutsHintDebounced(false); - return; - } - - const timeout = setTimeout(() => { - setShowShortcutsHintDebounced(true); - }, 200); - - return () => clearTimeout(timeout); - }, [canShowShortcutsHint]); + const allHooks = uiState.activeHooks; + const hasAnyHooks = allHooks.length > 0; + const userVisibleHooks = allHooks.filter((h) => isUserVisibleHook(h.source)); + const hasUserVisibleHooks = userVisibleHooks.length > 0; const shouldReserveSpaceForShortcutsHint = - settings.merged.ui.showShortcutsHint && !hideShortcutsHintForSuggestions; - const showShortcutsHint = - shouldReserveSpaceForShortcutsHint && showShortcutsHintDebounced; - const showMinimalModeBleedThrough = - !hideUiDetailsForSuggestions && Boolean(minimalModeBleedThrough); - const showMinimalInlineLoading = !showUiDetails && showLoadingIndicator; - const showMinimalBleedThroughRow = - !showUiDetails && - (showMinimalModeBleedThrough || - hasMinimalStatusBleedThrough || - showMinimalContextBleedThrough); - const showMinimalMetaRow = - !showUiDetails && - (showMinimalInlineLoading || - showMinimalBleedThroughRow || - shouldReserveSpaceForShortcutsHint); + settings.merged.ui.showShortcutsHint && + !hideUiDetailsForSuggestions && + !hasPendingActionRequired; + + const isInteractiveShellWaiting = uiState.currentLoadingPhrase?.includes( + INTERACTIVE_SHELL_WAITING_PHRASE, + ); + + /** + * Calculate the estimated length of the status message to avoid collisions + * with the tips area. + */ + let estimatedStatusLength = 0; + if (hasAnyHooks) { + if (hasUserVisibleHooks) { + const hookLabel = + userVisibleHooks.length > 1 ? 'Executing Hooks' : 'Executing Hook'; + const hookNames = userVisibleHooks + .map( + (h) => + h.name + + (h.index && h.total && h.total > 1 + ? ` (${h.index}/${h.total})` + : ''), + ) + .join(', '); + estimatedStatusLength = hookLabel.length + hookNames.length + 10; + } else { + estimatedStatusLength = GENERIC_WORKING_LABEL.length + 10; + } + } else if (showLoadingIndicator) { + const thoughtText = uiState.thought?.subject || GENERIC_WORKING_LABEL; + const inlineWittyLength = + showWit && uiState.currentWittyPhrase + ? uiState.currentWittyPhrase.length + 1 + : 0; + estimatedStatusLength = thoughtText.length + 25 + inlineWittyLength; + } else if (hasPendingActionRequired) { + estimatedStatusLength = 20; + } else if (hasToast) { + estimatedStatusLength = 40; + } + + /** + * Determine the ambient text (tip) to display. + */ + const tipContentStr = (() => { + // 1. Proactive Tip (Priority) + if ( + showTips && + uiState.currentTip && + !( + isInteractiveShellWaiting && + uiState.currentTip === INTERACTIVE_SHELL_WAITING_PHRASE + ) + ) { + if ( + estimatedStatusLength + uiState.currentTip.length + 10 <= + terminalWidth + ) { + return uiState.currentTip; + } + } + + // 2. Shortcut Hint (Fallback) + if ( + settings.merged.ui.showShortcutsHint && + !hideUiDetailsForSuggestions && + !hasPendingActionRequired && + uiState.buffer.text.length === 0 + ) { + return showUiDetails ? '? for shortcuts' : 'press tab twice for more'; + } + + return undefined; + })(); + + const tipLength = tipContentStr?.length || 0; + const willCollideTip = estimatedStatusLength + tipLength + 5 > terminalWidth; + + const showTipLine = + !hasPendingActionRequired && tipContentStr && !willCollideTip && !isNarrow; + + // Mini Mode VIP Flags (Pure Content Triggers) + const miniMode_ShowApprovalMode = + Boolean(modeContentObj) && !hideUiDetailsForSuggestions; + const miniMode_ShowToast = hasToast; + const miniMode_ShowShortcuts = shouldReserveSpaceForShortcutsHint; + const miniMode_ShowStatus = showLoadingIndicator || hasAnyHooks; + const miniMode_ShowTip = showTipLine; + const miniMode_ShowContext = isContextUsageHigh( + uiState.sessionStats.lastPromptTokenCount, + uiState.currentModel, + settings.merged.model?.compressionThreshold, + ); + + // Composite Mini Mode Triggers + const showRow1_MiniMode = + miniMode_ShowToast || + miniMode_ShowStatus || + miniMode_ShowShortcuts || + miniMode_ShowTip; + + const showRow2_MiniMode = miniMode_ShowApprovalMode || miniMode_ShowContext; + + // Final Display Rules (Stable Footer Architecture) + const showRow1 = showUiDetails || showRow1_MiniMode; + const showRow2 = showUiDetails || showRow2_MiniMode; + + const showMinimalBleedThroughRow = !showUiDetails && showRow2_MiniMode; + + const renderTipNode = () => { + if (!tipContentStr) return null; + + const isShortcutHint = + tipContentStr === '? for shortcuts' || + tipContentStr === 'press tab twice for more'; + const color = + isShortcutHint && uiState.shortcutsHelpVisible + ? theme.text.accent + : theme.text.secondary; + + return ( + + + {tipContentStr === uiState.currentTip + ? `Tip: ${tipContentStr}` + : tipContentStr} + + + ); + }; + + const renderStatusNode = () => { + const allHooks = uiState.activeHooks; + if (allHooks.length === 0 && !showLoadingIndicator) return null; + + if (allHooks.length > 0) { + const userVisibleHooks = allHooks.filter((h) => + isUserVisibleHook(h.source), + ); + + let hookText = GENERIC_WORKING_LABEL; + if (userVisibleHooks.length > 0) { + const label = + userVisibleHooks.length > 1 ? 'Executing Hooks' : 'Executing Hook'; + const displayNames = userVisibleHooks.map((h) => { + let name = h.name; + if (h.index && h.total && h.total > 1) { + name += ` (${h.index}/${h.total})`; + } + return name; + }); + hookText = `${label}: ${displayNames.join(', ')}`; + } + + return ( + + ); + } + + return ( + + ); + }; + + const statusNode = renderStatusNode(); + + /** + * Renders the minimal metadata row content shown when UI details are hidden. + */ + const renderMinimalMetaRowContent = () => ( + + {renderStatusNode()} + {showMinimalBleedThroughRow && ( + + {miniMode_ShowApprovalMode && modeContentObj && ( + ● {modeContentObj.text} + )} + + )} + + ); + + const renderStatusRow = () => { + // Mini Mode Height Reservation (The "Anti-Jitter" line) + if (!showUiDetails && !showRow1_MiniMode && !showRow2_MiniMode) { + return ; + } + + return ( + + {/* Row 1: multipurpose status (thinking, hooks, wit, tips) */} + {showRow1 && ( + + + {!showUiDetails && showRow1_MiniMode ? ( + renderMinimalMetaRowContent() + ) : isInteractiveShellWaiting ? ( + + + ! Shell awaiting input (Tab to focus) + + + ) : ( + + {statusNode} + + )} + + + + {!isNarrow && showTipLine && renderTipNode()} + + + )} + + {/* Internal Separator Line */} + {showRow1 && + showRow2 && + (showUiDetails || (showRow1_MiniMode && showRow2_MiniMode)) && ( + + + + )} + + {/* Row 2: Mode and Context Summary */} + {showRow2 && ( + + + {showUiDetails ? ( + <> + {showApprovalIndicator && ( + + )} + {uiState.shellModeActive && ( + + + + )} + {showRawMarkdownIndicator && ( + + + + )} + + ) : ( + miniMode_ShowApprovalMode && + modeContentObj && ( + + ● {modeContentObj.text} + + ) + )} + + + {(showUiDetails || miniMode_ShowContext) && ( + + )} + {miniMode_ShowContext && !showUiDetails && ( + + + + )} + + + )} + + ); + }; return ( { {showUiDetails && } - - - - {showUiDetails && showLoadingIndicator && ( - - )} - - - {showUiDetails && showShortcutsHint && } - - - {showMinimalMetaRow && ( - - - {showMinimalInlineLoading && ( - - )} - {showMinimalModeBleedThrough && minimalModeBleedThrough && ( - - ● {minimalModeBleedThrough.text} - - )} - {hasMinimalStatusBleedThrough && ( - - - - )} - - {(showMinimalContextBleedThrough || - shouldReserveSpaceForShortcutsHint) && ( - - {showMinimalContextBleedThrough && ( - - )} - - {showShortcutsHint && } - - - )} - - )} - {showShortcutsHelp && } - {showUiDetails && } - {showUiDetails && ( - - - {hasToast ? ( - - ) : ( - - {showApprovalIndicator && ( - - )} - {!showLoadingIndicator && ( - <> - {uiState.shellModeActive && ( - - - - )} - {showRawMarkdownIndicator && ( - - - - )} - - )} - - )} - + {showShortcutsHelp && } - - {!showLoadingIndicator && ( - - )} - - - )} + {(showUiDetails || miniMode_ShowToast) && ( + + + + )} + + + {renderStatusRow()} {showUiDetails && uiState.showErrorDetails && ( diff --git a/packages/cli/src/ui/components/ConfigInitDisplay.tsx b/packages/cli/src/ui/components/ConfigInitDisplay.tsx index d421da211e..4997260621 100644 --- a/packages/cli/src/ui/components/ConfigInitDisplay.tsx +++ b/packages/cli/src/ui/components/ConfigInitDisplay.tsx @@ -16,7 +16,7 @@ import { GeminiSpinner } from './GeminiSpinner.js'; import { theme } from '../semantic-colors.js'; export const ConfigInitDisplay = ({ - message: initialMessage = 'Initializing...', + message: initialMessage = 'Working...', }: { message?: string; }) => { @@ -45,14 +45,14 @@ export const ConfigInitDisplay = ({ const suffix = remaining > 0 ? `, +${remaining} more` : ''; const mcpMessage = `Connecting to MCP servers... (${connected}/${clients.size}) - Waiting for: ${displayedServers}${suffix}`; setMessage( - initialMessage && initialMessage !== 'Initializing...' + initialMessage && initialMessage !== 'Working...' ? `${initialMessage} (${mcpMessage})` : mcpMessage, ); } else { const mcpMessage = `Connecting to MCP servers... (${connected}/${clients.size})`; setMessage( - initialMessage && initialMessage !== 'Initializing...' + initialMessage && initialMessage !== 'Working...' ? `${initialMessage} (${mcpMessage})` : mcpMessage, ); diff --git a/packages/cli/src/ui/components/ConsentPrompt.tsx b/packages/cli/src/ui/components/ConsentPrompt.tsx index 3f255d2606..859d29281d 100644 --- a/packages/cli/src/ui/components/ConsentPrompt.tsx +++ b/packages/cli/src/ui/components/ConsentPrompt.tsx @@ -9,6 +9,7 @@ import { type ReactNode } from 'react'; import { theme } from '../semantic-colors.js'; import { MarkdownDisplay } from '../utils/MarkdownDisplay.js'; import { RadioButtonSelect } from './shared/RadioButtonSelect.js'; +import { DialogFooter } from './shared/DialogFooter.js'; type ConsentPromptProps = { // If a simple string is given, it will render using markdown by default. @@ -37,7 +38,7 @@ export const ConsentPrompt = (props: ConsentPromptProps) => { ) : ( prompt )} - + { ]} onSelect={onConfirm} /> + ); diff --git a/packages/cli/src/ui/components/ContextSummaryDisplay.test.tsx b/packages/cli/src/ui/components/ContextSummaryDisplay.test.tsx index 1049e97912..8c013cafa9 100644 --- a/packages/cli/src/ui/components/ContextSummaryDisplay.test.tsx +++ b/packages/cli/src/ui/components/ContextSummaryDisplay.test.tsx @@ -77,32 +77,6 @@ describe('', () => { unmount(); }); - it('should switch layout at the 80-column breakpoint', async () => { - const props = { - ...baseProps, - geminiMdFileCount: 1, - contextFileNames: ['GEMINI.md'], - mcpServers: { 'test-server': { command: 'test' } }, - ideContext: { - workspaceState: { - openFiles: [{ path: '/a/b/c', timestamp: Date.now() }], - }, - }, - }; - - // At 80 columns, should be on one line - const { lastFrame: wideFrame, unmount: unmountWide } = - await renderWithWidth(80, props); - expect(wideFrame().trim().includes('\n')).toBe(false); - unmountWide(); - - // At 79 columns, should be on multiple lines - const { lastFrame: narrowFrame, unmount: unmountNarrow } = - await renderWithWidth(79, props); - expect(narrowFrame().trim().includes('\n')).toBe(true); - expect(narrowFrame().trim().split('\n').length).toBe(4); - unmountNarrow(); - }); it('should not render empty parts', async () => { const props = { ...baseProps, diff --git a/packages/cli/src/ui/components/ContextSummaryDisplay.tsx b/packages/cli/src/ui/components/ContextSummaryDisplay.tsx index c9f67e34b3..696793bc06 100644 --- a/packages/cli/src/ui/components/ContextSummaryDisplay.tsx +++ b/packages/cli/src/ui/components/ContextSummaryDisplay.tsx @@ -8,8 +8,6 @@ import type React from 'react'; import { Box, Text } from 'ink'; import { theme } from '../semantic-colors.js'; import { type IdeContext, type MCPServerConfig } from '@google/gemini-cli-core'; -import { useTerminalSize } from '../hooks/useTerminalSize.js'; -import { isNarrowWidth } from '../utils/isNarrowWidth.js'; interface ContextSummaryDisplayProps { geminiMdFileCount: number; @@ -30,8 +28,6 @@ export const ContextSummaryDisplay: React.FC = ({ skillCount, backgroundProcessCount = 0, }) => { - const { columns: terminalWidth } = useTerminalSize(); - const isNarrow = isNarrowWidth(terminalWidth); const mcpServerCount = Object.keys(mcpServers || {}).length; const blockedMcpServerCount = blockedMcpServers?.length || 0; const openFileCount = ideContext?.workspaceState?.openFiles?.length ?? 0; @@ -44,7 +40,7 @@ export const ContextSummaryDisplay: React.FC = ({ skillCount === 0 && backgroundProcessCount === 0 ) { - return ; // Render an empty space to reserve height + return null; } const openFilesText = (() => { @@ -113,21 +109,14 @@ export const ContextSummaryDisplay: React.FC = ({ backgroundText, ].filter(Boolean); - if (isNarrow) { - return ( - - {summaryParts.map((part, index) => ( - - - {part} - - ))} - - ); - } - return ( - - {summaryParts.join(' | ')} + + {summaryParts.map((part, index) => ( + + {index > 0 && {' · '}} + {part} + + ))} ); }; diff --git a/packages/cli/src/ui/components/GeminiRespondingSpinner.tsx b/packages/cli/src/ui/components/GeminiRespondingSpinner.tsx index 2e6821355f..316438d737 100644 --- a/packages/cli/src/ui/components/GeminiRespondingSpinner.tsx +++ b/packages/cli/src/ui/components/GeminiRespondingSpinner.tsx @@ -23,14 +23,28 @@ interface GeminiRespondingSpinnerProps { */ nonRespondingDisplay?: string; spinnerType?: SpinnerName; + /** + * If true, we prioritize showing the nonRespondingDisplay (hook icon) + * even if the state is Responding. + */ + isHookActive?: boolean; + color?: string; } export const GeminiRespondingSpinner: React.FC< GeminiRespondingSpinnerProps -> = ({ nonRespondingDisplay, spinnerType = 'dots' }) => { +> = ({ + nonRespondingDisplay, + spinnerType = 'dots', + isHookActive = false, + color, +}) => { const streamingState = useStreamingContext(); const isScreenReaderEnabled = useIsScreenReaderEnabled(); - if (streamingState === StreamingState.Responding) { + + // If a hook is active, we want to show the hook icon (nonRespondingDisplay) + // to be consistent, instead of the rainbow spinner which means "Gemini is talking". + if (streamingState === StreamingState.Responding && !isHookActive) { return ( {SCREEN_READER_LOADING} ) : ( - {nonRespondingDisplay} + {nonRespondingDisplay} ); } diff --git a/packages/cli/src/ui/components/GradientRegression.test.tsx b/packages/cli/src/ui/components/GradientRegression.test.tsx index dfdad4f1aa..75ecac6f9a 100644 --- a/packages/cli/src/ui/components/GradientRegression.test.tsx +++ b/packages/cli/src/ui/components/GradientRegression.test.tsx @@ -10,7 +10,7 @@ import * as SessionContext from '../contexts/SessionContext.js'; import { type SessionStatsState } from '../contexts/SessionContext.js'; import { Banner } from './Banner.js'; import { Footer } from './Footer.js'; -import { Header } from './Header.js'; +import { AppHeader } from './AppHeader.js'; import { ModelDialog } from './ModelDialog.js'; import { StatsDisplay } from './StatsDisplay.js'; @@ -71,9 +71,9 @@ useSessionStatsMock.mockReturnValue({ }); describe('Gradient Crash Regression Tests', () => { - it('
should not crash when theme.ui.gradient is empty', async () => { + it(' should not crash when theme.ui.gradient is empty', async () => { const { lastFrame, unmount } = await renderWithProviders( -
, + , { width: 120, }, diff --git a/packages/cli/src/ui/components/HookStatusDisplay.test.tsx b/packages/cli/src/ui/components/HookStatusDisplay.test.tsx index 54c824d76a..9603e6b31a 100644 --- a/packages/cli/src/ui/components/HookStatusDisplay.test.tsx +++ b/packages/cli/src/ui/components/HookStatusDisplay.test.tsx @@ -18,9 +18,10 @@ describe('', () => { const props = { activeHooks: [{ name: 'test-hook', eventName: 'BeforeAgent' }], }; - const { lastFrame, unmount } = await render( + const { lastFrame, unmount, waitUntilReady } = await render( , ); + await waitUntilReady(); expect(lastFrame()).toMatchSnapshot(); unmount(); }); @@ -32,9 +33,10 @@ describe('', () => { { name: 'h2', eventName: 'BeforeAgent' }, ], }; - const { lastFrame, unmount } = await render( + const { lastFrame, unmount, waitUntilReady } = await render( , ); + await waitUntilReady(); expect(lastFrame()).toMatchSnapshot(); unmount(); }); @@ -45,19 +47,47 @@ describe('', () => { { name: 'step', eventName: 'BeforeAgent', index: 1, total: 3 }, ], }; - const { lastFrame, unmount } = await render( + const { lastFrame, unmount, waitUntilReady } = await render( , ); + await waitUntilReady(); expect(lastFrame()).toMatchSnapshot(); unmount(); }); it('should return empty string if no active hooks', async () => { const props = { activeHooks: [] }; - const { lastFrame, unmount } = await render( + const { lastFrame, unmount, waitUntilReady } = await render( , ); + await waitUntilReady(); expect(lastFrame({ allowEmpty: true })).toBe(''); unmount(); }); + + it('should show generic message when only system hooks are active', async () => { + const props = { + activeHooks: [ + { name: 'sys-hook', eventName: 'BeforeAgent', source: 'system' }, + ], + }; + const { lastFrame, unmount, waitUntilReady } = await render( + , + ); + await waitUntilReady(); + expect(lastFrame()).toContain('Working...'); + unmount(); + }); + + it('matches SVG snapshot for single hook', async () => { + const props = { + activeHooks: [ + { name: 'test-hook', eventName: 'BeforeAgent', source: 'user' }, + ], + }; + const result = await render(); + await result.waitUntilReady(); + await expect(result).toMatchSvgSnapshot(); + result.unmount(); + }); }); diff --git a/packages/cli/src/ui/components/HookStatusDisplay.tsx b/packages/cli/src/ui/components/HookStatusDisplay.tsx index 07b2ee3d4a..a455193706 100644 --- a/packages/cli/src/ui/components/HookStatusDisplay.tsx +++ b/packages/cli/src/ui/components/HookStatusDisplay.tsx @@ -6,8 +6,10 @@ import type React from 'react'; import { Text } from 'ink'; -import { theme } from '../semantic-colors.js'; import { type ActiveHook } from '../types.js'; +import { isUserVisibleHook } from '@google/gemini-cli-core'; +import { GENERIC_WORKING_LABEL } from '../textConstants.js'; +import { theme } from '../semantic-colors.js'; interface HookStatusDisplayProps { activeHooks: ActiveHook[]; @@ -20,20 +22,30 @@ export const HookStatusDisplay: React.FC = ({ return null; } - const label = activeHooks.length > 1 ? 'Executing Hooks' : 'Executing Hook'; - const displayNames = activeHooks.map((hook) => { - let name = hook.name; - if (hook.index && hook.total && hook.total > 1) { - name += ` (${hook.index}/${hook.total})`; - } - return name; - }); + const userHooks = activeHooks.filter((h) => isUserVisibleHook(h.source)); - const text = `${label}: ${displayNames.join(', ')}`; + if (userHooks.length > 0) { + const label = userHooks.length > 1 ? 'Executing Hooks' : 'Executing Hook'; + const displayNames = userHooks.map((hook) => { + let name = hook.name; + if (hook.index && hook.total && hook.total > 1) { + name += ` (${hook.index}/${hook.total})`; + } + return name; + }); + const text = `${label}: ${displayNames.join(', ')}`; + return ( + + {text} + + ); + } + + // If only system/extension hooks are running, show a generic message. return ( - - {text} + + {GENERIC_WORKING_LABEL} ); }; diff --git a/packages/cli/src/ui/components/HooksDialog.tsx b/packages/cli/src/ui/components/HooksDialog.tsx index 0421f7d9eb..6a60a10af6 100644 --- a/packages/cli/src/ui/components/HooksDialog.tsx +++ b/packages/cli/src/ui/components/HooksDialog.tsx @@ -244,6 +244,11 @@ export const HooksDialog: React.FC = ({ )} + + + (Press Esc to close) + + ); }; diff --git a/packages/cli/src/ui/components/LoadingIndicator.test.tsx b/packages/cli/src/ui/components/LoadingIndicator.test.tsx index 5dc9aa543e..ef2e21e132 100644 --- a/packages/cli/src/ui/components/LoadingIndicator.test.tsx +++ b/packages/cli/src/ui/components/LoadingIndicator.test.tsx @@ -10,7 +10,7 @@ import { Text } from 'ink'; import { LoadingIndicator } from './LoadingIndicator.js'; import { StreamingContext } from '../contexts/StreamingContext.js'; import { StreamingState } from '../types.js'; -import { vi } from 'vitest'; +import { describe, it, expect, vi } from 'vitest'; import * as useTerminalSize from '../hooks/useTerminalSize.js'; // Mock GeminiRespondingSpinner @@ -50,26 +50,28 @@ const renderWithContext = async ( describe('', () => { const defaultProps = { - currentLoadingPhrase: 'Loading...', + currentLoadingPhrase: 'Thinking...', elapsedTime: 5, }; it('should render blank when streamingState is Idle and no loading phrase or thought', async () => { - const { lastFrame } = await renderWithContext( + const { lastFrame, waitUntilReady } = await renderWithContext( , StreamingState.Idle, ); + await waitUntilReady(); expect(lastFrame({ allowEmpty: true })?.trim()).toBe(''); }); it('should render spinner, phrase, and time when streamingState is Responding', async () => { - const { lastFrame } = await renderWithContext( + const { lastFrame, waitUntilReady } = await renderWithContext( , StreamingState.Responding, ); + await waitUntilReady(); const output = lastFrame(); expect(output).toContain('MockRespondingSpinner'); - expect(output).toContain('Loading...'); + expect(output).toContain('Thinking...'); expect(output).toContain('(esc to cancel, 5s)'); }); @@ -78,10 +80,11 @@ describe('', () => { currentLoadingPhrase: 'Confirm action', elapsedTime: 10, }; - const { lastFrame } = await renderWithContext( + const { lastFrame, waitUntilReady } = await renderWithContext( , StreamingState.WaitingForConfirmation, ); + await waitUntilReady(); const output = lastFrame(); expect(output).toContain('⠏'); // Static char for WaitingForConfirmation expect(output).toContain('Confirm action'); @@ -94,46 +97,50 @@ describe('', () => { currentLoadingPhrase: 'Processing data...', elapsedTime: 3, }; - const { lastFrame, unmount } = await renderWithContext( + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( , StreamingState.Responding, ); + await waitUntilReady(); expect(lastFrame()).toContain('Processing data...'); unmount(); }); it('should display the elapsedTime correctly when Responding', async () => { const props = { - currentLoadingPhrase: 'Working...', + currentLoadingPhrase: 'Thinking...', elapsedTime: 60, }; - const { lastFrame, unmount } = await renderWithContext( + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( , StreamingState.Responding, ); + await waitUntilReady(); expect(lastFrame()).toContain('(esc to cancel, 1m)'); unmount(); }); it('should display the elapsedTime correctly in human-readable format', async () => { const props = { - currentLoadingPhrase: 'Working...', + currentLoadingPhrase: 'Thinking...', elapsedTime: 125, }; - const { lastFrame, unmount } = await renderWithContext( + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( , StreamingState.Responding, ); + await waitUntilReady(); expect(lastFrame()).toContain('(esc to cancel, 2m 5s)'); unmount(); }); it('should render rightContent when provided', async () => { const rightContent = Extra Info; - const { lastFrame, unmount } = await renderWithContext( + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( , StreamingState.Responding, ); + await waitUntilReady(); expect(lastFrame()).toContain('Extra Info'); unmount(); }); @@ -174,6 +181,7 @@ describe('', () => { const { lastFrame, unmount, waitUntilReady } = await renderWithProviders( , ); + await waitUntilReady(); expect(lastFrame({ allowEmpty: true })?.trim()).toBe(''); // Initial: Idle (no loading phrase) // Transition to Responding @@ -221,15 +229,16 @@ describe('', () => { it('should display fallback phrase if thought is empty', async () => { const props = { thought: null, - currentLoadingPhrase: 'Loading...', + currentLoadingPhrase: 'Thinking...', elapsedTime: 5, }; - const { lastFrame, unmount } = await renderWithContext( + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( , StreamingState.Responding, ); + await waitUntilReady(); const output = lastFrame(); - expect(output).toContain('Loading...'); + expect(output).toContain('Thinking...'); unmount(); }); @@ -241,10 +250,11 @@ describe('', () => { }, elapsedTime: 5, }; - const { lastFrame, unmount } = await renderWithContext( + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( , StreamingState.Responding, ); + await waitUntilReady(); const output = lastFrame(); expect(output).toBeDefined(); if (output) { @@ -256,7 +266,7 @@ describe('', () => { unmount(); }); - it('should prepend "Thinking... " if the subject does not start with "Thinking"', async () => { + it('should NOT prepend "Thinking... " even if the subject does not start with "Thinking"', async () => { const props = { thought: { subject: 'Planning the response...', @@ -264,12 +274,14 @@ describe('', () => { }, elapsedTime: 5, }; - const { lastFrame, unmount } = await renderWithContext( + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( , StreamingState.Responding, ); + await waitUntilReady(); const output = lastFrame(); - expect(output).toContain('Thinking... Planning the response...'); + expect(output).toContain('Planning the response...'); + expect(output).not.toContain('Thinking... '); unmount(); }); @@ -282,31 +294,32 @@ describe('', () => { currentLoadingPhrase: 'This should not be displayed', elapsedTime: 5, }; - const { lastFrame, unmount } = await renderWithContext( + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( , StreamingState.Responding, ); + await waitUntilReady(); const output = lastFrame(); - expect(output).toContain('Thinking... '); expect(output).toContain('This should be displayed'); expect(output).not.toContain('This should not be displayed'); unmount(); }); it('should not display thought indicator for non-thought loading phrases', async () => { - const { lastFrame, unmount } = await renderWithContext( + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( , StreamingState.Responding, ); + await waitUntilReady(); expect(lastFrame()).not.toContain('Thinking... '); unmount(); }); it('should truncate long primary text instead of wrapping', async () => { - const { lastFrame, unmount } = await renderWithContext( + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( ', () => { StreamingState.Responding, 80, ); - + await waitUntilReady(); expect(lastFrame()).toMatchSnapshot(); unmount(); }); describe('responsive layout', () => { it('should render on a single line on a wide terminal', async () => { - const { lastFrame, unmount } = await renderWithContext( + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( Right} @@ -331,17 +344,18 @@ describe('', () => { StreamingState.Responding, 120, ); + await waitUntilReady(); const output = lastFrame(); // Check for single line output expect(output?.trim().includes('\n')).toBe(false); - expect(output).toContain('Loading...'); + expect(output).toContain('Thinking...'); expect(output).toContain('(esc to cancel, 5s)'); expect(output).toContain('Right'); unmount(); }); it('should render on multiple lines on a narrow terminal', async () => { - const { lastFrame, unmount } = await renderWithContext( + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( Right} @@ -349,6 +363,7 @@ describe('', () => { StreamingState.Responding, 79, ); + await waitUntilReady(); const output = lastFrame(); const lines = output?.trim().split('\n'); // Expecting 3 lines: @@ -357,7 +372,7 @@ describe('', () => { // 3. Right Content expect(lines).toHaveLength(3); if (lines) { - expect(lines[0]).toContain('Loading...'); + expect(lines[0]).toContain('Thinking...'); expect(lines[0]).not.toContain('(esc to cancel, 5s)'); expect(lines[1]).toContain('(esc to cancel, 5s)'); expect(lines[2]).toContain('Right'); @@ -366,23 +381,87 @@ describe('', () => { }); it('should use wide layout at 80 columns', async () => { - const { lastFrame, unmount } = await renderWithContext( + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( , StreamingState.Responding, 80, ); + await waitUntilReady(); expect(lastFrame()?.trim().includes('\n')).toBe(false); unmount(); }); it('should use narrow layout at 79 columns', async () => { - const { lastFrame, unmount } = await renderWithContext( + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( , StreamingState.Responding, 79, ); + await waitUntilReady(); expect(lastFrame()?.includes('\n')).toBe(true); unmount(); }); + + it('should render witty phrase after cancel and timer hint in wide layout', async () => { + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( + , + StreamingState.Responding, + 120, + ); + await waitUntilReady(); + const output = lastFrame(); + // Sequence should be: Primary Text -> Cancel/Timer -> Witty Phrase + expect(output).toContain('Thinking... (esc to cancel, 5s) I am witty'); + unmount(); + }); + + it('should render witty phrase after cancel and timer hint in narrow layout', async () => { + const { lastFrame, unmount, waitUntilReady } = await renderWithContext( + , + StreamingState.Responding, + 79, + ); + await waitUntilReady(); + const output = lastFrame(); + const lines = output?.trim().split('\n'); + // Expecting 3 lines: + // 1. Spinner + Primary Text + // 2. Cancel + Timer + // 3. Witty Phrase + expect(lines).toHaveLength(3); + if (lines) { + expect(lines[0]).toContain('Thinking...'); + expect(lines[1]).toContain('(esc to cancel, 5s)'); + expect(lines[2]).toContain('I am witty'); + } + unmount(); + }); + }); + + it('should use spinnerIcon when provided', async () => { + const props = { + currentLoadingPhrase: 'Confirm action', + elapsedTime: 10, + spinnerIcon: '?', + }; + const { lastFrame, waitUntilReady, unmount } = await renderWithContext( + , + StreamingState.WaitingForConfirmation, + ); + await waitUntilReady(); + const output = lastFrame(); + expect(output).toContain('?'); + expect(output).not.toContain('⠏'); + unmount(); }); }); diff --git a/packages/cli/src/ui/components/LoadingIndicator.tsx b/packages/cli/src/ui/components/LoadingIndicator.tsx index eba0a7d8a3..a48451b26c 100644 --- a/packages/cli/src/ui/components/LoadingIndicator.tsx +++ b/packages/cli/src/ui/components/LoadingIndicator.tsx @@ -18,22 +18,34 @@ import { INTERACTIVE_SHELL_WAITING_PHRASE } from '../hooks/usePhraseCycler.js'; interface LoadingIndicatorProps { currentLoadingPhrase?: string; + wittyPhrase?: string; + showWit?: boolean; + showTips?: boolean; + errorVerbosity?: 'low' | 'full'; elapsedTime: number; inline?: boolean; rightContent?: React.ReactNode; thought?: ThoughtSummary | null; thoughtLabel?: string; showCancelAndTimer?: boolean; + forceRealStatusOnly?: boolean; + spinnerIcon?: string; + isHookActive?: boolean; } export const LoadingIndicator: React.FC = ({ currentLoadingPhrase, + wittyPhrase, + showWit = false, elapsedTime, inline = false, rightContent, thought, thoughtLabel, showCancelAndTimer = true, + forceRealStatusOnly = false, + spinnerIcon, + isHookActive = false, }) => { const streamingState = useStreamingContext(); const { columns: terminalWidth } = useTerminalSize(); @@ -54,15 +66,10 @@ export const LoadingIndicator: React.FC = ({ ? currentLoadingPhrase : thought?.subject ? (thoughtLabel ?? thought.subject) - : currentLoadingPhrase; - const hasThoughtIndicator = - currentLoadingPhrase !== INTERACTIVE_SHELL_WAITING_PHRASE && - Boolean(thought?.subject?.trim()); - // Avoid "Thinking... Thinking..." duplication if primaryText already starts with "Thinking" - const thinkingIndicator = - hasThoughtIndicator && !primaryText?.startsWith('Thinking') - ? 'Thinking... ' - : ''; + : currentLoadingPhrase || + (streamingState === StreamingState.Responding + ? 'Thinking...' + : undefined); const cancelAndTimerContent = showCancelAndTimer && @@ -70,22 +77,35 @@ export const LoadingIndicator: React.FC = ({ ? `(esc to cancel, ${elapsedTime < 60 ? `${elapsedTime}s` : formatDuration(elapsedTime * 1000)})` : null; + const wittyPhraseNode = + !forceRealStatusOnly && + showWit && + wittyPhrase && + primaryText === 'Thinking...' ? ( + + + {wittyPhrase} + + + ) : null; + if (inline) { return ( {primaryText && ( - {thinkingIndicator} {primaryText} {primaryText === INTERACTIVE_SHELL_WAITING_PHRASE && ( @@ -102,6 +122,7 @@ export const LoadingIndicator: React.FC = ({ {cancelAndTimerContent} )} + {wittyPhraseNode} ); } @@ -118,16 +139,17 @@ export const LoadingIndicator: React.FC = ({ {primaryText && ( - {thinkingIndicator} {primaryText} {primaryText === INTERACTIVE_SHELL_WAITING_PHRASE && ( @@ -144,6 +166,7 @@ export const LoadingIndicator: React.FC = ({ {cancelAndTimerContent} )} + {!isNarrow && wittyPhraseNode} {!isNarrow && {/* Spacer */}} {!isNarrow && rightContent && {rightContent}} @@ -153,6 +176,7 @@ export const LoadingIndicator: React.FC = ({ {cancelAndTimerContent} )} + {isNarrow && wittyPhraseNode} {isNarrow && rightContent && {rightContent}} ); diff --git a/packages/cli/src/ui/components/MainContent.test.tsx b/packages/cli/src/ui/components/MainContent.test.tsx index 070b2c835c..e5d74b5cf5 100644 --- a/packages/cli/src/ui/components/MainContent.test.tsx +++ b/packages/cli/src/ui/components/MainContent.test.tsx @@ -97,7 +97,7 @@ describe('getToolGroupBorderAppearance', () => { }); it('inspects only the last pending tool_group item if current has no tools', () => { - const item = { type: 'tool_group' as const, tools: [], id: 1 }; + const item = { type: 'tool_group' as const, tools: [], id: -1 }; const pendingItems = [ { type: 'tool_group' as const, @@ -158,7 +158,7 @@ describe('getToolGroupBorderAppearance', () => { confirmationDetails: undefined, } as IndividualToolCallDisplay, ], - id: 1, + id: -1, }; const result = getToolGroupBorderAppearance( item, @@ -187,7 +187,7 @@ describe('getToolGroupBorderAppearance', () => { confirmationDetails: undefined, } as IndividualToolCallDisplay, ], - id: 1, + id: -1, }; const result = getToolGroupBorderAppearance( item, @@ -276,7 +276,7 @@ describe('getToolGroupBorderAppearance', () => { confirmationDetails: undefined, } as IndividualToolCallDisplay, ], - id: 1, + id: -1, }; const result = getToolGroupBorderAppearance( item, @@ -292,7 +292,7 @@ describe('getToolGroupBorderAppearance', () => { }); it('handles empty tools with active shell turn (isCurrentlyInShellTurn)', () => { - const item = { type: 'tool_group' as const, tools: [], id: 1 }; + const item = { type: 'tool_group' as const, tools: [], id: -1 }; // active shell turn const result = getToolGroupBorderAppearance( @@ -667,7 +667,7 @@ describe('MainContent', () => { pendingHistoryItems: [ { type: 'tool_group', - id: 1, + id: -1, tools: [ { callId: 'call_1', diff --git a/packages/cli/src/ui/components/MainContent.tsx b/packages/cli/src/ui/components/MainContent.tsx index 0530e171b8..d8656a879c 100644 --- a/packages/cli/src/ui/components/MainContent.tsx +++ b/packages/cli/src/ui/components/MainContent.tsx @@ -127,7 +127,7 @@ export const MainContent = () => { const pendingItems = useMemo( () => ( - + {pendingHistoryItems.map((item, i) => { const prevType = i === 0 @@ -140,12 +140,12 @@ export const MainContent = () => { return ( { ); })} {showConfirmationQueue && confirmingTool && ( - + )} ), diff --git a/packages/cli/src/ui/components/ShortcutsHint.tsx b/packages/cli/src/ui/components/ShortcutsHint.tsx deleted file mode 100644 index 4ecb01e9d8..0000000000 --- a/packages/cli/src/ui/components/ShortcutsHint.tsx +++ /dev/null @@ -1,24 +0,0 @@ -/** - * @license - * Copyright 2025 Google LLC - * SPDX-License-Identifier: Apache-2.0 - */ - -import type React from 'react'; -import { Text } from 'ink'; -import { theme } from '../semantic-colors.js'; -import { useUIState } from '../contexts/UIStateContext.js'; - -export const ShortcutsHint: React.FC = () => { - const { cleanUiDetailsVisible, shortcutsHelpVisible } = useUIState(); - - if (!cleanUiDetailsVisible) { - return press tab twice for more ; - } - - const highlightColor = shortcutsHelpVisible - ? theme.text.accent - : theme.text.secondary; - - return ? for shortcuts ; -}; diff --git a/packages/cli/src/ui/components/StatusDisplay.tsx b/packages/cli/src/ui/components/StatusDisplay.tsx index 223340c039..472e900b3b 100644 --- a/packages/cli/src/ui/components/StatusDisplay.tsx +++ b/packages/cli/src/ui/components/StatusDisplay.tsx @@ -11,9 +11,8 @@ import { useUIState } from '../contexts/UIStateContext.js'; import { useSettings } from '../contexts/SettingsContext.js'; import { useConfig } from '../contexts/ConfigContext.js'; import { ContextSummaryDisplay } from './ContextSummaryDisplay.js'; -import { HookStatusDisplay } from './HookStatusDisplay.js'; -interface StatusDisplayProps { +export interface StatusDisplayProps { hideContextSummary: boolean; } @@ -28,13 +27,6 @@ export const StatusDisplay: React.FC = ({ return |⌐■_■|; } - if ( - uiState.activeHooks.length > 0 && - settings.merged.hooksConfig.notifications - ) { - return ; - } - if (!settings.merged.ui.hideContextSummary && !hideContextSummary) { return ( { if (uiState.showIsExpandableHint) { const action = uiState.constrainHeight ? 'show more' : 'collapse'; return ( - + Press Ctrl+O to {action} lines of the last response ); diff --git a/packages/cli/src/ui/components/ToolConfirmationQueue.test.tsx b/packages/cli/src/ui/components/ToolConfirmationQueue.test.tsx index 90d762581d..4edf1e4f35 100644 --- a/packages/cli/src/ui/components/ToolConfirmationQueue.test.tsx +++ b/packages/cli/src/ui/components/ToolConfirmationQueue.test.tsx @@ -6,13 +6,16 @@ import { describe, it, expect, vi, beforeEach } from 'vitest'; import { act } from 'react'; -import { Box } from 'ink'; import { ToolConfirmationQueue } from './ToolConfirmationQueue.js'; import { StreamingState } from '../types.js'; import { renderWithProviders } from '../../test-utils/render.js'; import { createMockSettings } from '../../test-utils/settings.js'; import { waitFor } from '../../test-utils/async.js'; -import { type Config, CoreToolCallStatus } from '@google/gemini-cli-core'; +import { + type Config, + CoreToolCallStatus, + type SerializableConfirmationDetails, +} from '@google/gemini-cli-core'; import type { ConfirmingToolState } from '../hooks/useConfirmingTool.js'; import { theme } from '../semantic-colors.js'; @@ -133,58 +136,6 @@ describe('ToolConfirmationQueue', () => { unmount(); }); - it('renders expansion hint when content is long and constrained', async () => { - const longDiff = '@@ -1,1 +1,50 @@\n' + '+line\n'.repeat(50); - const confirmingTool = { - tool: { - callId: 'call-1', - name: 'replace', - description: 'edit file', - status: CoreToolCallStatus.AwaitingApproval, - confirmationDetails: { - type: 'edit' as const, - title: 'Confirm edit', - fileName: 'test.ts', - filePath: '/test.ts', - fileDiff: longDiff, - originalContent: 'old', - newContent: 'new', - }, - }, - index: 1, - total: 1, - }; - - const { lastFrame, unmount } = await renderWithProviders( - - - , - { - config: { - ...mockConfig, - getUseAlternateBuffer: () => true, - } as unknown as Config, - settings: createMockSettings({ ui: { useAlternateBuffer: true } }), - uiState: { - terminalWidth: 80, - terminalHeight: 20, - constrainHeight: true, - streamingState: StreamingState.WaitingForConfirmation, - }, - }, - ); - - await waitFor(() => - expect(lastFrame()?.toLowerCase()).toContain( - 'press ctrl+o to show more lines', - ), - ); - expect(lastFrame()).toMatchSnapshot(); - unmount(); - }); - it('calculates availableContentHeight based on availableTerminalHeight from UI state', async () => { const longDiff = '@@ -1,1 +1,50 @@\n' + '+line\n'.repeat(50); const confirmingTool = { @@ -413,4 +364,155 @@ describe('ToolConfirmationQueue', () => { expect(stickyHeaderProps.borderColor).toBe(theme.status.success); unmount(); }); + + describe('height allocation and layout', () => { + it('should render the full queue wrapper with borders and content for large edit diffs', async () => { + let largeDiff = '--- a/file.ts\n+++ b/file.ts\n@@ -1,10 +1,15 @@\n'; + for (let i = 1; i <= 20; i++) { + largeDiff += `-const oldLine${i} = true;\n`; + largeDiff += `+const newLine${i} = true;\n`; + } + + const confirmationDetails: SerializableConfirmationDetails = { + type: 'edit', + title: 'Confirm Edit', + fileName: 'file.ts', + filePath: '/file.ts', + fileDiff: largeDiff, + originalContent: 'old', + newContent: 'new', + isModifying: false, + }; + + const confirmingTool = { + tool: { + callId: 'test-call-id', + name: 'replace', + status: CoreToolCallStatus.AwaitingApproval, + description: 'Replaces content in a file', + confirmationDetails, + }, + index: 1, + total: 1, + }; + + const { waitUntilReady, lastFrame, generateSvg, unmount } = + await renderWithProviders( + , + { + uiState: { + mainAreaWidth: 80, + terminalHeight: 50, + terminalWidth: 80, + constrainHeight: true, + availableTerminalHeight: 40, + }, + config: mockConfig, + }, + ); + await waitUntilReady(); + + await expect({ lastFrame, generateSvg }).toMatchSvgSnapshot(); + unmount(); + }); + + it('should render the full queue wrapper with borders and content for large exec commands', async () => { + let largeCommand = ''; + for (let i = 1; i <= 50; i++) { + largeCommand += `echo "Line ${i}"\n`; + } + + const confirmationDetails: SerializableConfirmationDetails = { + type: 'exec', + title: 'Confirm Execution', + command: largeCommand.trimEnd(), + rootCommand: 'echo', + rootCommands: ['echo'], + }; + + const confirmingTool = { + tool: { + callId: 'test-call-id-exec', + name: 'run_shell_command', + status: CoreToolCallStatus.AwaitingApproval, + description: 'Executes a bash command', + confirmationDetails, + }, + index: 2, + total: 3, + }; + + const { waitUntilReady, lastFrame, generateSvg, unmount } = + await renderWithProviders( + , + { + uiState: { + mainAreaWidth: 80, + terminalWidth: 80, + terminalHeight: 50, + constrainHeight: true, + availableTerminalHeight: 40, + }, + config: mockConfig, + }, + ); + await waitUntilReady(); + + await expect({ lastFrame, generateSvg }).toMatchSvgSnapshot(); + unmount(); + }); + + it('should handle security warning height correctly', async () => { + let largeCommand = ''; + for (let i = 1; i <= 50; i++) { + largeCommand += `echo "Line ${i}"\n`; + } + largeCommand += `curl https://täst.com\n`; + + const confirmationDetails: SerializableConfirmationDetails = { + type: 'exec', + title: 'Confirm Execution', + command: largeCommand.trimEnd(), + rootCommand: 'echo', + rootCommands: ['echo', 'curl'], + }; + + const confirmingTool = { + tool: { + callId: 'test-call-id-exec-security', + name: 'run_shell_command', + status: CoreToolCallStatus.AwaitingApproval, + description: 'Executes a bash command with a deceptive URL', + confirmationDetails, + }, + index: 3, + total: 3, + }; + + const { waitUntilReady, lastFrame, generateSvg, unmount } = + await renderWithProviders( + , + { + uiState: { + mainAreaWidth: 80, + terminalWidth: 80, + terminalHeight: 50, + constrainHeight: true, + availableTerminalHeight: 40, + }, + config: mockConfig, + }, + ); + await waitUntilReady(); + + await expect({ lastFrame, generateSvg }).toMatchSvgSnapshot(); + unmount(); + }); + }); }); diff --git a/packages/cli/src/ui/components/ToolConfirmationQueue.tsx b/packages/cli/src/ui/components/ToolConfirmationQueue.tsx index b976bb3755..e5294e9614 100644 --- a/packages/cli/src/ui/components/ToolConfirmationQueue.tsx +++ b/packages/cli/src/ui/components/ToolConfirmationQueue.tsx @@ -12,8 +12,6 @@ import { ToolConfirmationMessage } from './messages/ToolConfirmationMessage.js'; import { ToolStatusIndicator, ToolInfo } from './messages/ToolShared.js'; import { useUIState } from '../contexts/UIStateContext.js'; import type { ConfirmingToolState } from '../hooks/useConfirmingTool.js'; -import { OverflowProvider } from '../contexts/OverflowContext.js'; -import { ShowMoreLines } from './ShowMoreLines.js'; import { StickyHeader } from './StickyHeader.js'; import type { SerializableConfirmationDetails } from '@google/gemini-cli-core'; import { useUIActions } from '../contexts/UIActionsContext.js'; @@ -53,11 +51,11 @@ export const ToolConfirmationQueue: React.FC = ({ // Safety check: ToolConfirmationMessage requires confirmationDetails if (!tool.confirmationDetails) return null; - // Render up to 100% of the available terminal height (minus 1 line for safety) + // Render up to 100% of the available terminal height // to maximize space for diffs and other content. const maxHeight = uiAvailableHeight !== undefined - ? Math.max(uiAvailableHeight - 1, 4) + ? Math.max(uiAvailableHeight, 4) : Math.floor(terminalHeight * 0.5); const isRoutine = @@ -76,84 +74,81 @@ export const ToolConfirmationQueue: React.FC = ({ : undefined; const content = ( - <> - - - - {/* Header */} - - - {getConfirmationHeader(tool.confirmationDetails)} + + + + {/* Header */} + + + {getConfirmationHeader(tool.confirmationDetails)} + + {total > 1 && ( + + {index} of {total} - {total > 1 && ( - - {index} of {total} - - )} - - - {!hideToolIdentity && ( - - - - )} - - - {/* Interactive Area */} - {/* - Note: We force isFocused={true} because if this component is rendered, - it effectively acts as a modal over the shell/composer. - */} - + {!hideToolIdentity && ( + + + + + )} - + + + {/* Interactive Area */} + {/* + Note: We force isFocused={true} because if this component is rendered, + it effectively acts as a modal over the shell/composer. + */} + - - + + ); - return {content}; + return content; }; diff --git a/packages/cli/src/ui/components/__snapshots__/AlternateBufferQuittingDisplay.test.tsx.snap b/packages/cli/src/ui/components/__snapshots__/AlternateBufferQuittingDisplay.test.tsx.snap index 5394ab83c0..d4dc67bbc6 100644 --- a/packages/cli/src/ui/components/__snapshots__/AlternateBufferQuittingDisplay.test.tsx.snap +++ b/packages/cli/src/ui/components/__snapshots__/AlternateBufferQuittingDisplay.test.tsx.snap @@ -2,10 +2,13 @@ exports[`AlternateBufferQuittingDisplay > renders with a tool awaiting confirmation > with_confirming_tool 1`] = ` " - ▝▜▄ Gemini CLI v0.10.0 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v0.10.0 + Tips for getting started: @@ -22,10 +25,13 @@ Action Required (was prompted): exports[`AlternateBufferQuittingDisplay > renders with active and pending tool messages > with_history_and_pending 1`] = ` " - ▝▜▄ Gemini CLI v0.10.0 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v0.10.0 + Tips for getting started: @@ -50,10 +56,13 @@ Tips for getting started: exports[`AlternateBufferQuittingDisplay > renders with empty history and no pending items > empty 1`] = ` " - ▝▜▄ Gemini CLI v0.10.0 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v0.10.0 + Tips for getting started: @@ -66,10 +75,13 @@ Tips for getting started: exports[`AlternateBufferQuittingDisplay > renders with history but no pending items > with_history_no_pending 1`] = ` " - ▝▜▄ Gemini CLI v0.10.0 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v0.10.0 + Tips for getting started: @@ -90,10 +102,13 @@ Tips for getting started: exports[`AlternateBufferQuittingDisplay > renders with pending items but no history > with_pending_no_history 1`] = ` " - ▝▜▄ Gemini CLI v0.10.0 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v0.10.0 + Tips for getting started: @@ -110,10 +125,13 @@ Tips for getting started: exports[`AlternateBufferQuittingDisplay > renders with user and gemini messages > with_user_gemini_messages 1`] = ` " - ▝▜▄ Gemini CLI v0.10.0 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v0.10.0 + Tips for getting started: diff --git a/packages/cli/src/ui/components/__snapshots__/AppHeader.test.tsx.snap b/packages/cli/src/ui/components/__snapshots__/AppHeader.test.tsx.snap index 4411f766de..ee9ea5f708 100644 --- a/packages/cli/src/ui/components/__snapshots__/AppHeader.test.tsx.snap +++ b/packages/cli/src/ui/components/__snapshots__/AppHeader.test.tsx.snap @@ -2,10 +2,13 @@ exports[` > should not render the banner when no flags are set 1`] = ` " - ▝▜▄ Gemini CLI v1.0.0 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v1.0.0 + Tips for getting started: @@ -18,10 +21,13 @@ Tips for getting started: exports[` > should not render the default banner if shown count is 5 or more 1`] = ` " - ▝▜▄ Gemini CLI v1.0.0 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v1.0.0 + Tips for getting started: @@ -34,10 +40,13 @@ Tips for getting started: exports[` > should render the banner with default text 1`] = ` " - ▝▜▄ Gemini CLI v1.0.0 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v1.0.0 + ╭──────────────────────────────────────────────────────────────────────────────────────────────────╮ │ This is the default banner │ @@ -53,10 +62,13 @@ Tips for getting started: exports[` > should render the banner with warning text 1`] = ` " - ▝▜▄ Gemini CLI v1.0.0 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v1.0.0 + ╭──────────────────────────────────────────────────────────────────────────────────────────────────╮ │ There are capacity issues │ @@ -69,3 +81,14 @@ Tips for getting started: 4. Be specific for the best results " `; + +exports[` > should render the full logo when logged out 1`] = ` +" + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v1.0.0 +" +`; diff --git a/packages/cli/src/ui/components/__snapshots__/AppHeaderIcon-AppHeader-Icon-Rendering-renders-the-default-icon-in-standard-terminals.snap.svg b/packages/cli/src/ui/components/__snapshots__/AppHeaderIcon-AppHeader-Icon-Rendering-renders-the-default-icon-in-standard-terminals.snap.svg index 4e9d0e67a5..5c4c6426b7 100644 --- a/packages/cli/src/ui/components/__snapshots__/AppHeaderIcon-AppHeader-Icon-Rendering-renders-the-default-icon-in-standard-terminals.snap.svg +++ b/packages/cli/src/ui/components/__snapshots__/AppHeaderIcon-AppHeader-Icon-Rendering-renders-the-default-icon-in-standard-terminals.snap.svg @@ -1,30 +1,34 @@ - + - + - - - - Gemini CLI - v1.0.0 - - - - - - - - - Tips for getting started: - 1. Create - GEMINI.md - files to customize your interactions - 2. - /help - for more information - 3. Ask coding questions, edit code or run commands - 4. Be specific for the best results + + + + ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + + + + █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + + + + ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + + + ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + Gemini CLI + v1.0.0 + Tips for getting started: + 1. Create + GEMINI.md + files to customize your interactions + 2. + /help + for more information + 3. Ask coding questions, edit code or run commands + 4. Be specific for the best results \ No newline at end of file diff --git a/packages/cli/src/ui/components/__snapshots__/AppHeaderIcon-AppHeader-Icon-Rendering-renders-the-symmetric-icon-in-Apple-Terminal.snap.svg b/packages/cli/src/ui/components/__snapshots__/AppHeaderIcon-AppHeader-Icon-Rendering-renders-the-symmetric-icon-in-Apple-Terminal.snap.svg index fa8373acc7..eaa118754f 100644 --- a/packages/cli/src/ui/components/__snapshots__/AppHeaderIcon-AppHeader-Icon-Rendering-renders-the-symmetric-icon-in-Apple-Terminal.snap.svg +++ b/packages/cli/src/ui/components/__snapshots__/AppHeaderIcon-AppHeader-Icon-Rendering-renders-the-symmetric-icon-in-Apple-Terminal.snap.svg @@ -1,31 +1,35 @@ - + - + - - - - Gemini CLI - v1.0.0 - - - - - - - - - - Tips for getting started: - 1. Create - GEMINI.md - files to customize your interactions - 2. - /help - for more information - 3. Ask coding questions, edit code or run commands - 4. Be specific for the best results + + + + ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + + + + █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + + + + ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + + + + ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + Gemini CLI + v1.0.0 + Tips for getting started: + 1. Create + GEMINI.md + files to customize your interactions + 2. + /help + for more information + 3. Ask coding questions, edit code or run commands + 4. Be specific for the best results \ No newline at end of file diff --git a/packages/cli/src/ui/components/__snapshots__/AppHeaderIcon.test.tsx.snap b/packages/cli/src/ui/components/__snapshots__/AppHeaderIcon.test.tsx.snap index 2bb5276ee8..c8c4c53c89 100644 --- a/packages/cli/src/ui/components/__snapshots__/AppHeaderIcon.test.tsx.snap +++ b/packages/cli/src/ui/components/__snapshots__/AppHeaderIcon.test.tsx.snap @@ -2,10 +2,13 @@ exports[`AppHeader Icon Rendering > renders the default icon in standard terminals 1`] = ` " - ▝▜▄ Gemini CLI v1.0.0 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v1.0.0 + Tips for getting started: @@ -17,10 +20,13 @@ Tips for getting started: exports[`AppHeader Icon Rendering > renders the symmetric icon in Apple Terminal 1`] = ` " - ▝▜▄ Gemini CLI v1.0.0 - ▝▜▄ - ▗▟▀ - ▗▟▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▗▟▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + + Gemini CLI v1.0.0 + Tips for getting started: diff --git a/packages/cli/src/ui/components/__snapshots__/AskUserDialog.test.tsx.snap b/packages/cli/src/ui/components/__snapshots__/AskUserDialog.test.tsx.snap index 30caf0fb40..cdc060d9d7 100644 --- a/packages/cli/src/ui/components/__snapshots__/AskUserDialog.test.tsx.snap +++ b/packages/cli/src/ui/components/__snapshots__/AskUserDialog.test.tsx.snap @@ -30,6 +30,8 @@ exports[`AskUserDialog > Scroll Arrows (useAlternateBuffer: false) > shows scrol Description 1 2. Option 2 Description 2 + 3. Option 3 + Description 3 ▼ Enter to select · ↑/↓ to navigate · Esc to cancel @@ -39,37 +41,14 @@ Enter to select · ↑/↓ to navigate · Esc to cancel exports[`AskUserDialog > Scroll Arrows (useAlternateBuffer: true) > shows scroll arrows correctly when useAlternateBuffer is true 1`] = ` "Choose an option +▲ ● 1. Option 1 Description 1 2. Option 2 Description 2 3. Option 3 Description 3 - 4. Option 4 - Description 4 - 5. Option 5 - Description 5 - 6. Option 6 - Description 6 - 7. Option 7 - Description 7 - 8. Option 8 - Description 8 - 9. Option 9 - Description 9 - 10. Option 10 - Description 10 - 11. Option 11 - Description 11 - 12. Option 12 - Description 12 - 13. Option 13 - Description 13 - 14. Option 14 - Description 14 - 15. Option 15 - Description 15 - 16. Enter a custom value +▼ Enter to select · ↑/↓ to navigate · Esc to cancel " diff --git a/packages/cli/src/ui/components/__snapshots__/Composer.test.tsx.snap b/packages/cli/src/ui/components/__snapshots__/Composer.test.tsx.snap index 452663d719..745347bc95 100644 --- a/packages/cli/src/ui/components/__snapshots__/Composer.test.tsx.snap +++ b/packages/cli/src/ui/components/__snapshots__/Composer.test.tsx.snap @@ -1,33 +1,33 @@ // Vitest Snapshot v1, https://vitest.dev/guide/snapshot.html exports[`Composer > Snapshots > matches snapshot in idle state 1`] = ` -" ShortcutsHint +" + ? for shortcuts ──────────────────────────────────────────────────────────────────────────────────────────────────── - ApprovalModeIndicator StatusDisplay + ApprovalModeIndicator: default StatusDisplay InputPrompt: Type your message or @path/to/file Footer " `; exports[`Composer > Snapshots > matches snapshot in minimal UI mode 1`] = ` -" ShortcutsHint +" press tab twice for more InputPrompt: Type your message or @path/to/file " `; exports[`Composer > Snapshots > matches snapshot in minimal UI mode while loading 1`] = ` -" LoadingIndicator +"LoadingIndicator press tab twice for more InputPrompt: Type your message or @path/to/file " `; exports[`Composer > Snapshots > matches snapshot in narrow view 1`] = ` " -ShortcutsHint + ? for shortcuts ──────────────────────────────────────── - ApprovalModeIndicator - -StatusDisplay + ApprovalModeIndicator: StatusDispl + default ay InputPrompt: Type your message or @path/to/file Footer @@ -35,9 +35,10 @@ Footer `; exports[`Composer > Snapshots > matches snapshot while streaming 1`] = ` -" LoadingIndicator: Thinking +" + LoadingIndicator: Thinking ? for shortcuts ──────────────────────────────────────────────────────────────────────────────────────────────────── - ApprovalModeIndicator + ApprovalModeIndicator: default StatusDisplay InputPrompt: Type your message or @path/to/file Footer " diff --git a/packages/cli/src/ui/components/__snapshots__/ConfigInitDisplay.test.tsx.snap b/packages/cli/src/ui/components/__snapshots__/ConfigInitDisplay.test.tsx.snap index 28929deee5..8358ec7918 100644 --- a/packages/cli/src/ui/components/__snapshots__/ConfigInitDisplay.test.tsx.snap +++ b/packages/cli/src/ui/components/__snapshots__/ConfigInitDisplay.test.tsx.snap @@ -2,36 +2,24 @@ exports[`ConfigInitDisplay > handles empty clients map 1`] = ` " -Spinner Initializing... +Spinner Working... " `; exports[`ConfigInitDisplay > renders initial state 1`] = ` " -Spinner Initializing... +Spinner Working... " `; exports[`ConfigInitDisplay > truncates list of waiting servers if too many 1`] = ` " -Spinner Connecting to MCP servers... (0/5) - Waiting for: s1, s2, s3, +2 more -" -`; - -exports[`ConfigInitDisplay > truncates list of waiting servers if too many 2`] = ` -" -Spinner Connecting to MCP servers... (0/5) - Waiting for: s1, s2, s3, +2 more +Spinner Working... " `; exports[`ConfigInitDisplay > updates message on McpClientUpdate event 1`] = ` " -Spinner Connecting to MCP servers... (1/2) - Waiting for: server2 -" -`; - -exports[`ConfigInitDisplay > updates message on McpClientUpdate event 2`] = ` -" -Spinner Connecting to MCP servers... (1/2) - Waiting for: server2 +Spinner Working... " `; diff --git a/packages/cli/src/ui/components/__snapshots__/ContextSummaryDisplay.test.tsx.snap b/packages/cli/src/ui/components/__snapshots__/ContextSummaryDisplay.test.tsx.snap index e28d884acf..876524bdb8 100644 --- a/packages/cli/src/ui/components/__snapshots__/ContextSummaryDisplay.test.tsx.snap +++ b/packages/cli/src/ui/components/__snapshots__/ContextSummaryDisplay.test.tsx.snap @@ -1,19 +1,16 @@ // Vitest Snapshot v1, https://vitest.dev/guide/snapshot.html exports[` > should not render empty parts 1`] = ` -" - 1 open file (ctrl+g to view) +" 1 open file (ctrl+g to view) " `; exports[` > should render on a single line on a wide screen 1`] = ` -" 1 open file (ctrl+g to view) | 1 GEMINI.md file | 1 MCP server | 1 skill +" 1 open file (ctrl+g to view) · 1 GEMINI.md file · 1 MCP server · 1 skill " `; exports[` > should render on multiple lines on a narrow screen 1`] = ` -" - 1 open file (ctrl+g to view) - - 1 GEMINI.md file - - 1 MCP server - - 1 skill +" 1 open file (ctrl+g to view) · 1 GEMINI.md file · 1 MCP server · 1 skill " `; diff --git a/packages/cli/src/ui/components/__snapshots__/HookStatusDisplay--HookStatusDisplay-matches-SVG-snapshot-for-single-hook.snap.svg b/packages/cli/src/ui/components/__snapshots__/HookStatusDisplay--HookStatusDisplay-matches-SVG-snapshot-for-single-hook.snap.svg new file mode 100644 index 0000000000..7c9cc6473c --- /dev/null +++ b/packages/cli/src/ui/components/__snapshots__/HookStatusDisplay--HookStatusDisplay-matches-SVG-snapshot-for-single-hook.snap.svg @@ -0,0 +1,9 @@ + + + + + Executing Hook: test-hook + + \ No newline at end of file diff --git a/packages/cli/src/ui/components/__snapshots__/HookStatusDisplay.test.tsx.snap b/packages/cli/src/ui/components/__snapshots__/HookStatusDisplay.test.tsx.snap index 458728736e..5e04b96cb8 100644 --- a/packages/cli/src/ui/components/__snapshots__/HookStatusDisplay.test.tsx.snap +++ b/packages/cli/src/ui/components/__snapshots__/HookStatusDisplay.test.tsx.snap @@ -1,5 +1,7 @@ // Vitest Snapshot v1, https://vitest.dev/guide/snapshot.html +exports[` > matches SVG snapshot for single hook 1`] = `"Executing Hook: test-hook"`; + exports[` > should render a single executing hook 1`] = ` "Executing Hook: test-hook " diff --git a/packages/cli/src/ui/components/__snapshots__/HooksDialog.test.tsx.snap b/packages/cli/src/ui/components/__snapshots__/HooksDialog.test.tsx.snap index 1a2271cc45..cd16040059 100644 --- a/packages/cli/src/ui/components/__snapshots__/HooksDialog.test.tsx.snap +++ b/packages/cli/src/ui/components/__snapshots__/HooksDialog.test.tsx.snap @@ -6,6 +6,8 @@ exports[`HooksDialog > snapshots > renders empty hooks dialog 1`] = ` │ │ │ No hooks configured. │ │ │ +│ (Press Esc to close) │ +│ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ " `; @@ -31,6 +33,8 @@ exports[`HooksDialog > snapshots > renders hook using command as name when name │ Tip: Use /hooks enable or /hooks disable to toggle individual hooks. Use │ │ /hooks enable-all or /hooks disable-all to toggle all hooks at once. │ │ │ +│ (Press Esc to close) │ +│ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ " `; @@ -57,6 +61,8 @@ exports[`HooksDialog > snapshots > renders hook with all metadata (matcher, sequ │ Tip: Use /hooks enable or /hooks disable to toggle individual hooks. Use │ │ /hooks enable-all or /hooks disable-all to toggle all hooks at once. │ │ │ +│ (Press Esc to close) │ +│ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ " `; @@ -93,6 +99,8 @@ exports[`HooksDialog > snapshots > renders hooks grouped by event name with enab │ Tip: Use /hooks enable or /hooks disable to toggle individual hooks. Use │ │ /hooks enable-all or /hooks disable-all to toggle all hooks at once. │ │ │ +│ (Press Esc to close) │ +│ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ " `; @@ -119,6 +127,8 @@ exports[`HooksDialog > snapshots > renders single hook with security warning, so │ Tip: Use /hooks enable or /hooks disable to toggle individual hooks. Use │ │ /hooks enable-all or /hooks disable-all to toggle all hooks at once. │ │ │ +│ (Press Esc to close) │ +│ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ " `; diff --git a/packages/cli/src/ui/components/__snapshots__/MainContent-MainContent-renders-multiple-thinking-messages-sequentially-correctly.snap.svg b/packages/cli/src/ui/components/__snapshots__/MainContent-MainContent-renders-multiple-thinking-messages-sequentially-correctly.snap.svg index 558118cdfb..0527f43327 100644 --- a/packages/cli/src/ui/components/__snapshots__/MainContent-MainContent-renders-multiple-thinking-messages-sequentially-correctly.snap.svg +++ b/packages/cli/src/ui/components/__snapshots__/MainContent-MainContent-renders-multiple-thinking-messages-sequentially-correctly.snap.svg @@ -21,22 +21,22 @@ Initial analysis - This is a multiple line paragraph for the first thinking message of how the model analyzes the + This is a multiple line paragraph for the first thinking message of how the - problem. + model analyzes the problem. Planning execution - This a second multiple line paragraph for the second thinking message explaining the plan in + This a second multiple line paragraph for the second thinking message - detail so that it wraps around the terminal display. + explaining the plan in detail so that it wraps around the terminal display. Refining approach - And finally a third multiple line paragraph for the third thinking message to refine the + And finally a third multiple line paragraph for the third thinking message to - solution. + refine the solution. \ No newline at end of file diff --git a/packages/cli/src/ui/components/__snapshots__/MainContent.test.tsx.snap b/packages/cli/src/ui/components/__snapshots__/MainContent.test.tsx.snap index 785dc6b6f0..d5173e8c9c 100644 --- a/packages/cli/src/ui/components/__snapshots__/MainContent.test.tsx.snap +++ b/packages/cli/src/ui/components/__snapshots__/MainContent.test.tsx.snap @@ -6,12 +6,11 @@ AppHeader(full) ╭──────────────────────────────────────────────────────────────────────────────────────────────╮ │ ⊶ Shell Command Running a long command... │ │ │ -│ Line 9 │ │ Line 10 │ │ Line 11 │ │ Line 12 │ │ Line 13 │ -│ Line 14 █ │ +│ Line 14 │ │ Line 15 █ │ │ Line 16 █ │ │ Line 17 █ │ @@ -28,12 +27,11 @@ AppHeader(full) ╭──────────────────────────────────────────────────────────────────────────────────────────────╮ │ ⊶ Shell Command Running a long command... │ │ │ -│ Line 9 │ │ Line 10 │ │ Line 11 │ │ Line 12 │ │ Line 13 │ -│ Line 14 █ │ +│ Line 14 │ │ Line 15 █ │ │ Line 16 █ │ │ Line 17 █ │ @@ -49,8 +47,7 @@ exports[`MainContent > MainContent Tool Output Height Logic > 'Normal mode - Con ╭──────────────────────────────────────────────────────────────────────────────────────────────╮ │ ⊶ Shell Command Running a long command... │ │ │ -│ ... first 9 lines hidden (Ctrl+O to show) ... │ -│ Line 10 │ +│ ... first 10 lines hidden (Ctrl+O to show) ... │ │ Line 11 │ │ Line 12 │ │ Line 13 │ @@ -96,15 +93,15 @@ exports[`MainContent > MainContent Tool Output Height Logic > 'Normal mode - Unc exports[`MainContent > renders a split tool group without a gap between static and pending areas 1`] = ` "AppHeader(full) -╭──────────────────────────────────────────────────────────────────────────────────────────────╮ -│ ✓ test-tool A tool for testing │ -│ │ -│ Part 1 │ -│ │ -│ ✓ test-tool A tool for testing │ -│ │ -│ Part 2 │ -╰──────────────────────────────────────────────────────────────────────────────────────────────╯ +╭──────────────────────────────────────────────────────────────────────────╮ +│ ✓ test-tool A tool for testing │ +│ │ +│ Part 1 │ +│ │ +│ ✓ test-tool A tool for testing │ +│ │ +│ Part 2 │ +╰──────────────────────────────────────────────────────────────────────────╯ " `; @@ -163,16 +160,16 @@ AppHeader(full) Thinking... │ │ Initial analysis - │ This is a multiple line paragraph for the first thinking message of how the model analyzes the - │ problem. + │ This is a multiple line paragraph for the first thinking message of how the + │ model analyzes the problem. │ │ Planning execution - │ This a second multiple line paragraph for the second thinking message explaining the plan in - │ detail so that it wraps around the terminal display. + │ This a second multiple line paragraph for the second thinking message + │ explaining the plan in detail so that it wraps around the terminal display. │ │ Refining approach - │ And finally a third multiple line paragraph for the third thinking message to refine the - │ solution. + │ And finally a third multiple line paragraph for the third thinking message to + │ refine the solution. " `; @@ -185,14 +182,14 @@ AppHeader(full) Thinking... │ │ Initial analysis - │ This is a multiple line paragraph for the first thinking message of how the model analyzes the - │ problem. + │ This is a multiple line paragraph for the first thinking message of how the + │ model analyzes the problem. │ │ Planning execution - │ This a second multiple line paragraph for the second thinking message explaining the plan in - │ detail so that it wraps around the terminal display. + │ This a second multiple line paragraph for the second thinking message + │ explaining the plan in detail so that it wraps around the terminal display. │ │ Refining approach - │ And finally a third multiple line paragraph for the third thinking message to refine the - │ solution." + │ And finally a third multiple line paragraph for the third thinking message to + │ refine the solution." `; diff --git a/packages/cli/src/ui/components/__snapshots__/StatusDisplay.test.tsx.snap b/packages/cli/src/ui/components/__snapshots__/StatusDisplay.test.tsx.snap index 2620531cc3..2e6b4b75ad 100644 --- a/packages/cli/src/ui/components/__snapshots__/StatusDisplay.test.tsx.snap +++ b/packages/cli/src/ui/components/__snapshots__/StatusDisplay.test.tsx.snap @@ -11,7 +11,7 @@ exports[`StatusDisplay > renders ContextSummaryDisplay by default 1`] = ` `; exports[`StatusDisplay > renders HookStatusDisplay when hooks are active 1`] = ` -"Mock Hook Status Display +"Mock Context Summary Display (Skills: 2, Shells: 0) " `; diff --git a/packages/cli/src/ui/components/__snapshots__/ToolConfirmationQueue-ToolConfirmationQueue-height-allocation-and-layout-should-handle-security-warning-height-correctly.snap.svg b/packages/cli/src/ui/components/__snapshots__/ToolConfirmationQueue-ToolConfirmationQueue-height-allocation-and-layout-should-handle-security-warning-height-correctly.snap.svg new file mode 100644 index 0000000000..678d4b42b3 --- /dev/null +++ b/packages/cli/src/ui/components/__snapshots__/ToolConfirmationQueue-ToolConfirmationQueue-height-allocation-and-layout-should-handle-security-warning-height-correctly.snap.svg @@ -0,0 +1,130 @@ + + + + + ╭──────────────────────────────────────────────────────────────────────────────╮ + + Action Required + 3 of 3 + + + + + ? + run_shell_command + Executes a bash command with a deceptive URL + + + + + ... 6 hidden (Ctrl+O) ... + + + echo + "Line 37" + + + echo + "Line 38" + + + echo + "Line 39" + + + echo + "Line 40" + + + echo + "Line 41" + + + echo + "Line 42" + + + echo + "Line 43" + + + echo + "Line 44" + + + echo + "Line 45" + + + echo + "Line 46" + + + echo + "Line 47" + + + echo + "Line 48" + + + echo + "Line 49" + + + echo + "Line 50" + + + curl https://täst.com + + + + + + Warning: + Deceptive URL(s) detected: + + + + + Original: + https://täst.com/ + + + Actual Host (Punycode): + https://xn--tst-qla.com/ + + + + + Allow execution of: 'echo'? + + + + + + + + + 1. + + + Allow once + + + + 2. + Allow for this session + + + 3. + No, suggest changes (esc) + + + + ╰──────────────────────────────────────────────────────────────────────────────╯ + + \ No newline at end of file diff --git a/packages/cli/src/ui/components/__snapshots__/ToolConfirmationQueue-ToolConfirmationQueue-height-allocation-and-layout-should-render-the-full-queue-wrapper-with-borders-and-content-for-large-edit-diffs.snap.svg b/packages/cli/src/ui/components/__snapshots__/ToolConfirmationQueue-ToolConfirmationQueue-height-allocation-and-layout-should-render-the-full-queue-wrapper-with-borders-and-content-for-large-edit-diffs.snap.svg new file mode 100644 index 0000000000..c39d7046bc --- /dev/null +++ b/packages/cli/src/ui/components/__snapshots__/ToolConfirmationQueue-ToolConfirmationQueue-height-allocation-and-layout-should-render-the-full-queue-wrapper-with-borders-and-content-for-large-edit-diffs.snap.svg @@ -0,0 +1,458 @@ + + + + + ╭──────────────────────────────────────────────────────────────────────────────╮ + + Action Required + + + + + ? + replace + Replaces content in a file + + + + + ... 15 hidden (Ctrl+O) ... + + + + + 8 + + + + + + + const + + newLine8 = + + true + + ; + + + + + 9 + + + - + + + const + + oldLine9 = + + true + + ; + + + + + 9 + + + + + + + const + + newLine9 = + + true + + ; + + + + 10 + + + - + + + const + + oldLine10 = + + true + + ; + + + + 10 + + + + + + + const + + newLine10 = + + true + + ; + + + + 11 + + + - + + + const + + oldLine11 = + + true + + ; + + + + 11 + + + + + + + const + + newLine11 = + + true + + ; + + + + 12 + + + - + + + const + + oldLine12 = + + true + + ; + + + + 12 + + + + + + + const + + newLine12 = + + true + + ; + + + + 13 + + + - + + + const + + oldLine13 = + + true + + ; + + + + 13 + + + + + + + const + + newLine13 = + + true + + ; + + + + 14 + + + - + + + const + + oldLine14 = + + true + + ; + + + + 14 + + + + + + + const + + newLine14 = + + true + + ; + + + + 15 + + + - + + + const + + oldLine15 = + + true + + ; + + + + 15 + + + + + + + const + + newLine15 = + + true + + ; + + + + 16 + + + - + + + const + + oldLine16 = + + true + + ; + + + + 16 + + + + + + + const + + newLine16 = + + true + + ; + + + + 17 + + + - + + + const + + oldLine17 = + + true + + ; + + + + 17 + + + + + + + const + + newLine17 = + + true + + ; + + + + 18 + + + - + + + const + + oldLine18 = + + true + + ; + + + + 18 + + + + + + + const + + newLine18 = + + true + + ; + + + + 19 + + + - + + + const + + oldLine19 = + + true + + ; + + + + 19 + + + + + + + const + + newLine19 = + + true + + ; + + + + 20 + + + - + + + const + + oldLine20 = + + true + + ; + + + + 20 + + + + + + + const + + newLine20 = + + true + + ; + + + Apply this change? + + + + + + + + + 1. + + + Allow once + + + + 2. + Allow for this session + + + 3. + Modify with external editor + + + 4. + No, suggest changes (esc) + + + + ╰──────────────────────────────────────────────────────────────────────────────╯ + + \ No newline at end of file diff --git a/packages/cli/src/ui/components/__snapshots__/ToolConfirmationQueue-ToolConfirmationQueue-height-allocation-and-layout-should-render-the-full-queue-wrapper-with-borders-and-content-for-large-exec-commands.snap.svg b/packages/cli/src/ui/components/__snapshots__/ToolConfirmationQueue-ToolConfirmationQueue-height-allocation-and-layout-should-render-the-full-queue-wrapper-with-borders-and-content-for-large-exec-commands.snap.svg new file mode 100644 index 0000000000..508fc9d3c4 --- /dev/null +++ b/packages/cli/src/ui/components/__snapshots__/ToolConfirmationQueue-ToolConfirmationQueue-height-allocation-and-layout-should-render-the-full-queue-wrapper-with-borders-and-content-for-large-exec-commands.snap.svg @@ -0,0 +1,156 @@ + + + + + ╭──────────────────────────────────────────────────────────────────────────────╮ + + Action Required + 2 of 3 + + + + + ? + run_shell_command + Executes a bash command + + + + + ... 24 hidden (Ctrl+O) ... + + + echo + "Line 25" + + + echo + "Line 26" + + + echo + "Line 27" + + + echo + "Line 28" + + + echo + "Line 29" + + + echo + "Line 30" + + + echo + "Line 31" + + + echo + "Line 32" + + + echo + "Line 33" + + + echo + "Line 34" + + + echo + "Line 35" + + + echo + "Line 36" + + + echo + "Line 37" + + + echo + "Line 38" + + + echo + "Line 39" + + + echo + "Line 40" + + + echo + "Line 41" + + + echo + "Line 42" + + + echo + "Line 43" + + + echo + "Line 44" + + + echo + "Line 45" + + + echo + "Line 46" + + + echo + "Line 47" + + + echo + "Line 48" + + + echo + "Line 49" + + + echo + "Line 50" + + + Allow execution of: 'echo'? + + + + + + + + + 1. + + + Allow once + + + + 2. + Allow for this session + + + 3. + No, suggest changes (esc) + + + + ╰──────────────────────────────────────────────────────────────────────────────╯ + + \ No newline at end of file diff --git a/packages/cli/src/ui/components/__snapshots__/ToolConfirmationQueue.test.tsx.snap b/packages/cli/src/ui/components/__snapshots__/ToolConfirmationQueue.test.tsx.snap index 6d9baba94f..fdbb216cde 100644 --- a/packages/cli/src/ui/components/__snapshots__/ToolConfirmationQueue.test.tsx.snap +++ b/packages/cli/src/ui/components/__snapshots__/ToolConfirmationQueue.test.tsx.snap @@ -16,7 +16,6 @@ exports[`ToolConfirmationQueue > calculates availableContentHeight based on avai │ 4. No, suggest changes (esc) │ │ │ ╰──────────────────────────────────────────────────────────────────────────────╯ - Press Ctrl+O to show more lines " `; @@ -42,6 +41,130 @@ exports[`ToolConfirmationQueue > does not render expansion hint when constrainHe " `; +exports[`ToolConfirmationQueue > height allocation and layout > should handle security warning height correctly 1`] = ` +"╭──────────────────────────────────────────────────────────────────────────────╮ +│ Action Required 3 of 3 │ +│ │ +│ ? run_shell_command Executes a bash command with a deceptive URL │ +│ │ +│ ... 6 hidden (Ctrl+O) ... │ +│ echo "Line 37" │ +│ echo "Line 38" │ +│ echo "Line 39" │ +│ echo "Line 40" │ +│ echo "Line 41" │ +│ echo "Line 42" │ +│ echo "Line 43" │ +│ echo "Line 44" │ +│ echo "Line 45" │ +│ echo "Line 46" │ +│ echo "Line 47" │ +│ echo "Line 48" │ +│ echo "Line 49" │ +│ echo "Line 50" │ +│ curl https://täst.com │ +│ │ +│ ⚠ Warning: Deceptive URL(s) detected: │ +│ │ +│ Original: https://täst.com/ │ +│ Actual Host (Punycode): https://xn--tst-qla.com/ │ +│ │ +│ Allow execution of: 'echo'? │ +│ │ +│ ● 1. Allow once │ +│ 2. Allow for this session │ +│ 3. No, suggest changes (esc) │ +│ │ +╰──────────────────────────────────────────────────────────────────────────────╯ +" +`; + +exports[`ToolConfirmationQueue > height allocation and layout > should render the full queue wrapper with borders and content for large edit diffs 1`] = ` +"╭──────────────────────────────────────────────────────────────────────────────╮ +│ Action Required │ +│ │ +│ ? replace Replaces content in a file │ +│ │ +│ ... 15 hidden (Ctrl+O) ... │ +│ 8 + const newLine8 = true; │ +│ 9 - const oldLine9 = true; │ +│ 9 + const newLine9 = true; │ +│ 10 - const oldLine10 = true; │ +│ 10 + const newLine10 = true; │ +│ 11 - const oldLine11 = true; │ +│ 11 + const newLine11 = true; │ +│ 12 - const oldLine12 = true; │ +│ 12 + const newLine12 = true; │ +│ 13 - const oldLine13 = true; │ +│ 13 + const newLine13 = true; │ +│ 14 - const oldLine14 = true; │ +│ 14 + const newLine14 = true; │ +│ 15 - const oldLine15 = true; │ +│ 15 + const newLine15 = true; │ +│ 16 - const oldLine16 = true; │ +│ 16 + const newLine16 = true; │ +│ 17 - const oldLine17 = true; │ +│ 17 + const newLine17 = true; │ +│ 18 - const oldLine18 = true; │ +│ 18 + const newLine18 = true; │ +│ 19 - const oldLine19 = true; │ +│ 19 + const newLine19 = true; │ +│ 20 - const oldLine20 = true; │ +│ 20 + const newLine20 = true; │ +│ Apply this change? │ +│ │ +│ ● 1. Allow once │ +│ 2. Allow for this session │ +│ 3. Modify with external editor │ +│ 4. No, suggest changes (esc) │ +│ │ +╰──────────────────────────────────────────────────────────────────────────────╯ +" +`; + +exports[`ToolConfirmationQueue > height allocation and layout > should render the full queue wrapper with borders and content for large exec commands 1`] = ` +"╭──────────────────────────────────────────────────────────────────────────────╮ +│ Action Required 2 of 3 │ +│ │ +│ ? run_shell_command Executes a bash command │ +│ │ +│ ... 24 hidden (Ctrl+O) ... │ +│ echo "Line 25" │ +│ echo "Line 26" │ +│ echo "Line 27" │ +│ echo "Line 28" │ +│ echo "Line 29" │ +│ echo "Line 30" │ +│ echo "Line 31" │ +│ echo "Line 32" │ +│ echo "Line 33" │ +│ echo "Line 34" │ +│ echo "Line 35" │ +│ echo "Line 36" │ +│ echo "Line 37" │ +│ echo "Line 38" │ +│ echo "Line 39" │ +│ echo "Line 40" │ +│ echo "Line 41" │ +│ echo "Line 42" │ +│ echo "Line 43" │ +│ echo "Line 44" │ +│ echo "Line 45" │ +│ echo "Line 46" │ +│ echo "Line 47" │ +│ echo "Line 48" │ +│ echo "Line 49" │ +│ echo "Line 50" │ +│ Allow execution of: 'echo'? │ +│ │ +│ ● 1. Allow once │ +│ 2. Allow for this session │ +│ 3. No, suggest changes (esc) │ +│ │ +╰──────────────────────────────────────────────────────────────────────────────╯ +" +`; + exports[`ToolConfirmationQueue > provides more height for ask_user by subtracting less overhead 1`] = ` "╭──────────────────────────────────────────────────────────────────────────────╮ │ Answer Questions │ @@ -91,26 +214,6 @@ exports[`ToolConfirmationQueue > renders ExitPlanMode tool confirmation with Suc " `; -exports[`ToolConfirmationQueue > renders expansion hint when content is long and constrained 1`] = ` -"╭──────────────────────────────────────────────────────────────────────────────╮ -│ Action Required │ -│ │ -│ ? replace edit file │ -│ │ -│ ... 49 hidden (Ctrl+O) ... │ -│ 50 line │ -│ Apply this change? │ -│ │ -│ ● 1. Allow once │ -│ 2. Allow for this session │ -│ 3. Modify with external editor │ -│ 4. No, suggest changes (esc) │ -│ │ -╰──────────────────────────────────────────────────────────────────────────────╯ - Press Ctrl+O to show more lines -" -`; - exports[`ToolConfirmationQueue > renders the confirming tool with progress indicator 1`] = ` "╭──────────────────────────────────────────────────────────────────────────────╮ │ Action Required 1 of 3 │ diff --git a/packages/cli/src/ui/components/messages/ShellToolMessage.test.tsx b/packages/cli/src/ui/components/messages/ShellToolMessage.test.tsx index a5981e4e2d..4f703dcfe6 100644 --- a/packages/cli/src/ui/components/messages/ShellToolMessage.test.tsx +++ b/packages/cli/src/ui/components/messages/ShellToolMessage.test.tsx @@ -184,28 +184,28 @@ describe('', () => { [ 'respects availableTerminalHeight when it is smaller than ACTIVE_SHELL_MAX_LINES', 10, - 8, + 7, false, true, ], [ 'uses ACTIVE_SHELL_MAX_LINES when availableTerminalHeight is large', 100, - ACTIVE_SHELL_MAX_LINES - 3, + ACTIVE_SHELL_MAX_LINES - 4, false, true, ], [ 'uses full availableTerminalHeight when focused in alternate buffer mode', 100, - 98, + 97, true, false, ], [ 'defaults to ACTIVE_SHELL_MAX_LINES in alternate buffer when availableTerminalHeight is undefined', undefined, - ACTIVE_SHELL_MAX_LINES - 3, + ACTIVE_SHELL_MAX_LINES - 4, false, false, ], @@ -323,8 +323,8 @@ describe('', () => { await waitFor(() => { const frame = lastFrame(); - // Should still be constrained to 12 (15 - 3) because isExpandable is false - expect(frame.match(/Line \d+/g)?.length).toBe(12); + // Should still be constrained to 11 (15 - 4) because isExpandable is false + expect(frame.match(/Line \d+/g)?.length).toBe(11); }); expect(lastFrame()).toMatchSnapshot(); unmount(); diff --git a/packages/cli/src/ui/components/messages/SubagentProgressDisplay.test.tsx b/packages/cli/src/ui/components/messages/SubagentProgressDisplay.test.tsx index 955c4a5f8a..caed091b2b 100644 --- a/packages/cli/src/ui/components/messages/SubagentProgressDisplay.test.tsx +++ b/packages/cli/src/ui/components/messages/SubagentProgressDisplay.test.tsx @@ -182,4 +182,25 @@ describe('', () => { ); expect(lastFrame()).toMatchSnapshot(); }); + + it('renders error tool status correctly', async () => { + const progress: SubagentProgress = { + isSubagentProgress: true, + agentName: 'TestAgent', + recentActivity: [ + { + id: '7', + type: 'tool_call', + content: 'run_shell_command', + args: '{"command": "echo hello"}', + status: 'error', + }, + ], + }; + + const { lastFrame } = await render( + , + ); + expect(lastFrame()).toMatchSnapshot(); + }); }); diff --git a/packages/cli/src/ui/components/messages/ToolConfirmationMessage.test.tsx b/packages/cli/src/ui/components/messages/ToolConfirmationMessage.test.tsx index 1759b0484c..eddbaf4396 100644 --- a/packages/cli/src/ui/components/messages/ToolConfirmationMessage.test.tsx +++ b/packages/cli/src/ui/components/messages/ToolConfirmationMessage.test.tsx @@ -232,7 +232,7 @@ describe('ToolConfirmationMessage', () => { unmount(); }); - it('should render multiline shell scripts with correct newlines and syntax highlighting (SVG snapshot)', async () => { + it('should render multiline shell scripts with correct newlines and syntax highlighting', async () => { const confirmationDetails: SerializableConfirmationDetails = { type: 'exec', title: 'Confirm Multiline Script', @@ -453,7 +453,6 @@ describe('ToolConfirmationMessage', () => { cancel: vi.fn(), isDiffingEnabled: false, }); - const { lastFrame, unmount } = await renderWithProviders( { cancel: vi.fn(), isDiffingEnabled: false, }); - const { lastFrame, unmount } = await renderWithProviders( { unmount(); }); + describe('height allocation and layout', () => { + it('should expand to available height for large exec commands', async () => { + let largeCommand = ''; + for (let i = 1; i <= 50; i++) { + largeCommand += `echo "Line ${i}"\n`; + } + + const confirmationDetails: SerializableConfirmationDetails = { + type: 'exec', + title: 'Confirm Execution', + command: largeCommand.trimEnd(), + rootCommand: 'echo', + rootCommands: ['echo'], + }; + + const { waitUntilReady, lastFrame, generateSvg, unmount } = + await renderWithProviders( + , + ); + await waitUntilReady(); + + const outputLines = lastFrame().split('\n'); + // Should use the entire terminal height minus 1 line for the "Press Ctrl+O to show more lines" hint + expect(outputLines.length).toBe(39); + + await expect({ lastFrame, generateSvg }).toMatchSvgSnapshot(); + unmount(); + }); + + it('should expand to available height for large edit diffs', async () => { + // Create a large diff string + let largeDiff = '--- a/file.ts\n+++ b/file.ts\n@@ -1,10 +1,15 @@\n'; + for (let i = 1; i <= 20; i++) { + largeDiff += `-const oldLine${i} = true;\n`; + largeDiff += `+const newLine${i} = true;\n`; + } + + const confirmationDetails: SerializableConfirmationDetails = { + type: 'edit', + title: 'Confirm Edit', + fileName: 'file.ts', + filePath: '/file.ts', + fileDiff: largeDiff, + originalContent: 'old', + newContent: 'new', + isModifying: false, + }; + + const { waitUntilReady, lastFrame, generateSvg, unmount } = + await renderWithProviders( + , + ); + await waitUntilReady(); + + const outputLines = lastFrame().split('\n'); + // Should use the entire terminal height minus 1 line for the "Press Ctrl+O to show more lines" hint + expect(outputLines.length).toBe(39); + + await expect({ lastFrame, generateSvg }).toMatchSvgSnapshot(); + unmount(); + }); + }); + describe('ESCAPE key behavior', () => { beforeEach(() => { vi.useFakeTimers(); @@ -646,7 +721,6 @@ describe('ToolConfirmationMessage', () => { cancel: vi.fn(), isDiffingEnabled: false, }); - const confirmationDetails: SerializableConfirmationDetails = { type: 'info', title: 'Confirm Web Fetch', diff --git a/packages/cli/src/ui/components/messages/ToolConfirmationMessage.tsx b/packages/cli/src/ui/components/messages/ToolConfirmationMessage.tsx index 45584a9d46..d9ca2e66c6 100644 --- a/packages/cli/src/ui/components/messages/ToolConfirmationMessage.tsx +++ b/packages/cli/src/ui/components/messages/ToolConfirmationMessage.tsx @@ -5,8 +5,8 @@ */ import type React from 'react'; -import { useEffect, useMemo, useCallback, useState } from 'react'; -import { Box, Text } from 'ink'; +import { useEffect, useMemo, useCallback, useState, useRef } from 'react'; +import { Box, Text, ResizeObserver, type DOMElement } from 'ink'; import { DiffRenderer } from './DiffRenderer.js'; import { RenderInline } from '../../utils/InlineMarkdownRenderer.js'; import { @@ -85,6 +85,64 @@ export const ToolConfirmationMessage: React.FC< ? mcpDetailsExpansionState.expanded : false; + const [measuredSecurityWarningsHeight, setMeasuredSecurityWarningsHeight] = + useState(0); + const observerRef = useRef(null); + + const deceptiveUrlWarnings = useMemo(() => { + const urls: string[] = []; + if (confirmationDetails.type === 'info' && confirmationDetails.urls) { + urls.push(...confirmationDetails.urls); + } else if (confirmationDetails.type === 'exec') { + const commands = + confirmationDetails.commands && confirmationDetails.commands.length > 0 + ? confirmationDetails.commands + : [confirmationDetails.command]; + for (const cmd of commands) { + const matches = cmd.match(/https?:\/\/[^\s"'`<>;&|()]+/g); + if (matches) urls.push(...matches); + } + } + + const uniqueUrls = Array.from(new Set(urls)); + return uniqueUrls + .map(getDeceptiveUrlDetails) + .filter((d): d is DeceptiveUrlDetails => d !== null); + }, [confirmationDetails]); + + const deceptiveUrlWarningText = useMemo(() => { + if (deceptiveUrlWarnings.length === 0) return null; + return `**Warning:** Deceptive URL(s) detected:\n\n${deceptiveUrlWarnings + .map( + (w) => + ` **Original:** ${w.originalUrl}\n **Actual Host (Punycode):** ${w.punycodeUrl}`, + ) + .join('\n\n')}`; + }, [deceptiveUrlWarnings]); + + const onSecurityWarningsRefChange = useCallback((node: DOMElement | null) => { + if (observerRef.current) { + observerRef.current.disconnect(); + observerRef.current = null; + } + + if (node) { + const observer = new ResizeObserver((entries) => { + const entry = entries[0]; + if (entry) { + const newHeight = Math.round(entry.contentRect.height); + setMeasuredSecurityWarningsHeight((prev) => + newHeight !== prev ? newHeight : prev, + ); + } + }); + observer.observe(node); + observerRef.current = observer; + } else { + setMeasuredSecurityWarningsHeight((prev) => (prev !== 0 ? 0 : prev)); + } + }, []); + const settings = useSettings(); const allowPermanentApproval = settings.merged.security.enablePermanentToolApproval && @@ -216,37 +274,6 @@ export const ToolConfirmationMessage: React.FC< [handleConfirm], ); - const deceptiveUrlWarnings = useMemo(() => { - const urls: string[] = []; - if (confirmationDetails.type === 'info' && confirmationDetails.urls) { - urls.push(...confirmationDetails.urls); - } else if (confirmationDetails.type === 'exec') { - const commands = - confirmationDetails.commands && confirmationDetails.commands.length > 0 - ? confirmationDetails.commands - : [confirmationDetails.command]; - for (const cmd of commands) { - const matches = cmd.match(/https?:\/\/[^\s"'`<>;&|()]+/g); - if (matches) urls.push(...matches); - } - } - - const uniqueUrls = Array.from(new Set(urls)); - return uniqueUrls - .map(getDeceptiveUrlDetails) - .filter((d): d is DeceptiveUrlDetails => d !== null); - }, [confirmationDetails]); - - const deceptiveUrlWarningText = useMemo(() => { - if (deceptiveUrlWarnings.length === 0) return null; - return `**Warning:** Deceptive URL(s) detected:\n\n${deceptiveUrlWarnings - .map( - (w) => - ` **Original:** ${w.originalUrl}\n **Actual Host (Punycode):** ${w.punycodeUrl}`, - ) - .join('\n\n')}`; - }, [deceptiveUrlWarnings]); - const getOptions = useCallback(() => { const options: Array> = []; @@ -389,23 +416,36 @@ export const ToolConfirmationMessage: React.FC< // Calculate the vertical space (in lines) consumed by UI elements // surrounding the main body content. - const PADDING_OUTER_Y = 2; // Main container has `padding={1}` (top & bottom). - const MARGIN_BODY_BOTTOM = 1; // margin on the body container. + const PADDING_OUTER_Y = 1; // Main container has `paddingBottom={1}`. const HEIGHT_QUESTION = 1; // The question text is one line. const MARGIN_QUESTION_BOTTOM = 1; // Margin on the question container. + const SECURITY_WARNING_BOTTOM_MARGIN = 1; // Margin on the securityWarnings container. + const SHOW_MORE_LINES_HEIGHT = 1; // The "Press Ctrl+O to show more lines" hint. const optionsCount = getOptions().length; + // The measured height includes the margin inside WarningMessage (1 line). + // We also add 1 line for the marginBottom on the securityWarnings container. + const securityWarningsHeight = deceptiveUrlWarningText + ? measuredSecurityWarningsHeight + SECURITY_WARNING_BOTTOM_MARGIN + : 0; + const surroundingElementsHeight = PADDING_OUTER_Y + - MARGIN_BODY_BOTTOM + HEIGHT_QUESTION + MARGIN_QUESTION_BOTTOM + + SHOW_MORE_LINES_HEIGHT + optionsCount + - 1; // Reserve one line for 'ShowMoreLines' hint + securityWarningsHeight; return Math.max(availableTerminalHeight - surroundingElementsHeight, 1); - }, [availableTerminalHeight, getOptions, handlesOwnUI]); + }, [ + availableTerminalHeight, + handlesOwnUI, + getOptions, + measuredSecurityWarningsHeight, + deceptiveUrlWarningText, + ]); const { question, bodyContent, options, securityWarnings, initialIndex } = useMemo<{ @@ -547,10 +587,6 @@ export const ToolConfirmationMessage: React.FC< let bodyContentHeight = availableBodyContentHeight(); let warnings: React.ReactNode = null; - if (bodyContentHeight !== undefined) { - bodyContentHeight -= 2; // Account for padding; - } - if (containsRedirection) { // Calculate lines needed for Note and Tip const safeWidth = Math.max(terminalWidth, 1); @@ -735,6 +771,15 @@ export const ToolConfirmationMessage: React.FC< paddingTop={0} paddingBottom={handlesOwnUI ? 0 : 1} > + {/* System message from hook */} + {confirmationDetails.systemMessage && ( + + + {confirmationDetails.systemMessage} + + + )} + {handlesOwnUI ? ( bodyContent ) : ( @@ -750,7 +795,11 @@ export const ToolConfirmationMessage: React.FC< {securityWarnings && ( - + {securityWarnings} )} diff --git a/packages/cli/src/ui/components/messages/__snapshots__/ShellToolMessage.test.tsx.snap b/packages/cli/src/ui/components/messages/__snapshots__/ShellToolMessage.test.tsx.snap index 1847b8ce67..967ea81e14 100644 --- a/packages/cli/src/ui/components/messages/__snapshots__/ShellToolMessage.test.tsx.snap +++ b/packages/cli/src/ui/components/messages/__snapshots__/ShellToolMessage.test.tsx.snap @@ -4,7 +4,6 @@ exports[` > Height Constraints > defaults to ACTIVE_SHELL_MA "╭──────────────────────────────────────────────────────────────────────────────╮ │ ⊶ Shell Command A shell command │ │ │ -│ Line 89 │ │ Line 90 │ │ Line 91 │ │ Line 92 │ @@ -14,7 +13,7 @@ exports[` > Height Constraints > defaults to ACTIVE_SHELL_MA │ Line 96 │ │ Line 97 │ │ Line 98 │ -│ Line 99 ▄ │ +│ Line 99 │ │ Line 100 █ │ " `; @@ -130,7 +129,6 @@ exports[` > Height Constraints > respects availableTerminalH "╭──────────────────────────────────────────────────────────────────────────────╮ │ ⊶ Shell Command A shell command │ │ │ -│ Line 93 │ │ Line 94 │ │ Line 95 │ │ Line 96 │ @@ -145,7 +143,6 @@ exports[` > Height Constraints > stays constrained in altern "╭──────────────────────────────────────────────────────────────────────────────╮ │ ✓ Shell Command A shell command │ │ │ -│ Line 89 │ │ Line 90 │ │ Line 91 │ │ Line 92 │ @@ -155,7 +152,7 @@ exports[` > Height Constraints > stays constrained in altern │ Line 96 │ │ Line 97 │ │ Line 98 │ -│ Line 99 ▄ │ +│ Line 99 │ │ Line 100 █ │ " `; @@ -164,7 +161,6 @@ exports[` > Height Constraints > uses ACTIVE_SHELL_MAX_LINES "╭──────────────────────────────────────────────────────────────────────────────╮ │ ⊶ Shell Command A shell command │ │ │ -│ Line 89 │ │ Line 90 │ │ Line 91 │ │ Line 92 │ @@ -174,7 +170,7 @@ exports[` > Height Constraints > uses ACTIVE_SHELL_MAX_LINES │ Line 96 │ │ Line 97 │ │ Line 98 │ -│ Line 99 ▄ │ +│ Line 99 │ │ Line 100 █ │ " `; @@ -183,10 +179,9 @@ exports[` > Height Constraints > uses full availableTerminal "╭──────────────────────────────────────────────────────────────────────────────╮ │ ⊶ Shell Command A shell command (Shift+Tab to unfocus) │ │ │ -│ Line 3 │ │ Line 4 │ -│ Line 5 █ │ -│ Line 6 █ │ +│ Line 5 │ +│ Line 6 │ │ Line 7 █ │ │ Line 8 █ │ │ Line 9 █ │ diff --git a/packages/cli/src/ui/components/messages/__snapshots__/SubagentProgressDisplay.test.tsx.snap b/packages/cli/src/ui/components/messages/__snapshots__/SubagentProgressDisplay.test.tsx.snap index 2d31c9c652..77a3ec001f 100644 --- a/packages/cli/src/ui/components/messages/__snapshots__/SubagentProgressDisplay.test.tsx.snap +++ b/packages/cli/src/ui/components/messages/__snapshots__/SubagentProgressDisplay.test.tsx.snap @@ -40,6 +40,13 @@ exports[` > renders correctly with file_path 1`] = ` " `; +exports[` > renders error tool status correctly 1`] = ` +"Running subagent TestAgent... + +x run_shell_command echo hello +" +`; + exports[` > renders thought bubbles correctly 1`] = ` "Running subagent TestAgent... diff --git a/packages/cli/src/ui/components/messages/__snapshots__/ToolConfirmationMessage-ToolConfirmationMessage-height-allocation-and-layout-should-expand-to-available-height-for-large-edit-diffs.snap.svg b/packages/cli/src/ui/components/messages/__snapshots__/ToolConfirmationMessage-ToolConfirmationMessage-height-allocation-and-layout-should-expand-to-available-height-for-large-edit-diffs.snap.svg new file mode 100644 index 0000000000..4c570fb451 --- /dev/null +++ b/packages/cli/src/ui/components/messages/__snapshots__/ToolConfirmationMessage-ToolConfirmationMessage-height-allocation-and-layout-should-expand-to-available-height-for-large-edit-diffs.snap.svg @@ -0,0 +1,468 @@ + + + + + ... first 9 lines hidden (Ctrl+O to show) ... + + + 5 + + + + + + + const + + newLine5 = + + true + + ; + + + 6 + + + - + + + const + + oldLine6 = + + true + + ; + + + 6 + + + + + + + const + + newLine6 = + + true + + ; + + + 7 + + + - + + + const + + oldLine7 = + + true + + ; + + + 7 + + + + + + + const + + newLine7 = + + true + + ; + + + 8 + + + - + + + const + + oldLine8 = + + true + + ; + + + 8 + + + + + + + const + + newLine8 = + + true + + ; + + + 9 + + + - + + + const + + oldLine9 = + + true + + ; + + + 9 + + + + + + + const + + newLine9 = + + true + + ; + + 10 + + + - + + + const + + oldLine10 = + + true + + ; + + 10 + + + + + + + const + + newLine10 = + + true + + ; + + 11 + + + - + + + const + + oldLine11 = + + true + + ; + + 11 + + + + + + + const + + newLine11 = + + true + + ; + + 12 + + + - + + + const + + oldLine12 = + + true + + ; + + 12 + + + + + + + const + + newLine12 = + + true + + ; + + 13 + + + - + + + const + + oldLine13 = + + true + + ; + + 13 + + + + + + + const + + newLine13 = + + true + + ; + + 14 + + + - + + + const + + oldLine14 = + + true + + ; + + 14 + + + + + + + const + + newLine14 = + + true + + ; + + 15 + + + - + + + const + + oldLine15 = + + true + + ; + + 15 + + + + + + + const + + newLine15 = + + true + + ; + + 16 + + + - + + + const + + oldLine16 = + + true + + ; + + 16 + + + + + + + const + + newLine16 = + + true + + ; + + 17 + + + - + + + const + + oldLine17 = + + true + + ; + + 17 + + + + + + + const + + newLine17 = + + true + + ; + + 18 + + + - + + + const + + oldLine18 = + + true + + ; + + 18 + + + + + + + const + + newLine18 = + + true + + ; + + 19 + + + - + + + const + + oldLine19 = + + true + + ; + + 19 + + + + + + + const + + newLine19 = + + true + + ; + + 20 + + + - + + + const + + oldLine20 = + + true + + ; + + 20 + + + + + + + const + + newLine20 = + + true + + ; + Apply this change? + + + + + 1. + + + Allow once + + 2. + Allow for this session + 3. + Modify with external editor + 4. + No, suggest changes (esc) + + \ No newline at end of file diff --git a/packages/cli/src/ui/components/messages/__snapshots__/ToolConfirmationMessage-ToolConfirmationMessage-height-allocation-and-layout-should-expand-to-available-height-for-large-exec-commands.snap.svg b/packages/cli/src/ui/components/messages/__snapshots__/ToolConfirmationMessage-ToolConfirmationMessage-height-allocation-and-layout-should-expand-to-available-height-for-large-exec-commands.snap.svg new file mode 100644 index 0000000000..4b34a3405f --- /dev/null +++ b/packages/cli/src/ui/components/messages/__snapshots__/ToolConfirmationMessage-ToolConfirmationMessage-height-allocation-and-layout-should-expand-to-available-height-for-large-exec-commands.snap.svg @@ -0,0 +1,87 @@ + + + + + ... first 18 lines hidden (Ctrl+O to show) ... + echo + "Line 19" + echo + "Line 20" + echo + "Line 21" + echo + "Line 22" + echo + "Line 23" + echo + "Line 24" + echo + "Line 25" + echo + "Line 26" + echo + "Line 27" + echo + "Line 28" + echo + "Line 29" + echo + "Line 30" + echo + "Line 31" + echo + "Line 32" + echo + "Line 33" + echo + "Line 34" + echo + "Line 35" + echo + "Line 36" + echo + "Line 37" + echo + "Line 38" + echo + "Line 39" + echo + "Line 40" + echo + "Line 41" + echo + "Line 42" + echo + "Line 43" + echo + "Line 44" + echo + "Line 45" + echo + "Line 46" + echo + "Line 47" + echo + "Line 48" + echo + "Line 49" + echo + "Line 50" + Allow execution of: 'echo'? + + + + + 1. + + + Allow once + + 2. + Allow for this session + 3. + No, suggest changes (esc) + + \ No newline at end of file diff --git a/packages/cli/src/ui/components/messages/__snapshots__/ToolConfirmationMessage-ToolConfirmationMessage-should-render-multiline-shell-scripts-with-correct-newlines-and-syntax-highlighting-SVG-snapshot-.snap.svg b/packages/cli/src/ui/components/messages/__snapshots__/ToolConfirmationMessage-ToolConfirmationMessage-should-render-multiline-shell-scripts-with-correct-newlines-and-syntax-highlighting.snap.svg similarity index 100% rename from packages/cli/src/ui/components/messages/__snapshots__/ToolConfirmationMessage-ToolConfirmationMessage-should-render-multiline-shell-scripts-with-correct-newlines-and-syntax-highlighting-SVG-snapshot-.snap.svg rename to packages/cli/src/ui/components/messages/__snapshots__/ToolConfirmationMessage-ToolConfirmationMessage-should-render-multiline-shell-scripts-with-correct-newlines-and-syntax-highlighting.snap.svg diff --git a/packages/cli/src/ui/components/messages/__snapshots__/ToolConfirmationMessage.test.tsx.snap b/packages/cli/src/ui/components/messages/__snapshots__/ToolConfirmationMessage.test.tsx.snap index 085d0bc445..eb9f856b0b 100644 --- a/packages/cli/src/ui/components/messages/__snapshots__/ToolConfirmationMessage.test.tsx.snap +++ b/packages/cli/src/ui/components/messages/__snapshots__/ToolConfirmationMessage.test.tsx.snap @@ -16,6 +16,90 @@ Apply this change? " `; +exports[`ToolConfirmationMessage > height allocation and layout > should expand to available height for large edit diffs 1`] = ` +"... first 9 lines hidden (Ctrl+O to show) ... + 5 + const newLine5 = true; + 6 - const oldLine6 = true; + 6 + const newLine6 = true; + 7 - const oldLine7 = true; + 7 + const newLine7 = true; + 8 - const oldLine8 = true; + 8 + const newLine8 = true; + 9 - const oldLine9 = true; + 9 + const newLine9 = true; +10 - const oldLine10 = true; +10 + const newLine10 = true; +11 - const oldLine11 = true; +11 + const newLine11 = true; +12 - const oldLine12 = true; +12 + const newLine12 = true; +13 - const oldLine13 = true; +13 + const newLine13 = true; +14 - const oldLine14 = true; +14 + const newLine14 = true; +15 - const oldLine15 = true; +15 + const newLine15 = true; +16 - const oldLine16 = true; +16 + const newLine16 = true; +17 - const oldLine17 = true; +17 + const newLine17 = true; +18 - const oldLine18 = true; +18 + const newLine18 = true; +19 - const oldLine19 = true; +19 + const newLine19 = true; +20 - const oldLine20 = true; +20 + const newLine20 = true; +Apply this change? + +● 1. Allow once + 2. Allow for this session + 3. Modify with external editor + 4. No, suggest changes (esc) +" +`; + +exports[`ToolConfirmationMessage > height allocation and layout > should expand to available height for large exec commands 1`] = ` +"... first 18 lines hidden (Ctrl+O to show) ... +echo "Line 19" +echo "Line 20" +echo "Line 21" +echo "Line 22" +echo "Line 23" +echo "Line 24" +echo "Line 25" +echo "Line 26" +echo "Line 27" +echo "Line 28" +echo "Line 29" +echo "Line 30" +echo "Line 31" +echo "Line 32" +echo "Line 33" +echo "Line 34" +echo "Line 35" +echo "Line 36" +echo "Line 37" +echo "Line 38" +echo "Line 39" +echo "Line 40" +echo "Line 41" +echo "Line 42" +echo "Line 43" +echo "Line 44" +echo "Line 45" +echo "Line 46" +echo "Line 47" +echo "Line 48" +echo "Line 49" +echo "Line 50" +Allow execution of: 'echo'? + +● 1. Allow once + 2. Allow for this session + 3. No, suggest changes (esc) +" +`; + exports[`ToolConfirmationMessage > should display multiple commands for exec type when provided 1`] = ` "echo "hello" @@ -53,7 +137,7 @@ Do you want to proceed? " `; -exports[`ToolConfirmationMessage > should render multiline shell scripts with correct newlines and syntax highlighting (SVG snapshot) 1`] = ` +exports[`ToolConfirmationMessage > should render multiline shell scripts with correct newlines and syntax highlighting 1`] = ` "echo "hello" for i in 1 2 3; do echo $i diff --git a/packages/cli/src/ui/components/messages/__snapshots__/ToolResultDisplay.test.tsx.snap b/packages/cli/src/ui/components/messages/__snapshots__/ToolResultDisplay.test.tsx.snap index 5e5c7ea2b0..e34e66cc48 100644 --- a/packages/cli/src/ui/components/messages/__snapshots__/ToolResultDisplay.test.tsx.snap +++ b/packages/cli/src/ui/components/messages/__snapshots__/ToolResultDisplay.test.tsx.snap @@ -37,8 +37,7 @@ exports[`ToolResultDisplay > renders string result as plain text when renderOutp `; exports[`ToolResultDisplay > truncates very long string results 1`] = ` -"... 248 hidden (Ctrl+O) ... -aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +"... 249 hidden (Ctrl+O) ... aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa diff --git a/packages/cli/src/ui/components/shared/BaseSelectionList.test.tsx b/packages/cli/src/ui/components/shared/BaseSelectionList.test.tsx index 0501667d1f..b873de80d9 100644 --- a/packages/cli/src/ui/components/shared/BaseSelectionList.test.tsx +++ b/packages/cli/src/ui/components/shared/BaseSelectionList.test.tsx @@ -447,6 +447,28 @@ describe('BaseSelectionList', () => { unmount(); }); + it('should correctly calculate scroll offset during the initial render phase', async () => { + // Verify that the component correctly calculates the scroll offset during the + // initial render pass when starting with a high activeIndex. + // List length 10, max items 3, activeIndex 9 (last item). + const { unmount } = await renderScrollableList(9); + + const renderedItemValues = mockRenderItem.mock.calls.map( + (call) => call[0].value, + ); + + // Item 1 (index 0) should not be rendered if the scroll offset is correctly + // synchronized with the activeIndex from the start. + expect(renderedItemValues).not.toContain('Item 1'); + + // The items at the end of the list should be rendered. + expect(renderedItemValues).toContain('Item 8'); + expect(renderedItemValues).toContain('Item 9'); + expect(renderedItemValues).toContain('Item 10'); + + unmount(); + }); + it('should handle maxItemsToShow larger than the list length', async () => { const { lastFrame, unmount } = await renderComponent( { items: longList, maxItemsToShow: 15 }, diff --git a/packages/cli/src/ui/components/shared/BaseSelectionList.tsx b/packages/cli/src/ui/components/shared/BaseSelectionList.tsx index 1090d4010d..455069f03f 100644 --- a/packages/cli/src/ui/components/shared/BaseSelectionList.tsx +++ b/packages/cli/src/ui/components/shared/BaseSelectionList.tsx @@ -5,7 +5,7 @@ */ import type React from 'react'; -import { useEffect, useState } from 'react'; +import { useState } from 'react'; import { Text, Box } from 'ink'; import { theme } from '../../semantic-colors.js'; import { @@ -84,20 +84,27 @@ export function BaseSelectionList< const [scrollOffset, setScrollOffset] = useState(0); - // Handle scrolling for long lists - useEffect(() => { - const newScrollOffset = Math.max( + // Derive the effective scroll offset during render to avoid "no-selection" flicker. + // This ensures that the visibleItems calculation uses an offset that includes activeIndex. + let effectiveScrollOffset = scrollOffset; + if (activeIndex < effectiveScrollOffset) { + effectiveScrollOffset = activeIndex; + } else if (activeIndex >= effectiveScrollOffset + maxItemsToShow) { + effectiveScrollOffset = Math.max( 0, Math.min(activeIndex - maxItemsToShow + 1, items.length - maxItemsToShow), ); - if (activeIndex < scrollOffset) { - setScrollOffset(activeIndex); - } else if (activeIndex >= scrollOffset + maxItemsToShow) { - setScrollOffset(newScrollOffset); - } - }, [activeIndex, items.length, scrollOffset, maxItemsToShow]); + } - const visibleItems = items.slice(scrollOffset, scrollOffset + maxItemsToShow); + // Synchronize state if it changed during derivation + if (effectiveScrollOffset !== scrollOffset) { + setScrollOffset(effectiveScrollOffset); + } + + const visibleItems = items.slice( + effectiveScrollOffset, + effectiveScrollOffset + maxItemsToShow, + ); const numberColumnWidth = String(items.length).length; return ( @@ -105,14 +112,18 @@ export function BaseSelectionList< {/* Use conditional coloring instead of conditional rendering */} {showScrollArrows && items.length > maxItemsToShow && ( 0 ? theme.text.primary : theme.text.secondary} + color={ + effectiveScrollOffset > 0 + ? theme.text.primary + : theme.text.secondary + } > ▲ )} {visibleItems.map((item, index) => { - const itemIndex = scrollOffset + index; + const itemIndex = effectiveScrollOffset + index; const isSelected = activeIndex === itemIndex; // Determine colors based on selection and disabled state @@ -182,7 +193,7 @@ export function BaseSelectionList< {showScrollArrows && items.length > maxItemsToShow && ( = ({ color = theme.border.default, + dim = false, }) => ( = ({ borderLeft={false} borderRight={false} borderColor={color} + borderDimColor={dim} /> ); diff --git a/packages/cli/src/ui/constants/tips.ts b/packages/cli/src/ui/constants/tips.ts index 15aa86c118..922465347a 100644 --- a/packages/cli/src/ui/constants/tips.ts +++ b/packages/cli/src/ui/constants/tips.ts @@ -75,90 +75,91 @@ export const INFORMATIVE_TIPS = [ 'Set the character threshold for truncating tool outputs (/settings)…', 'Set the number of lines to keep when truncating outputs (/settings)…', 'Enable policy-based tool confirmation via message bus (/settings)…', + 'Enable write_todos_list tool to generate task lists (/settings)…', 'Enable experimental subagents for task delegation (/settings)…', 'Enable extension management features (settings.json)…', 'Enable extension reloading within the CLI session (settings.json)…', //Settings tips end here // Keyboard shortcut tips start here - 'Close dialogs and suggestions with Esc…', - 'Cancel a request with Ctrl+C, or press twice to exit…', - 'Exit the app with Ctrl+D on an empty line…', - 'Clear your screen at any time with Ctrl+L…', - 'Toggle the debug console display with F12…', - 'Toggle the todo list display with Ctrl+T…', - 'See full, untruncated responses with Ctrl+O…', - 'Toggle auto-approval (YOLO mode) for all tools with Ctrl+Y…', - 'Cycle through approval modes (Default, Auto-Edit, Plan) with Shift+Tab…', - 'Toggle Markdown rendering (raw markdown mode) with Alt+M…', - 'Toggle shell mode by typing ! in an empty prompt…', - 'Insert a newline with a backslash (\\) followed by Enter…', - 'Navigate your prompt history with the Up and Down arrows…', - 'You can also use Ctrl+P (up) and Ctrl+N (down) for history…', - 'Search through command history with Ctrl+R…', - 'Accept an autocomplete suggestion with Tab or Enter…', - 'Move to the start of the line with Ctrl+A or Home…', - 'Move to the end of the line with Ctrl+E or End…', - 'Move one character left or right with Ctrl+B/F or the arrow keys…', - 'Move one word left or right with Ctrl+Left/Right Arrow…', - 'Delete the character to the left with Ctrl+H or Backspace…', - 'Delete the character to the right with Ctrl+D or Delete…', - 'Delete the word to the left of the cursor with Ctrl+W…', - 'Delete the word to the right of the cursor with Ctrl+Delete…', - 'Delete from the cursor to the start of the line with Ctrl+U…', - 'Delete from the cursor to the end of the line with Ctrl+K…', - 'Clear the entire input prompt with a double-press of Esc…', - 'Paste from your clipboard with Ctrl+V…', - 'Undo text edits in the input with Alt+Z or Cmd+Z…', - 'Redo undone text edits with Shift+Alt+Z or Shift+Cmd+Z…', - 'Open the current prompt in an external editor with Ctrl+X…', - 'In menus, move up/down with k/j or the arrow keys…', - 'In menus, select an item by typing its number…', - "If you're using an IDE, see the context with Ctrl+G…", - 'Toggle background shells with Ctrl+B or /shells...', - 'Toggle the background shell process list with Ctrl+L...', + 'Close dialogs and suggestions with Esc', + 'Cancel a request with Ctrl+C, or press twice to exit', + 'Exit the app with Ctrl+D on an empty line', + 'Clear your screen at any time with Ctrl+L', + 'Toggle the debug console display with F12', + 'Toggle the todo list display with Ctrl+T', + 'See full, untruncated responses with Ctrl+O', + 'Toggle auto-approval (YOLO mode) for all tools with Ctrl+Y', + 'Cycle through approval modes (Default, Auto-Edit, Plan) with Shift+Tab', + 'Toggle Markdown rendering (raw markdown mode) with Alt+M', + 'Toggle shell mode by typing ! in an empty prompt', + 'Insert a newline with a backslash (\\) followed by Enter', + 'Navigate your prompt history with the Up and Down arrows', + 'You can also use Ctrl+P (up) and Ctrl+N (down) for history', + 'Search through command history with Ctrl+R', + 'Accept an autocomplete suggestion with Tab or Enter', + 'Move to the start of the line with Ctrl+A or Home', + 'Move to the end of the line with Ctrl+E or End', + 'Move one character left or right with Ctrl+B/F or the arrow keys', + 'Move one word left or right with Ctrl+Left/Right Arrow', + 'Delete the character to the left with Ctrl+H or Backspace', + 'Delete the character to the right with Ctrl+D or Delete', + 'Delete the word to the left of the cursor with Ctrl+W', + 'Delete the word to the right of the cursor with Ctrl+Delete', + 'Delete from the cursor to the start of the line with Ctrl+U', + 'Delete from the cursor to the end of the line with Ctrl+K', + 'Clear the entire input prompt with a double-press of Esc', + 'Paste from your clipboard with Ctrl+V', + 'Undo text edits in the input with Alt+Z or Cmd+Z', + 'Redo undone text edits with Shift+Alt+Z or Shift+Cmd+Z', + 'Open the current prompt in an external editor with Ctrl+X', + 'In menus, move up/down with k/j or the arrow keys', + 'In menus, select an item by typing its number', + "If you're using an IDE, see the context with Ctrl+G", + 'Toggle background shells with Ctrl+B or /shells', + 'Toggle the background shell process list with Ctrl+L', // Keyboard shortcut tips end here // Command tips start here - 'Show version info with /about…', - 'Change your authentication method with /auth…', - 'File a bug report directly with /bug…', - 'List your saved chat checkpoints with /resume list…', - 'Save your current conversation with /resume save …', - 'Resume a saved conversation with /resume resume …', - 'Delete a conversation checkpoint with /resume delete …', - 'Share your conversation to a file with /resume share …', - 'Clear the screen and history with /clear…', - 'Save tokens by summarizing the context with /compress…', - 'Copy the last response to your clipboard with /copy…', - 'Open the full documentation in your browser with /docs…', - 'Add directories to your workspace with /directory add …', - 'Show all directories in your workspace with /directory show…', - 'Use /dir as a shortcut for /directory…', - 'Set your preferred external editor with /editor…', - 'List all active extensions with /extensions list…', - 'Update all or specific extensions with /extensions update…', - 'Get help on commands with /help…', - 'Manage IDE integration with /ide…', - 'Create a project-specific GEMINI.md file with /init…', - 'List configured MCP servers and tools with /mcp list…', - 'Authenticate with an OAuth-enabled MCP server with /mcp auth…', - 'Reload MCP servers with /mcp reload…', - 'See the current instructional context with /memory show…', - 'Add content to the instructional memory with /memory add…', - 'Reload instructional context from GEMINI.md files with /memory reload…', - 'List the paths of the GEMINI.md files in use with /memory list…', - 'Choose your Gemini model with /model…', - 'Display the privacy notice with /privacy…', - 'Restore project files to a previous state with /restore…', - 'Exit the CLI with /quit or /exit…', - 'Check model-specific usage stats with /stats model…', - 'Check tool-specific usage stats with /stats tools…', - "Change the CLI's color theme with /theme…", - 'List all available tools with /tools…', - 'View and edit settings with the /settings editor…', - 'Toggle Vim keybindings on and off with /vim…', - 'Set up GitHub Actions with /setup-github…', - 'Configure terminal keybindings for multiline input with /terminal-setup…', - 'Find relevant documentation with /find-docs…', - 'Execute any shell command with !…', + 'Show version info with /about', + 'Change your authentication method with /auth', + 'File a bug report directly with /bug', + 'List your saved chat checkpoints with /resume list', + 'Save your current conversation with /resume save ', + 'Resume a saved conversation with /resume resume ', + 'Delete a conversation checkpoint with /resume delete ', + 'Share your conversation to a file with /resume share ', + 'Clear the screen and history with /clear', + 'Save tokens by summarizing the context with /compress', + 'Copy the last response to your clipboard with /copy', + 'Open the full documentation in your browser with /docs', + 'Add directories to your workspace with /directory add ', + 'Show all directories in your workspace with /directory show', + 'Use /dir as a shortcut for /directory', + 'Set your preferred external editor with /editor', + 'List all active extensions with /extensions list', + 'Update all or specific extensions with /extensions update', + 'Get help on commands with /help', + 'Manage IDE integration with /ide', + 'Create a project-specific GEMINI.md file with /init', + 'List configured MCP servers and tools with /mcp list', + 'Authenticate with an OAuth-enabled MCP server with /mcp auth', + 'Reload MCP servers with /mcp reload', + 'See the current instructional context with /memory show', + 'Add content to the instructional memory with /memory add', + 'Reload instructional context from GEMINI.md files with /memory reload', + 'List the paths of the GEMINI.md files in use with /memory list', + 'Choose your Gemini model with /model', + 'Display the privacy notice with /privacy', + 'Restore project files to a previous state with /restore', + 'Exit the CLI with /quit or /exit', + 'Check model-specific usage stats with /stats model', + 'Check tool-specific usage stats with /stats tools', + "Change the CLI's color theme with /theme", + 'List all available tools with /tools', + 'View and edit settings with the /settings editor', + 'Toggle Vim keybindings on and off with /vim', + 'Set up GitHub Actions with /setup-github', + 'Configure terminal keybindings for multiline input with /terminal-setup', + 'Find relevant documentation with /find-docs', + 'Execute any shell command with !', // Command tips end here ]; diff --git a/packages/cli/src/ui/constants/wittyPhrases.ts b/packages/cli/src/ui/constants/wittyPhrases.ts index a8facd9e5a..e37a74593f 100644 --- a/packages/cli/src/ui/constants/wittyPhrases.ts +++ b/packages/cli/src/ui/constants/wittyPhrases.ts @@ -6,113 +6,113 @@ export const WITTY_LOADING_PHRASES = [ "I'm Feeling Lucky", - 'Shipping awesomeness… ', - 'Painting the serifs back on…', - 'Navigating the slime mold…', - 'Consulting the digital spirits…', - 'Reticulating splines…', - 'Warming up the AI hamsters…', - 'Asking the magic conch shell…', - 'Generating witty retort…', - 'Polishing the algorithms…', - "Don't rush perfection (or my code)…", - 'Brewing fresh bytes…', - 'Counting electrons…', - 'Engaging cognitive processors…', - 'Checking for syntax errors in the universe…', - 'One moment, optimizing humor…', - 'Shuffling punchlines…', - 'Untangling neural nets…', - 'Compiling brilliance…', - 'Loading wit.exe…', - 'Summoning the cloud of wisdom…', - 'Preparing a witty response…', - "Just a sec, I'm debugging reality…", - 'Confuzzling the options…', - 'Tuning the cosmic frequencies…', - 'Crafting a response worthy of your patience…', - 'Compiling the 1s and 0s…', - 'Resolving dependencies… and existential crises…', - 'Defragmenting memories… both RAM and personal…', - 'Rebooting the humor module…', - 'Caching the essentials (mostly cat memes)…', + 'Shipping awesomeness', + 'Painting the serifs back on', + 'Navigating the slime mold', + 'Consulting the digital spirits', + 'Reticulating splines', + 'Warming up the AI hamsters', + 'Asking the magic conch shell', + 'Generating witty retort', + 'Polishing the algorithms', + "Don't rush perfection (or my code)", + 'Brewing fresh bytes', + 'Counting electrons', + 'Engaging cognitive processors', + 'Checking for syntax errors in the universe', + 'One moment, optimizing humor', + 'Shuffling punchlines', + 'Untangling neural nets', + 'Compiling brilliance', + 'Loading wit.exe', + 'Summoning the cloud of wisdom', + 'Preparing a witty response', + "Just a sec, I'm debugging reality", + 'Confuzzling the options', + 'Tuning the cosmic frequencies', + 'Crafting a response worthy of your patience', + 'Compiling the 1s and 0s', + 'Resolving dependencies… and existential crises', + 'Defragmenting memories… both RAM and personal', + 'Rebooting the humor module', + 'Caching the essentials (mostly cat memes)', 'Optimizing for ludicrous speed', - "Swapping bits… don't tell the bytes…", - 'Garbage collecting… be right back…', - 'Assembling the interwebs…', - 'Converting coffee into code…', - 'Updating the syntax for reality…', - 'Rewiring the synapses…', - 'Looking for a misplaced semicolon…', - "Greasin' the cogs of the machine…", - 'Pre-heating the servers…', - 'Calibrating the flux capacitor…', - 'Engaging the improbability drive…', - 'Channeling the Force…', - 'Aligning the stars for optimal response…', - 'So say we all…', - 'Loading the next great idea…', - "Just a moment, I'm in the zone…", - 'Preparing to dazzle you with brilliance…', - "Just a tick, I'm polishing my wit…", - "Hold tight, I'm crafting a masterpiece…", - "Just a jiffy, I'm debugging the universe…", - "Just a moment, I'm aligning the pixels…", - "Just a sec, I'm optimizing the humor…", - "Just a moment, I'm tuning the algorithms…", - 'Warp speed engaged…', - 'Mining for more Dilithium crystals…', - "Don't panic…", - 'Following the white rabbit…', - 'The truth is in here… somewhere…', - 'Blowing on the cartridge…', + "Swapping bits… don't tell the bytes", + 'Garbage collecting… be right back', + 'Assembling the interwebs', + 'Converting coffee into code', + 'Updating the syntax for reality', + 'Rewiring the synapses', + 'Looking for a misplaced semicolon', + "Greasin' the cogs of the machine", + 'Pre-heating the servers', + 'Calibrating the flux capacitor', + 'Engaging the improbability drive', + 'Channeling the Force', + 'Aligning the stars for optimal response', + 'So say we all', + 'Loading the next great idea', + "Just a moment, I'm in the zone", + 'Preparing to dazzle you with brilliance', + "Just a tick, I'm polishing my wit", + "Hold tight, I'm crafting a masterpiece", + "Just a jiffy, I'm debugging the universe", + "Just a moment, I'm aligning the pixels", + "Just a sec, I'm optimizing the humor", + "Just a moment, I'm tuning the algorithms", + 'Warp speed engaged', + 'Mining for more Dilithium crystals', + "Don't panic", + 'Following the white rabbit', + 'The truth is in here… somewhere', + 'Blowing on the cartridge', 'Loading… Do a barrel roll!', - 'Waiting for the respawn…', - 'Finishing the Kessel Run in less than 12 parsecs…', - "The cake is not a lie, it's just still loading…", - 'Fiddling with the character creation screen…', - "Just a moment, I'm finding the right meme…", - "Pressing 'A' to continue…", - 'Herding digital cats…', - 'Polishing the pixels…', - 'Finding a suitable loading screen pun…', - 'Distracting you with this witty phrase…', - 'Almost there… probably…', - 'Our hamsters are working as fast as they can…', - 'Giving Cloudy a pat on the head…', - 'Petting the cat…', - 'Rickrolling my boss…', - 'Slapping the bass…', - 'Tasting the snozberries…', - "I'm going the distance, I'm going for speed…", - 'Is this the real life? Is this just fantasy?…', - "I've got a good feeling about this…", - 'Poking the bear…', - 'Doing research on the latest memes…', - 'Figuring out how to make this more witty…', - 'Hmmm… let me think…', - 'What do you call a fish with no eyes? A fsh…', - 'Why did the computer go to therapy? It had too many bytes…', - "Why don't programmers like nature? It has too many bugs…", - 'Why do programmers prefer dark mode? Because light attracts bugs…', - 'Why did the developer go broke? Because they used up all their cache…', - "What can you do with a broken pencil? Nothing, it's pointless…", - 'Applying percussive maintenance…', - 'Searching for the correct USB orientation…', - 'Ensuring the magic smoke stays inside the wires…', - 'Rewriting in Rust for no particular reason…', - 'Trying to exit Vim…', - 'Spinning up the hamster wheel…', - "That's not a bug, it's an undocumented feature…", + 'Waiting for the respawn', + 'Finishing the Kessel Run in less than 12 parsecs', + "The cake is not a lie, it's just still loading", + 'Fiddling with the character creation screen', + "Just a moment, I'm finding the right meme", + "Pressing 'A' to continue", + 'Herding digital cats', + 'Polishing the pixels', + 'Finding a suitable loading screen pun', + 'Distracting you with this witty phrase', + 'Almost there… probably', + 'Our hamsters are working as fast as they can', + 'Giving Cloudy a pat on the head', + 'Petting the cat', + 'Rickrolling my boss', + 'Slapping the bass', + 'Tasting the snozberries', + "I'm going the distance, I'm going for speed", + 'Is this the real life? Is this just fantasy?', + "I've got a good feeling about this", + 'Poking the bear', + 'Doing research on the latest memes', + 'Figuring out how to make this more witty', + 'Hmmm… let me think', + 'What do you call a fish with no eyes? A fsh', + 'Why did the computer go to therapy? It had too many bytes', + "Why don't programmers like nature? It has too many bugs", + 'Why do programmers prefer dark mode? Because light attracts bugs', + 'Why did the developer go broke? Because they used up all their cache', + "What can you do with a broken pencil? Nothing, it's pointless", + 'Applying percussive maintenance', + 'Searching for the correct USB orientation', + 'Ensuring the magic smoke stays inside the wires', + 'Rewriting in Rust for no particular reason', + 'Trying to exit Vim', + 'Spinning up the hamster wheel', + "That's not a bug, it's an undocumented feature", 'Engage.', "I'll be back… with an answer.", - 'My other process is a TARDIS…', - 'Communing with the machine spirit…', - 'Letting the thoughts marinate…', - 'Just remembered where I put my keys…', - 'Pondering the orb…', + 'My other process is a TARDIS', + 'Communing with the machine spirit', + 'Letting the thoughts marinate', + 'Just remembered where I put my keys', + 'Pondering the orb', "I've seen things you people wouldn't believe… like a user who reads loading messages.", - 'Initiating thoughtful gaze…', + 'Initiating thoughtful gaze', "What's a computer's favorite snack? Microchips.", "Why do Java developers wear glasses? Because they don't C#.", 'Charging the laser… pew pew!', @@ -120,18 +120,18 @@ export const WITTY_LOADING_PHRASES = [ 'Looking for an adult superviso… I mean, processing.', 'Making it go beep boop.', 'Buffering… because even AIs need a moment.', - 'Entangling quantum particles for a faster response…', + 'Entangling quantum particles for a faster response', 'Polishing the chrome… on the algorithms.', 'Are you not entertained? (Working on it!)', 'Summoning the code gremlins… to help, of course.', - 'Just waiting for the dial-up tone to finish…', + 'Just waiting for the dial-up tone to finish', 'Recalibrating the humor-o-meter.', 'My other loading screen is even funnier.', - "Pretty sure there's a cat walking on the keyboard somewhere…", + "Pretty sure there's a cat walking on the keyboard somewhere", 'Enhancing… Enhancing… Still loading.', "It's not a bug, it's a feature… of this loading screen.", 'Have you tried turning it off and on again? (The loading screen, not me.)', - 'Constructing additional pylons…', + 'Constructing additional pylons', 'New line? That’s Ctrl+J.', - 'Releasing the HypnoDrones…', + 'Releasing the HypnoDrones', ]; diff --git a/packages/cli/src/ui/contexts/UIStateContext.tsx b/packages/cli/src/ui/contexts/UIStateContext.tsx index d393be8fe2..b77a56bbc3 100644 --- a/packages/cli/src/ui/contexts/UIStateContext.tsx +++ b/packages/cli/src/ui/contexts/UIStateContext.tsx @@ -166,6 +166,8 @@ export interface UIState { cleanUiDetailsVisible: boolean; elapsedTime: number; currentLoadingPhrase: string | undefined; + currentTip: string | undefined; + currentWittyPhrase: string | undefined; historyRemountKey: number; activeHooks: ActiveHook[]; messageQueue: string[]; diff --git a/packages/cli/src/ui/hooks/__snapshots__/usePhraseCycler.test.tsx.snap b/packages/cli/src/ui/hooks/__snapshots__/usePhraseCycler.test.tsx.snap deleted file mode 100644 index 77d028caa7..0000000000 --- a/packages/cli/src/ui/hooks/__snapshots__/usePhraseCycler.test.tsx.snap +++ /dev/null @@ -1,11 +0,0 @@ -// Vitest Snapshot v1, https://vitest.dev/guide/snapshot.html - -exports[`usePhraseCycler > should prioritize interactive shell waiting over normal waiting immediately 1`] = `"Waiting for user confirmation..."`; - -exports[`usePhraseCycler > should prioritize interactive shell waiting over normal waiting immediately 2`] = `"Interactive shell awaiting input... press tab to focus shell"`; - -exports[`usePhraseCycler > should reset phrase when transitioning from waiting to active 1`] = `"Waiting for user confirmation..."`; - -exports[`usePhraseCycler > should show "Waiting for user confirmation..." when isWaiting is true 1`] = `"Waiting for user confirmation..."`; - -exports[`usePhraseCycler > should show interactive shell waiting message immediately when isInteractiveShellWaiting is true 1`] = `"Interactive shell awaiting input... press tab to focus shell"`; diff --git a/packages/cli/src/ui/hooks/slashCommandProcessor.ts b/packages/cli/src/ui/hooks/slashCommandProcessor.ts index 20ed225186..1839670df7 100644 --- a/packages/cli/src/ui/hooks/slashCommandProcessor.ts +++ b/packages/cli/src/ui/hooks/slashCommandProcessor.ts @@ -505,7 +505,9 @@ export const useSlashCommandProcessor = ( const props = result.props as Record; if ( !props || + // eslint-disable-next-line no-restricted-syntax typeof props['name'] !== 'string' || + // eslint-disable-next-line no-restricted-syntax typeof props['displayName'] !== 'string' || !props['definition'] ) { diff --git a/packages/cli/src/ui/hooks/useAtCompletion.test.ts b/packages/cli/src/ui/hooks/useAtCompletion.test.ts index 381849a1d2..27e779acef 100644 --- a/packages/cli/src/ui/hooks/useAtCompletion.test.ts +++ b/packages/cli/src/ui/hooks/useAtCompletion.test.ts @@ -674,6 +674,7 @@ describe('useAtCompletion', () => { multiDirTmpDirs.push(addedDir); const multiDirConfig = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...mockConfig, getWorkspaceContext: vi.fn().mockReturnValue({ getDirectories: () => [cwdDir, addedDir], @@ -706,6 +707,7 @@ describe('useAtCompletion', () => { const directories = [cwdDir]; const dynamicConfig = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...mockConfig, getWorkspaceContext: vi.fn().mockReturnValue({ getDirectories: () => [...directories], @@ -750,6 +752,7 @@ describe('useAtCompletion', () => { multiDirTmpDirs.push(dir2); const multiDirConfig = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...mockConfig, getWorkspaceContext: vi.fn().mockReturnValue({ getDirectories: () => [dir1, dir2], diff --git a/packages/cli/src/ui/hooks/useGeminiStream.test.tsx b/packages/cli/src/ui/hooks/useGeminiStream.test.tsx index b912dbe4f8..7858ad6ede 100644 --- a/packages/cli/src/ui/hooks/useGeminiStream.test.tsx +++ b/packages/cli/src/ui/hooks/useGeminiStream.test.tsx @@ -32,7 +32,10 @@ import type { Config, EditorType, AnyToolInvocation, + AnyDeclarativeTool, SpanMetadata, + CompletedToolCall, + ToolCallRequestInfo, } from '@google/gemini-cli-core'; import { CoreToolCallStatus, @@ -52,7 +55,11 @@ import { } from '@google/gemini-cli-core'; import type { Part, PartListUnion } from '@google/genai'; import type { UseHistoryManagerReturn } from './useHistoryManager.js'; -import type { SlashCommandProcessorResult } from '../types.js'; +import type { + SlashCommandProcessorResult, + HistoryItemWithoutId, + HistoryItem, +} from '../types.js'; import { MessageType, StreamingState } from '../types.js'; import type { LoadedSettings } from '../../config/settings.js'; @@ -138,7 +145,6 @@ const mockRunInDevTraceSpan = vi.hoisted(() => }; return await fn({ metadata, - endSpan: vi.fn(), }); }), ); @@ -243,8 +249,10 @@ describe('useGeminiStream', () => { let mockMarkToolsAsSubmitted: Mock; let handleAtCommandSpy: MockInstance; - const emptyHistory: any[] = []; - let capturedOnComplete: any = null; + const emptyHistory: HistoryItem[] = []; + let capturedOnComplete: + | ((tools: CompletedToolCall[]) => Promise) + | null = null; const mockGetPreferredEditor = vi.fn(() => 'vscode' as EditorType); const mockOnAuthError = vi.fn(); const mockPerformMemoryRefresh = vi.fn(() => Promise.resolve()); @@ -403,13 +411,17 @@ describe('useGeminiStream', () => { lastToolCalls, mockScheduleToolCalls, mockMarkToolsAsSubmitted, - (updater: any) => { + ( + updater: + | TrackedToolCall[] + | ((prev: TrackedToolCall[]) => TrackedToolCall[]), + ) => { lastToolCalls = typeof updater === 'function' ? updater(lastToolCalls) : updater; rerender({ ...initialProps, toolCalls: lastToolCalls }); }, - (...args: any[]) => { - mockCancelAllToolCalls(...args); + (signal: AbortSignal) => { + mockCancelAllToolCalls(signal); lastToolCalls = lastToolCalls.map((tc) => { if ( tc.status === CoreToolCallStatus.AwaitingApproval || @@ -876,7 +888,7 @@ describe('useGeminiStream', () => { const fn = spanArgs[1]; const metadata = { attributes: {} }; await act(async () => { - await fn({ metadata, endSpan: vi.fn() }); + await fn({ metadata }); }); expect(metadata).toMatchObject({ input: sentParts, @@ -970,7 +982,7 @@ describe('useGeminiStream', () => { }); it('should stop agent execution immediately when a tool call returns STOP_EXECUTION error', async () => { - const stopExecutionToolCalls: TrackedToolCall[] = [ + const stopExecutionToolCalls: TrackedCompletedToolCall[] = [ { request: { callId: 'stop-call', @@ -1042,7 +1054,7 @@ describe('useGeminiStream', () => { }); it('should add a compact suppressed-error note before STOP_EXECUTION terminal info in low verbosity mode', async () => { - const stopExecutionToolCalls: TrackedToolCall[] = [ + const stopExecutionToolCalls: TrackedCompletedToolCall[] = [ { request: { callId: 'stop-call', @@ -1069,6 +1081,7 @@ describe('useGeminiStream', () => { } as unknown as TrackedCompletedToolCall, ]; const lowVerbositySettings = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...mockLoadedSettings, merged: { ...mockLoadedSettings.merged, @@ -1922,6 +1935,120 @@ describe('useGeminiStream', () => { expect(mockHandleSlashCommand).not.toHaveBeenCalled(); }); }); + + it('should record client-initiated tool calls in GeminiChat history', async () => { + const { result, client: mockGeminiClient } = await renderTestHook(); + + mockHandleSlashCommand.mockResolvedValue({ + type: 'schedule_tool', + toolName: 'activate_skill', + toolArgs: { name: 'test-skill' }, + }); + + await act(async () => { + await result.current.submitQuery('/test-skill'); + }); + + // Simulate tool completion + const completedTool = { + request: { + callId: 'test-call-id', + name: 'activate_skill', + args: { name: 'test-skill' }, + isClientInitiated: true, + }, + status: CoreToolCallStatus.Success, + invocation: { + getDescription: () => 'Activating skill test-skill', + }, + tool: { + isOutputMarkdown: true, + }, + response: { + responseParts: [ + { + functionResponse: { + name: 'activate_skill', + response: { content: 'skill instructions' }, + }, + }, + ], + }, + } as unknown as TrackedCompletedToolCall; + + await act(async () => { + if (capturedOnComplete) { + await capturedOnComplete([completedTool]); + } + }); + + // Verify that the tool call and response were added to GeminiChat history + expect(mockGeminiClient.addHistory).toHaveBeenCalledWith({ + role: 'model', + parts: [ + { + functionCall: { + name: 'activate_skill', + args: { name: 'test-skill' }, + }, + }, + ], + }); + expect(mockGeminiClient.addHistory).toHaveBeenCalledWith({ + role: 'user', + parts: completedTool.response.responseParts, + }); + }); + + it('should NOT record other client-initiated tool calls (like save_memory) in history', async () => { + const { result, client: mockGeminiClient } = await renderTestHook(); + + mockHandleSlashCommand.mockResolvedValue({ + type: 'schedule_tool', + toolName: 'save_memory', + toolArgs: { fact: 'test fact' }, + }); + + await act(async () => { + await result.current.submitQuery('/memory add "test fact"'); + }); + + // Simulate tool completion + const completedTool = { + request: { + callId: 'test-call-id', + name: 'save_memory', + args: { fact: 'test fact' }, + isClientInitiated: true, + }, + status: CoreToolCallStatus.Success, + invocation: { + getDescription: () => 'Saving memory', + }, + tool: { + isOutputMarkdown: true, + }, + response: { + responseParts: [ + { + functionResponse: { + name: 'save_memory', + response: { success: true }, + }, + }, + ], + }, + } as unknown as TrackedCompletedToolCall; + + await act(async () => { + if (capturedOnComplete) { + await capturedOnComplete([completedTool]); + } + }); + + // Verify that addHistory was NOT called + expect(mockGeminiClient.addHistory).not.toHaveBeenCalled(); + }); }); describe('Memory Refresh on save_memory', () => { @@ -1949,7 +2076,7 @@ describe('useGeminiStream', () => { displayName: 'save_memory', description: 'Saves memory', build: vi.fn(), - } as any, + } as unknown as AnyDeclarativeTool, invocation: { getDescription: () => `Mock description`, } as unknown as AnyToolInvocation, @@ -2023,6 +2150,7 @@ describe('useGeminiStream', () => { ); const testConfig = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...mockConfig, getContentGenerator: vi.fn(), getContentGeneratorConfig: vi.fn(() => ({ @@ -2188,7 +2316,7 @@ describe('useGeminiStream', () => { displayName: 'replace', description: 'Replace text', build: vi.fn(), - } as any, + } as unknown as AnyDeclarativeTool, invocation: { getDescription: () => 'Mock description', } as unknown as AnyToolInvocation, @@ -2229,7 +2357,7 @@ describe('useGeminiStream', () => { displayName: 'write_file', description: 'Write file', build: vi.fn(), - } as any, + } as unknown as AnyDeclarativeTool, invocation: { getDescription: () => 'Mock description', } as unknown as AnyToolInvocation, @@ -2574,14 +2702,14 @@ describe('useGeminiStream', () => { it('should flush pending text rationale before scheduling tool calls to ensure correct history order', async () => { const addItemOrder: string[] = []; - let capturedOnComplete: any; + let capturedOnComplete: (tools: CompletedToolCall[]) => Promise; const mockScheduleToolCalls = vi.fn(async (requests) => { addItemOrder.push('scheduleToolCalls_START'); // Simulate tools completing and triggering onComplete immediately. // This mimics the behavior that caused the regression where tool results // were added to history during the await scheduleToolCalls(...) block. - const tools = requests.map((r: any) => ({ + const tools = requests.map((r: ToolCallRequestInfo) => ({ request: r, status: CoreToolCallStatus.Success, tool: { displayName: r.name, name: r.name }, @@ -2596,7 +2724,7 @@ describe('useGeminiStream', () => { addItemOrder.push('scheduleToolCalls_END'); }); - mockAddItem.mockImplementation((item: any) => { + mockAddItem.mockImplementation((item: HistoryItemWithoutId) => { addItemOrder.push(`addItem:${item.type}`); }); @@ -2826,6 +2954,7 @@ describe('useGeminiStream', () => { describe('Thought Reset', () => { it('should keep full thinking entries in history when mode is full', async () => { const fullThinkingSettings: LoadedSettings = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...mockLoadedSettings, merged: { ...mockLoadedSettings.merged, @@ -3907,7 +4036,7 @@ describe('useGeminiStream', () => { const spanMetadata = {} as SpanMetadata; await act(async () => { - await userPromptCall![1]({ metadata: spanMetadata, endSpan: vi.fn() }); + await userPromptCall![1]({ metadata: spanMetadata }); }); expect(spanMetadata.input).toBe('telemetry test query'); }); diff --git a/packages/cli/src/ui/hooks/useGeminiStream.ts b/packages/cli/src/ui/hooks/useGeminiStream.ts index 2034e14b87..54006d2ab2 100644 --- a/packages/cli/src/ui/hooks/useGeminiStream.ts +++ b/packages/cli/src/ui/hooks/useGeminiStream.ts @@ -39,6 +39,7 @@ import { getPlanModeExitMessage, isBackgroundExecutionData, Kind, + ACTIVATE_SKILL_TOOL_NAME, } from '@google/gemini-cli-core'; import type { Config, @@ -548,11 +549,9 @@ export const useGeminiStream = ( if (tc.request.name === ASK_USER_TOOL_NAME && isInProgress) { return false; } - return ( - tc.status !== 'scheduled' && - tc.status !== 'validating' && - tc.status !== 'awaiting_approval' - ); + // ToolGroupMessage now shows all non-canceled tools, so they are visible + // in pending and we need to draw the closing border for them. + return true; }); if ( @@ -1658,7 +1657,7 @@ export const useGeminiStream = ( ) { let awaitingApprovalCalls = toolCalls.filter( (call): call is TrackedWaitingToolCall => - call.status === 'awaiting_approval', + call.status === 'awaiting_approval' && !call.request.forcedAsk, ); // For AUTO_EDIT mode, only approve edit tools (replace, write_file) @@ -1722,6 +1721,36 @@ export const useGeminiStream = ( ); if (clientTools.length > 0) { markToolsAsSubmitted(clientTools.map((t) => t.request.callId)); + + if (geminiClient) { + for (const tool of clientTools) { + // Only manually record skill activations in the chat history. + // Other client-initiated tools (like save_memory) update the system + // prompt/context and don't strictly need to be in the history. + if (tool.request.name !== ACTIVATE_SKILL_TOOL_NAME) { + continue; + } + + // Add both the call (model turn) and the result (user turn) to history. + // Client-initiated calls are essentially "synthetic" turns that let + // subsequent model calls understand what just happened in the UI. + await geminiClient.addHistory({ + role: 'model', + parts: [ + { + functionCall: { + name: tool.request.name, + args: tool.request.args, + }, + }, + ], + }); + await geminiClient.addHistory({ + role: 'user', + parts: tool.response.responseParts, + }); + } + } } // Identify new, successful save_memory calls that we haven't processed yet. diff --git a/packages/cli/src/ui/hooks/useHistoryManager.test.ts b/packages/cli/src/ui/hooks/useHistoryManager.test.ts index 0c304e3823..158d30e7a6 100644 --- a/packages/cli/src/ui/hooks/useHistoryManager.test.ts +++ b/packages/cli/src/ui/hooks/useHistoryManager.test.ts @@ -39,6 +39,56 @@ describe('useHistoryManager', () => { expect(result.current.history[0].id).toBeGreaterThanOrEqual(timestamp); }); + it('should generate strictly increasing IDs even if baseTimestamp goes backwards', async () => { + const { result } = await renderHook(() => useHistory()); + const timestamp = 1000000; + const itemData: Omit = { type: 'info', text: 'First' }; + + let id1!: number; + let id2!: number; + + act(() => { + id1 = result.current.addItem(itemData, timestamp); + // Try to add with a smaller timestamp + id2 = result.current.addItem(itemData, timestamp - 500); + }); + + expect(id1).toBe(timestamp); + expect(id2).toBe(id1 + 1); + expect(result.current.history[1].id).toBe(id2); + }); + + it('should ensure new IDs start after existing IDs when resuming a session', async () => { + const initialItems: HistoryItem[] = [ + { id: 5000, type: 'info', text: 'Existing' }, + ]; + const { result } = await renderHook(() => useHistory({ initialItems })); + + let newId!: number; + act(() => { + // Try to add with a timestamp smaller than the highest existing ID + newId = result.current.addItem({ type: 'info', text: 'New' }, 2000); + }); + + expect(newId).toBe(5001); + expect(result.current.history[1].id).toBe(5001); + }); + + it('should update lastIdRef when loading new history', async () => { + const { result } = await renderHook(() => useHistory()); + + act(() => { + result.current.loadHistory([{ id: 8000, type: 'info', text: 'Loaded' }]); + }); + + let newId!: number; + act(() => { + newId = result.current.addItem({ type: 'info', text: 'New' }, 1000); + }); + + expect(newId).toBe(8001); + }); + it('should generate unique IDs for items added with the same base timestamp', async () => { const { result } = await renderHook(() => useHistory()); const timestamp = Date.now(); @@ -215,8 +265,8 @@ describe('useHistoryManager', () => { const after = Date.now(); expect(result.current.history).toHaveLength(1); - // ID should be >= before + 1 (since counter starts at 0 and increments to 1) - expect(result.current.history[0].id).toBeGreaterThanOrEqual(before + 1); + // ID should be >= before (since baseTimestamp defaults to Date.now()) + expect(result.current.history[0].id).toBeGreaterThanOrEqual(before); expect(result.current.history[0].id).toBeLessThanOrEqual(after + 1); }); diff --git a/packages/cli/src/ui/hooks/useHistoryManager.ts b/packages/cli/src/ui/hooks/useHistoryManager.ts index 93f7f01f28..c6ceabb920 100644 --- a/packages/cli/src/ui/hooks/useHistoryManager.ts +++ b/packages/cli/src/ui/hooks/useHistoryManager.ts @@ -42,16 +42,22 @@ export function useHistory({ initialItems?: HistoryItem[]; } = {}): UseHistoryManagerReturn { const [history, setHistory] = useState(initialItems); - const messageIdCounterRef = useRef(0); + const lastIdRef = useRef( + initialItems.reduce((max, item) => Math.max(max, item.id), 0), + ); - // Generates a unique message ID based on a timestamp and a counter. + // Generates a unique message ID based on a timestamp, ensuring it is always + // greater than any previously assigned ID. const getNextMessageId = useCallback((baseTimestamp: number): number => { - messageIdCounterRef.current += 1; - return baseTimestamp + messageIdCounterRef.current; + const nextId = Math.max(baseTimestamp, lastIdRef.current + 1); + lastIdRef.current = nextId; + return nextId; }, []); const loadHistory = useCallback((newHistory: HistoryItem[]) => { setHistory(newHistory); + const maxId = newHistory.reduce((max, item) => Math.max(max, item.id), 0); + lastIdRef.current = Math.max(lastIdRef.current, maxId); }, []); // Adds a new item to the history state with a unique ID. @@ -153,7 +159,7 @@ export function useHistory({ // Clears the entire history state and resets the ID counter. const clearItems = useCallback(() => { setHistory([]); - messageIdCounterRef.current = 0; + lastIdRef.current = 0; }, []); return useMemo( diff --git a/packages/cli/src/ui/hooks/useHookDisplayState.ts b/packages/cli/src/ui/hooks/useHookDisplayState.ts index 6c9e1811ad..c98bc7ba29 100644 --- a/packages/cli/src/ui/hooks/useHookDisplayState.ts +++ b/packages/cli/src/ui/hooks/useHookDisplayState.ts @@ -43,6 +43,7 @@ export const useHookDisplayState = () => { { name: payload.hookName, eventName: payload.eventName, + source: payload.source, index: payload.hookIndex, total: payload.totalHooks, }, diff --git a/packages/cli/src/ui/hooks/useInlineEditBuffer.test.ts b/packages/cli/src/ui/hooks/useInlineEditBuffer.test.ts index b3a87f7c9a..eb0aa697fd 100644 --- a/packages/cli/src/ui/hooks/useInlineEditBuffer.test.ts +++ b/packages/cli/src/ui/hooks/useInlineEditBuffer.test.ts @@ -6,17 +6,30 @@ import { renderHook } from '../../test-utils/render.js'; import { act } from 'react'; -import { describe, it, expect, vi, beforeEach, type Mock } from 'vitest'; +import { + describe, + it, + expect, + vi, + beforeEach, + afterEach, + type Mock, +} from 'vitest'; import { useInlineEditBuffer } from './useInlineEditBuffer.js'; describe('useEditBuffer', () => { let mockOnCommit: Mock; beforeEach(() => { + vi.useFakeTimers(); vi.clearAllMocks(); mockOnCommit = vi.fn(); }); + afterEach(() => { + vi.useRealTimers(); + }); + it('should initialize with empty state', async () => { const { result } = await renderHook(() => useInlineEditBuffer({ onCommit: mockOnCommit }), diff --git a/packages/cli/src/ui/hooks/useLoadingIndicator.test.tsx b/packages/cli/src/ui/hooks/useLoadingIndicator.test.tsx index a16c6ea192..db6dc3f1e9 100644 --- a/packages/cli/src/ui/hooks/useLoadingIndicator.test.tsx +++ b/packages/cli/src/ui/hooks/useLoadingIndicator.test.tsx @@ -16,7 +16,6 @@ import { import { WITTY_LOADING_PHRASES } from '../constants/wittyPhrases.js'; import { INFORMATIVE_TIPS } from '../constants/tips.js'; import type { RetryAttemptPayload } from '@google/gemini-cli-core'; -import type { LoadingPhrasesMode } from '../../config/settings.js'; describe('useLoadingIndicator', () => { beforeEach(() => { @@ -34,7 +33,8 @@ describe('useLoadingIndicator', () => { initialStreamingState: StreamingState, initialShouldShowFocusHint: boolean = false, initialRetryStatus: RetryAttemptPayload | null = null, - loadingPhrasesMode: LoadingPhrasesMode = 'all', + initialShowTips: boolean = true, + initialShowWit: boolean = true, initialErrorVerbosity: 'low' | 'full' = 'full', ) => { let hookResult: ReturnType; @@ -42,30 +42,35 @@ describe('useLoadingIndicator', () => { streamingState, shouldShowFocusHint, retryStatus, - mode, + showTips, + showWit, errorVerbosity, }: { streamingState: StreamingState; shouldShowFocusHint?: boolean; retryStatus?: RetryAttemptPayload | null; - mode?: LoadingPhrasesMode; - errorVerbosity: 'low' | 'full'; + showTips?: boolean; + showWit?: boolean; + errorVerbosity?: 'low' | 'full'; }) { hookResult = useLoadingIndicator({ streamingState, shouldShowFocusHint: !!shouldShowFocusHint, retryStatus: retryStatus || null, - loadingPhrasesMode: mode, + showTips, + showWit, errorVerbosity, }); return null; } - const { rerender } = await render( + + const { rerender, waitUntilReady } = await render( , ); @@ -75,20 +80,25 @@ describe('useLoadingIndicator', () => { return hookResult; }, }, - rerender: (newProps: { + rerender: async (newProps: { streamingState: StreamingState; shouldShowFocusHint?: boolean; retryStatus?: RetryAttemptPayload | null; - mode?: LoadingPhrasesMode; + showTips?: boolean; + showWit?: boolean; errorVerbosity?: 'low' | 'full'; - }) => + }) => { rerender( , - ), + ); + await waitUntilReady(); + }, + waitUntilReady, }; }; @@ -106,13 +116,8 @@ describe('useLoadingIndicator', () => { false, ); - // Initially should be witty phrase or tip - expect([...WITTY_LOADING_PHRASES, ...INFORMATIVE_TIPS]).toContain( - result.current.currentLoadingPhrase, - ); - await act(async () => { - rerender({ + await rerender({ streamingState: StreamingState.Responding, shouldShowFocusHint: true, }); @@ -129,16 +134,14 @@ describe('useLoadingIndicator', () => { StreamingState.Responding, ); - // Initial phrase on first activation will be a tip, not necessarily from witty phrases expect(result.current.elapsedTime).toBe(0); - // On first activation, it may show a tip, so we can't guarantee it's in WITTY_LOADING_PHRASES await act(async () => { await vi.advanceTimersByTimeAsync(PHRASE_CHANGE_INTERVAL_MS + 1); }); - // Phrase should cycle if PHRASE_CHANGE_INTERVAL_MS has passed, now it should be witty since first activation already happened - expect(WITTY_LOADING_PHRASES).toContain( + // Both tip and witty phrase are available in the currentLoadingPhrase because it defaults to tip if present + expect([...WITTY_LOADING_PHRASES, ...INFORMATIVE_TIPS]).toContain( result.current.currentLoadingPhrase, ); }); @@ -153,8 +156,8 @@ describe('useLoadingIndicator', () => { }); expect(result.current.elapsedTime).toBe(60); - act(() => { - rerender({ streamingState: StreamingState.WaitingForConfirmation }); + await act(async () => { + await rerender({ streamingState: StreamingState.WaitingForConfirmation }); }); expect(result.current.currentLoadingPhrase).toBe( @@ -169,7 +172,7 @@ describe('useLoadingIndicator', () => { expect(result.current.elapsedTime).toBe(60); }); - it('should reset elapsedTime and use a witty phrase when transitioning from WaitingForConfirmation to Responding', async () => { + it('should reset elapsedTime and cycle phrases when transitioning from WaitingForConfirmation to Responding', async () => { vi.spyOn(Math, 'random').mockImplementation(() => 0.5); // Always witty const { result, rerender } = await renderLoadingIndicatorHook( StreamingState.Responding, @@ -180,19 +183,19 @@ describe('useLoadingIndicator', () => { }); expect(result.current.elapsedTime).toBe(5); - act(() => { - rerender({ streamingState: StreamingState.WaitingForConfirmation }); + await act(async () => { + await rerender({ streamingState: StreamingState.WaitingForConfirmation }); }); expect(result.current.elapsedTime).toBe(5); expect(result.current.currentLoadingPhrase).toBe( 'Waiting for user confirmation...', ); - act(() => { - rerender({ streamingState: StreamingState.Responding }); + await act(async () => { + await rerender({ streamingState: StreamingState.Responding }); }); expect(result.current.elapsedTime).toBe(0); // Should reset - expect(WITTY_LOADING_PHRASES).toContain( + expect([...WITTY_LOADING_PHRASES, ...INFORMATIVE_TIPS]).toContain( result.current.currentLoadingPhrase, ); @@ -213,18 +216,12 @@ describe('useLoadingIndicator', () => { }); expect(result.current.elapsedTime).toBe(10); - act(() => { - rerender({ streamingState: StreamingState.Idle }); + await act(async () => { + await rerender({ streamingState: StreamingState.Idle }); }); expect(result.current.elapsedTime).toBe(0); expect(result.current.currentLoadingPhrase).toBeUndefined(); - - // Timer should not advance - await act(async () => { - await vi.advanceTimersByTimeAsync(2000); - }); - expect(result.current.elapsedTime).toBe(0); }); it('should reflect retry status in currentLoadingPhrase when provided', async () => { @@ -255,7 +252,8 @@ describe('useLoadingIndicator', () => { StreamingState.Responding, false, retryStatus, - 'all', + true, + true, 'low', ); @@ -275,7 +273,8 @@ describe('useLoadingIndicator', () => { StreamingState.Responding, false, retryStatus, - 'all', + true, + true, 'low', ); @@ -284,12 +283,13 @@ describe('useLoadingIndicator', () => { ); }); - it('should show no phrases when loadingPhrasesMode is "off"', async () => { + it('should show no phrases when showTips and showWit are false', async () => { const { result } = await renderLoadingIndicatorHook( StreamingState.Responding, false, null, - 'off', + false, + false, ); expect(result.current.currentLoadingPhrase).toBeUndefined(); diff --git a/packages/cli/src/ui/hooks/useLoadingIndicator.ts b/packages/cli/src/ui/hooks/useLoadingIndicator.ts index 4f7b631844..6d13615761 100644 --- a/packages/cli/src/ui/hooks/useLoadingIndicator.ts +++ b/packages/cli/src/ui/hooks/useLoadingIndicator.ts @@ -12,7 +12,6 @@ import { getDisplayString, type RetryAttemptPayload, } from '@google/gemini-cli-core'; -import type { LoadingPhrasesMode } from '../../config/settings.js'; const LOW_VERBOSITY_RETRY_HINT_ATTEMPT_THRESHOLD = 2; @@ -20,18 +19,22 @@ export interface UseLoadingIndicatorProps { streamingState: StreamingState; shouldShowFocusHint: boolean; retryStatus: RetryAttemptPayload | null; - loadingPhrasesMode?: LoadingPhrasesMode; + showTips?: boolean; + showWit?: boolean; customWittyPhrases?: string[]; - errorVerbosity: 'low' | 'full'; + errorVerbosity?: 'low' | 'full'; + maxLength?: number; } export const useLoadingIndicator = ({ streamingState, shouldShowFocusHint, retryStatus, - loadingPhrasesMode, + showTips = true, + showWit = false, customWittyPhrases, - errorVerbosity, + errorVerbosity = 'full', + maxLength, }: UseLoadingIndicatorProps) => { const [timerResetKey, setTimerResetKey] = useState(0); const isTimerActive = streamingState === StreamingState.Responding; @@ -40,12 +43,15 @@ export const useLoadingIndicator = ({ const isPhraseCyclingActive = streamingState === StreamingState.Responding; const isWaiting = streamingState === StreamingState.WaitingForConfirmation; - const currentLoadingPhrase = usePhraseCycler( + + const { currentTip, currentWittyPhrase } = usePhraseCycler( isPhraseCyclingActive, isWaiting, shouldShowFocusHint, - loadingPhrasesMode, + showTips, + showWit, customWittyPhrases, + maxLength, ); const [retainedElapsedTime, setRetainedElapsedTime] = useState(0); @@ -86,6 +92,8 @@ export const useLoadingIndicator = ({ streamingState === StreamingState.WaitingForConfirmation ? retainedElapsedTime : elapsedTimeFromTimer, - currentLoadingPhrase: retryPhrase || currentLoadingPhrase, + currentLoadingPhrase: retryPhrase || currentTip || currentWittyPhrase, + currentTip, + currentWittyPhrase, }; }; diff --git a/packages/cli/src/ui/hooks/usePhraseCycler.test.tsx b/packages/cli/src/ui/hooks/usePhraseCycler.test.tsx index 81299870c7..82264442e6 100644 --- a/packages/cli/src/ui/hooks/usePhraseCycler.test.tsx +++ b/packages/cli/src/ui/hooks/usePhraseCycler.test.tsx @@ -11,33 +11,39 @@ import { Text } from 'ink'; import { usePhraseCycler, PHRASE_CHANGE_INTERVAL_MS, + INTERACTIVE_SHELL_WAITING_PHRASE, } from './usePhraseCycler.js'; import { INFORMATIVE_TIPS } from '../constants/tips.js'; import { WITTY_LOADING_PHRASES } from '../constants/wittyPhrases.js'; -import type { LoadingPhrasesMode } from '../../config/settings.js'; // Test component to consume the hook const TestComponent = ({ isActive, isWaiting, - isInteractiveShellWaiting = false, - loadingPhrasesMode = 'all', + shouldShowFocusHint = false, + showTips = true, + showWit = true, customPhrases, }: { isActive: boolean; isWaiting: boolean; - isInteractiveShellWaiting?: boolean; - loadingPhrasesMode?: LoadingPhrasesMode; + shouldShowFocusHint?: boolean; + showTips?: boolean; + showWit?: boolean; customPhrases?: string[]; }) => { - const phrase = usePhraseCycler( + const { currentTip, currentWittyPhrase } = usePhraseCycler( isActive, isWaiting, - isInteractiveShellWaiting, - loadingPhrasesMode, + shouldShowFocusHint, + showTips, + showWit, customPhrases, ); - return {phrase}; + // For tests, we'll combine them to verify existence + return ( + {[currentTip, currentWittyPhrase].filter(Boolean).join(' | ')} + ); }; describe('usePhraseCycler', () => { @@ -52,9 +58,10 @@ describe('usePhraseCycler', () => { it('should initialize with an empty string when not active and not waiting', async () => { vi.spyOn(Math, 'random').mockImplementation(() => 0.5); // Always witty - const { lastFrame, unmount } = await render( + const { lastFrame, unmount, waitUntilReady } = await render( , ); + await waitUntilReady(); expect(lastFrame({ allowEmpty: true }).trim()).toBe(''); unmount(); }); @@ -63,33 +70,35 @@ describe('usePhraseCycler', () => { const { lastFrame, rerender, waitUntilReady, unmount } = await render( , ); + await waitUntilReady(); await act(async () => { rerender(); }); await waitUntilReady(); - expect(lastFrame().trim()).toMatchSnapshot(); + expect(lastFrame().trim()).toBe('Waiting for user confirmation...'); unmount(); }); - it('should show interactive shell waiting message immediately when isInteractiveShellWaiting is true', async () => { + it('should show interactive shell waiting message immediately when shouldShowFocusHint is true', async () => { const { lastFrame, rerender, waitUntilReady, unmount } = await render( , ); + await waitUntilReady(); await act(async () => { rerender( , ); }); await waitUntilReady(); - expect(lastFrame().trim()).toMatchSnapshot(); + expect(lastFrame().trim()).toBe(INTERACTIVE_SHELL_WAITING_PHRASE); unmount(); }); @@ -97,19 +106,20 @@ describe('usePhraseCycler', () => { const { lastFrame, rerender, waitUntilReady, unmount } = await render( , ); - expect(lastFrame().trim()).toMatchSnapshot(); + await waitUntilReady(); + expect(lastFrame().trim()).toBe('Waiting for user confirmation...'); await act(async () => { rerender( , ); }); await waitUntilReady(); - expect(lastFrame().trim()).toMatchSnapshot(); + expect(lastFrame().trim()).toBe(INTERACTIVE_SHELL_WAITING_PHRASE); unmount(); }); @@ -117,6 +127,7 @@ describe('usePhraseCycler', () => { const { lastFrame, waitUntilReady, unmount } = await render( , ); + await waitUntilReady(); const initialPhrase = lastFrame({ allowEmpty: true }).trim(); await act(async () => { @@ -128,53 +139,56 @@ describe('usePhraseCycler', () => { unmount(); }); - it('should show a tip on first activation, then a witty phrase', async () => { - vi.spyOn(Math, 'random').mockImplementation(() => 0.99); // Subsequent phrases are witty + it('should show both a tip and a witty phrase when both are enabled', async () => { + vi.spyOn(Math, 'random').mockImplementation(() => 0.5); const { lastFrame, waitUntilReady, unmount } = await render( - , + , ); - - // Initial phrase on first activation should be a tip - expect(INFORMATIVE_TIPS).toContain(lastFrame().trim()); - - // After the first interval, it should be a witty phrase - await act(async () => { - await vi.advanceTimersByTimeAsync(PHRASE_CHANGE_INTERVAL_MS + 100); - }); await waitUntilReady(); - expect(WITTY_LOADING_PHRASES).toContain(lastFrame().trim()); + + // In the new logic, both are selected independently if enabled. + const frame = lastFrame().trim(); + const parts = frame.split(' | '); + expect(parts).toHaveLength(2); + expect(INFORMATIVE_TIPS).toContain(parts[0]); + expect(WITTY_LOADING_PHRASES).toContain(parts[1]); unmount(); }); it('should cycle through phrases when isActive is true and not waiting', async () => { - vi.spyOn(Math, 'random').mockImplementation(() => 0.5); // Always witty for subsequent phrases + vi.spyOn(Math, 'random').mockImplementation(() => 0.5); const { lastFrame, waitUntilReady, unmount } = await render( - , + , ); - // Initial phrase on first activation will be a tip + await waitUntilReady(); - // After the first interval, it should follow the random pattern (witty phrases due to mock) await act(async () => { await vi.advanceTimersByTimeAsync(PHRASE_CHANGE_INTERVAL_MS + 100); }); await waitUntilReady(); - expect(WITTY_LOADING_PHRASES).toContain(lastFrame().trim()); + const frame = lastFrame().trim(); + const parts = frame.split(' | '); + expect(parts).toHaveLength(2); + expect(INFORMATIVE_TIPS).toContain(parts[0]); + expect(WITTY_LOADING_PHRASES).toContain(parts[1]); - await act(async () => { - await vi.advanceTimersByTimeAsync(PHRASE_CHANGE_INTERVAL_MS); - }); - await waitUntilReady(); - expect(WITTY_LOADING_PHRASES).toContain(lastFrame().trim()); unmount(); }); - it('should reset to a phrase when isActive becomes true after being false', async () => { + it('should reset to phrases when isActive becomes true after being false', async () => { const customPhrases = ['Phrase A', 'Phrase B']; let callCount = 0; vi.spyOn(Math, 'random').mockImplementation(() => { - // For custom phrases, only 1 Math.random call is made per update. - // 0 -> index 0 ('Phrase A') - // 0.99 -> index 1 ('Phrase B') const val = callCount % 2 === 0 ? 0 : 0.99; callCount++; return val; @@ -185,33 +199,31 @@ describe('usePhraseCycler', () => { isActive={false} isWaiting={false} customPhrases={customPhrases} + showWit={true} + showTips={false} />, ); + await waitUntilReady(); - // Activate -> On first activation will show tip on initial call, then first interval will use first mock value for 'Phrase A' + // Activate await act(async () => { rerender( , ); }); await waitUntilReady(); await act(async () => { - await vi.advanceTimersByTimeAsync(PHRASE_CHANGE_INTERVAL_MS); // First interval after initial state -> callCount 0 -> 'Phrase A' + await vi.advanceTimersByTimeAsync(0); }); await waitUntilReady(); - expect(customPhrases).toContain(lastFrame().trim()); // Should be one of the custom phrases - - // Second interval -> callCount 1 -> returns 0.99 -> 'Phrase B' - await act(async () => { - await vi.advanceTimersByTimeAsync(PHRASE_CHANGE_INTERVAL_MS); - }); - await waitUntilReady(); - expect(customPhrases).toContain(lastFrame().trim()); // Should be one of the custom phrases + expect(customPhrases).toContain(lastFrame().trim()); // Deactivate -> resets to undefined (empty string in output) await act(async () => { @@ -220,6 +232,8 @@ describe('usePhraseCycler', () => { isActive={false} isWaiting={false} customPhrases={customPhrases} + showWit={true} + showTips={false} />, ); }); @@ -227,35 +241,18 @@ describe('usePhraseCycler', () => { // The phrase should be empty after reset expect(lastFrame({ allowEmpty: true }).trim()).toBe(''); - - // Activate again -> this will show a tip on first activation, then cycle from where mock is - await act(async () => { - rerender( - , - ); - }); - await waitUntilReady(); - - await act(async () => { - await vi.advanceTimersByTimeAsync(PHRASE_CHANGE_INTERVAL_MS); // First interval after re-activation -> should contain phrase - }); - await waitUntilReady(); - expect(customPhrases).toContain(lastFrame().trim()); // Should be one of the custom phrases unmount(); }); it('should clear phrase interval on unmount when active', async () => { - const { unmount } = await render( + const { unmount, waitUntilReady } = await render( , ); + await waitUntilReady(); const clearIntervalSpy = vi.spyOn(global, 'clearInterval'); unmount(); - expect(clearIntervalSpy).toHaveBeenCalledOnce(); + expect(clearIntervalSpy).toHaveBeenCalled(); }); it('should use custom phrases when provided', async () => { @@ -284,7 +281,8 @@ describe('usePhraseCycler', () => { ); @@ -293,10 +291,11 @@ describe('usePhraseCycler', () => { const { lastFrame, unmount, waitUntilReady } = await render( , ); + await waitUntilReady(); // After first interval, it should use custom phrases await act(async () => { - await vi.advanceTimersByTimeAsync(PHRASE_CHANGE_INTERVAL_MS + 100); + await vi.advanceTimersByTimeAsync(0); }); await waitUntilReady(); @@ -315,73 +314,24 @@ describe('usePhraseCycler', () => { await waitUntilReady(); expect(customPhrases).toContain(lastFrame({ allowEmpty: true }).trim()); - randomMock.mockReturnValue(0.99); - await act(async () => { - await vi.advanceTimersByTimeAsync(PHRASE_CHANGE_INTERVAL_MS); - }); - await waitUntilReady(); - expect(customPhrases).toContain(lastFrame({ allowEmpty: true }).trim()); - - // Test fallback to default phrases. - randomMock.mockRestore(); - vi.spyOn(Math, 'random').mockReturnValue(0.5); // Always witty - - await act(async () => { - setStateExternally?.({ - isActive: true, - customPhrases: [] as string[], - }); - }); - await waitUntilReady(); - - await act(async () => { - await vi.advanceTimersByTimeAsync(PHRASE_CHANGE_INTERVAL_MS); // Wait for first cycle - }); - await waitUntilReady(); - - expect(WITTY_LOADING_PHRASES).toContain(lastFrame().trim()); unmount(); }); + it('should fall back to witty phrases if custom phrases are an empty array', async () => { - vi.spyOn(Math, 'random').mockImplementation(() => 0.5); // Always witty for subsequent phrases - const { lastFrame, unmount, waitUntilReady } = await render( - , + vi.spyOn(Math, 'random').mockImplementation(() => 0.5); + const { lastFrame, waitUntilReady, unmount } = await render( + , ); - await act(async () => { - await vi.advanceTimersByTimeAsync(PHRASE_CHANGE_INTERVAL_MS); // Next phrase after tip - }); - await waitUntilReady(); - expect(WITTY_LOADING_PHRASES).toContain(lastFrame().trim()); - unmount(); - }); - it('should reset phrase when transitioning from waiting to active', async () => { - vi.spyOn(Math, 'random').mockImplementation(() => 0.5); // Always witty for subsequent phrases - const { lastFrame, rerender, unmount, waitUntilReady } = await render( - , - ); - - // Cycle to a different phrase (should be witty due to mock) - await act(async () => { - await vi.advanceTimersByTimeAsync(PHRASE_CHANGE_INTERVAL_MS); - }); - await waitUntilReady(); - expect(WITTY_LOADING_PHRASES).toContain(lastFrame().trim()); - - // Go to waiting state - await act(async () => { - rerender(); - }); - await waitUntilReady(); - expect(lastFrame().trim()).toMatchSnapshot(); - - // Go back to active cycling - should pick a phrase based on the logic (witty due to mock) - await act(async () => { - rerender(); - }); await waitUntilReady(); await act(async () => { - await vi.advanceTimersByTimeAsync(PHRASE_CHANGE_INTERVAL_MS); // Skip the tip and get next phrase + await vi.advanceTimersByTimeAsync(0); }); await waitUntilReady(); expect(WITTY_LOADING_PHRASES).toContain(lastFrame().trim()); diff --git a/packages/cli/src/ui/hooks/usePhraseCycler.ts b/packages/cli/src/ui/hooks/usePhraseCycler.ts index 8ddab6eef9..1b82336afe 100644 --- a/packages/cli/src/ui/hooks/usePhraseCycler.ts +++ b/packages/cli/src/ui/hooks/usePhraseCycler.ts @@ -7,112 +7,177 @@ import { useState, useEffect, useRef } from 'react'; import { INFORMATIVE_TIPS } from '../constants/tips.js'; import { WITTY_LOADING_PHRASES } from '../constants/wittyPhrases.js'; -import type { LoadingPhrasesMode } from '../../config/settings.js'; -export const PHRASE_CHANGE_INTERVAL_MS = 15000; +export const PHRASE_CHANGE_INTERVAL_MS = 10000; +export const WITTY_PHRASE_CHANGE_INTERVAL_MS = 5000; export const INTERACTIVE_SHELL_WAITING_PHRASE = - 'Interactive shell awaiting input... press tab to focus shell'; + '! Shell awaiting input (Tab to focus)'; /** * Custom hook to manage cycling through loading phrases. * @param isActive Whether the phrase cycling should be active. * @param isWaiting Whether to show a specific waiting phrase. * @param shouldShowFocusHint Whether to show the shell focus hint. - * @param loadingPhrasesMode Which phrases to show: tips, witty, all, or off. + * @param showTips Whether to show informative tips. + * @param showWit Whether to show witty phrases. * @param customPhrases Optional list of custom phrases to use instead of built-in witty phrases. + * @param maxLength Optional maximum length for the selected phrase. * @returns The current loading phrase. */ export const usePhraseCycler = ( isActive: boolean, isWaiting: boolean, shouldShowFocusHint: boolean, - loadingPhrasesMode: LoadingPhrasesMode = 'tips', + showTips: boolean = true, + showWit: boolean = true, customPhrases?: string[], + maxLength?: number, ) => { - const [currentLoadingPhrase, setCurrentLoadingPhrase] = useState< + const [currentTipState, setCurrentTipState] = useState( + undefined, + ); + const [currentWittyPhraseState, setCurrentWittyPhraseState] = useState< string | undefined >(undefined); - const phraseIntervalRef = useRef(null); - const hasShownFirstRequestTipRef = useRef(false); + const tipIntervalRef = useRef(null); + const wittyIntervalRef = useRef(null); + const lastTipChangeTimeRef = useRef(0); + const lastWittyChangeTimeRef = useRef(0); + const lastSelectedTipRef = useRef(undefined); + const lastSelectedWittyPhraseRef = useRef(undefined); + const MIN_TIP_DISPLAY_TIME_MS = 10000; + const MIN_WIT_DISPLAY_TIME_MS = 5000; useEffect(() => { // Always clear on re-run - if (phraseIntervalRef.current) { - clearInterval(phraseIntervalRef.current); - phraseIntervalRef.current = null; - } + const clearTimers = () => { + if (tipIntervalRef.current) { + clearInterval(tipIntervalRef.current); + tipIntervalRef.current = null; + } + if (wittyIntervalRef.current) { + clearInterval(wittyIntervalRef.current); + wittyIntervalRef.current = null; + } + }; - if (shouldShowFocusHint) { - setCurrentLoadingPhrase(INTERACTIVE_SHELL_WAITING_PHRASE); + clearTimers(); + + if (shouldShowFocusHint || isWaiting) { + // These are handled by the return value directly for immediate feedback return; } - if (isWaiting) { - setCurrentLoadingPhrase('Waiting for user confirmation...'); + if (!isActive || (!showTips && !showWit)) { return; } - if (!isActive || loadingPhrasesMode === 'off') { - setCurrentLoadingPhrase(undefined); - return; - } - - const wittyPhrases = + const wittyPhrasesList = customPhrases && customPhrases.length > 0 ? customPhrases : WITTY_LOADING_PHRASES; - const setRandomPhrase = () => { - let phraseList: readonly string[]; - - switch (loadingPhrasesMode) { - case 'tips': - phraseList = INFORMATIVE_TIPS; - break; - case 'witty': - phraseList = wittyPhrases; - break; - case 'all': - // Show a tip on the first request after startup, then continue with 1/6 chance - if (!hasShownFirstRequestTipRef.current) { - phraseList = INFORMATIVE_TIPS; - hasShownFirstRequestTipRef.current = true; - } else { - const showTip = Math.random() < 1 / 6; - phraseList = showTip ? INFORMATIVE_TIPS : wittyPhrases; - } - break; - default: - phraseList = INFORMATIVE_TIPS; - break; + const setRandomTip = (force: boolean = false) => { + if (!showTips) { + setCurrentTipState(undefined); + lastSelectedTipRef.current = undefined; + return; } - const randomIndex = Math.floor(Math.random() * phraseList.length); - setCurrentLoadingPhrase(phraseList[randomIndex]); - }; + const now = Date.now(); + if ( + !force && + now - lastTipChangeTimeRef.current < MIN_TIP_DISPLAY_TIME_MS && + lastSelectedTipRef.current + ) { + setCurrentTipState(lastSelectedTipRef.current); + return; + } - // Select an initial random phrase - setRandomPhrase(); + const filteredTips = + maxLength !== undefined + ? INFORMATIVE_TIPS.filter((p) => p.length <= maxLength) + : INFORMATIVE_TIPS; - phraseIntervalRef.current = setInterval(() => { - // Select a new random phrase - setRandomPhrase(); - }, PHRASE_CHANGE_INTERVAL_MS); - - return () => { - if (phraseIntervalRef.current) { - clearInterval(phraseIntervalRef.current); - phraseIntervalRef.current = null; + if (filteredTips.length > 0) { + const selected = + filteredTips[Math.floor(Math.random() * filteredTips.length)]; + setCurrentTipState(selected); + lastSelectedTipRef.current = selected; + lastTipChangeTimeRef.current = now; } }; + + const setRandomWitty = (force: boolean = false) => { + if (!showWit) { + setCurrentWittyPhraseState(undefined); + lastSelectedWittyPhraseRef.current = undefined; + return; + } + + const now = Date.now(); + if ( + !force && + now - lastWittyChangeTimeRef.current < MIN_WIT_DISPLAY_TIME_MS && + lastSelectedWittyPhraseRef.current + ) { + setCurrentWittyPhraseState(lastSelectedWittyPhraseRef.current); + return; + } + + const filteredWitty = + maxLength !== undefined + ? wittyPhrasesList.filter((p) => p.length <= maxLength) + : wittyPhrasesList; + + if (filteredWitty.length > 0) { + const selected = + filteredWitty[Math.floor(Math.random() * filteredWitty.length)]; + setCurrentWittyPhraseState(selected); + lastSelectedWittyPhraseRef.current = selected; + lastWittyChangeTimeRef.current = now; + } + }; + + // Select initial random phrases or resume previous ones + setRandomTip(false); + setRandomWitty(false); + + if (showTips) { + tipIntervalRef.current = setInterval(() => { + setRandomTip(true); + }, PHRASE_CHANGE_INTERVAL_MS); + } + + if (showWit) { + wittyIntervalRef.current = setInterval(() => { + setRandomWitty(true); + }, WITTY_PHRASE_CHANGE_INTERVAL_MS); + } + + return clearTimers; }, [ isActive, isWaiting, shouldShowFocusHint, - loadingPhrasesMode, + showTips, + showWit, customPhrases, + maxLength, ]); - return currentLoadingPhrase; + let currentTip = undefined; + let currentWittyPhrase = undefined; + + if (shouldShowFocusHint) { + currentTip = INTERACTIVE_SHELL_WAITING_PHRASE; + } else if (isWaiting) { + currentTip = 'Waiting for user confirmation...'; + } else if (isActive) { + currentTip = currentTipState; + currentWittyPhrase = currentWittyPhraseState; + } + + return { currentTip, currentWittyPhrase }; }; diff --git a/packages/cli/src/ui/key/keyBindings.ts b/packages/cli/src/ui/key/keyBindings.ts index 5b1afc0735..c84f189664 100644 --- a/packages/cli/src/ui/key/keyBindings.ts +++ b/packages/cli/src/ui/key/keyBindings.ts @@ -194,6 +194,7 @@ export class KeyBinding { const key = remains; + // eslint-disable-next-line @typescript-eslint/no-misused-spread const isSingleChar = [...key].length === 1; if (!isSingleChar && !KeyBinding.VALID_LONG_KEYS.has(key.toLowerCase())) { diff --git a/packages/cli/src/ui/layouts/DefaultAppLayout.tsx b/packages/cli/src/ui/layouts/DefaultAppLayout.tsx index c703f5102f..74c02c1d9a 100644 --- a/packages/cli/src/ui/layouts/DefaultAppLayout.tsx +++ b/packages/cli/src/ui/layouts/DefaultAppLayout.tsx @@ -31,9 +31,6 @@ export const DefaultAppLayout: React.FC = () => { flexDirection="column" width={uiState.terminalWidth} height={isAlternateBuffer ? terminalHeight : undefined} - paddingBottom={ - isAlternateBuffer && !uiState.copyModeEnabled ? 1 : undefined - } flexShrink={0} flexGrow={0} overflow="hidden" diff --git a/packages/cli/src/ui/textConstants.ts b/packages/cli/src/ui/textConstants.ts index 00be0623d2..eaef8bf0ff 100644 --- a/packages/cli/src/ui/textConstants.ts +++ b/packages/cli/src/ui/textConstants.ts @@ -18,3 +18,5 @@ export const REDIRECTION_WARNING_NOTE_TEXT = export const REDIRECTION_WARNING_TIP_LABEL = 'Tip: '; // Padded to align with "Note: " export const getRedirectionWarningTipText = (shiftTabHint: string) => `Toggle auto-edit (${shiftTabHint}) to allow redirection in the future.`; + +export const GENERIC_WORKING_LABEL = 'Working...'; diff --git a/packages/cli/src/ui/types.ts b/packages/cli/src/ui/types.ts index 2f8e414a83..3760575a6f 100644 --- a/packages/cli/src/ui/types.ts +++ b/packages/cli/src/ui/types.ts @@ -16,13 +16,20 @@ import { type AgentDefinition, type ApprovalMode, type Kind, + type AnsiOutput, CoreToolCallStatus, checkExhaustive, } from '@google/gemini-cli-core'; import type { PartListUnion } from '@google/genai'; import { type ReactNode } from 'react'; -export type { ThoughtSummary, SkillDefinition }; +export { CoreToolCallStatus }; +export type { + ThoughtSummary, + SkillDefinition, + SerializableConfirmationDetails, + ToolResultDisplay, +}; export enum AuthState { // Attempting to authenticate or re-authenticate @@ -86,6 +93,16 @@ export function mapCoreStatusToDisplayStatus( } } +/** + * --- TYPE GUARDS --- + */ + +export const isTodoList = (res: unknown): res is { todos: unknown[] } => + typeof res === 'object' && res !== null && 'todos' in res; + +export const isAnsiOutput = (res: unknown): res is AnsiOutput => + Array.isArray(res) && (res.length === 0 || Array.isArray(res[0])); + export interface ToolCallEvent { type: 'tool_call'; status: CoreToolCallStatus; @@ -352,10 +369,6 @@ export type HistoryItemMcpStatus = HistoryItemBase & { showSchema: boolean; }; -// Using Omit seems to have some issues with typescript's -// type inference e.g. historyItem.type === 'tool_group' isn't auto-inferring that -// 'tools' in historyItem. -// Individually exported types extending HistoryItemBase export type HistoryItemWithoutId = | HistoryItemUser | HistoryItemUserShell @@ -507,6 +520,7 @@ export interface PermissionConfirmationRequest { export interface ActiveHook { name: string; eventName: string; + source?: string; index?: number; total?: number; } diff --git a/packages/cli/src/ui/utils/CodeColorizer.test.tsx b/packages/cli/src/ui/utils/CodeColorizer.test.tsx index c647491ec9..0979e3e123 100644 --- a/packages/cli/src/ui/utils/CodeColorizer.test.tsx +++ b/packages/cli/src/ui/utils/CodeColorizer.test.tsx @@ -79,4 +79,28 @@ describe('colorizeCode', () => { await expect(renderResult).toMatchSvgSnapshot(); renderResult.unmount(); }); + + it('returns an array of lines when returnLines is true', () => { + const code = 'line 1\nline 2\nline 3'; + const settings = new LoadedSettings( + { path: '', settings: {}, originalSettings: {} }, + { path: '', settings: {}, originalSettings: {} }, + { path: '', settings: {}, originalSettings: {} }, + { path: '', settings: {}, originalSettings: {} }, + true, + [], + ); + + const result = colorizeCode({ + code, + language: 'javascript', + maxWidth: 80, + settings, + hideLineNumbers: true, + returnLines: true, + }); + + expect(Array.isArray(result)).toBe(true); + expect(result).toHaveLength(3); + }); }); diff --git a/packages/cli/src/ui/utils/CodeColorizer.tsx b/packages/cli/src/ui/utils/CodeColorizer.tsx index 948a5f8988..94dda9501e 100644 --- a/packages/cli/src/ui/utils/CodeColorizer.tsx +++ b/packages/cli/src/ui/utils/CodeColorizer.tsx @@ -21,8 +21,8 @@ import { MaxSizedBox, MINIMUM_MAX_HEIGHT, } from '../components/shared/MaxSizedBox.js'; -import type { LoadedSettings } from '../../config/settings.js'; import { debugLogger } from '@google/gemini-cli-core'; +import type { LoadedSettings } from '../../config/settings.js'; // Configure theming and parsing utilities. const lowlight = createLowlight(common); @@ -117,7 +117,11 @@ export function colorizeLine( line: string, language: string | null, theme?: Theme, + disableColor = false, ): React.ReactNode { + if (disableColor) { + return {line}; + } const activeTheme = theme || themeManager.getActiveTheme(); return highlightAndRenderLine(line, language, activeTheme); } @@ -130,6 +134,8 @@ export interface ColorizeCodeOptions { theme?: Theme | null; settings: LoadedSettings; hideLineNumbers?: boolean; + disableColor?: boolean; + returnLines?: boolean; } /** @@ -138,6 +144,12 @@ export interface ColorizeCodeOptions { * @param options The options for colorizing the code. * @returns A React.ReactNode containing Ink elements for the highlighted code. */ +export function colorizeCode( + options: ColorizeCodeOptions & { returnLines: true }, +): React.ReactNode[]; +export function colorizeCode( + options: ColorizeCodeOptions & { returnLines?: false }, +): React.ReactNode; export function colorizeCode({ code, language = null, @@ -146,13 +158,16 @@ export function colorizeCode({ theme = null, settings, hideLineNumbers = false, -}: ColorizeCodeOptions): React.ReactNode { + disableColor = false, + returnLines = false, +}: ColorizeCodeOptions): React.ReactNode | React.ReactNode[] { const codeToHighlight = code.replace(/\n$/, ''); const activeTheme = theme || themeManager.getActiveTheme(); const showLineNumbers = hideLineNumbers ? false : settings.merged.ui.showLineNumbers; + const useMaxSizedBox = !settings.merged.ui.useAlternateBuffer && !returnLines; try { // Render the HAST tree using the adapted theme // Apply the theme's default foreground color to the top-level Text element @@ -162,7 +177,7 @@ export function colorizeCode({ let hiddenLinesCount = 0; // Optimization to avoid highlighting lines that cannot possibly be displayed. - if (availableHeight !== undefined) { + if (availableHeight !== undefined && useMaxSizedBox) { availableHeight = Math.max(availableHeight, MINIMUM_MAX_HEIGHT); if (lines.length > availableHeight) { const sliceIndex = lines.length - availableHeight; @@ -172,11 +187,9 @@ export function colorizeCode({ } const renderedLines = lines.map((line, index) => { - const contentToRender = highlightAndRenderLine( - line, - language, - activeTheme, - ); + const contentToRender = disableColor + ? line + : highlightAndRenderLine(line, language, activeTheme); return ( @@ -188,19 +201,26 @@ export function colorizeCode({ alignItems="flex-start" justifyContent="flex-end" > - + {`${index + 1 + hiddenLinesCount}`} )} - + {contentToRender} ); }); - if (availableHeight !== undefined) { + if (returnLines) { + return renderedLines; + } + + if (useMaxSizedBox) { return ( - {`${index + 1}`} + + {`${index + 1}`} + )} - {stripAnsi(line)} + + {stripAnsi(line)} + )); - if (availableHeight !== undefined) { + if (returnLines) { + return fallbackLines; + } + + if (useMaxSizedBox) { return ( + - + - - - - Gemini CLI - v1.2.3 - - - - - - - - - ╭──────────────────────────────────────────────────────────────────────────────────────────────╮ - - - google_web_search - - - - - Searching... - - ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ + + + + ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + + + + █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + + + + ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + + + ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + Gemini CLI + v1.2.3 + Tips for getting started: + 1. Create + GEMINI.md + files to customize your interactions + 2. + /help + for more information + 3. Ask coding questions, edit code or run commands + 4. Be specific for the best results + ╭──────────────────────────────────────────────────────────────────────────────────────────────╮ + + + google_web_search + + + + + Searching... + + ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ \ No newline at end of file diff --git a/packages/cli/src/ui/utils/__snapshots__/borderStyles-MainContent-tool-group-border-SVG-snapshots-should-render-SVG-snapshot-for-a-shell-tool.snap.svg b/packages/cli/src/ui/utils/__snapshots__/borderStyles-MainContent-tool-group-border-SVG-snapshots-should-render-SVG-snapshot-for-a-shell-tool.snap.svg index 1c0ff4b121..85a715cc01 100644 --- a/packages/cli/src/ui/utils/__snapshots__/borderStyles-MainContent-tool-group-border-SVG-snapshots-should-render-SVG-snapshot-for-a-shell-tool.snap.svg +++ b/packages/cli/src/ui/utils/__snapshots__/borderStyles-MainContent-tool-group-border-SVG-snapshots-should-render-SVG-snapshot-for-a-shell-tool.snap.svg @@ -1,32 +1,45 @@ - + - + - - - - Gemini CLI - v1.2.3 - - - - - - - - - ╭──────────────────────────────────────────────────────────────────────────────────────────────╮ - - - run_shell_command - - - - - Running command... - - ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ + + + + ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + + + + █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + + + + ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + + + ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + Gemini CLI + v1.2.3 + Tips for getting started: + 1. Create + GEMINI.md + files to customize your interactions + 2. + /help + for more information + 3. Ask coding questions, edit code or run commands + 4. Be specific for the best results + ╭──────────────────────────────────────────────────────────────────────────────────────────────╮ + + + run_shell_command + + + + + Running command... + + ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ \ No newline at end of file diff --git a/packages/cli/src/ui/utils/__snapshots__/borderStyles-MainContent-tool-group-border-SVG-snapshots-should-render-SVG-snapshot-for-an-empty-slice-following-a-search-tool.snap.svg b/packages/cli/src/ui/utils/__snapshots__/borderStyles-MainContent-tool-group-border-SVG-snapshots-should-render-SVG-snapshot-for-an-empty-slice-following-a-search-tool.snap.svg index 6a693d318b..beaa216162 100644 --- a/packages/cli/src/ui/utils/__snapshots__/borderStyles-MainContent-tool-group-border-SVG-snapshots-should-render-SVG-snapshot-for-an-empty-slice-following-a-search-tool.snap.svg +++ b/packages/cli/src/ui/utils/__snapshots__/borderStyles-MainContent-tool-group-border-SVG-snapshots-should-render-SVG-snapshot-for-an-empty-slice-following-a-search-tool.snap.svg @@ -1,32 +1,45 @@ - + - + - - - - Gemini CLI - v1.2.3 - - - - - - - - - ╭──────────────────────────────────────────────────────────────────────────────────────────────╮ - - - google_web_search - - - - - Searching... - - ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ + + + + ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + + + + █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + + + + ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + + + ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + Gemini CLI + v1.2.3 + Tips for getting started: + 1. Create + GEMINI.md + files to customize your interactions + 2. + /help + for more information + 3. Ask coding questions, edit code or run commands + 4. Be specific for the best results + ╭──────────────────────────────────────────────────────────────────────────────────────────────╮ + + + google_web_search + + + + + Searching... + + ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ \ No newline at end of file diff --git a/packages/cli/src/ui/utils/__snapshots__/borderStyles.test.tsx.snap b/packages/cli/src/ui/utils/__snapshots__/borderStyles.test.tsx.snap index bdf1e95332..84baf2edb8 100644 --- a/packages/cli/src/ui/utils/__snapshots__/borderStyles.test.tsx.snap +++ b/packages/cli/src/ui/utils/__snapshots__/borderStyles.test.tsx.snap @@ -2,11 +2,19 @@ exports[`MainContent tool group border SVG snapshots > should render SVG snapshot for a pending search dialog (google_web_search) 1`] = ` " - ▝▜▄ Gemini CLI v1.2.3 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + Gemini CLI v1.2.3 + + +Tips for getting started: +1. Create GEMINI.md files to customize your interactions +2. /help for more information +3. Ask coding questions, edit code or run commands +4. Be specific for the best results ╭──────────────────────────────────────────────────────────────────────────────────────────────╮ │ ⊶ google_web_search │ │ │ @@ -16,11 +24,19 @@ exports[`MainContent tool group border SVG snapshots > should render SVG snapsho exports[`MainContent tool group border SVG snapshots > should render SVG snapshot for a shell tool 1`] = ` " - ▝▜▄ Gemini CLI v1.2.3 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + Gemini CLI v1.2.3 + + +Tips for getting started: +1. Create GEMINI.md files to customize your interactions +2. /help for more information +3. Ask coding questions, edit code or run commands +4. Be specific for the best results ╭──────────────────────────────────────────────────────────────────────────────────────────────╮ │ ⊶ run_shell_command │ │ │ @@ -30,11 +46,19 @@ exports[`MainContent tool group border SVG snapshots > should render SVG snapsho exports[`MainContent tool group border SVG snapshots > should render SVG snapshot for an empty slice following a search tool 1`] = ` " - ▝▜▄ Gemini CLI v1.2.3 - ▝▜▄ - ▗▟▀ - ▝▀ + ▝▜▄ ▗█▀▀▜▙▝█▛▀▀▌▜██▖▟██▘▜█▘▜██▖▝█▛▝█▛ + ▝▜▄ █▌ █▙▟ ▐█▝█▛▐█ ▐█ ▐█▝█▖█▌ █▌ + ▗▟▀ ▜▙ ▝█▛ █▌▝ ▖▐█ ▐█ ▐█ ▐█ ▝██▌ █▌ + ▝▀ ▀▀▀▀▘▝▀▀▀▀▘▀▀▘ ▀▀▘▀▀▘▀▀▘ ▝▀▀▝▀▀ + Gemini CLI v1.2.3 + + +Tips for getting started: +1. Create GEMINI.md files to customize your interactions +2. /help for more information +3. Ask coding questions, edit code or run commands +4. Be specific for the best results ╭──────────────────────────────────────────────────────────────────────────────────────────────╮ │ ⊶ google_web_search │ │ │ diff --git a/packages/cli/src/ui/utils/confirmingTool.ts b/packages/cli/src/ui/utils/confirmingTool.ts index 86579f1d1f..c7edf8d790 100644 --- a/packages/cli/src/ui/utils/confirmingTool.ts +++ b/packages/cli/src/ui/utils/confirmingTool.ts @@ -6,10 +6,10 @@ import { CoreToolCallStatus } from '@google/gemini-cli-core'; import { - type HistoryItemToolGroup, type HistoryItemWithoutId, type IndividualToolCallDisplay, } from '../types.js'; +import { getAllToolCalls } from './historyUtils.js'; export interface ConfirmingToolState { tool: IndividualToolCallDisplay; @@ -23,9 +23,7 @@ export interface ConfirmingToolState { export function getConfirmingToolState( pendingHistoryItems: HistoryItemWithoutId[], ): ConfirmingToolState | null { - const allPendingTools = pendingHistoryItems - .filter((item): item is HistoryItemToolGroup => item.type === 'tool_group') - .flatMap((group) => group.tools); + const allPendingTools = getAllToolCalls(pendingHistoryItems); const confirmingTools = allPendingTools.filter( (tool) => tool.status === CoreToolCallStatus.AwaitingApproval, diff --git a/packages/cli/src/ui/utils/historyUtils.ts b/packages/cli/src/ui/utils/historyUtils.ts new file mode 100644 index 0000000000..ee607dca96 --- /dev/null +++ b/packages/cli/src/ui/utils/historyUtils.ts @@ -0,0 +1,83 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +import { CoreToolCallStatus } from '../types.js'; +import type { + HistoryItem, + HistoryItemWithoutId, + HistoryItemToolGroup, + IndividualToolCallDisplay, +} from '../types.js'; + +export function getLastTurnToolCallIds( + history: HistoryItem[], + pendingHistoryItems: HistoryItemWithoutId[], +): string[] { + const targetToolCallIds: string[] = []; + + // Find the boundary of the last user prompt + let lastUserPromptIndex = -1; + for (let i = history.length - 1; i >= 0; i--) { + const type = history[i].type; + if (type === 'user' || type === 'user_shell') { + lastUserPromptIndex = i; + break; + } + } + + // Collect IDs from history after last user prompt + history.forEach((item, index) => { + if (index > lastUserPromptIndex && item.type === 'tool_group') { + item.tools.forEach((t) => { + if (t.callId) targetToolCallIds.push(t.callId); + }); + } + }); + + // Collect IDs from pending items + pendingHistoryItems.forEach((item) => { + if (item.type === 'tool_group') { + item.tools.forEach((t) => { + if (t.callId) targetToolCallIds.push(t.callId); + }); + } + }); + + return targetToolCallIds; +} + +export function isToolExecuting( + pendingHistoryItems: HistoryItemWithoutId[], +): boolean { + return pendingHistoryItems.some((item) => { + if (item && item.type === 'tool_group') { + return item.tools.some( + (tool) => CoreToolCallStatus.Executing === tool.status, + ); + } + return false; + }); +} + +export function isToolAwaitingConfirmation( + pendingHistoryItems: HistoryItemWithoutId[], +): boolean { + return pendingHistoryItems + .filter((item): item is HistoryItemToolGroup => item.type === 'tool_group') + .some((item) => + item.tools.some( + (tool) => CoreToolCallStatus.AwaitingApproval === tool.status, + ), + ); +} + +export function getAllToolCalls( + historyItems: HistoryItemWithoutId[], +): IndividualToolCallDisplay[] { + return historyItems + .filter((item): item is HistoryItemToolGroup => item.type === 'tool_group') + .flatMap((group) => group.tools); +} diff --git a/packages/cli/src/ui/utils/terminalCapabilityManager.ts b/packages/cli/src/ui/utils/terminalCapabilityManager.ts index 7867f48e6f..6aeda005dc 100644 --- a/packages/cli/src/ui/utils/terminalCapabilityManager.ts +++ b/packages/cli/src/ui/utils/terminalCapabilityManager.ts @@ -13,12 +13,14 @@ import { disableModifyOtherKeys, enableBracketedPasteMode, disableBracketedPasteMode, + disableMouseEvents, } from '@google/gemini-cli-core'; import { parseColor } from '../themes/color-utils.js'; export type TerminalBackgroundColor = string | undefined; -const TERMINAL_CLEANUP_SEQUENCE = '\x1b[4;0m\x1b[?2004l'; +const TERMINAL_CLEANUP_SEQUENCE = + '\x1b[4;0m\x1b[?2004l\x1b[?1000l\x1b[?1002l\x1b[?1003l\x1b[?1006l'; export function cleanupTerminalOnExit() { try { @@ -33,6 +35,7 @@ export function cleanupTerminalOnExit() { disableKittyKeyboardProtocol(); disableModifyOtherKeys(); disableBracketedPasteMode(); + disableMouseEvents(); } export class TerminalCapabilityManager { diff --git a/packages/cli/src/ui/utils/terminalSetup.ts b/packages/cli/src/ui/utils/terminalSetup.ts index aaa8d9fc6f..d04dedb4ff 100644 --- a/packages/cli/src/ui/utils/terminalSetup.ts +++ b/packages/cli/src/ui/utils/terminalSetup.ts @@ -502,7 +502,6 @@ export function useTerminalSetupPrompt({ if (hasBeenPrompted) { return; } - let cancelled = false; // eslint-disable-next-line @typescript-eslint/no-floating-promises diff --git a/packages/cli/src/ui/utils/textUtils.test.ts b/packages/cli/src/ui/utils/textUtils.test.ts index b06fa62f5e..7ec515ffb1 100644 --- a/packages/cli/src/ui/utils/textUtils.test.ts +++ b/packages/cli/src/ui/utils/textUtils.test.ts @@ -514,6 +514,7 @@ describe('textUtils', () => { const b = sanitized.b as { c: string; d: Array }; expect(b.c).toBe('\\u001b[32mgreen\\u001b[0m'); expect(b.d[0]).toBe('\\u001b[33myellow\\u001b[0m'); + // eslint-disable-next-line no-restricted-syntax if (typeof b.d[1] === 'object' && b.d[1] !== null) { const e = b.d[1] as { e: string }; expect(e.e).toBe('\\u001b[34mblue\\u001b[0m'); diff --git a/packages/cli/src/ui/utils/toolLayoutUtils.test.ts b/packages/cli/src/ui/utils/toolLayoutUtils.test.ts index 57e1e3f190..768fccc111 100644 --- a/packages/cli/src/ui/utils/toolLayoutUtils.test.ts +++ b/packages/cli/src/ui/utils/toolLayoutUtils.test.ts @@ -9,6 +9,10 @@ import { calculateToolContentMaxLines, calculateShellMaxLines, SHELL_CONTENT_OVERHEAD, + TOOL_RESULT_STATIC_HEIGHT, + TOOL_RESULT_STANDARD_RESERVED_LINE_COUNT, + TOOL_RESULT_ASB_RESERVED_LINE_COUNT, + TOOL_RESULT_MIN_LINES_SHOWN, } from './toolLayoutUtils.js'; import { CoreToolCallStatus } from '@google/gemini-cli-core'; import { @@ -48,7 +52,7 @@ describe('toolLayoutUtils', () => { availableTerminalHeight: 2, isAlternateBuffer: false, }, - expected: 3, + expected: TOOL_RESULT_MIN_LINES_SHOWN + 1, }, { desc: 'returns available space directly in constrained terminal (ASB mode)', @@ -56,7 +60,7 @@ describe('toolLayoutUtils', () => { availableTerminalHeight: 4, isAlternateBuffer: true, }, - expected: 3, + expected: TOOL_RESULT_MIN_LINES_SHOWN + 1, }, { desc: 'returns remaining space if sufficient space exists (Standard mode)', @@ -64,7 +68,10 @@ describe('toolLayoutUtils', () => { availableTerminalHeight: 20, isAlternateBuffer: false, }, - expected: 17, + expected: + 20 - + TOOL_RESULT_STATIC_HEIGHT - + TOOL_RESULT_STANDARD_RESERVED_LINE_COUNT, }, { desc: 'returns remaining space if sufficient space exists (ASB mode)', @@ -72,7 +79,8 @@ describe('toolLayoutUtils', () => { availableTerminalHeight: 20, isAlternateBuffer: true, }, - expected: 13, + expected: + 20 - TOOL_RESULT_STATIC_HEIGHT - TOOL_RESULT_ASB_RESERVED_LINE_COUNT, }, ]; @@ -148,7 +156,7 @@ describe('toolLayoutUtils', () => { constrainHeight: true, isExpandable: false, }, - expected: 4, + expected: 6 - TOOL_RESULT_STANDARD_RESERVED_LINE_COUNT, }, { desc: 'handles negative availableTerminalHeight gracefully', @@ -172,7 +180,7 @@ describe('toolLayoutUtils', () => { constrainHeight: false, isExpandable: false, }, - expected: 28, + expected: 30 - TOOL_RESULT_STANDARD_RESERVED_LINE_COUNT, }, { desc: 'falls back to COMPLETED_SHELL_MAX_LINES - SHELL_CONTENT_OVERHEAD for completed shells if space allows', diff --git a/packages/cli/src/ui/utils/toolLayoutUtils.ts b/packages/cli/src/ui/utils/toolLayoutUtils.ts index 9f391dca4e..1f140b9bc9 100644 --- a/packages/cli/src/ui/utils/toolLayoutUtils.ts +++ b/packages/cli/src/ui/utils/toolLayoutUtils.ts @@ -17,7 +17,7 @@ import { CoreToolCallStatus } from '@google/gemini-cli-core'; */ export const TOOL_RESULT_STATIC_HEIGHT = 1; export const TOOL_RESULT_ASB_RESERVED_LINE_COUNT = 6; -export const TOOL_RESULT_STANDARD_RESERVED_LINE_COUNT = 2; +export const TOOL_RESULT_STANDARD_RESERVED_LINE_COUNT = 3; export const TOOL_RESULT_MIN_LINES_SHOWN = 2; /** diff --git a/packages/cli/src/utils/agentSettings.ts b/packages/cli/src/utils/agentSettings.ts index 661b065d18..1ea9054c9c 100644 --- a/packages/cli/src/utils/agentSettings.ts +++ b/packages/cli/src/utils/agentSettings.ts @@ -40,8 +40,8 @@ const agentStrategy: FeatureToggleStrategy = { }; /** - * Enables an agent by ensuring it is enabled in any writable scope (User and Workspace). - * It sets `agents.overrides..enabled` to `true`. + * Enables an agent by setting `agents.overrides..enabled` to `true` + * in available writable scopes (User and Workspace). */ export function enableAgent( settings: LoadedSettings, @@ -59,7 +59,8 @@ export function enableAgent( } /** - * Disables an agent by setting `agents.overrides..enabled` to `false` in the specified scope. + * Disables an agent by setting `agents.overrides..enabled` to `false` + * in the specified scope. */ export function disableAgent( settings: LoadedSettings, diff --git a/packages/cli/src/utils/cleanup.test.ts b/packages/cli/src/utils/cleanup.test.ts index e9a2b0ea76..0e2454cb82 100644 --- a/packages/cli/src/utils/cleanup.test.ts +++ b/packages/cli/src/utils/cleanup.test.ts @@ -72,6 +72,46 @@ describe('cleanup', () => { expect(asyncFn).toHaveBeenCalledTimes(1); }); + it('should run cleanupFunctions BEFORE draining stdin and BEFORE runSyncCleanup', async () => { + const callOrder: string[] = []; + + // Cleanup function + registerCleanup(() => { + callOrder.push('cleanup'); + }); + + // Sync cleanup function (e.g. setRawMode(false)) + registerSyncCleanup(() => { + callOrder.push('sync'); + }); + + // Mock stdin.resume to track drainStdin + const originalResume = process.stdin.resume; + process.stdin.resume = vi.fn().mockImplementation(() => { + callOrder.push('drain'); + return process.stdin; + }); + + // Mock stdin properties for drainStdin + const originalIsTTY = process.stdin.isTTY; + Object.defineProperty(process.stdin, 'isTTY', { + value: true, + configurable: true, + }); + + try { + await runExitCleanup(); + } finally { + process.stdin.resume = originalResume; + Object.defineProperty(process.stdin, 'isTTY', { + value: originalIsTTY, + configurable: true, + }); + } + + expect(callOrder).toEqual(['drain', 'drain', 'sync', 'cleanup']); + }); + it('should continue running cleanup functions even if one throws an error', async () => { const errorFn = vi.fn().mockImplementation(() => { throw new Error('test error'); @@ -183,6 +223,7 @@ describe('signal and TTY handling', () => { const sigtermHandlers = processOnHandlers.get('SIGTERM') || []; expect(sigtermHandlers.length).toBeGreaterThan(0); + // eslint-disable-next-line no-restricted-syntax expect(typeof sigtermHandlers[0]).toBe('function'); }); }); diff --git a/packages/cli/src/utils/cleanup.ts b/packages/cli/src/utils/cleanup.ts index 6185b34fe5..19aa795640 100644 --- a/packages/cli/src/utils/cleanup.ts +++ b/packages/cli/src/utils/cleanup.ts @@ -59,7 +59,7 @@ export function registerTelemetryConfig(config: Config) { export async function runExitCleanup() { // drain stdin to prevent printing garbage on exit - // https://github.com/google-gemini/gemini-cli/issues/1680 + // https://github.com/google-gemini/gemini-cli/issues/16801 await drainStdin(); runSyncCleanup(); diff --git a/packages/cli/src/utils/sessions.test.ts b/packages/cli/src/utils/sessions.test.ts index 965a595c53..5c91bf0d50 100644 --- a/packages/cli/src/utils/sessions.test.ts +++ b/packages/cli/src/utils/sessions.test.ts @@ -214,6 +214,7 @@ describe('listSessions', () => { // Get all the session log calls (skip the header) const sessionCalls = mocks.writeToStdout.mock.calls.filter( (call): call is [string] => + // eslint-disable-next-line no-restricted-syntax typeof call[0] === 'string' && call[0].includes('[session-') && !call[0].includes('Available sessions'), diff --git a/packages/cli/test-setup.ts b/packages/cli/test-setup.ts index 8d055bc63d..452493559a 100644 --- a/packages/cli/test-setup.ts +++ b/packages/cli/test-setup.ts @@ -30,6 +30,9 @@ process.env.FORCE_COLOR = '3'; // Force generic keybinding hints to ensure stable snapshots across different operating systems. process.env.FORCE_GENERIC_KEYBINDING_HINTS = 'true'; +// Force generic terminal declaration to ensure stable snapshots across different host environments. +process.env.TERM_PROGRAM = 'generic'; + import './src/test-utils/customMatchers.js'; let consoleErrorSpy: vi.SpyInstance; diff --git a/packages/core/src/agent/agent-session.test.ts b/packages/core/src/agent/agent-session.test.ts index c390d719d4..e3ff1c5dc0 100644 --- a/packages/core/src/agent/agent-session.test.ts +++ b/packages/core/src/agent/agent-session.test.ts @@ -32,9 +32,7 @@ describe('AgentSession', () => { await session.abort(); expect( session.events.some( - (e) => - e.type === 'agent_end' && - (e as AgentEvent<'agent_end'>).reason === 'aborted', + (e) => e.type === 'agent_end' && e.reason === 'aborted', ), ).toBe(true); }); @@ -119,6 +117,7 @@ describe('AgentSession', () => { expect(events).toHaveLength(0); expect(protocol.events).toHaveLength(1); expect(protocol.events[0].type).toBe('session_update'); + expect(protocol.events[0].streamId).toEqual(expect.any(String)); }); it('should skip events that occur before agent_start', async () => { @@ -173,6 +172,181 @@ describe('AgentSession', () => { expect(streamedEvents).toEqual(allEvents.slice(2)); }); + it('should complete immediately when resuming from agent_end', async () => { + const protocol = new MockAgentProtocol(); + const session = new AgentSession(protocol); + + protocol.pushResponse([{ type: 'message' }]); + const { streamId } = await session.send({ + message: [{ type: 'text', text: 'request' }], + }); + await new Promise((resolve) => setTimeout(resolve, 10)); + + const endEvent = session.events.findLast( + (event): event is AgentEvent<'agent_end'> => + event.type === 'agent_end' && event.streamId === streamId, + ); + expect(endEvent).toBeDefined(); + + const iterator = session + .stream({ eventId: endEvent!.id }) + [Symbol.asyncIterator](); + await expect(iterator.next()).resolves.toEqual({ + value: undefined, + done: true, + }); + }); + + it('should throw for an unknown eventId', async () => { + const protocol = new MockAgentProtocol(); + const session = new AgentSession(protocol); + + const iterator = session + .stream({ eventId: 'missing-event' }) + [Symbol.asyncIterator](); + await expect(iterator.next()).rejects.toThrow( + 'Unknown eventId: missing-event', + ); + }); + + it('should throw when resuming from an event before agent_start on a stream with no agent activity', async () => { + const protocol = new MockAgentProtocol(); + const session = new AgentSession(protocol); + + const { streamId } = await session.send({ update: { title: 'draft' } }); + expect(streamId).toBeNull(); + + const updateEvent = session.events.find( + (event): event is AgentEvent<'session_update'> => + event.type === 'session_update', + ); + expect(updateEvent).toBeDefined(); + + const iterator = session + .stream({ eventId: updateEvent!.id }) + [Symbol.asyncIterator](); + await expect(iterator.next()).rejects.toThrow( + `Cannot resume from eventId ${updateEvent!.id} before agent_start for stream ${updateEvent!.streamId}`, + ); + }); + + it('should replay from agent_start when resuming from a pre-agent_start event after activity is in history', async () => { + const protocol = new MockAgentProtocol(); + const session = new AgentSession(protocol); + + protocol.pushResponse([ + { + type: 'message', + role: 'agent', + content: [{ type: 'text', text: 'hello' }], + }, + ]); + await session.send({ + message: [{ type: 'text', text: 'request' }], + }); + await new Promise((resolve) => setTimeout(resolve, 10)); + + const userMessage = session.events.find( + (event): event is AgentEvent<'message'> => + event.type === 'message' && event.role === 'user', + ); + expect(userMessage).toBeDefined(); + + const streamedEvents: AgentEvent[] = []; + for await (const event of session.stream({ eventId: userMessage!.id })) { + streamedEvents.push(event); + } + + expect(streamedEvents.map((event) => event.type)).toEqual([ + 'agent_start', + 'message', + 'agent_end', + ]); + expect(streamedEvents[0]?.streamId).toBe(userMessage!.streamId); + }); + + it('should throw when resuming from a pre-agent_start event before activity is in history', async () => { + const protocol = new MockAgentProtocol([ + { + id: 'e-1', + timestamp: '2026-01-01T00:00:00.000Z', + streamId: 'stream-1', + type: 'message', + role: 'user', + content: [{ type: 'text', text: 'request' }], + }, + ]); + const session = new AgentSession(protocol); + + const iterator = session + .stream({ eventId: 'e-1' }) + [Symbol.asyncIterator](); + await expect(iterator.next()).rejects.toThrow( + 'Cannot resume from eventId e-1 before agent_start for stream stream-1', + ); + }); + + it('should resume from an in-stream event within the same stream only', async () => { + const protocol = new MockAgentProtocol(); + const session = new AgentSession(protocol); + + protocol.pushResponse([ + { + type: 'message', + role: 'agent', + content: [{ type: 'text', text: 'first answer 1' }], + }, + { + type: 'message', + role: 'agent', + content: [{ type: 'text', text: 'first answer 2' }], + }, + ]); + const { streamId: streamId1 } = await session.send({ + message: [{ type: 'text', text: 'first request' }], + }); + await new Promise((resolve) => setTimeout(resolve, 10)); + + protocol.pushResponse([ + { + type: 'message', + role: 'agent', + content: [{ type: 'text', text: 'second answer' }], + }, + ]); + await session.send({ + message: [{ type: 'text', text: 'second request' }], + }); + await new Promise((resolve) => setTimeout(resolve, 10)); + + const resumeEvent = session.events.find( + (event): event is AgentEvent<'message'> => + event.type === 'message' && + event.streamId === streamId1 && + event.role === 'agent' && + event.content[0]?.type === 'text' && + event.content[0].text === 'first answer 1', + ); + expect(resumeEvent).toBeDefined(); + + const streamedEvents: AgentEvent[] = []; + for await (const event of session.stream({ eventId: resumeEvent!.id })) { + streamedEvents.push(event); + } + + expect( + streamedEvents.every((event) => event.streamId === streamId1), + ).toBe(true); + expect(streamedEvents.map((event) => event.type)).toEqual([ + 'message', + 'agent_end', + ]); + const resumedMessage = streamedEvents[0] as AgentEvent<'message'>; + expect(resumedMessage.content).toEqual([ + { type: 'text', text: 'first answer 2' }, + ]); + }); + it('should replay events for streamId starting with agent_start', async () => { const protocol = new MockAgentProtocol(); const session = new AgentSession(protocol); @@ -225,6 +399,33 @@ describe('AgentSession', () => { expect(streamedEvents.at(-1)?.type).toBe('agent_end'); }); + it('should not drop agent_end that arrives while replay events are being yielded', async () => { + const protocol = new MockAgentProtocol(); + const session = new AgentSession(protocol); + + protocol.pushResponse([{ type: 'message' }], { keepOpen: true }); + const { streamId } = await session.send({ update: { title: 't1' } }); + await new Promise((resolve) => setTimeout(resolve, 10)); + + const iterator = session + .stream({ streamId: streamId! }) + [Symbol.asyncIterator](); + + const first = await iterator.next(); + expect(first.value?.type).toBe('agent_start'); + + protocol.pushToStream(streamId!, [], { close: true }); + + const second = await iterator.next(); + expect(second.value?.type).toBe('message'); + + const third = await iterator.next(); + expect(third.value?.type).toBe('agent_end'); + + const fourth = await iterator.next(); + expect(fourth.done).toBe(true); + }); + it('should follow an active stream if no options provided', async () => { const protocol = new MockAgentProtocol(); const session = new AgentSession(protocol); diff --git a/packages/core/src/agent/agent-session.ts b/packages/core/src/agent/agent-session.ts index 0d9fc86bb0..6a4c295fc8 100644 --- a/packages/core/src/agent/agent-session.ts +++ b/packages/core/src/agent/agent-session.ts @@ -34,7 +34,7 @@ export class AgentSession implements AgentProtocol { return this._protocol.abort(); } - get events(): AgentEvent[] { + get events(): readonly AgentEvent[] { return this._protocol.events; } @@ -77,6 +77,30 @@ export class AgentSession implements AgentProtocol { let done = false; let trackedStreamId = options.streamId; let started = false; + let agentActivityStarted = false; + + const queueVisibleEvent = (event: AgentEvent): void => { + if (trackedStreamId && event.streamId !== trackedStreamId) { + return; + } + + if (!agentActivityStarted) { + if (event.type !== 'agent_start') { + return; + } + trackedStreamId = event.streamId; + agentActivityStarted = true; + } + + if (!trackedStreamId) { + return; + } + + eventQueue.push(event); + if (event.type === 'agent_end' && event.streamId === trackedStreamId) { + done = true; + } + }; // 1. Subscribe early to avoid missing any events that occur during replay setup const unsubscribe = this._protocol.subscribe((event) => { @@ -87,23 +111,7 @@ export class AgentSession implements AgentProtocol { return; } - if (trackedStreamId && event.streamId !== trackedStreamId) return; - - // If we don't have a tracked stream yet, the first agent_start we see becomes it. - if (!trackedStreamId && event.type === 'agent_start') { - trackedStreamId = event.streamId ?? undefined; - } - - // If we still don't have a tracked stream and we aren't replaying everything (eventId), ignore. - if (!trackedStreamId && !options.eventId) return; - - eventQueue.push(event); - if ( - event.type === 'agent_end' && - event.streamId === (trackedStreamId ?? null) - ) { - done = true; - } + queueVisibleEvent(event); const currentResolve = resolve; next = new Promise((r) => { @@ -118,8 +126,42 @@ export class AgentSession implements AgentProtocol { if (options.eventId) { const index = currentEvents.findIndex((e) => e.id === options.eventId); - if (index !== -1) { + if (index === -1) { + throw new Error(`Unknown eventId: ${options.eventId}`); + } + + const resumeEvent = currentEvents[index]; + trackedStreamId = resumeEvent.streamId; + const firstAgentStartIndex = currentEvents.findIndex( + (event) => + event.type === 'agent_start' && event.streamId === trackedStreamId, + ); + + if (resumeEvent.type === 'agent_end') { replayStartIndex = index + 1; + agentActivityStarted = true; + done = true; + } else if ( + firstAgentStartIndex !== -1 && + firstAgentStartIndex <= index + ) { + replayStartIndex = index + 1; + agentActivityStarted = true; + } else if (firstAgentStartIndex !== -1) { + // A pre-agent_start cursor can be resumed once the corresponding + // agent activity is already present in history. Because stream() + // yields only agent_start -> agent_end, replay begins at agent_start + // rather than at the original pre-start event. + replayStartIndex = firstAgentStartIndex; + } else { + // Consumers can only resume by eventId once the corresponding stream + // has entered the agent_start -> agent_end lifecycle in history. + // Without a recorded agent_start, this wrapper cannot distinguish + // "agent activity may start later" from "this send was acknowledged + // without agent activity" without risking an infinite wait. + throw new Error( + `Cannot resume from eventId ${options.eventId} before agent_start for stream ${trackedStreamId}`, + ); } } else if (options.streamId) { const index = currentEvents.findIndex( @@ -128,29 +170,7 @@ export class AgentSession implements AgentProtocol { if (index !== -1) { replayStartIndex = index; } - } - - if (replayStartIndex !== -1) { - for (let i = replayStartIndex; i < currentEvents.length; i++) { - const event = currentEvents[i]; - if (options.streamId && event.streamId !== options.streamId) continue; - - eventQueue.push(event); - if (event.type === 'agent_start' && !trackedStreamId) { - trackedStreamId = event.streamId ?? undefined; - } - if ( - event.type === 'agent_end' && - event.streamId === (trackedStreamId ?? null) - ) { - done = true; - break; - } - } - } - - if (!done && !trackedStreamId) { - // Find active stream in history + } else { const activeStarts = currentEvents.filter( (e) => e.type === 'agent_start', ); @@ -161,36 +181,28 @@ export class AgentSession implements AgentProtocol { (e) => e.type === 'agent_end' && e.streamId === start.streamId, ) ) { - trackedStreamId = start.streamId ?? undefined; + trackedStreamId = start.streamId; + replayStartIndex = currentEvents.findIndex( + (e) => e.id === start.id, + ); break; } } } - // If we replayed to the end and no stream is active, and we were specifically - // replaying from an eventId (or we've already finished the stream we were looking for), we are done. - if (!done && !trackedStreamId && options.eventId) { - done = true; + if (replayStartIndex !== -1) { + for (let i = replayStartIndex; i < currentEvents.length; i++) { + const event = currentEvents[i]; + queueVisibleEvent(event); + if (done) break; + } } - started = true; // Process events that arrived while we were replaying for (const event of earlyEvents) { if (done) break; - if (trackedStreamId && event.streamId !== trackedStreamId) continue; - if (!trackedStreamId && event.type === 'agent_start') { - trackedStreamId = event.streamId ?? undefined; - } - if (!trackedStreamId && !options.eventId) continue; - - eventQueue.push(event); - if ( - event.type === 'agent_end' && - event.streamId === (trackedStreamId ?? null) - ) { - done = true; - } + queueVisibleEvent(event); } while (true) { @@ -200,6 +212,7 @@ export class AgentSession implements AgentProtocol { for (const event of eventsToYield) { yield event; } + continue; } if (done) break; diff --git a/packages/core/src/agent/event-translator.test.ts b/packages/core/src/agent/event-translator.test.ts new file mode 100644 index 0000000000..f40c6c27ad --- /dev/null +++ b/packages/core/src/agent/event-translator.test.ts @@ -0,0 +1,733 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +import { describe, expect, it, beforeEach } from 'vitest'; +import { FinishReason } from '@google/genai'; +import { ToolErrorType } from '../tools/tool-error.js'; +import { + translateEvent, + createTranslationState, + mapFinishReason, + mapHttpToGrpcStatus, + mapError, + mapUsage, + type TranslationState, +} from './event-translator.js'; +import { GeminiEventType } from '../core/turn.js'; +import type { ServerGeminiStreamEvent } from '../core/turn.js'; +import type { AgentEvent } from './types.js'; + +describe('createTranslationState', () => { + it('creates state with default streamId', () => { + const state = createTranslationState(); + expect(state.streamId).toBeDefined(); + expect(state.streamStartEmitted).toBe(false); + expect(state.model).toBeUndefined(); + expect(state.eventCounter).toBe(0); + expect(state.pendingToolNames.size).toBe(0); + }); + + it('creates state with custom streamId', () => { + const state = createTranslationState('custom-stream'); + expect(state.streamId).toBe('custom-stream'); + }); +}); + +describe('translateEvent', () => { + let state: TranslationState; + + beforeEach(() => { + state = createTranslationState('test-stream'); + }); + + describe('Content events', () => { + it('emits agent_start + message for first content event', () => { + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.Content, + value: 'Hello world', + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(2); + expect(result[0]?.type).toBe('agent_start'); + expect(result[1]?.type).toBe('message'); + const msg = result[1] as AgentEvent<'message'>; + expect(msg.role).toBe('agent'); + expect(msg.content).toEqual([{ type: 'text', text: 'Hello world' }]); + }); + + it('skips agent_start for subsequent content events', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.Content, + value: 'more text', + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + expect(result[0]?.type).toBe('message'); + }); + }); + + describe('Thought events', () => { + it('emits thought content with metadata', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.Thought, + value: { subject: 'Planning', description: 'I am thinking...' }, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + const msg = result[0] as AgentEvent<'message'>; + expect(msg.content).toEqual([ + { type: 'thought', thought: 'I am thinking...' }, + ]); + expect(msg._meta?.['subject']).toBe('Planning'); + }); + }); + + describe('ToolCallRequest events', () => { + it('emits tool_request and tracks pending tool name', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.ToolCallRequest, + value: { + callId: 'call-1', + name: 'read_file', + args: { path: '/tmp/test' }, + isClientInitiated: false, + prompt_id: 'p1', + }, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + const req = result[0] as AgentEvent<'tool_request'>; + expect(req.requestId).toBe('call-1'); + expect(req.name).toBe('read_file'); + expect(req.args).toEqual({ path: '/tmp/test' }); + expect(state.pendingToolNames.get('call-1')).toBe('read_file'); + }); + }); + + describe('ToolCallResponse events', () => { + it('emits tool_response with content from responseParts', () => { + state.streamStartEmitted = true; + state.pendingToolNames.set('call-1', 'read_file'); + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.ToolCallResponse, + value: { + callId: 'call-1', + responseParts: [{ text: 'file contents' }], + resultDisplay: undefined, + error: undefined, + errorType: undefined, + }, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + const resp = result[0] as AgentEvent<'tool_response'>; + expect(resp.requestId).toBe('call-1'); + expect(resp.name).toBe('read_file'); + expect(resp.content).toEqual([{ type: 'text', text: 'file contents' }]); + expect(resp.isError).toBe(false); + expect(state.pendingToolNames.has('call-1')).toBe(false); + }); + + it('uses error.message for content when tool errored', () => { + state.streamStartEmitted = true; + state.pendingToolNames.set('call-2', 'write_file'); + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.ToolCallResponse, + value: { + callId: 'call-2', + responseParts: [{ text: 'stale parts' }], + resultDisplay: 'Permission denied', + error: new Error('Permission denied to write'), + errorType: ToolErrorType.PERMISSION_DENIED, + }, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + const resp = result[0] as AgentEvent<'tool_response'>; + expect(resp.isError).toBe(true); + // Should use error.message, not responseParts + expect(resp.content).toEqual([ + { type: 'text', text: 'Permission denied to write' }, + ]); + expect(resp.displayContent).toEqual([ + { type: 'text', text: 'Permission denied' }, + ]); + expect(resp.data).toEqual({ errorType: 'permission_denied' }); + }); + + it('uses "unknown" name for untracked tool calls', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.ToolCallResponse, + value: { + callId: 'untracked', + responseParts: [{ text: 'data' }], + resultDisplay: undefined, + error: undefined, + errorType: undefined, + }, + }; + const result = translateEvent(event, state); + const resp = result[0] as AgentEvent<'tool_response'>; + expect(resp.name).toBe('unknown'); + }); + + it('stringifies object resultDisplay correctly', () => { + state.streamStartEmitted = true; + state.pendingToolNames.set('call-3', 'diff_tool'); + const objectDisplay = { + fileDiff: '@@ -1 +1 @@\n-a\n+b', + fileName: 'test.txt', + filePath: '/tmp/test.txt', + originalContent: 'a', + newContent: 'b', + }; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.ToolCallResponse, + value: { + callId: 'call-3', + responseParts: [{ text: 'diff result' }], + resultDisplay: objectDisplay, + error: undefined, + errorType: undefined, + }, + }; + const result = translateEvent(event, state); + const resp = result[0] as AgentEvent<'tool_response'>; + expect(resp.displayContent).toEqual([ + { type: 'text', text: JSON.stringify(objectDisplay) }, + ]); + }); + + it('passes through string resultDisplay as-is', () => { + state.streamStartEmitted = true; + state.pendingToolNames.set('call-4', 'shell'); + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.ToolCallResponse, + value: { + callId: 'call-4', + responseParts: [{ text: 'output' }], + resultDisplay: 'Command output text', + error: undefined, + errorType: undefined, + }, + }; + const result = translateEvent(event, state); + const resp = result[0] as AgentEvent<'tool_response'>; + expect(resp.displayContent).toEqual([ + { type: 'text', text: 'Command output text' }, + ]); + }); + + it('preserves outputFile and contentLength in data', () => { + state.streamStartEmitted = true; + state.pendingToolNames.set('call-5', 'write_file'); + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.ToolCallResponse, + value: { + callId: 'call-5', + responseParts: [{ text: 'written' }], + resultDisplay: undefined, + error: undefined, + errorType: undefined, + outputFile: '/tmp/out.txt', + contentLength: 42, + }, + }; + const result = translateEvent(event, state); + const resp = result[0] as AgentEvent<'tool_response'>; + expect(resp.data?.['outputFile']).toBe('/tmp/out.txt'); + expect(resp.data?.['contentLength']).toBe(42); + }); + + it('handles multi-part responses (text + inlineData)', () => { + state.streamStartEmitted = true; + state.pendingToolNames.set('call-6', 'screenshot'); + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.ToolCallResponse, + value: { + callId: 'call-6', + responseParts: [ + { text: 'Here is the screenshot' }, + { inlineData: { data: 'base64img', mimeType: 'image/png' } }, + ], + resultDisplay: undefined, + error: undefined, + errorType: undefined, + }, + }; + const result = translateEvent(event, state); + const resp = result[0] as AgentEvent<'tool_response'>; + expect(resp.content).toEqual([ + { type: 'text', text: 'Here is the screenshot' }, + { type: 'media', data: 'base64img', mimeType: 'image/png' }, + ]); + expect(resp.isError).toBe(false); + }); + }); + + describe('Error events', () => { + it('emits error event for structured errors', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.Error, + value: { error: { message: 'Rate limited', status: 429 } }, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + const err = result[0] as AgentEvent<'error'>; + expect(err.status).toBe('RESOURCE_EXHAUSTED'); + expect(err.message).toBe('Rate limited'); + expect(err.fatal).toBe(true); + }); + + it('emits error event for Error instances', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.Error, + value: { error: new Error('Something broke') }, + }; + const result = translateEvent(event, state); + const err = result[0] as AgentEvent<'error'>; + expect(err.status).toBe('INTERNAL'); + expect(err.message).toBe('Something broke'); + }); + }); + + describe('ModelInfo events', () => { + it('emits agent_start and session_update when no stream started yet', () => { + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.ModelInfo, + value: 'gemini-2.5-pro', + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(2); + expect(result[0]?.type).toBe('agent_start'); + expect(result[1]?.type).toBe('session_update'); + const sessionUpdate = result[1] as AgentEvent<'session_update'>; + expect(sessionUpdate.model).toBe('gemini-2.5-pro'); + expect(state.model).toBe('gemini-2.5-pro'); + expect(state.streamStartEmitted).toBe(true); + }); + + it('emits session_update when stream already started', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.ModelInfo, + value: 'gemini-2.5-flash', + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + expect(result[0]?.type).toBe('session_update'); + }); + }); + + describe('AgentExecutionStopped events', () => { + it('emits agent_end with the final stop message in data.message', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.AgentExecutionStopped, + value: { + reason: 'before_model', + systemMessage: 'Stopped by hook', + contextCleared: true, + }, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + const streamEnd = result[0] as AgentEvent<'agent_end'>; + expect(streamEnd.type).toBe('agent_end'); + expect(streamEnd.reason).toBe('completed'); + expect(streamEnd.data).toEqual({ message: 'Stopped by hook' }); + }); + + it('uses reason when systemMessage is not set', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.AgentExecutionStopped, + value: { reason: 'hook' }, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + const streamEnd = result[0] as AgentEvent<'agent_end'>; + expect(streamEnd.data).toEqual({ message: 'hook' }); + }); + }); + + describe('AgentExecutionBlocked events', () => { + it('emits non-fatal error event (non-terminal, stream continues)', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.AgentExecutionBlocked, + value: { reason: 'Policy violation' }, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + const err = result[0] as AgentEvent<'error'>; + expect(err.type).toBe('error'); + expect(err.fatal).toBe(false); + expect(err._meta?.['code']).toBe('AGENT_EXECUTION_BLOCKED'); + expect(err.message).toBe('Agent execution blocked: Policy violation'); + }); + + it('uses systemMessage in the final error message when available', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.AgentExecutionBlocked, + value: { + reason: 'hook_blocked', + systemMessage: 'Blocked by policy hook', + contextCleared: true, + }, + }; + const result = translateEvent(event, state); + const err = result[0] as AgentEvent<'error'>; + expect(err.message).toBe( + 'Agent execution blocked: Blocked by policy hook', + ); + }); + }); + + describe('LoopDetected events', () => { + it('emits a non-fatal warning error event', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.LoopDetected, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + expect(result[0]?.type).toBe('error'); + const loopWarning = result[0] as AgentEvent<'error'>; + expect(loopWarning.fatal).toBe(false); + expect(loopWarning.message).toBe('Loop detected, stopping execution'); + expect(loopWarning._meta?.['code']).toBe('LOOP_DETECTED'); + }); + }); + + describe('MaxSessionTurns events', () => { + it('emits agent_end with max_turns', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.MaxSessionTurns, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + const streamEnd = result[0] as AgentEvent<'agent_end'>; + expect(streamEnd.type).toBe('agent_end'); + expect(streamEnd.reason).toBe('max_turns'); + expect(streamEnd.data).toEqual({ code: 'MAX_TURNS_EXCEEDED' }); + }); + }); + + describe('Finished events', () => { + it('emits usage for STOP', () => { + state.streamStartEmitted = true; + state.model = 'gemini-2.5-pro'; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.Finished, + value: { + reason: FinishReason.STOP, + usageMetadata: { + promptTokenCount: 100, + candidatesTokenCount: 50, + cachedContentTokenCount: 10, + }, + }, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + + const usage = result[0] as AgentEvent<'usage'>; + expect(usage.model).toBe('gemini-2.5-pro'); + expect(usage.inputTokens).toBe(100); + expect(usage.outputTokens).toBe(50); + expect(usage.cachedTokens).toBe(10); + }); + + it('emits nothing when no usage metadata is present', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.Finished, + value: { reason: undefined, usageMetadata: undefined }, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(0); + }); + }); + + describe('Citation events', () => { + it('emits message with citation meta', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.Citation, + value: 'Source: example.com', + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + const msg = result[0] as AgentEvent<'message'>; + expect(msg.content).toEqual([ + { type: 'text', text: 'Source: example.com' }, + ]); + expect(msg._meta?.['citation']).toBe(true); + }); + }); + + describe('UserCancelled events', () => { + it('emits agent_end with reason aborted', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.UserCancelled, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + const end = result[0] as AgentEvent<'agent_end'>; + expect(end.type).toBe('agent_end'); + expect(end.reason).toBe('aborted'); + }); + }); + + describe('ContextWindowWillOverflow events', () => { + it('emits fatal error', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.ContextWindowWillOverflow, + value: { + estimatedRequestTokenCount: 150000, + remainingTokenCount: 10000, + }, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + const err = result[0] as AgentEvent<'error'>; + expect(err.status).toBe('RESOURCE_EXHAUSTED'); + expect(err.fatal).toBe(true); + expect(err.message).toContain('150000'); + expect(err.message).toContain('10000'); + }); + }); + + describe('InvalidStream events', () => { + it('emits fatal error', () => { + state.streamStartEmitted = true; + const event: ServerGeminiStreamEvent = { + type: GeminiEventType.InvalidStream, + }; + const result = translateEvent(event, state); + expect(result).toHaveLength(1); + const err = result[0] as AgentEvent<'error'>; + expect(err.status).toBe('INTERNAL'); + expect(err.message).toBe('Invalid stream received from model'); + expect(err.fatal).toBe(true); + }); + }); + + describe('Events with no output', () => { + it('returns empty for Retry', () => { + const result = translateEvent({ type: GeminiEventType.Retry }, state); + expect(result).toEqual([]); + }); + + it('returns empty for ChatCompressed with null', () => { + const result = translateEvent( + { type: GeminiEventType.ChatCompressed, value: null }, + state, + ); + expect(result).toEqual([]); + }); + + it('returns empty for ToolCallConfirmation', () => { + // ToolCallConfirmation is skipped in non-interactive mode (elicitations + // are deferred to the interactive runtime adaptation). + const event = { + type: GeminiEventType.ToolCallConfirmation, + value: { + request: { + callId: 'c1', + name: 'tool', + args: {}, + isClientInitiated: false, + prompt_id: 'p1', + }, + details: { type: 'info', title: 'Confirm', prompt: 'Confirm?' }, + }, + } as ServerGeminiStreamEvent; + const result = translateEvent(event, state); + expect(result).toEqual([]); + }); + }); + + describe('Event IDs', () => { + it('generates sequential IDs', () => { + state.streamStartEmitted = true; + const e1 = translateEvent( + { type: GeminiEventType.Content, value: 'a' }, + state, + ); + const e2 = translateEvent( + { type: GeminiEventType.Content, value: 'b' }, + state, + ); + expect(e1[0]?.id).toBe('test-stream-0'); + expect(e2[0]?.id).toBe('test-stream-1'); + }); + + it('includes streamId in events', () => { + const events = translateEvent( + { type: GeminiEventType.Content, value: 'hi' }, + state, + ); + for (const e of events) { + expect(e.streamId).toBe('test-stream'); + } + }); + }); +}); + +describe('mapFinishReason', () => { + it('maps STOP to completed', () => { + expect(mapFinishReason(FinishReason.STOP)).toBe('completed'); + }); + + it('maps undefined to completed', () => { + expect(mapFinishReason(undefined)).toBe('completed'); + }); + + it('maps MAX_TOKENS to max_budget', () => { + expect(mapFinishReason(FinishReason.MAX_TOKENS)).toBe('max_budget'); + }); + + it('maps SAFETY to refusal', () => { + expect(mapFinishReason(FinishReason.SAFETY)).toBe('refusal'); + }); + + it('maps MALFORMED_FUNCTION_CALL to failed', () => { + expect(mapFinishReason(FinishReason.MALFORMED_FUNCTION_CALL)).toBe( + 'failed', + ); + }); + + it('maps RECITATION to refusal', () => { + expect(mapFinishReason(FinishReason.RECITATION)).toBe('refusal'); + }); + + it('maps LANGUAGE to refusal', () => { + expect(mapFinishReason(FinishReason.LANGUAGE)).toBe('refusal'); + }); + + it('maps BLOCKLIST to refusal', () => { + expect(mapFinishReason(FinishReason.BLOCKLIST)).toBe('refusal'); + }); + + it('maps OTHER to failed', () => { + expect(mapFinishReason(FinishReason.OTHER)).toBe('failed'); + }); + + it('maps PROHIBITED_CONTENT to refusal', () => { + expect(mapFinishReason(FinishReason.PROHIBITED_CONTENT)).toBe('refusal'); + }); + + it('maps IMAGE_SAFETY to refusal', () => { + expect(mapFinishReason(FinishReason.IMAGE_SAFETY)).toBe('refusal'); + }); + + it('maps IMAGE_PROHIBITED_CONTENT to refusal', () => { + expect(mapFinishReason(FinishReason.IMAGE_PROHIBITED_CONTENT)).toBe( + 'refusal', + ); + }); + + it('maps UNEXPECTED_TOOL_CALL to failed', () => { + expect(mapFinishReason(FinishReason.UNEXPECTED_TOOL_CALL)).toBe('failed'); + }); + + it('maps NO_IMAGE to failed', () => { + expect(mapFinishReason(FinishReason.NO_IMAGE)).toBe('failed'); + }); +}); + +describe('mapHttpToGrpcStatus', () => { + it('maps 400 to INVALID_ARGUMENT', () => { + expect(mapHttpToGrpcStatus(400)).toBe('INVALID_ARGUMENT'); + }); + + it('maps 401 to UNAUTHENTICATED', () => { + expect(mapHttpToGrpcStatus(401)).toBe('UNAUTHENTICATED'); + }); + + it('maps 429 to RESOURCE_EXHAUSTED', () => { + expect(mapHttpToGrpcStatus(429)).toBe('RESOURCE_EXHAUSTED'); + }); + + it('maps undefined to INTERNAL', () => { + expect(mapHttpToGrpcStatus(undefined)).toBe('INTERNAL'); + }); + + it('maps unknown codes to INTERNAL', () => { + expect(mapHttpToGrpcStatus(418)).toBe('INTERNAL'); + }); +}); + +describe('mapError', () => { + it('maps structured errors with status', () => { + const result = mapError({ message: 'Rate limit', status: 429 }); + expect(result.status).toBe('RESOURCE_EXHAUSTED'); + expect(result.message).toBe('Rate limit'); + expect(result.fatal).toBe(true); + expect(result._meta?.['rawError']).toEqual({ + message: 'Rate limit', + status: 429, + }); + }); + + it('maps Error instances', () => { + const result = mapError(new Error('Something failed')); + expect(result.status).toBe('INTERNAL'); + expect(result.message).toBe('Something failed'); + }); + + it('preserves error name in _meta', () => { + class CustomError extends Error { + constructor(msg: string) { + super(msg); + } + } + const result = mapError(new CustomError('test')); + expect(result._meta?.['errorName']).toBe('CustomError'); + }); + + it('maps non-Error values to string', () => { + const result = mapError('raw string error'); + expect(result.message).toBe('raw string error'); + expect(result.status).toBe('INTERNAL'); + }); +}); + +describe('mapUsage', () => { + it('maps all fields', () => { + const result = mapUsage( + { + promptTokenCount: 100, + candidatesTokenCount: 50, + cachedContentTokenCount: 25, + }, + 'gemini-2.5-pro', + ); + expect(result).toEqual({ + model: 'gemini-2.5-pro', + inputTokens: 100, + outputTokens: 50, + cachedTokens: 25, + }); + }); + + it('uses "unknown" for missing model', () => { + const result = mapUsage({}); + expect(result.model).toBe('unknown'); + }); +}); diff --git a/packages/core/src/agent/event-translator.ts b/packages/core/src/agent/event-translator.ts new file mode 100644 index 0000000000..73f93f4a15 --- /dev/null +++ b/packages/core/src/agent/event-translator.ts @@ -0,0 +1,457 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +/** + * @fileoverview Pure, stateless-per-call translation functions that convert + * ServerGeminiStreamEvent objects into AgentEvent objects. + * + * No side effects, no generators. Each call to `translateEvent` takes an event + * and mutable TranslationState, returning zero or more AgentEvents. + */ + +import type { FinishReason } from '@google/genai'; +import { GeminiEventType } from '../core/turn.js'; +import type { + ServerGeminiStreamEvent, + StructuredError, + GeminiFinishedEventValue, +} from '../core/turn.js'; +import type { + AgentEvent, + StreamEndReason, + ErrorData, + Usage, + AgentEventType, +} from './types.js'; +import { + geminiPartsToContentParts, + toolResultDisplayToContentParts, + buildToolResponseData, +} from './content-utils.js'; + +// --------------------------------------------------------------------------- +// Translation State +// --------------------------------------------------------------------------- + +export interface TranslationState { + streamId: string; + streamStartEmitted: boolean; + model: string | undefined; + eventCounter: number; + /** Tracks callId → tool name from requests so responses can reference the name. */ + pendingToolNames: Map; +} + +export function createTranslationState(streamId?: string): TranslationState { + return { + streamId: streamId ?? crypto.randomUUID(), + streamStartEmitted: false, + model: undefined, + eventCounter: 0, + pendingToolNames: new Map(), + }; +} + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +function makeEvent( + type: T, + state: TranslationState, + payload: Partial>, +): AgentEvent { + const id = `${state.streamId}-${state.eventCounter++}`; + // TypeScript cannot preserve the specific discriminated union member across + // this generic object assembly, so keep the narrowing local to the event + // constructor boundary. + // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion + return { + ...payload, + id, + timestamp: new Date().toISOString(), + streamId: state.streamId, + type, + } as AgentEvent; +} + +function ensureStreamStart(state: TranslationState, out: AgentEvent[]): void { + if (!state.streamStartEmitted) { + out.push(makeEvent('agent_start', state, {})); + state.streamStartEmitted = true; + } +} + +// --------------------------------------------------------------------------- +// Core Translator +// --------------------------------------------------------------------------- + +/** + * Translates a single ServerGeminiStreamEvent into zero or more AgentEvents. + * Mutates `state` (counter, flags) as a side effect. + */ +export function translateEvent( + event: ServerGeminiStreamEvent, + state: TranslationState, +): AgentEvent[] { + const out: AgentEvent[] = []; + + switch (event.type) { + case GeminiEventType.ModelInfo: + state.model = event.value; + ensureStreamStart(state, out); + out.push(makeEvent('session_update', state, { model: event.value })); + break; + + case GeminiEventType.Content: + ensureStreamStart(state, out); + out.push( + makeEvent('message', state, { + role: 'agent', + content: [{ type: 'text', text: event.value }], + }), + ); + break; + + case GeminiEventType.Thought: + ensureStreamStart(state, out); + out.push( + makeEvent('message', state, { + role: 'agent', + content: [{ type: 'thought', thought: event.value.description }], + _meta: event.value.subject + ? { source: 'agent', subject: event.value.subject } + : { source: 'agent' }, + }), + ); + break; + + case GeminiEventType.Citation: + ensureStreamStart(state, out); + out.push( + makeEvent('message', state, { + role: 'agent', + content: [{ type: 'text', text: event.value }], + _meta: { source: 'agent', citation: true }, + }), + ); + break; + + case GeminiEventType.Finished: + handleFinished(event.value, state, out); + break; + + case GeminiEventType.Error: + handleError(event.value.error, state, out); + break; + + case GeminiEventType.UserCancelled: + ensureStreamStart(state, out); + out.push( + makeEvent('agent_end', state, { + reason: 'aborted', + }), + ); + break; + + case GeminiEventType.MaxSessionTurns: + ensureStreamStart(state, out); + out.push( + makeEvent('agent_end', state, { + reason: 'max_turns', + data: { + code: 'MAX_TURNS_EXCEEDED', + }, + }), + ); + break; + + case GeminiEventType.LoopDetected: + ensureStreamStart(state, out); + out.push( + makeEvent('error', state, { + status: 'INTERNAL', + message: 'Loop detected, stopping execution', + fatal: false, + _meta: { code: 'LOOP_DETECTED' }, + }), + ); + break; + + case GeminiEventType.ContextWindowWillOverflow: + ensureStreamStart(state, out); + out.push( + makeEvent('error', state, { + status: 'RESOURCE_EXHAUSTED', + message: `Context window will overflow (estimated: ${event.value.estimatedRequestTokenCount}, remaining: ${event.value.remainingTokenCount})`, + fatal: true, + }), + ); + break; + + case GeminiEventType.AgentExecutionStopped: + ensureStreamStart(state, out); + out.push( + makeEvent('agent_end', state, { + reason: 'completed', + data: { + message: event.value.systemMessage?.trim() || event.value.reason, + }, + }), + ); + break; + + case GeminiEventType.AgentExecutionBlocked: + ensureStreamStart(state, out); + out.push( + makeEvent('error', state, { + status: 'PERMISSION_DENIED', + message: `Agent execution blocked: ${event.value.systemMessage?.trim() || event.value.reason}`, + fatal: false, + _meta: { code: 'AGENT_EXECUTION_BLOCKED' }, + }), + ); + break; + + case GeminiEventType.InvalidStream: + ensureStreamStart(state, out); + out.push( + makeEvent('error', state, { + status: 'INTERNAL', + message: 'Invalid stream received from model', + fatal: true, + }), + ); + break; + + case GeminiEventType.ToolCallRequest: + ensureStreamStart(state, out); + state.pendingToolNames.set(event.value.callId, event.value.name); + out.push( + makeEvent('tool_request', state, { + requestId: event.value.callId, + name: event.value.name, + args: event.value.args, + }), + ); + break; + + case GeminiEventType.ToolCallResponse: { + ensureStreamStart(state, out); + const displayContent = toolResultDisplayToContentParts( + event.value.resultDisplay, + ); + const data = buildToolResponseData(event.value); + out.push( + makeEvent('tool_response', state, { + requestId: event.value.callId, + name: state.pendingToolNames.get(event.value.callId) ?? 'unknown', + content: event.value.error + ? [{ type: 'text', text: event.value.error.message }] + : geminiPartsToContentParts(event.value.responseParts), + isError: event.value.error !== undefined, + ...(displayContent ? { displayContent } : {}), + ...(data ? { data } : {}), + }), + ); + state.pendingToolNames.delete(event.value.callId); + break; + } + + case GeminiEventType.ToolCallConfirmation: + // Elicitations are handled separately by the session layer + break; + + // Internal concerns — no AgentEvent emitted + case GeminiEventType.ChatCompressed: + case GeminiEventType.Retry: + break; + + default: + ((x: never) => { + throw new Error(`Unhandled event type: ${JSON.stringify(x)}`); + })(event); + break; + } + + return out; +} + +// --------------------------------------------------------------------------- +// Finished Event Handling +// --------------------------------------------------------------------------- + +function handleFinished( + value: GeminiFinishedEventValue, + state: TranslationState, + out: AgentEvent[], +): void { + if (value.usageMetadata) { + ensureStreamStart(state, out); + const usage = mapUsage(value.usageMetadata, state.model); + out.push(makeEvent('usage', state, usage)); + } +} + +// --------------------------------------------------------------------------- +// Error Handling +// --------------------------------------------------------------------------- + +function handleError( + error: unknown, + state: TranslationState, + out: AgentEvent[], +): void { + ensureStreamStart(state, out); + + const mapped = mapError(error); + out.push(makeEvent('error', state, mapped)); +} + +// --------------------------------------------------------------------------- +// Public Mapping Functions +// --------------------------------------------------------------------------- + +/** + * Maps a Gemini FinishReason to an AgentEnd reason. + */ +export function mapFinishReason( + reason: FinishReason | undefined, +): StreamEndReason { + if (!reason) return 'completed'; + + switch (reason) { + case 'STOP': + case 'FINISH_REASON_UNSPECIFIED': + return 'completed'; + case 'MAX_TOKENS': + return 'max_budget'; + case 'SAFETY': + case 'RECITATION': + case 'LANGUAGE': + case 'BLOCKLIST': + case 'PROHIBITED_CONTENT': + case 'SPII': + case 'IMAGE_SAFETY': + case 'IMAGE_PROHIBITED_CONTENT': + return 'refusal'; + case 'MALFORMED_FUNCTION_CALL': + case 'OTHER': + case 'UNEXPECTED_TOOL_CALL': + case 'NO_IMAGE': + return 'failed'; + default: + return 'failed'; + } +} + +/** + * Maps an HTTP status code to a gRPC-style status string. + */ +export function mapHttpToGrpcStatus( + httpStatus: number | undefined, +): ErrorData['status'] { + if (httpStatus === undefined) return 'INTERNAL'; + + switch (httpStatus) { + case 400: + return 'INVALID_ARGUMENT'; + case 401: + return 'UNAUTHENTICATED'; + case 403: + return 'PERMISSION_DENIED'; + case 404: + return 'NOT_FOUND'; + case 409: + return 'ALREADY_EXISTS'; + case 429: + return 'RESOURCE_EXHAUSTED'; + case 500: + return 'INTERNAL'; + case 501: + return 'UNIMPLEMENTED'; + case 503: + return 'UNAVAILABLE'; + case 504: + return 'DEADLINE_EXCEEDED'; + default: + return 'INTERNAL'; + } +} + +/** + * Maps a StructuredError (or unknown error value) to an ErrorData payload. + * Preserves selected error metadata in _meta and includes raw structured + * errors for lossless debugging. + */ +export function mapError( + error: unknown, +): ErrorData & { _meta?: Record } { + const meta: Record = {}; + + if (error instanceof Error) { + meta['errorName'] = error.constructor.name; + if ('exitCode' in error && typeof error.exitCode === 'number') { + meta['exitCode'] = error.exitCode; + } + if ('code' in error) { + meta['code'] = error.code; + } + } + + if (isStructuredError(error)) { + const structuredMeta = { ...meta, rawError: error }; + return { + status: mapHttpToGrpcStatus(error.status), + message: error.message, + fatal: true, + _meta: structuredMeta, + }; + } + + if (error instanceof Error) { + return { + status: 'INTERNAL', + message: error.message, + fatal: true, + ...(Object.keys(meta).length > 0 ? { _meta: meta } : {}), + }; + } + + return { + status: 'INTERNAL', + message: String(error), + fatal: true, + }; +} + +function isStructuredError(error: unknown): error is StructuredError { + return ( + typeof error === 'object' && + error !== null && + 'message' in error && + typeof error.message === 'string' + ); +} + +/** + * Maps Gemini usageMetadata to Usage. + */ +export function mapUsage( + metadata: { + promptTokenCount?: number; + candidatesTokenCount?: number; + cachedContentTokenCount?: number; + }, + model?: string, +): Usage { + return { + model: model ?? 'unknown', + inputTokens: metadata.promptTokenCount, + outputTokens: metadata.candidatesTokenCount, + cachedTokens: metadata.cachedContentTokenCount, + }; +} diff --git a/packages/core/src/agent/legacy-agent-session.test.ts b/packages/core/src/agent/legacy-agent-session.test.ts new file mode 100644 index 0000000000..438b1e5ef0 --- /dev/null +++ b/packages/core/src/agent/legacy-agent-session.test.ts @@ -0,0 +1,1417 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +import { describe, expect, it, vi, beforeEach } from 'vitest'; +import { FinishReason } from '@google/genai'; +import { LegacyAgentSession } from './legacy-agent-session.js'; +import type { LegacyAgentSessionDeps } from './legacy-agent-session.js'; +import { GeminiEventType } from '../core/turn.js'; +import type { ServerGeminiStreamEvent } from '../core/turn.js'; +import type { AgentEvent } from './types.js'; +import { ToolErrorType } from '../tools/tool-error.js'; +import type { + CompletedToolCall, + ToolCallRequestInfo, +} from '../scheduler/types.js'; +import { CoreToolCallStatus } from '../scheduler/types.js'; + +// --------------------------------------------------------------------------- +// Mock helpers +// --------------------------------------------------------------------------- + +function createMockDeps( + overrides?: Partial, +): LegacyAgentSessionDeps { + const mockClient = { + sendMessageStream: vi.fn(), + getChat: vi.fn().mockReturnValue({ + recordCompletedToolCalls: vi.fn(), + }), + getCurrentSequenceModel: vi.fn().mockReturnValue(null), + }; + + const mockScheduler = { + schedule: vi.fn().mockResolvedValue([]), + }; + + const mockConfig = { + getMaxSessionTurns: vi.fn().mockReturnValue(-1), + getModel: vi.fn().mockReturnValue('gemini-2.5-pro'), + }; + + return { + client: mockClient as unknown as LegacyAgentSessionDeps['client'], + + scheduler: mockScheduler as unknown as LegacyAgentSessionDeps['scheduler'], + + config: mockConfig as unknown as LegacyAgentSessionDeps['config'], + promptId: 'test-prompt', + streamId: 'test-stream', + ...overrides, + }; +} + +async function* makeStream( + events: ServerGeminiStreamEvent[], +): AsyncGenerator { + for (const event of events) { + yield event; + } +} + +function makeToolRequest(callId: string, name: string): ToolCallRequestInfo { + return { + callId, + name, + args: {}, + isClientInitiated: false, + prompt_id: 'p1', + }; +} + +function makeCompletedToolCall( + callId: string, + name: string, + responseText: string, +): CompletedToolCall { + return { + status: CoreToolCallStatus.Success, + request: makeToolRequest(callId, name), + response: { + callId, + responseParts: [{ text: responseText }], + resultDisplay: undefined, + error: undefined, + errorType: undefined, + }, + + tool: {} as CompletedToolCall extends { tool: infer T } ? T : never, + + invocation: {} as CompletedToolCall extends { invocation: infer T } + ? T + : never, + } as CompletedToolCall; +} + +async function collectEvents( + session: LegacyAgentSession, + options?: { streamId?: string; eventId?: string }, +): Promise { + const events: AgentEvent[] = []; + const streamOptions = + options?.eventId || options?.streamId ? options : undefined; + + for await (const event of streamOptions + ? session.stream(streamOptions) + : session.stream()) { + events.push(event); + } + return events; +} + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +describe('LegacyAgentSession', () => { + let deps: LegacyAgentSessionDeps; + + beforeEach(() => { + deps = createMockDeps(); + vi.useFakeTimers({ shouldAdvanceTime: true }); + }); + + describe('send', () => { + it('returns streamId', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + makeStream([ + { type: GeminiEventType.Content, value: 'hello' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + const result = await session.send({ + message: [{ type: 'text', text: 'hi' }], + }); + + expect(result.streamId).toBe('test-stream'); + }); + + it('records the sent user message in the trajectory before send resolves', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + makeStream([ + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + const { streamId } = await session.send({ + message: [{ type: 'text', text: 'hi' }], + _meta: { source: 'user-test' }, + }); + + const userMessage = session.events.find( + (e): e is AgentEvent<'message'> => + e.type === 'message' && e.role === 'user' && e.streamId === streamId, + ); + expect(userMessage?.content).toEqual([{ type: 'text', text: 'hi' }]); + expect(userMessage?._meta).toEqual({ source: 'user-test' }); + + await collectEvents(session, { streamId: streamId ?? undefined }); + }); + + it('returns streamId before emitting agent_start', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + makeStream([ + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + const liveEvents: AgentEvent[] = []; + session.subscribe((event) => { + liveEvents.push(event); + }); + + const { streamId } = await session.send({ + message: [{ type: 'text', text: 'hi' }], + }); + + expect(streamId).toBe('test-stream'); + expect(liveEvents.some((event) => event.type === 'agent_start')).toBe( + false, + ); + + await collectEvents(session, { streamId: streamId ?? undefined }); + expect(liveEvents.some((event) => event.type === 'agent_start')).toBe( + true, + ); + }); + + it('throws for non-message payloads', async () => { + const session = new LegacyAgentSession(deps); + await expect(session.send({ update: { title: 'test' } })).rejects.toThrow( + 'only supports message sends', + ); + }); + + it('throws if send is called while a stream is active', async () => { + let resolveHang: (() => void) | undefined; + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + (async function* () { + await new Promise((resolve) => { + resolveHang = resolve; + }); + yield { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + } as ServerGeminiStreamEvent; + })(), + ); + + const session = new LegacyAgentSession(deps); + const { streamId } = await session.send({ + message: [{ type: 'text', text: 'first' }], + }); + await vi.advanceTimersByTimeAsync(0); + + await expect( + session.send({ message: [{ type: 'text', text: 'second' }] }), + ).rejects.toThrow('cannot be called while a stream is active'); + + resolveHang?.(); + await collectEvents(session, { streamId: streamId ?? undefined }); + }); + + it('creates a new streamId after the previous stream completes', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock + .mockReturnValueOnce( + makeStream([ + { type: GeminiEventType.Content, value: 'first response' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ) + .mockReturnValueOnce( + makeStream([ + { type: GeminiEventType.Content, value: 'second response' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + const first = await session.send({ + message: [{ type: 'text', text: 'first' }], + }); + const firstEvents = await collectEvents(session, { + streamId: first.streamId ?? undefined, + }); + + const second = await session.send({ + message: [{ type: 'text', text: 'second' }], + }); + const secondEvents = await collectEvents(session, { + streamId: second.streamId ?? undefined, + }); + const userMessages = session.events.filter( + (e): e is AgentEvent<'message'> => + e.type === 'message' && e.role === 'user', + ); + + expect(first.streamId).not.toBe(second.streamId); + expect( + userMessages.some( + (e) => + e.streamId === first.streamId && + e.content[0]?.type === 'text' && + e.content[0].text === 'first', + ), + ).toBe(true); + expect( + userMessages.some( + (e) => + e.streamId === second.streamId && + e.content[0]?.type === 'text' && + e.content[0].text === 'second', + ), + ).toBe(true); + expect(firstEvents.some((e) => e.type === 'agent_end')).toBe(true); + expect(secondEvents.some((e) => e.type === 'agent_end')).toBe(true); + }); + }); + + describe('stream - basic flow', () => { + it('emits agent_start, content messages, and agent_end', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + makeStream([ + { type: GeminiEventType.Content, value: 'Hello' }, + { type: GeminiEventType.Content, value: ' World' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + const events = await collectEvents(session); + + const types = events.map((e) => e.type); + expect(types).toContain('agent_start'); + expect(types).toContain('message'); + expect(types).toContain('agent_end'); + + const messages = events.filter( + (e): e is AgentEvent<'message'> => + e.type === 'message' && e.role === 'agent', + ); + expect(messages).toHaveLength(2); + expect(messages[0]?.content).toEqual([{ type: 'text', text: 'Hello' }]); + + const streamEnd = events.find( + (e): e is AgentEvent<'agent_end'> => e.type === 'agent_end', + ); + expect(streamEnd?.reason).toBe('completed'); + }); + }); + + describe('stream - tool calls', () => { + it('handles a tool call round-trip', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + // First turn: model requests a tool + sendMock.mockReturnValueOnce( + makeStream([ + { + type: GeminiEventType.ToolCallRequest, + value: makeToolRequest('call-1', 'read_file'), + }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + // Second turn: model provides final answer + sendMock.mockReturnValueOnce( + makeStream([ + { type: GeminiEventType.Content, value: 'Done!' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const scheduleMock = deps.scheduler.schedule as ReturnType; + scheduleMock.mockResolvedValueOnce([ + makeCompletedToolCall('call-1', 'read_file', 'file contents'), + ]); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'read a file' }] }); + const events = await collectEvents(session); + + const types = events.map((e) => e.type); + expect(types).toContain('tool_request'); + expect(types).toContain('tool_response'); + expect(types).toContain('agent_end'); + + const toolReq = events.find( + (e): e is AgentEvent<'tool_request'> => e.type === 'tool_request', + ); + expect(toolReq?.name).toBe('read_file'); + + const toolResp = events.find( + (e): e is AgentEvent<'tool_response'> => e.type === 'tool_response', + ); + expect(toolResp?.name).toBe('read_file'); + expect(toolResp?.content).toEqual([ + { type: 'text', text: 'file contents' }, + ]); + expect(toolResp?.isError).toBe(false); + + // Should have called sendMessageStream twice + expect(sendMock).toHaveBeenCalledTimes(2); + }); + + it('handles tool errors and sends error message in content', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValueOnce( + makeStream([ + { + type: GeminiEventType.ToolCallRequest, + value: makeToolRequest('call-1', 'write_file'), + }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + sendMock.mockReturnValueOnce( + makeStream([ + { type: GeminiEventType.Content, value: 'Failed' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const errorToolCall: CompletedToolCall = { + status: CoreToolCallStatus.Error, + request: makeToolRequest('call-1', 'write_file'), + response: { + callId: 'call-1', + responseParts: [{ text: 'stale' }], + resultDisplay: 'Error display', + error: new Error('Permission denied'), + errorType: 'permission_denied', + }, + } as CompletedToolCall; + + const scheduleMock = deps.scheduler.schedule as ReturnType; + scheduleMock.mockResolvedValueOnce([errorToolCall]); + + const session = new LegacyAgentSession(deps); + await session.send({ + message: [{ type: 'text', text: 'write file' }], + }); + const events = await collectEvents(session); + + const toolResp = events.find( + (e): e is AgentEvent<'tool_response'> => e.type === 'tool_response', + ); + expect(toolResp?.isError).toBe(true); + // Uses error.message, not responseParts + expect(toolResp?.content).toEqual([ + { type: 'text', text: 'Permission denied' }, + ]); + expect(toolResp?.displayContent).toEqual([ + { type: 'text', text: 'Error display' }, + ]); + }); + + it('stops on STOP_EXECUTION tool error', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValueOnce( + makeStream([ + { + type: GeminiEventType.ToolCallRequest, + value: makeToolRequest('call-1', 'dangerous_tool'), + }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const stopToolCall: CompletedToolCall = { + status: CoreToolCallStatus.Error, + request: makeToolRequest('call-1', 'dangerous_tool'), + response: { + callId: 'call-1', + responseParts: [], + resultDisplay: undefined, + error: new Error('Stopped by policy'), + errorType: ToolErrorType.STOP_EXECUTION, + }, + } as CompletedToolCall; + + const scheduleMock = deps.scheduler.schedule as ReturnType; + scheduleMock.mockResolvedValueOnce([stopToolCall]); + + const session = new LegacyAgentSession(deps); + await session.send({ + message: [{ type: 'text', text: 'do something' }], + }); + const events = await collectEvents(session); + + const streamEnd = events.find( + (e): e is AgentEvent<'agent_end'> => e.type === 'agent_end', + ); + expect(streamEnd?.reason).toBe('completed'); + // Should NOT make a second call + expect(sendMock).toHaveBeenCalledTimes(1); + }); + + it('treats fatal tool errors as tool_response followed by agent_end failed', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValueOnce( + makeStream([ + { + type: GeminiEventType.ToolCallRequest, + value: makeToolRequest('call-1', 'write_file'), + }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const fatalToolCall: CompletedToolCall = { + status: CoreToolCallStatus.Error, + request: makeToolRequest('call-1', 'write_file'), + response: { + callId: 'call-1', + responseParts: [], + resultDisplay: undefined, + error: new Error('Disk full'), + errorType: ToolErrorType.NO_SPACE_LEFT, + }, + } as CompletedToolCall; + + const scheduleMock = deps.scheduler.schedule as ReturnType; + scheduleMock.mockResolvedValueOnce([fatalToolCall]); + + const session = new LegacyAgentSession(deps); + await session.send({ + message: [{ type: 'text', text: 'write file' }], + }); + const events = await collectEvents(session); + + const toolResp = events.find( + (e): e is AgentEvent<'tool_response'> => e.type === 'tool_response', + ); + expect(toolResp?.isError).toBe(true); + expect(toolResp?.content).toEqual([{ type: 'text', text: 'Disk full' }]); + expect( + events.some( + (e): e is AgentEvent<'error'> => + e.type === 'error' && e.fatal === true, + ), + ).toBe(false); + + const streamEnd = events.findLast( + (e): e is AgentEvent<'agent_end'> => e.type === 'agent_end', + ); + expect(streamEnd?.reason).toBe('failed'); + expect(sendMock).toHaveBeenCalledTimes(1); + }); + }); + + describe('stream - terminal events', () => { + it('handles AgentExecutionStopped', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + makeStream([ + { + type: GeminiEventType.AgentExecutionStopped, + value: { reason: 'hook', systemMessage: 'Halted by hook' }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + const events = await collectEvents(session); + + const streamEnd = events.find( + (e): e is AgentEvent<'agent_end'> => e.type === 'agent_end', + ); + expect(streamEnd?.reason).toBe('completed'); + expect(streamEnd?.data).toEqual({ message: 'Halted by hook' }); + }); + + it('handles AgentExecutionBlocked as non-terminal and continues the stream', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + makeStream([ + { + type: GeminiEventType.AgentExecutionBlocked, + value: { reason: 'Blocked by hook' }, + }, + { type: GeminiEventType.Content, value: 'Final answer' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + const events = await collectEvents(session); + + const blocked = events.find( + (e): e is AgentEvent<'error'> => + e.type === 'error' && e._meta?.['code'] === 'AGENT_EXECUTION_BLOCKED', + ); + expect(blocked?.fatal).toBe(false); + expect(blocked?.message).toBe('Agent execution blocked: Blocked by hook'); + + const messages = events.filter( + (e): e is AgentEvent<'message'> => + e.type === 'message' && e.role === 'agent', + ); + expect( + messages.some( + (message) => + message.content[0]?.type === 'text' && + message.content[0].text === 'Final answer', + ), + ).toBe(true); + + const streamEnd = events.find( + (e): e is AgentEvent<'agent_end'> => e.type === 'agent_end', + ); + expect(streamEnd?.reason).toBe('completed'); + }); + + it('handles Error events', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + makeStream([ + { + type: GeminiEventType.Error, + value: { error: new Error('API error') }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + const events = await collectEvents(session); + + const err = events.find( + (e): e is AgentEvent<'error'> => e.type === 'error', + ); + expect(err?.message).toBe('API error'); + expect(events.some((e) => e.type === 'agent_end')).toBe(true); + }); + + it('handles LoopDetected as non-terminal warning event', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + // LoopDetected followed by more content — stream continues + sendMock.mockReturnValue( + makeStream([ + { type: GeminiEventType.LoopDetected }, + { type: GeminiEventType.Content, value: 'continuing after loop' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + const events = await collectEvents(session); + + const warning = events.find( + (e): e is AgentEvent<'error'> => + e.type === 'error' && e._meta?.['code'] === 'LOOP_DETECTED', + ); + expect(warning).toBeDefined(); + expect(warning?.fatal).toBe(false); + + // Stream should have continued — content after loop detected + const messages = events.filter( + (e): e is AgentEvent<'message'> => + e.type === 'message' && e.role === 'agent', + ); + expect( + messages.some( + (m) => + m.content[0]?.type === 'text' && + m.content[0].text === 'continuing after loop', + ), + ).toBe(true); + + // Should still end with agent_end completed + const streamEnd = events.find( + (e): e is AgentEvent<'agent_end'> => e.type === 'agent_end', + ); + expect(streamEnd?.reason).toBe('completed'); + }); + }); + + describe('stream - max turns', () => { + it('emits agent_end with max_turns when the session turn limit is exceeded', async () => { + const configMock = deps.config.getMaxSessionTurns as ReturnType< + typeof vi.fn + >; + configMock.mockReturnValue(0); + + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + makeStream([ + { type: GeminiEventType.Content, value: 'should not be reached' }, + ]), + ); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + const events = await collectEvents(session); + + const streamEnd = events.find( + (e): e is AgentEvent<'agent_end'> => e.type === 'agent_end', + ); + expect(streamEnd?.reason).toBe('max_turns'); + expect(streamEnd?.data).toEqual({ + code: 'MAX_TURNS_EXCEEDED', + maxTurns: 0, + turnCount: 0, + }); + expect(sendMock).not.toHaveBeenCalled(); + }); + + it('treats GeminiClient MaxSessionTurns as a terminal max_turns stream end', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + makeStream([{ type: GeminiEventType.MaxSessionTurns }]), + ); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + const events = await collectEvents(session); + + const errorEvents = events.filter( + (e): e is AgentEvent<'error'> => e.type === 'error', + ); + expect(errorEvents).toHaveLength(0); + + const streamEnd = events.findLast( + (e): e is AgentEvent<'agent_end'> => e.type === 'agent_end', + ); + expect(streamEnd?.reason).toBe('max_turns'); + expect(streamEnd?.data).toEqual({ + code: 'MAX_TURNS_EXCEEDED', + }); + }); + }); + + describe('abort', () => { + it('treats abort before the first model event as aborted without fatal error', async () => { + let releaseAbort: (() => void) | undefined; + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + (async function* () { + await new Promise((resolve) => { + releaseAbort = resolve; + }); + yield* []; + const abortError = new Error('Aborted'); + abortError.name = 'AbortError'; + throw abortError; + })(), + ); + + const session = new LegacyAgentSession(deps); + const { streamId } = await session.send({ + message: [{ type: 'text', text: 'hi' }], + }); + await vi.advanceTimersByTimeAsync(0); + + await session.abort(); + releaseAbort?.(); + + const events = await collectEvents(session, { + streamId: streamId ?? undefined, + }); + expect( + events.some( + (event): event is AgentEvent<'error'> => + event.type === 'error' && event.fatal, + ), + ).toBe(false); + + const streamEnd = events.findLast( + (event): event is AgentEvent<'agent_end'> => event.type === 'agent_end', + ); + expect(streamEnd?.reason).toBe('aborted'); + }); + + it('aborts the stream', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + // Stream that yields content then checks abort signal via a deferred + let resolveHang: (() => void) | undefined; + sendMock.mockReturnValue( + (async function* () { + yield { + type: GeminiEventType.Content, + value: 'start', + } as ServerGeminiStreamEvent; + // Wait until externally resolved (by abort) + await new Promise((resolve) => { + resolveHang = resolve; + }); + yield { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + } as ServerGeminiStreamEvent; + })(), + ); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + + // Give the loop time to start processing + await new Promise((r) => setTimeout(r, 50)); + + // Abort and resolve the hang so the generator can finish + await session.abort(); + resolveHang?.(); + + // Collect all events + const events = await collectEvents(session); + + const streamEnd = events.find( + (e): e is AgentEvent<'agent_end'> => e.type === 'agent_end', + ); + expect(streamEnd?.reason).toBe('aborted'); + }); + + it('treats abort during pending scheduler work as aborted without fatal error', async () => { + let resolveSchedule: ((value: CompletedToolCall[]) => void) | undefined; + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + makeStream([ + { + type: GeminiEventType.ToolCallRequest, + value: makeToolRequest('call-1', 'slow_tool'), + }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const scheduleMock = deps.scheduler.schedule as ReturnType; + scheduleMock.mockReturnValue( + new Promise((resolve) => { + resolveSchedule = resolve; + }), + ); + + const session = new LegacyAgentSession(deps); + const { streamId } = await session.send({ + message: [{ type: 'text', text: 'hi' }], + }); + + await new Promise((resolve) => setTimeout(resolve, 25)); + await session.abort(); + resolveSchedule?.([makeCompletedToolCall('call-1', 'slow_tool', 'done')]); + + const events = await collectEvents(session, { + streamId: streamId ?? undefined, + }); + expect( + events.some( + (event): event is AgentEvent<'error'> => + event.type === 'error' && event.fatal, + ), + ).toBe(false); + expect(events.some((event) => event.type === 'tool_response')).toBe( + false, + ); + + const streamEnd = events.findLast( + (event): event is AgentEvent<'agent_end'> => event.type === 'agent_end', + ); + expect(streamEnd?.reason).toBe('aborted'); + }); + }); + + describe('events property', () => { + it('accumulates all events', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + makeStream([ + { type: GeminiEventType.Content, value: 'hi' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + await collectEvents(session); + + expect(session.events.length).toBeGreaterThan(0); + expect(session.events[0]?.type).toBe('message'); + }); + }); + + describe('subscription and stream scoping', () => { + it('subscribe receives live events for the next stream', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + makeStream([ + { type: GeminiEventType.Content, value: 'hello later' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + const liveEvents: AgentEvent[] = []; + const unsubscribe = session.subscribe((event) => { + liveEvents.push(event); + }); + + const { streamId } = await session.send({ + message: [{ type: 'text', text: 'hi' }], + }); + await collectEvents(session, { streamId: streamId ?? undefined }); + unsubscribe(); + + expect(liveEvents.length).toBeGreaterThan(0); + expect(liveEvents[0]?.type).toBe('message'); + expect(liveEvents.every((event) => event.streamId === streamId)).toBe( + true, + ); + }); + + it('subscribe is live-only and does not replay old history when idle', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock + .mockReturnValueOnce( + makeStream([ + { type: GeminiEventType.Content, value: 'first answer' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ) + .mockReturnValueOnce( + makeStream([ + { type: GeminiEventType.Content, value: 'second answer' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + const first = await session.send({ + message: [{ type: 'text', text: 'first request' }], + }); + await collectEvents(session, { streamId: first.streamId ?? undefined }); + + const liveEvents: AgentEvent[] = []; + const unsubscribe = session.subscribe((event) => { + liveEvents.push(event); + }); + + const second = await session.send({ + message: [{ type: 'text', text: 'second request' }], + }); + await collectEvents(session, { streamId: second.streamId ?? undefined }); + unsubscribe(); + + expect(liveEvents.length).toBeGreaterThan(0); + expect( + liveEvents.every((event) => event.streamId === second.streamId), + ).toBe(true); + expect( + liveEvents.some( + (event) => + event.type === 'message' && + event.role === 'user' && + event.content[0]?.type === 'text' && + event.content[0].text === 'first request', + ), + ).toBe(false); + }); + + it('streams only the requested streamId', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock + .mockReturnValueOnce( + makeStream([ + { type: GeminiEventType.Content, value: 'first answer' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ) + .mockReturnValueOnce( + makeStream([ + { type: GeminiEventType.Content, value: 'second answer' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + const first = await session.send({ + message: [{ type: 'text', text: 'first request' }], + }); + await collectEvents(session, { streamId: first.streamId ?? undefined }); + + const second = await session.send({ + message: [{ type: 'text', text: 'second request' }], + }); + await collectEvents(session, { streamId: second.streamId ?? undefined }); + + const firstStreamEvents = await collectEvents(session, { + streamId: first.streamId ?? undefined, + }); + + expect( + firstStreamEvents.every((event) => event.streamId === first.streamId), + ).toBe(true); + expect( + firstStreamEvents.some( + (e) => + e.type === 'message' && + e.role === 'agent' && + e.content[0]?.type === 'text' && + e.content[0].text === 'first answer', + ), + ).toBe(true); + expect( + firstStreamEvents.some( + (e) => + e.type === 'message' && + e.role === 'agent' && + e.content[0]?.type === 'text' && + e.content[0].text === 'second answer', + ), + ).toBe(false); + }); + + it('resumes from eventId within the same stream only', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock + .mockReturnValueOnce( + makeStream([ + { type: GeminiEventType.Content, value: 'first answer' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ) + .mockReturnValueOnce( + makeStream([ + { type: GeminiEventType.Content, value: 'second answer' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + const first = await session.send({ + message: [{ type: 'text', text: 'first request' }], + }); + await collectEvents(session, { streamId: first.streamId ?? undefined }); + + await session.send({ + message: [{ type: 'text', text: 'second request' }], + }); + await collectEvents(session); + + const firstAgentMessage = session.events.find( + (e): e is AgentEvent<'message'> => + e.type === 'message' && + e.role === 'agent' && + e.streamId === first.streamId && + e.content[0]?.type === 'text' && + e.content[0].text === 'first answer', + ); + expect(firstAgentMessage).toBeDefined(); + + const resumedEvents = await collectEvents(session, { + eventId: firstAgentMessage?.id, + }); + expect( + resumedEvents.every((event) => event.streamId === first.streamId), + ).toBe(true); + expect(resumedEvents.map((event) => event.type)).toEqual(['agent_end']); + expect( + resumedEvents.some( + (e) => + e.type === 'message' && + e.role === 'agent' && + e.content[0]?.type === 'text' && + e.content[0].text === 'second answer', + ), + ).toBe(false); + }); + }); + + describe('agent_end ordering', () => { + it('agent_end is always the final event yielded', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + makeStream([ + { type: GeminiEventType.Content, value: 'Hello' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + const events = await collectEvents(session); + + expect(events.length).toBeGreaterThan(0); + expect(events[events.length - 1]?.type).toBe('agent_end'); + }); + + it('agent_end is final even after error events', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValue( + makeStream([ + { + type: GeminiEventType.Error, + value: { error: new Error('API error') }, + }, + ]), + ); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + const events = await collectEvents(session); + + expect(events[events.length - 1]?.type).toBe('agent_end'); + }); + }); + + describe('intermediate Finished events', () => { + it('does NOT emit agent_end when tool calls are pending', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + // First turn: tool request + Finished (should NOT produce agent_end) + sendMock.mockReturnValueOnce( + makeStream([ + { + type: GeminiEventType.ToolCallRequest, + value: makeToolRequest('call-1', 'read_file'), + }, + { + type: GeminiEventType.Finished, + value: { + reason: FinishReason.STOP, + usageMetadata: { + promptTokenCount: 50, + candidatesTokenCount: 20, + }, + }, + }, + ]), + ); + // Second turn: final answer + sendMock.mockReturnValueOnce( + makeStream([ + { type: GeminiEventType.Content, value: 'Answer' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const scheduleMock = deps.scheduler.schedule as ReturnType; + scheduleMock.mockResolvedValueOnce([ + makeCompletedToolCall('call-1', 'read_file', 'data'), + ]); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'do it' }] }); + const events = await collectEvents(session); + + // Only one agent_end at the very end + const streamEnds = events.filter((e) => e.type === 'agent_end'); + expect(streamEnds).toHaveLength(1); + expect(streamEnds[0]).toBe(events[events.length - 1]); + }); + + it('emits usage for intermediate Finished events', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockReturnValueOnce( + makeStream([ + { + type: GeminiEventType.ToolCallRequest, + value: makeToolRequest('call-1', 'read_file'), + }, + { + type: GeminiEventType.Finished, + value: { + reason: FinishReason.STOP, + usageMetadata: { + promptTokenCount: 100, + candidatesTokenCount: 30, + }, + }, + }, + ]), + ); + sendMock.mockReturnValueOnce( + makeStream([ + { type: GeminiEventType.Content, value: 'Done' }, + { + type: GeminiEventType.Finished, + value: { reason: FinishReason.STOP, usageMetadata: undefined }, + }, + ]), + ); + + const scheduleMock = deps.scheduler.schedule as ReturnType; + scheduleMock.mockResolvedValueOnce([ + makeCompletedToolCall('call-1', 'read_file', 'contents'), + ]); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'go' }] }); + const events = await collectEvents(session); + + // Should have at least one usage event from the intermediate Finished + const usageEvents = events.filter( + (e): e is AgentEvent<'usage'> => e.type === 'usage', + ); + expect(usageEvents.length).toBeGreaterThanOrEqual(1); + expect(usageEvents[0]?.inputTokens).toBe(100); + expect(usageEvents[0]?.outputTokens).toBe(30); + }); + }); + + describe('error handling in runLoop', () => { + it('catches thrown errors and emits error + agent_end', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + sendMock.mockImplementation(() => { + throw new Error('Connection refused'); + }); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + const events = await collectEvents(session); + + const err = events.find( + (e): e is AgentEvent<'error'> => e.type === 'error', + ); + expect(err?.message).toBe('Connection refused'); + expect(err?.fatal).toBe(true); + + const streamEnd = events.find( + (e): e is AgentEvent<'agent_end'> => e.type === 'agent_end', + ); + expect(streamEnd?.reason).toBe('failed'); + }); + }); + + describe('_emitErrorAndAgentEnd metadata', () => { + it('preserves exitCode and code in _meta for FatalError', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + // Simulate a FatalError being thrown + const { FatalError } = await import('../utils/errors.js'); + sendMock.mockImplementation(() => { + throw new FatalError('Disk full', 44); + }); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + const events = await collectEvents(session); + + const err = events.find( + (e): e is AgentEvent<'error'> => e.type === 'error', + ); + expect(err?.message).toBe('Disk full'); + expect(err?.fatal).toBe(true); + expect(err?._meta?.['exitCode']).toBe(44); + expect(err?._meta?.['errorName']).toBe('FatalError'); + }); + + it('preserves exitCode for non-FatalError errors that carry one', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + const exitCodeError = new Error('custom exit'); + (exitCodeError as Error & { exitCode: number }).exitCode = 17; + sendMock.mockImplementation(() => { + throw exitCodeError; + }); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + const events = await collectEvents(session); + + const err = events.find( + (e): e is AgentEvent<'error'> => e.type === 'error', + ); + expect(err?._meta?.['exitCode']).toBe(17); + }); + + it('preserves code in _meta for errors with code property', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + const codedError = new Error('ENOENT'); + (codedError as Error & { code: string }).code = 'ENOENT'; + sendMock.mockImplementation(() => { + throw codedError; + }); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + const events = await collectEvents(session); + + const err = events.find( + (e): e is AgentEvent<'error'> => e.type === 'error', + ); + expect(err?._meta?.['code']).toBe('ENOENT'); + }); + + it('preserves status in _meta for errors with status property', async () => { + const sendMock = deps.client.sendMessageStream as ReturnType< + typeof vi.fn + >; + const statusError = new Error('rate limited'); + (statusError as Error & { status: string }).status = 'RESOURCE_EXHAUSTED'; + sendMock.mockImplementation(() => { + throw statusError; + }); + + const session = new LegacyAgentSession(deps); + await session.send({ message: [{ type: 'text', text: 'hi' }] }); + const events = await collectEvents(session); + + const err = events.find( + (e): e is AgentEvent<'error'> => e.type === 'error', + ); + expect(err?._meta?.['status']).toBe('RESOURCE_EXHAUSTED'); + }); + }); +}); diff --git a/packages/core/src/agent/legacy-agent-session.ts b/packages/core/src/agent/legacy-agent-session.ts new file mode 100644 index 0000000000..d8044e77e3 --- /dev/null +++ b/packages/core/src/agent/legacy-agent-session.ts @@ -0,0 +1,452 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +/** + * @fileoverview LegacyAgentSession backed by the existing Gemini client + + * scheduler loop, adapted to the merged AgentProtocol / AgentSession surface. + */ + +import { GeminiEventType } from '../core/turn.js'; +import type { Part } from '@google/genai'; +import type { GeminiClient } from '../core/client.js'; +import type { Config } from '../config/config.js'; +import type { ToolCallRequestInfo } from '../scheduler/types.js'; +import type { Scheduler } from '../scheduler/scheduler.js'; +import { recordToolCallInteractions } from '../code_assist/telemetry.js'; +import { ToolErrorType, isFatalToolError } from '../tools/tool-error.js'; +import { debugLogger } from '../utils/debugLogger.js'; +import { + buildToolResponseData, + contentPartsToGeminiParts, + geminiPartsToContentParts, + toolResultDisplayToContentParts, +} from './content-utils.js'; +import { AgentSession } from './agent-session.js'; +import { + createTranslationState, + mapFinishReason, + translateEvent, + type TranslationState, +} from './event-translator.js'; +import type { + AgentEvent, + AgentProtocol, + AgentSend, + ContentPart, + StreamEndReason, + Unsubscribe, +} from './types.js'; + +function isAbortLikeError(err: unknown): boolean { + return err instanceof Error && err.name === 'AbortError'; +} + +export interface LegacyAgentSessionDeps { + client: GeminiClient; + scheduler: Scheduler; + config: Config; + promptId: string; + streamId?: string; +} + +class LegacyAgentProtocol implements AgentProtocol { + private _events: AgentEvent[] = []; + private _subscribers = new Set<(event: AgentEvent) => void>(); + private _translationState: TranslationState; + private _agentEndEmitted = false; + private _activeStreamId?: string; + private _abortController = new AbortController(); + private _nextStreamIdOverride?: string; + + private readonly _client: GeminiClient; + private readonly _scheduler: Scheduler; + private readonly _config: Config; + private readonly _promptId: string; + + constructor(deps: LegacyAgentSessionDeps) { + this._translationState = createTranslationState(deps.streamId); + this._nextStreamIdOverride = deps.streamId; + this._client = deps.client; + this._scheduler = deps.scheduler; + this._config = deps.config; + this._promptId = deps.promptId; + } + + get events(): readonly AgentEvent[] { + return this._events; + } + + subscribe(callback: (event: AgentEvent) => void): Unsubscribe { + this._subscribers.add(callback); + return () => { + this._subscribers.delete(callback); + }; + } + + async send(payload: AgentSend): Promise<{ streamId: string }> { + const message = 'message' in payload ? payload.message : undefined; + if (!message) { + throw new Error( + 'LegacyAgentSession.send() only supports message sends for the moment.', + ); + } + + if (this._activeStreamId) { + // TODO: Interactive may eventually allow selected in-stream sends such as + // updates or elicitation responses. Keep rejecting all concurrent sends + // here until we define those correlation semantics. + throw new Error( + 'LegacyAgentSession.send() cannot be called while a stream is active.', + ); + } + + this._beginNewStream(); + const streamId = this._translationState.streamId; + const parts = contentPartsToGeminiParts(message); + const userMessage = this._makeUserMessageEvent(message, payload._meta); + + this._emit([userMessage]); + + this._scheduleRunLoop(parts); + + return { streamId }; + } + + async abort(): Promise { + this._abortController.abort(); + } + + private _scheduleRunLoop(initialParts: Part[]): void { + // Use a macrotask so send() resolves with the streamId before agent_start + // is emitted and consumers can attach to the stream without racing startup. + setTimeout(() => { + void this._runLoopInBackground(initialParts); + }, 0); + } + + private async _runLoopInBackground(initialParts: Part[]): Promise { + this._ensureAgentStart(); + try { + await this._runLoop(initialParts); + } catch (err: unknown) { + if (this._abortController.signal.aborted || isAbortLikeError(err)) { + this._ensureAgentEnd('aborted'); + } else { + this._emitErrorAndAgentEnd(err); + } + this._clearActiveStream(); + } + } + + private async _runLoop(initialParts: Part[]): Promise { + let currentParts: Part[] = initialParts; + let turnCount = 0; + const maxTurns = this._config.getMaxSessionTurns(); + + while (true) { + turnCount++; + if (maxTurns >= 0 && turnCount > maxTurns) { + this._finishStream('max_turns', { + code: 'MAX_TURNS_EXCEEDED', + maxTurns, + turnCount: turnCount - 1, + }); + return; + } + + const toolCallRequests: ToolCallRequestInfo[] = []; + const responseStream = this._client.sendMessageStream( + currentParts, + this._abortController.signal, + this._promptId, + ); + + for await (const event of responseStream) { + if (this._abortController.signal.aborted) { + this._finishStream('aborted'); + return; + } + + if (event.type === GeminiEventType.ToolCallRequest) { + toolCallRequests.push(event.value); + } + + this._emit(translateEvent(event, this._translationState)); + + switch (event.type) { + case GeminiEventType.Error: + case GeminiEventType.InvalidStream: + case GeminiEventType.ContextWindowWillOverflow: + this._finishStream('failed'); + return; + case GeminiEventType.Finished: + if (toolCallRequests.length === 0) { + this._finishStream(mapFinishReason(event.value.reason)); + return; + } + break; + case GeminiEventType.AgentExecutionStopped: + case GeminiEventType.UserCancelled: + case GeminiEventType.MaxSessionTurns: + this._clearActiveStream(); + return; + default: + break; + } + } + + if (this._abortController.signal.aborted) { + this._finishStream('aborted'); + return; + } + + if (toolCallRequests.length === 0) { + this._finishStream('completed'); + return; + } + + const completedToolCalls = await this._scheduler.schedule( + toolCallRequests, + this._abortController.signal, + ); + + if (this._abortController.signal.aborted) { + this._finishStream('aborted'); + return; + } + + const toolResponseParts: Part[] = []; + for (const tc of completedToolCalls) { + const response = tc.response; + const request = tc.request; + const content: ContentPart[] = response.error + ? [{ type: 'text', text: response.error.message }] + : geminiPartsToContentParts(response.responseParts); + const displayContent = toolResultDisplayToContentParts( + response.resultDisplay, + ); + const data = buildToolResponseData(response); + + this._emit([ + this._makeToolResponseEvent({ + requestId: request.callId, + name: request.name, + content, + isError: response.error !== undefined, + ...(displayContent ? { displayContent } : {}), + ...(data ? { data } : {}), + }), + ]); + + if (response.responseParts) { + toolResponseParts.push(...response.responseParts); + } + } + + try { + const currentModel = + this._client.getCurrentSequenceModel() ?? this._config.getModel(); + this._client + .getChat() + .recordCompletedToolCalls(currentModel, completedToolCalls); + await recordToolCallInteractions(this._config, completedToolCalls); + } catch (error) { + debugLogger.error( + `Error recording completed tool call information: ${error}`, + ); + } + + const stopTool = completedToolCalls.find( + (tc) => + tc.response.errorType === ToolErrorType.STOP_EXECUTION && + tc.response.error !== undefined, + ); + if (stopTool) { + this._finishStream('completed'); + return; + } + + const fatalTool = completedToolCalls.find((tc) => + isFatalToolError(tc.response.errorType), + ); + if (fatalTool) { + this._finishStream('failed'); + return; + } + + currentParts = toolResponseParts; + } + } + + private _emit(events: AgentEvent[]): void { + if (events.length === 0) { + return; + } + + const subscribers = [...this._subscribers]; + for (const event of events) { + if (!this._events.some((existing) => existing.id === event.id)) { + this._events.push(event); + } + if (event.type === 'agent_end') { + this._agentEndEmitted = true; + } + for (const subscriber of subscribers) { + subscriber(event); + } + } + } + + private _clearActiveStream(): void { + this._activeStreamId = undefined; + } + + private _beginNewStream(): void { + this._translationState = createTranslationState(this._nextStreamIdOverride); + this._nextStreamIdOverride = undefined; + this._abortController = new AbortController(); + this._agentEndEmitted = false; + this._activeStreamId = this._translationState.streamId; + } + + private _ensureAgentStart(): void { + if (!this._translationState.streamStartEmitted) { + this._translationState.streamStartEmitted = true; + this._emit([this._makeAgentStartEvent()]); + } + } + + private _ensureAgentEnd(reason: StreamEndReason = 'completed'): void { + if (!this._agentEndEmitted && this._translationState.streamStartEmitted) { + this._agentEndEmitted = true; + this._emit([this._makeAgentEndEvent(reason)]); + } + } + + private _finishStream( + reason: StreamEndReason, + data?: Record, + ): void { + if (data && !this._agentEndEmitted) { + this._emit([this._makeAgentEndEvent(reason, data)]); + } else { + this._ensureAgentEnd(reason); + } + this._clearActiveStream(); + } + + /** + * Preserve error identity fields in _meta so downstream consumers can + * reconstruct fatal CLI errors. + */ + private _emitErrorAndAgentEnd(err: unknown): void { + const message = err instanceof Error ? err.message : String(err); + + this._ensureAgentStart(); + + const meta: Record = {}; + if (err instanceof Error) { + meta['errorName'] = err.constructor.name; + if ('exitCode' in err && typeof err.exitCode === 'number') { + meta['exitCode'] = err.exitCode; + } + if ('code' in err) { + meta['code'] = err.code; + } + if ('status' in err) { + meta['status'] = err.status; + } + } + + this._emit([ + this._makeErrorEvent({ + status: 'INTERNAL', + message, + fatal: true, + ...(Object.keys(meta).length > 0 ? { _meta: meta } : {}), + }), + ]); + + this._ensureAgentEnd('failed'); + } + + private _nextEventFields() { + return { + id: `${this._translationState.streamId}-${this._translationState.eventCounter++}`, + timestamp: new Date().toISOString(), + streamId: this._translationState.streamId, + }; + } + + private _makeUserMessageEvent( + content: ContentPart[], + meta?: Record, + ): AgentEvent<'message'> { + const event = { + ...this._nextEventFields(), + type: 'message', + role: 'user', + content, + ...(meta ? { _meta: meta } : {}), + } satisfies AgentEvent<'message'>; + return event; + } + + private _makeToolResponseEvent( + payload: Omit< + AgentEvent<'tool_response'>, + 'id' | 'timestamp' | 'streamId' | 'type' + >, + ): AgentEvent<'tool_response'> { + const event = { + ...this._nextEventFields(), + type: 'tool_response', + ...payload, + } satisfies AgentEvent<'tool_response'>; + return event; + } + + private _makeAgentStartEvent(): AgentEvent<'agent_start'> { + const event = { + ...this._nextEventFields(), + type: 'agent_start', + } satisfies AgentEvent<'agent_start'>; + return event; + } + + private _makeAgentEndEvent( + reason: StreamEndReason, + data?: Record, + ): AgentEvent<'agent_end'> { + const event = { + ...this._nextEventFields(), + type: 'agent_end', + reason, + ...(data ? { data } : {}), + } satisfies AgentEvent<'agent_end'>; + return event; + } + + private _makeErrorEvent( + payload: Omit< + AgentEvent<'error'>, + 'id' | 'timestamp' | 'streamId' | 'type' + >, + ): AgentEvent<'error'> { + const event = { + ...this._nextEventFields(), + type: 'error', + ...payload, + } satisfies AgentEvent<'error'>; + return event; + } +} + +export class LegacyAgentSession extends AgentSession { + constructor(deps: LegacyAgentSessionDeps) { + super(new LegacyAgentProtocol(deps)); + } +} diff --git a/packages/core/src/agent/mock.test.ts b/packages/core/src/agent/mock.test.ts index 4f102d5dbd..f5138e388a 100644 --- a/packages/core/src/agent/mock.test.ts +++ b/packages/core/src/agent/mock.test.ts @@ -235,7 +235,7 @@ describe('MockAgentProtocol', () => { expect(streamId).toBeNull(); expect(session.events).toHaveLength(1); expect(session.events[0].type).toBe('session_update'); - expect(session.events[0].streamId).toBeNull(); + expect(session.events[0].streamId).toEqual(expect.any(String)); }); it('should throw on action', async () => { diff --git a/packages/core/src/agent/mock.ts b/packages/core/src/agent/mock.ts index f29e87f878..80d8ebae2f 100644 --- a/packages/core/src/agent/mock.ts +++ b/packages/core/src/agent/mock.ts @@ -8,8 +8,8 @@ import type { AgentEvent, AgentEventCommon, AgentEventData, - AgentSend, AgentProtocol, + AgentSend, Unsubscribe, } from './types.js'; @@ -86,12 +86,7 @@ export class MockAgentProtocol implements AgentProtocol { ) { const now = new Date().toISOString(); for (const eventData of events) { - const event: AgentEvent = { - ...eventData, - id: eventData.id ?? `e-${this._nextEventId++}`, - timestamp: eventData.timestamp ?? now, - streamId: eventData.streamId ?? streamId, - } as AgentEvent; + const event = this._normalizeEvent(eventData, now, streamId); this._emit(event); } @@ -99,13 +94,13 @@ export class MockAgentProtocol implements AgentProtocol { options?.close && !events.some((eventData) => eventData.type === 'agent_end') ) { - this._emit({ - id: `e-${this._nextEventId++}`, - timestamp: now, - streamId, - type: 'agent_end', - reason: 'completed', - } as AgentEvent); + this._emit( + this._normalizeEvent( + { type: 'agent_end', reason: 'completed' }, + now, + streamId, + ), + ); } } @@ -123,15 +118,18 @@ export class MockAgentProtocol implements AgentProtocol { const now = new Date().toISOString(); const eventsToEmit: AgentEvent[] = []; + let fallbackStreamId: string | undefined; - // Helper to normalize and prepare for emission + // All emitted events stay correlated to a stream even if this send does not + // start agent activity and therefore returns `streamId: null`. const normalize = (eventData: MockAgentEvent): AgentEvent => - ({ - ...eventData, - id: eventData.id ?? `e-${this._nextEventId++}`, - timestamp: eventData.timestamp ?? now, - streamId: eventData.streamId ?? streamId, - }) as AgentEvent; + this._normalizeEvent( + eventData, + now, + eventData.streamId ?? + streamId ?? + (fallbackStreamId ??= `mock-stream-${this._nextStreamId++}`), + ); // 1. User/Update event (BEFORE agent_start) if ('message' in payload && payload.message) { @@ -223,16 +221,32 @@ export class MockAgentProtocol implements AgentProtocol { return { streamId }; } + private _normalizeEvent( + eventData: MockAgentEvent, + timestamp: string, + streamId: string, + ): AgentEvent { + // TypeScript loses the specific union member when we add common event + // fields here, so keep the narrowing local to this mock-only helper. + // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion + return { + ...eventData, + id: eventData.id ?? `e-${this._nextEventId++}`, + timestamp: eventData.timestamp ?? timestamp, + streamId: eventData.streamId ?? streamId, + } as AgentEvent; + } + async abort(): Promise { if (this._lastStreamId && this._activeStreamIds.has(this._lastStreamId)) { const streamId = this._lastStreamId; - this._emit({ - id: `e-${this._nextEventId++}`, - timestamp: new Date().toISOString(), - streamId, - type: 'agent_end', - reason: 'aborted', - } as AgentEvent); + this._emit( + this._normalizeEvent( + { type: 'agent_end', reason: 'aborted' }, + new Date().toISOString(), + streamId, + ), + ); } } } diff --git a/packages/core/src/agent/types.ts b/packages/core/src/agent/types.ts index 3b1c740ad4..4ec369d066 100644 --- a/packages/core/src/agent/types.ts +++ b/packages/core/src/agent/types.ts @@ -11,9 +11,10 @@ export type Unsubscribe = () => void; export interface AgentProtocol extends Trajectory { /** * Send data to the agent. Promise resolves when action is acknowledged. - * Returns the `streamId` of the stream the message was correlated to -- - * this may be a new stream if idle, an existing stream, or null if no - * stream was triggered. + * Returns the agent-activity `streamId` affected by the send. This may be a + * new stream if idle, an existing stream, or null if the send was + * acknowledged without starting agent activity. Emitted events should still + * remain correlated to a stream via their `streamId`. * * When a new stream is created by a send, the streamId MUST be returned * before the `agent_start` event is emitted for the stream. @@ -36,7 +37,7 @@ export interface AgentProtocol extends Trajectory { /** * AgentProtocol implements the Trajectory interface and can retrieve existing events. */ - readonly events: AgentEvent[]; + readonly events: readonly AgentEvent[]; } type RequireExactlyOne = { @@ -54,7 +55,7 @@ interface AgentSendPayloads { export type AgentSend = RequireExactlyOne & WithMeta; export interface Trajectory { - readonly events: AgentEvent[]; + readonly events: readonly AgentEvent[]; } export interface AgentEventCommon { @@ -62,8 +63,8 @@ export interface AgentEventCommon { id: string; /** Identifies the subagent thread, omitted for "main thread" events. */ threadId?: string; - /** Identifies a particular stream of a particular thread. */ - streamId?: string | null; + /** Identifies the stream this event belongs to. */ + streamId: string; /** ISO Timestamp for the time at which the event occurred. */ timestamp: string; /** The concrete type of the event. */ @@ -81,9 +82,18 @@ export type AgentEventData< EventType extends keyof AgentEvents = keyof AgentEvents, > = AgentEvents[EventType] & { type: EventType }; +/** + * Mapped type that produces a proper discriminated union when `EventType` is + * the default (all keys), enabling `switch (event.type)` narrowing. + * When a specific EventType is provided, resolves to a single variant. + */ export type AgentEvent< EventType extends keyof AgentEvents = keyof AgentEvents, -> = AgentEventCommon & AgentEventData; +> = { + [K in EventType]: AgentEventCommon & AgentEvents[K] & { type: K }; +}[EventType]; + +export type AgentEventType = keyof AgentEvents; export interface AgentEvents { /** MUST be the first event emitted in a session. */ @@ -263,7 +273,7 @@ export interface AgentStart { streamId: string; } -type StreamEndReason = +export type StreamEndReason = | 'completed' | 'failed' | 'aborted' diff --git a/packages/core/src/agents/browser/browserAgentInvocation.ts b/packages/core/src/agents/browser/browserAgentInvocation.ts index 60bd5201f0..0c96e1894c 100644 --- a/packages/core/src/agents/browser/browserAgentInvocation.ts +++ b/packages/core/src/agents/browser/browserAgentInvocation.ts @@ -30,6 +30,7 @@ import { type SubagentActivityEvent, type SubagentProgress, type SubagentActivityItem, + isToolActivityError, } from '../types.js'; import type { MessageBus } from '../../confirmation-bus/message-bus.js'; import { @@ -210,8 +211,9 @@ export class BrowserAgentInvocation extends BaseToolInvocation< const callId = activity.data['id'] ? String(activity.data['id']) : undefined; - // Find the tool call by ID - // Find the tool call by ID + const data = activity.data['data']; + const isError = isToolActivityError(data); + for (let i = recentActivity.length - 1; i >= 0; i--) { if ( recentActivity[i].type === 'tool_call' && @@ -219,7 +221,7 @@ export class BrowserAgentInvocation extends BaseToolInvocation< recentActivity[i].id === callId && recentActivity[i].status === 'running' ) { - recentActivity[i].status = 'completed'; + recentActivity[i].status = isError ? 'error' : 'completed'; updated = true; break; } diff --git a/packages/core/src/agents/browser/browserManager.test.ts b/packages/core/src/agents/browser/browserManager.test.ts index 9931d6d7ca..36652bbb64 100644 --- a/packages/core/src/agents/browser/browserManager.test.ts +++ b/packages/core/src/agents/browser/browserManager.test.ts @@ -9,6 +9,7 @@ import { BrowserManager } from './browserManager.js'; import { makeFakeConfig } from '../../test-utils/config.js'; import type { Config } from '../../config/config.js'; import { injectAutomationOverlay } from './automationOverlay.js'; +import { coreEvents } from '../../utils/events.js'; // Mock the MCP SDK vi.mock('@modelcontextprotocol/sdk/client/index.js', () => ({ @@ -77,6 +78,7 @@ describe('BrowserManager', () => { beforeEach(() => { vi.resetAllMocks(); vi.mocked(injectAutomationOverlay).mockClear(); + vi.spyOn(coreEvents, 'emitFeedback').mockImplementation(() => {}); // Re-establish consent mock after resetAllMocks vi.mocked(getBrowserConsentIfNeeded).mockResolvedValue(true); @@ -427,6 +429,11 @@ describe('BrowserManager', () => { ?.args as string[]; expect(args).toContain('--autoConnect'); expect(args).not.toContain('--isolated'); + + expect(coreEvents.emitFeedback).toHaveBeenCalledWith( + 'info', + expect.stringContaining('saved logins will be visible'), + ); }); it('should throw actionable error when existing mode connection fails', async () => { diff --git a/packages/core/src/agents/browser/browserManager.ts b/packages/core/src/agents/browser/browserManager.ts index f1d149f838..c5fc6c5053 100644 --- a/packages/core/src/agents/browser/browserManager.ts +++ b/packages/core/src/agents/browser/browserManager.ts @@ -21,6 +21,7 @@ import { Client } from '@modelcontextprotocol/sdk/client/index.js'; import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js'; import type { Tool as McpTool } from '@modelcontextprotocol/sdk/types.js'; import { debugLogger } from '../../utils/debugLogger.js'; +import { coreEvents } from '../../utils/events.js'; import type { Config } from '../../config/config.js'; import { Storage } from '../../config/storage.js'; import { getBrowserConsentIfNeeded } from '../../utils/browserConsent.js'; @@ -346,6 +347,10 @@ export class BrowserManager { mcpArgs.push('--isolated'); } else if (sessionMode === 'existing') { mcpArgs.push('--autoConnect'); + const message = + '🔒 Browsing with your signed-in Chrome profile — cookies and saved logins will be visible to the agent.'; + coreEvents.emitFeedback('info', message); + coreEvents.emitConsoleLog('info', message); } // Add optional settings from config diff --git a/packages/core/src/agents/cli-help-agent.ts b/packages/core/src/agents/cli-help-agent.ts index ad8d2bebde..bd96878190 100644 --- a/packages/core/src/agents/cli-help-agent.ts +++ b/packages/core/src/agents/cli-help-agent.ts @@ -30,7 +30,7 @@ export const CliHelpAgent = ( kind: 'local', displayName: 'CLI Help Agent', description: - 'Specialized in answering questions about how users use you, (Gemini CLI): features, documentation, and current runtime configuration.', + 'Specialized agent for answering questions about the Gemini CLI application. Invoke this agent for questions regarding CLI features, configuration schemas (e.g., policies), or instructions on how to create custom subagents. It queries internal documentation to provide accurate usage guidance.', inputConfig: { inputSchema: { type: 'object', diff --git a/packages/core/src/agents/local-executor.test.ts b/packages/core/src/agents/local-executor.test.ts index 65f3b76877..fb21e1093d 100644 --- a/packages/core/src/agents/local-executor.test.ts +++ b/packages/core/src/agents/local-executor.test.ts @@ -175,6 +175,7 @@ vi.mock('../utils/promptIdContext.js', async (importOriginal) => { return { ...actual, promptIdContext: { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...actual.promptIdContext, getStore: vi.fn(), run: vi.fn((_id, fn) => fn()), diff --git a/packages/core/src/agents/local-executor.ts b/packages/core/src/agents/local-executor.ts index a860e1e597..ed26f634a0 100644 --- a/packages/core/src/agents/local-executor.ts +++ b/packages/core/src/agents/local-executor.ts @@ -1240,6 +1240,7 @@ export class LocalAgentExecutor { name: toolName, id: call.request.callId, output: call.response.resultDisplay, + data: call.response.data, }); } else if (call.status === 'error') { this.emitActivity('ERROR', { diff --git a/packages/core/src/agents/local-invocation.test.ts b/packages/core/src/agents/local-invocation.test.ts index 2153f538c9..478ceb9f34 100644 --- a/packages/core/src/agents/local-invocation.test.ts +++ b/packages/core/src/agents/local-invocation.test.ts @@ -338,6 +338,42 @@ describe('LocalSubagentInvocation', () => { ); }); + it('should mark tool call as error when TOOL_CALL_END contains isError: true', async () => { + mockExecutorInstance.run.mockImplementation(async () => { + const onActivity = MockLocalAgentExecutor.create.mock.calls[0][2]; + + if (onActivity) { + onActivity({ + isSubagentActivityEvent: true, + agentName: 'MockAgent', + type: 'TOOL_CALL_START', + data: { name: 'ls', args: {}, callId: 'call1' }, + } as SubagentActivityEvent); + onActivity({ + isSubagentActivityEvent: true, + agentName: 'MockAgent', + type: 'TOOL_CALL_END', + data: { name: 'ls', id: 'call1', data: { isError: true } }, + } as SubagentActivityEvent); + } + return { result: 'Done', terminate_reason: AgentTerminateMode.GOAL }; + }); + + await invocation.execute(signal, updateOutput); + + expect(updateOutput).toHaveBeenCalled(); + const lastCall = updateOutput.mock.calls[ + updateOutput.mock.calls.length - 1 + ][0] as SubagentProgress; + expect(lastCall.recentActivity).toContainEqual( + expect.objectContaining({ + type: 'tool_call', + content: 'ls', + status: 'error', + }), + ); + }); + it('should reflect tool rejections in the activity stream as cancelled but not abort the agent', async () => { mockExecutorInstance.run.mockImplementation(async () => { const onActivity = MockLocalAgentExecutor.create.mock.calls[0][2]; diff --git a/packages/core/src/agents/local-invocation.ts b/packages/core/src/agents/local-invocation.ts index 08a4aa8264..0d28dcbe64 100644 --- a/packages/core/src/agents/local-invocation.ts +++ b/packages/core/src/agents/local-invocation.ts @@ -21,6 +21,7 @@ import { SubagentActivityErrorType, SUBAGENT_REJECTED_ERROR_PREFIX, SUBAGENT_CANCELLED_ERROR_MESSAGE, + isToolActivityError, } from './types.js'; import { randomUUID } from 'node:crypto'; import type { MessageBus } from '../confirmation-bus/message-bus.js'; @@ -166,14 +167,16 @@ export class LocalSubagentInvocation extends BaseToolInvocation< } case 'TOOL_CALL_END': { const name = String(activity.data['name']); - // Find the last running tool call with this name + const data = activity.data['data']; + const isError = isToolActivityError(data); + for (let i = recentActivity.length - 1; i >= 0; i--) { if ( recentActivity[i].type === 'tool_call' && recentActivity[i].content === name && recentActivity[i].status === 'running' ) { - recentActivity[i].status = 'completed'; + recentActivity[i].status = isError ? 'error' : 'completed'; updated = true; break; } diff --git a/packages/core/src/agents/registry.test.ts b/packages/core/src/agents/registry.test.ts index 92bd3b2ec8..de0d95e659 100644 --- a/packages/core/src/agents/registry.test.ts +++ b/packages/core/src/agents/registry.test.ts @@ -1206,6 +1206,32 @@ describe('AgentRegistry', () => { }); describe('inheritance and refresh', () => { + it('should skip remote agents when refreshing on model change', async () => { + const remoteAgent: AgentDefinition = { + kind: 'remote', + name: 'RemoteAgent', + description: 'A remote agent', + agentCardUrl: 'https://example.com/card', + inputConfig: { inputSchema: { type: 'object' } }, + }; + + const loadAgentSpy = vi.fn().mockResolvedValue({ name: 'RemoteAgent' }); + vi.spyOn(mockConfig, 'getA2AClientManager').mockReturnValue({ + loadAgent: loadAgentSpy, + clearCache: vi.fn(), + } as unknown as A2AClientManager); + + await registry.testRegisterAgent(remoteAgent); + + expect(loadAgentSpy).toHaveBeenCalledTimes(1); + + coreEvents.emitModelChanged('new-model'); + + await new Promise((resolve) => setTimeout(resolve, 0)); + + expect(loadAgentSpy).toHaveBeenCalledTimes(1); + }); + it('should resolve "inherit" to the current model from configuration', async () => { const config = makeMockedConfig({ model: 'current-model' }); const registry = new TestableAgentRegistry(config); diff --git a/packages/core/src/agents/registry.ts b/packages/core/src/agents/registry.ts index 51d923001a..619f1dd71c 100644 --- a/packages/core/src/agents/registry.ts +++ b/packages/core/src/agents/registry.ts @@ -57,7 +57,7 @@ export class AgentRegistry { } private onModelChanged = () => { - this.refreshAgents().catch((e) => { + this.refreshAgents('local').catch((e) => { debugLogger.error( '[AgentRegistry] Failed to refresh agents on model change:', e, @@ -270,12 +270,16 @@ export class AgentRegistry { } } - private async refreshAgents(): Promise { + private async refreshAgents( + scope: AgentDefinition['kind'] | 'all' = 'all', + ): Promise { this.loadBuiltInAgents(); await Promise.allSettled( - Array.from(this.agents.values()).map((agent) => - this.registerAgent(agent), - ), + Array.from(this.agents.values()).map(async (agent) => { + if (scope === 'all' || agent.kind === scope) { + await this.registerAgent(agent); + } + }), ); } diff --git a/packages/core/src/agents/subagent-tool.test.ts b/packages/core/src/agents/subagent-tool.test.ts index 438df59cd3..e184558f81 100644 --- a/packages/core/src/agents/subagent-tool.test.ts +++ b/packages/core/src/agents/subagent-tool.test.ts @@ -38,7 +38,6 @@ const runInDevTraceSpan = vi.hoisted(() => const metadata = { attributes: opts.attributes || {} }; return fn({ metadata, - endSpan: vi.fn(), }); }), ); @@ -205,7 +204,7 @@ describe('SubAgentInvocation', () => { // Verify metadata was set on the span const spanCallback = vi.mocked(runInDevTraceSpan).mock.calls[0][1]; const mockMetadata = { input: undefined, output: undefined }; - const mockSpan = { metadata: mockMetadata, endSpan: vi.fn() }; + const mockSpan = { metadata: mockMetadata }; await spanCallback(mockSpan as Parameters[0]); expect(mockMetadata.input).toBe(params); expect(mockMetadata.output).toBe(mockResult); diff --git a/packages/core/src/agents/subagent-tool.ts b/packages/core/src/agents/subagent-tool.ts index 0c4f19ee8b..3ef9f0aa86 100644 --- a/packages/core/src/agents/subagent-tool.ts +++ b/packages/core/src/agents/subagent-tool.ts @@ -181,6 +181,7 @@ class SubAgentInvocation extends BaseToolInvocation { return runInDevTraceSpan( { operation: GeminiCliOperation.AgentCall, + logPrompts: this.context.config.getTelemetryLogPromptsEnabled(), attributes: { [GEN_AI_AGENT_NAME]: this.definition.name, [GEN_AI_AGENT_DESCRIPTION]: this.definition.description, diff --git a/packages/core/src/agents/types.ts b/packages/core/src/agents/types.ts index 7f056c37ab..e36d8f0ccb 100644 --- a/packages/core/src/agents/types.ts +++ b/packages/core/src/agents/types.ts @@ -112,6 +112,18 @@ export function isSubagentProgress(obj: unknown): obj is SubagentProgress { ); } +/** + * Checks if the tool call data indicates an error. + */ +export function isToolActivityError(data: unknown): boolean { + return ( + data !== null && + typeof data === 'object' && + 'isError' in data && + data.isError === true + ); +} + /** * The base definition for an agent. * @template TOutput The specific Zod schema for the agent's final output object. diff --git a/packages/core/src/code_assist/admin/mcpUtils.ts b/packages/core/src/code_assist/admin/mcpUtils.ts index 768a40847e..99fde70ae9 100644 --- a/packages/core/src/code_assist/admin/mcpUtils.ts +++ b/packages/core/src/code_assist/admin/mcpUtils.ts @@ -37,6 +37,7 @@ export function applyAdminAllowlist( const adminConfig = adminAllowlist[serverId]; if (adminConfig) { const mergedConfig = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...localConfig, url: adminConfig.url, type: adminConfig.type, diff --git a/packages/core/src/code_assist/codeAssist.test.ts b/packages/core/src/code_assist/codeAssist.test.ts index 3fe1d45583..1a4ba66f27 100644 --- a/packages/core/src/code_assist/codeAssist.test.ts +++ b/packages/core/src/code_assist/codeAssist.test.ts @@ -44,6 +44,7 @@ describe('codeAssist', () => { projectId: 'test-project', userTier: UserTierId.FREE, userTierName: 'free-tier-name', + hasOnboardedPreviously: false, }; it('should create a server for LOGIN_WITH_GOOGLE', async () => { @@ -63,7 +64,7 @@ describe('codeAssist', () => { ); expect(setupUser).toHaveBeenCalledWith( mockAuthClient, - mockValidationHandler, + mockConfig, httpOptions, ); expect(MockedCodeAssistServer).toHaveBeenCalledWith( @@ -95,7 +96,7 @@ describe('codeAssist', () => { ); expect(setupUser).toHaveBeenCalledWith( mockAuthClient, - mockValidationHandler, + mockConfig, httpOptions, ); expect(MockedCodeAssistServer).toHaveBeenCalledWith( diff --git a/packages/core/src/code_assist/codeAssist.ts b/packages/core/src/code_assist/codeAssist.ts index 3c3487bcff..4fcbea7853 100644 --- a/packages/core/src/code_assist/codeAssist.ts +++ b/packages/core/src/code_assist/codeAssist.ts @@ -22,11 +22,7 @@ export async function createCodeAssistContentGenerator( authType === AuthType.COMPUTE_ADC ) { const authClient = await getOauthClient(authType, config); - const userData = await setupUser( - authClient, - config.getValidationHandler(), - httpOptions, - ); + const userData = await setupUser(authClient, config, httpOptions); return new CodeAssistServer( authClient, userData.projectId, diff --git a/packages/core/src/code_assist/setup.test.ts b/packages/core/src/code_assist/setup.test.ts index f8e4bf5490..475ac7aa6e 100644 --- a/packages/core/src/code_assist/setup.test.ts +++ b/packages/core/src/code_assist/setup.test.ts @@ -14,6 +14,7 @@ import { ValidationRequiredError } from '../utils/googleQuotaErrors.js'; import { CodeAssistServer } from '../code_assist/server.js'; import type { OAuth2Client } from 'google-auth-library'; import { UserTierId, type GeminiUserTier } from './types.js'; +import type { Config } from '../config/config.js'; vi.mock('../code_assist/server.js'); @@ -35,6 +36,8 @@ describe('setupUser', () => { let mockLoad: ReturnType; let mockOnboardUser: ReturnType; let mockGetOperation: ReturnType; + let mockConfig: Config; + let mockValidationHandler: ReturnType; beforeEach(() => { vi.resetAllMocks(); @@ -60,6 +63,18 @@ describe('setupUser', () => { getOperation: mockGetOperation, }) as unknown as CodeAssistServer, ); + + mockValidationHandler = vi.fn(); + mockConfig = { + getValidationHandler: () => mockValidationHandler, + getUsageStatisticsEnabled: () => true, + getSessionId: () => 'test-session-id', + getContentGeneratorConfig: () => ({ + authType: 'google-login', + }), + isInteractive: () => false, + getExperiments: () => undefined, + } as unknown as Config; }); afterEach(() => { @@ -76,9 +91,9 @@ describe('setupUser', () => { const client = {} as OAuth2Client; // First call - await setupUser(client); + await setupUser(client, mockConfig); // Second call - await setupUser(client); + await setupUser(client, mockConfig); expect(mockLoad).toHaveBeenCalledTimes(1); }); @@ -91,10 +106,10 @@ describe('setupUser', () => { const client = {} as OAuth2Client; vi.stubEnv('GOOGLE_CLOUD_PROJECT', 'p1'); - await setupUser(client); + await setupUser(client, mockConfig); vi.stubEnv('GOOGLE_CLOUD_PROJECT', 'p2'); - await setupUser(client); + await setupUser(client, mockConfig); expect(mockLoad).toHaveBeenCalledTimes(2); }); @@ -106,11 +121,11 @@ describe('setupUser', () => { }); const client = {} as OAuth2Client; - await setupUser(client); + await setupUser(client, mockConfig); vi.advanceTimersByTime(31000); // 31s > 30s expiration - await setupUser(client); + await setupUser(client, mockConfig); expect(mockLoad).toHaveBeenCalledTimes(2); }); @@ -123,8 +138,10 @@ describe('setupUser', () => { }); const client = {} as OAuth2Client; - await expect(setupUser(client)).rejects.toThrow('Network error'); - await setupUser(client); + await expect(setupUser(client, mockConfig)).rejects.toThrow( + 'Network error', + ); + await setupUser(client, mockConfig); expect(mockLoad).toHaveBeenCalledTimes(2); }); @@ -136,7 +153,7 @@ describe('setupUser', () => { mockLoad.mockResolvedValue({ currentTier: mockPaidTier, }); - await setupUser({} as OAuth2Client); + await setupUser({} as OAuth2Client, mockConfig); expect(CodeAssistServer).toHaveBeenCalledWith( {}, 'test-project', @@ -157,7 +174,7 @@ describe('setupUser', () => { 'User-Agent': 'GeminiCLI/1.0.0/gemini-2.0-flash (darwin; arm64)', }, }; - await setupUser({} as OAuth2Client, undefined, httpOptions); + await setupUser({} as OAuth2Client, mockConfig, httpOptions); expect(CodeAssistServer).toHaveBeenCalledWith( {}, 'test-project', @@ -174,7 +191,7 @@ describe('setupUser', () => { cloudaicompanionProject: 'server-project', currentTier: mockPaidTier, }); - const result = await setupUser({} as OAuth2Client); + const result = await setupUser({} as OAuth2Client, mockConfig); expect(result.projectId).toBe('server-project'); }); @@ -185,7 +202,7 @@ describe('setupUser', () => { throw new ProjectIdRequiredError(); }); - await expect(setupUser({} as OAuth2Client)).rejects.toThrow( + await expect(setupUser({} as OAuth2Client, mockConfig)).rejects.toThrow( ProjectIdRequiredError, ); }); @@ -197,7 +214,7 @@ describe('setupUser', () => { mockLoad.mockResolvedValue({ allowedTiers: [mockPaidTier], }); - const userData = await setupUser({} as OAuth2Client); + const userData = await setupUser({} as OAuth2Client, mockConfig); expect(mockOnboardUser).toHaveBeenCalledWith( expect.objectContaining({ tierId: UserTierId.STANDARD, @@ -208,6 +225,7 @@ describe('setupUser', () => { projectId: 'server-project', userTier: UserTierId.STANDARD, userTierName: 'paid', + hasOnboardedPreviously: false, }); }); @@ -216,7 +234,7 @@ describe('setupUser', () => { mockLoad.mockResolvedValue({ allowedTiers: [mockFreeTier], }); - const userData = await setupUser({} as OAuth2Client); + const userData = await setupUser({} as OAuth2Client, mockConfig); expect(mockOnboardUser).toHaveBeenCalledWith( expect.objectContaining({ tierId: UserTierId.FREE, @@ -227,6 +245,7 @@ describe('setupUser', () => { projectId: 'server-project', userTier: UserTierId.FREE, userTierName: 'free', + hasOnboardedPreviously: false, }); }); @@ -241,11 +260,12 @@ describe('setupUser', () => { cloudaicompanionProject: undefined, }, }); - const userData = await setupUser({} as OAuth2Client); + const userData = await setupUser({} as OAuth2Client, mockConfig); expect(userData).toEqual({ projectId: 'test-project', userTier: UserTierId.STANDARD, userTierName: 'paid', + hasOnboardedPreviously: false, }); }); @@ -276,7 +296,7 @@ describe('setupUser', () => { }, }); - const promise = setupUser({} as OAuth2Client); + const promise = setupUser({} as OAuth2Client, mockConfig); await vi.advanceTimersByTimeAsync(5000); await vi.advanceTimersByTimeAsync(5000); @@ -308,10 +328,10 @@ describe('setupUser', () => { cloudaicompanionProject: 'p1', }); - const mockHandler = vi.fn().mockResolvedValue('verify'); - const result = await setupUser({} as OAuth2Client, mockHandler); + mockValidationHandler.mockResolvedValue('verify'); + const result = await setupUser({} as OAuth2Client, mockConfig); - expect(mockHandler).toHaveBeenCalledWith( + expect(mockValidationHandler).toHaveBeenCalledWith( 'https://verify', 'Verify please', ); @@ -333,9 +353,9 @@ describe('setupUser', () => { ], }); - const mockHandler = vi.fn().mockResolvedValue('cancel'); + mockValidationHandler.mockResolvedValue('cancel'); - await expect(setupUser({} as OAuth2Client, mockHandler)).rejects.toThrow( + await expect(setupUser({} as OAuth2Client, mockConfig)).rejects.toThrow( ValidationCancelledError, ); }); @@ -343,7 +363,7 @@ describe('setupUser', () => { it('should throw error if LoadCodeAssist returns empty response', async () => { mockLoad.mockResolvedValue(null); - await expect(setupUser({} as OAuth2Client)).rejects.toThrow( + await expect(setupUser({} as OAuth2Client, mockConfig)).rejects.toThrow( 'LoadCodeAssist returned empty response', ); }); diff --git a/packages/core/src/code_assist/setup.ts b/packages/core/src/code_assist/setup.ts index 536eb3be44..59e8749912 100644 --- a/packages/core/src/code_assist/setup.ts +++ b/packages/core/src/code_assist/setup.ts @@ -15,11 +15,17 @@ import { } from './types.js'; import { CodeAssistServer, type HttpOptions } from './server.js'; import type { AuthClient } from 'google-auth-library'; -import type { ValidationHandler } from '../fallback/types.js'; import { ChangeAuthRequestedError } from '../utils/errors.js'; import { ValidationRequiredError } from '../utils/googleQuotaErrors.js'; import { debugLogger } from '../utils/debugLogger.js'; import { createCache, type CacheService } from '../utils/cache.js'; +import type { Config } from '../config/config.js'; +import { + logOnboardingStart, + logOnboardingSuccess, + OnboardingStartEvent, + OnboardingSuccessEvent, +} from '../telemetry/index.js'; export class ProjectIdRequiredError extends Error { constructor() { @@ -54,6 +60,7 @@ export interface UserData { userTier: UserTierId; userTierName?: string; paidTier?: GeminiUserTier; + hasOnboardedPreviously?: boolean; } // Cache to store the results of setupUser to avoid redundant network calls. @@ -94,7 +101,8 @@ export function resetUserDataCacheForTesting() { * retry, auth change, or cancellation. * * @param client - The authenticated client to use for API calls - * @param validationHandler - Optional handler for account validation flow + * @param config - The CLI configuration + * @param httpOptions - Optional HTTP options * @returns The user's project ID, tier ID, and tier name * @throws {ValidationRequiredError} If account validation is required * @throws {ProjectIdRequiredError} If no project ID is available and required @@ -103,7 +111,7 @@ export function resetUserDataCacheForTesting() { */ export async function setupUser( client: AuthClient, - validationHandler?: ValidationHandler, + config: Config, httpOptions: HttpOptions = {}, ): Promise { const projectId = @@ -119,7 +127,7 @@ export async function setupUser( ); return projectCache.getOrCreate(projectId, () => - _doSetupUser(client, projectId, validationHandler, httpOptions), + _doSetupUser(client, projectId, config, httpOptions), ); } @@ -129,7 +137,7 @@ export async function setupUser( async function _doSetupUser( client: AuthClient, projectId: string | undefined, - validationHandler?: ValidationHandler, + config: Config, httpOptions: HttpOptions = {}, ): Promise { const caServer = new CodeAssistServer( @@ -146,6 +154,8 @@ async function _doSetupUser( pluginType: 'GEMINI', }; + const validationHandler = config.getValidationHandler(); + let loadRes: LoadCodeAssistResponse; while (true) { loadRes = await caServer.loadCodeAssist({ @@ -194,6 +204,8 @@ async function _doSetupUser( UserTierId.STANDARD, userTierName: loadRes.paidTier?.name ?? loadRes.currentTier.name, paidTier: loadRes.paidTier ?? undefined, + hasOnboardedPreviously: + loadRes.currentTier.hasOnboardedPreviously ?? true, }; } @@ -206,6 +218,8 @@ async function _doSetupUser( loadRes.paidTier?.id ?? loadRes.currentTier.id ?? UserTierId.STANDARD, userTierName: loadRes.paidTier?.name ?? loadRes.currentTier.name, paidTier: loadRes.paidTier ?? undefined, + hasOnboardedPreviously: + loadRes.currentTier.hasOnboardedPreviously ?? true, }; } @@ -236,6 +250,8 @@ async function _doSetupUser( }; } + logOnboardingStart(config, new OnboardingStartEvent()); + let lroRes = await caServer.onboardUser(onboardReq); if (!lroRes.done && lroRes.name) { const operationName = lroRes.name; @@ -245,12 +261,16 @@ async function _doSetupUser( } } + const userTier = tier.id ?? UserTierId.STANDARD; + logOnboardingSuccess(config, new OnboardingSuccessEvent(userTier)); + if (!lroRes.response?.cloudaicompanionProject?.id) { if (projectId) { return { projectId, userTier: tier.id ?? UserTierId.STANDARD, userTierName: tier.name, + hasOnboardedPreviously: tier.hasOnboardedPreviously ?? false, }; } @@ -261,6 +281,7 @@ async function _doSetupUser( projectId: lroRes.response.cloudaicompanionProject.id, userTier: tier.id ?? UserTierId.STANDARD, userTierName: tier.name, + hasOnboardedPreviously: tier.hasOnboardedPreviously ?? false, }; } diff --git a/packages/core/src/code_assist/telemetry.test.ts b/packages/core/src/code_assist/telemetry.test.ts index 66f1e631eb..f1404ecfb0 100644 --- a/packages/core/src/code_assist/telemetry.test.ts +++ b/packages/core/src/code_assist/telemetry.test.ts @@ -24,14 +24,16 @@ import { } from '@google/genai'; import * as codeAssist from './codeAssist.js'; import type { CodeAssistServer } from './server.js'; -import type { CompletedToolCall } from '../core/coreToolScheduler.js'; +import type { + CompletedToolCall, + ToolCallResponseInfo, +} from '../scheduler/types.js'; import { ToolConfirmationOutcome, type AnyDeclarativeTool, type AnyToolInvocation, } from '../tools/tools.js'; import type { Config } from '../config/config.js'; -import type { ToolCallResponseInfo } from '../scheduler/types.js'; function createMockResponse( candidates: GenerateContentResponse['candidates'] = [], diff --git a/packages/core/src/code_assist/telemetry.ts b/packages/core/src/code_assist/telemetry.ts index 86304a6e68..7135a38919 100644 --- a/packages/core/src/code_assist/telemetry.ts +++ b/packages/core/src/code_assist/telemetry.ts @@ -14,7 +14,7 @@ import { type ConversationOffered, type StreamingLatency, } from './types.js'; -import type { CompletedToolCall } from '../core/coreToolScheduler.js'; +import type { CompletedToolCall } from '../scheduler/types.js'; import type { Config } from '../config/config.js'; import { debugLogger } from '../utils/debugLogger.js'; import { getCodeAssistServer } from './codeAssist.js'; diff --git a/packages/core/src/config/config-agents-reload.test.ts b/packages/core/src/config/config-agents-reload.test.ts new file mode 100644 index 0000000000..4fe39f7de8 --- /dev/null +++ b/packages/core/src/config/config-agents-reload.test.ts @@ -0,0 +1,246 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'; +import { Config, type ConfigParameters } from './config.js'; +import { createTmpDir, cleanupTmpDir } from '@google/gemini-cli-test-utils'; +import * as path from 'node:path'; +import * as fs from 'node:fs/promises'; +import { SubagentTool } from '../agents/subagent-tool.js'; + +// Mock minimum dependencies that have side effects or external calls +vi.mock('../core/client.js', () => ({ + GeminiClient: vi.fn().mockImplementation(() => ({ + initialize: vi.fn().mockResolvedValue(undefined), + isInitialized: vi.fn().mockReturnValue(true), + setTools: vi.fn().mockResolvedValue(undefined), + updateSystemInstruction: vi.fn(), + })), +})); + +vi.mock('../core/contentGenerator.js'); +vi.mock('../telemetry/index.js'); +vi.mock('../core/tokenLimits.js'); +vi.mock('../services/fileDiscoveryService.js'); +vi.mock('../services/gitService.js'); +vi.mock('../services/trackerService.js'); + +describe('Config Agents Reload Integration', () => { + let tmpDir: string; + + beforeEach(async () => { + // Create a temporary directory for the test + tmpDir = await createTmpDir({}); + + // Create the .gemini/agents directory structure + await fs.mkdir(path.join(tmpDir, '.gemini', 'agents'), { recursive: true }); + }); + + afterEach(async () => { + await cleanupTmpDir(tmpDir); + vi.clearAllMocks(); + }); + + it('should unregister subagents as tools when they are disabled after being enabled', async () => { + const agentName = 'test-agent'; + const agentPath = path.join(tmpDir, '.gemini', 'agents', `${agentName}.md`); + + // Create agent definition file + const agentContent = `--- +name: ${agentName} +description: Test Agent Description +tools: [] +--- +Test System Prompt`; + + await fs.writeFile(agentPath, agentContent); + + // Initialize Config with agent enabled to start + const baseParams: ConfigParameters = { + sessionId: 'test-session', + targetDir: tmpDir, + model: 'test-model', + cwd: tmpDir, + debugMode: false, + enableAgents: true, + agents: { + overrides: { + [agentName]: { enabled: true }, + }, + }, + }; + + const config = new Config(baseParams); + vi.spyOn(config, 'isTrustedFolder').mockReturnValue(true); + vi.spyOn( + config.getAcknowledgedAgentsService(), + 'isAcknowledged', + ).mockResolvedValue(true); + await config.initialize(); + + const toolRegistry = config.getToolRegistry(); + + // Verify the tool was registered initially + // Note: Subagent tools use the agent name as the tool name. + const initialTools = toolRegistry.getAllToolNames(); + expect(initialTools).toContain(agentName); + const toolInstance = toolRegistry.getTool(agentName); + expect(toolInstance).toBeInstanceOf(SubagentTool); + + // Disable agent in settings for reload simulation + vi.spyOn(config, 'getAgentsSettings').mockReturnValue({ + overrides: { + [agentName]: { enabled: false }, + }, + }); + + // Trigger the refresh action that follows reloading + // @ts-expect-error accessing private method for testing + await config.onAgentsRefreshed(); + + // 4. Verify the tool is UNREGISTERED + const finalTools = toolRegistry.getAllToolNames(); + expect(finalTools).not.toContain(agentName); + expect(toolRegistry.getTool(agentName)).toBeUndefined(); + }); + + it('should not register subagents as tools when agents are disabled from the start', async () => { + const agentName = 'test-agent-disabled'; + const agentPath = path.join(tmpDir, '.gemini', 'agents', `${agentName}.md`); + + const agentContent = `--- +name: ${agentName} +description: Test Agent Description +tools: [] +--- +Test System Prompt`; + + await fs.writeFile(agentPath, agentContent); + + const params: ConfigParameters = { + sessionId: 'test-session', + targetDir: tmpDir, + model: 'test-model', + cwd: tmpDir, + debugMode: false, + enableAgents: true, + agents: { + overrides: { + [agentName]: { enabled: false }, + }, + }, + }; + + const config = new Config(params); + vi.spyOn(config, 'isTrustedFolder').mockReturnValue(true); + vi.spyOn( + config.getAcknowledgedAgentsService(), + 'isAcknowledged', + ).mockResolvedValue(true); + await config.initialize(); + + const toolRegistry = config.getToolRegistry(); + + const tools = toolRegistry.getAllToolNames(); + expect(tools).not.toContain(agentName); + expect(toolRegistry.getTool(agentName)).toBeUndefined(); + }); + + it('should register subagents as tools even when they are not in allowedTools', async () => { + const agentName = 'test-agent-allowed'; + const agentPath = path.join(tmpDir, '.gemini', 'agents', `${agentName}.md`); + + const agentContent = `--- +name: ${agentName} +description: Test Agent Description +tools: [] +--- +Test System Prompt`; + + await fs.writeFile(agentPath, agentContent); + + const params: ConfigParameters = { + sessionId: 'test-session', + targetDir: tmpDir, + model: 'test-model', + cwd: tmpDir, + debugMode: false, + enableAgents: true, + allowedTools: ['ls'], // test-agent-allowed is NOT here + agents: { + overrides: { + [agentName]: { enabled: true }, + }, + }, + }; + + const config = new Config(params); + vi.spyOn(config, 'isTrustedFolder').mockReturnValue(true); + vi.spyOn( + config.getAcknowledgedAgentsService(), + 'isAcknowledged', + ).mockResolvedValue(true); + await config.initialize(); + + const toolRegistry = config.getToolRegistry(); + + const tools = toolRegistry.getAllToolNames(); + expect(tools).toContain(agentName); + }); + + it('should register subagents as tools when they are enabled after being disabled', async () => { + const agentName = 'test-agent-enable'; + const agentPath = path.join(tmpDir, '.gemini', 'agents', `${agentName}.md`); + + const agentContent = `--- +name: ${agentName} +description: Test Agent Description +tools: [] +--- +Test System Prompt`; + + await fs.writeFile(agentPath, agentContent); + + const params: ConfigParameters = { + sessionId: 'test-session', + targetDir: tmpDir, + model: 'test-model', + cwd: tmpDir, + debugMode: false, + enableAgents: true, + agents: { + overrides: { + [agentName]: { enabled: false }, + }, + }, + }; + + const config = new Config(params); + vi.spyOn(config, 'isTrustedFolder').mockReturnValue(true); + vi.spyOn( + config.getAcknowledgedAgentsService(), + 'isAcknowledged', + ).mockResolvedValue(true); + await config.initialize(); + + const toolRegistry = config.getToolRegistry(); + + expect(toolRegistry.getAllToolNames()).not.toContain(agentName); + + // Enable agent in settings for reload simulation + vi.spyOn(config, 'getAgentsSettings').mockReturnValue({ + overrides: { + [agentName]: { enabled: true }, + }, + }); + + // Trigger refresh + // @ts-expect-error accessing private method for testing + await config.onAgentsRefreshed(); + + expect(toolRegistry.getAllToolNames()).toContain(agentName); + }); +}); diff --git a/packages/core/src/config/config.test.ts b/packages/core/src/config/config.test.ts index e1db5c6e8e..f8247f8377 100644 --- a/packages/core/src/config/config.test.ts +++ b/packages/core/src/config/config.test.ts @@ -185,6 +185,7 @@ vi.mock('../agents/registry.js', () => { const AgentRegistryMock = vi.fn(); AgentRegistryMock.prototype.initialize = vi.fn(); AgentRegistryMock.prototype.getAllDefinitions = vi.fn(() => []); + AgentRegistryMock.prototype.getAllDiscoveredAgentNames = vi.fn(() => []); AgentRegistryMock.prototype.getDefinition = vi.fn(); return { AgentRegistry: AgentRegistryMock }; }); @@ -1237,124 +1238,6 @@ describe('Server Config (config.ts)', () => { expect(wasReadFileToolRegistered).toBe(false); }); - it('should register subagents as tools when agents.overrides.codebase_investigator.enabled is true', async () => { - const params: ConfigParameters = { - ...baseParams, - agents: { - overrides: { - codebase_investigator: { enabled: true }, - }, - }, - }; - const config = new Config(params); - - const mockAgentDefinition = { - name: 'codebase_investigator', - description: 'Agent 1', - instructions: 'Inst 1', - }; - - const AgentRegistryMock = ( - (await vi.importMock('../agents/registry.js')) as { - AgentRegistry: Mock; - } - ).AgentRegistry; - AgentRegistryMock.prototype.getDefinition.mockReturnValue( - mockAgentDefinition, - ); - AgentRegistryMock.prototype.getAllDefinitions.mockReturnValue([ - mockAgentDefinition, - ]); - - const SubAgentToolMock = ( - (await vi.importMock('../agents/subagent-tool.js')) as { - SubagentTool: Mock; - } - ).SubagentTool; - - await config.initialize(); - - const registerToolMock = ( - (await vi.importMock('../tools/tool-registry')) as { - ToolRegistry: { prototype: { registerTool: Mock } }; - } - ).ToolRegistry.prototype.registerTool; - - expect(SubAgentToolMock).toHaveBeenCalledTimes(1); - expect(SubAgentToolMock).toHaveBeenCalledWith( - expect.anything(), // AgentRegistry - config, - expect.anything(), // MessageBus - ); - - const calls = registerToolMock.mock.calls; - const registeredWrappers = calls.filter( - (call) => call[0] instanceof SubAgentToolMock, - ); - expect(registeredWrappers).toHaveLength(1); - }); - - it('should register subagents as tools even when they are not in allowedTools', async () => { - const params: ConfigParameters = { - ...baseParams, - allowedTools: ['read_file'], // codebase_investigator is NOT here - agents: { - overrides: { - codebase_investigator: { enabled: true }, - }, - }, - }; - const config = new Config(params); - - const mockAgentDefinition = { - name: 'codebase_investigator', - description: 'Agent 1', - instructions: 'Inst 1', - }; - - const AgentRegistryMock = ( - (await vi.importMock('../agents/registry.js')) as { - AgentRegistry: Mock; - } - ).AgentRegistry; - AgentRegistryMock.prototype.getAllDefinitions.mockReturnValue([ - mockAgentDefinition, - ]); - - const SubAgentToolMock = ( - (await vi.importMock('../agents/subagent-tool.js')) as { - SubagentTool: Mock; - } - ).SubagentTool; - - await config.initialize(); - - expect(SubAgentToolMock).toHaveBeenCalled(); - }); - - it('should not register subagents as tools when agents are disabled', async () => { - const params: ConfigParameters = { - ...baseParams, - agents: { - overrides: { - codebase_investigator: { enabled: false }, - cli_help: { enabled: false }, - }, - }, - }; - const config = new Config(params); - - const SubAgentToolMock = ( - (await vi.importMock('../agents/subagent-tool.js')) as { - SubagentTool: Mock; - } - ).SubagentTool; - - await config.initialize(); - - expect(SubAgentToolMock).not.toHaveBeenCalled(); - }); - it('should register EnterPlanModeTool and ExitPlanModeTool when plan is enabled', async () => { const params: ConfigParameters = { ...baseParams, diff --git a/packages/core/src/config/config.ts b/packages/core/src/config/config.ts index 051c56228e..0740a5c16b 100644 --- a/packages/core/src/config/config.ts +++ b/packages/core/src/config/config.ts @@ -166,7 +166,7 @@ import { ConsecaSafetyChecker } from '../safety/conseca/conseca.js'; import type { AgentLoopContext } from './agent-loop-context.js'; export interface AccessibilitySettings { - /** @deprecated Use ui.loadingPhrases instead. */ + /** @deprecated Use ui.statusHints instead. */ enableLoadingPhrases?: boolean; screenReader?: boolean; } @@ -1001,7 +1001,7 @@ export class Config implements McpContext, AgentLoopContext { this.model = params.model; this.disableLoopDetection = params.disableLoopDetection ?? false; this._activeModel = params.model; - this.enableAgents = params.enableAgents ?? true; + this.enableAgents = params.enableAgents ?? false; this.agents = params.agents ?? {}; this.disableLLMCorrection = params.disableLLMCorrection ?? true; this.planEnabled = params.plan ?? true; @@ -3301,9 +3301,28 @@ export class Config implements McpContext, AgentLoopContext { */ private registerSubAgentTools(registry: ToolRegistry): void { const agentsOverrides = this.getAgentsSettings().overrides ?? {}; - const definitions = this.agentRegistry.getAllDefinitions(); + const discoveredDefinitions = + this.agentRegistry.getAllDiscoveredAgentNames(); - for (const definition of definitions) { + // First, unregister any agents that are now disabled + for (const agentName of discoveredDefinitions) { + if ( + !this.isAgentsEnabled() || + agentsOverrides[agentName]?.enabled === false + ) { + const tool = registry.getTool(agentName); + if (tool instanceof SubagentTool) { + registry.unregisterTool(agentName); + } + } + } + + const discoveredNames = this.agentRegistry.getAllDiscoveredAgentNames(); + for (const agentName of discoveredNames) { + const definition = this.agentRegistry.getDiscoveredDefinition(agentName); + if (!definition) { + continue; + } try { if ( !this.isAgentsEnabled() || diff --git a/packages/core/src/config/models.test.ts b/packages/core/src/config/models.test.ts index dbe558fc85..19b6d81b29 100644 --- a/packages/core/src/config/models.test.ts +++ b/packages/core/src/config/models.test.ts @@ -71,10 +71,12 @@ describe('Dynamic Configuration Parity', () => { for (const flags of flagCombos) { for (const hasAccess of [true, false]) { const mockLegacyConfig = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...legacyConfig, getHasAccessToPreviewModel: () => hasAccess, } as unknown as Config; const mockDynamicConfig = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...dynamicConfig, getHasAccessToPreviewModel: () => hasAccess, } as unknown as Config; @@ -110,10 +112,12 @@ describe('Dynamic Configuration Parity', () => { for (const hasAccess of [true, false]) { const mockLegacyConfig = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...legacyConfig, getHasAccessToPreviewModel: () => hasAccess, } as unknown as Config; const mockDynamicConfig = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...dynamicConfig, getHasAccessToPreviewModel: () => hasAccess, } as unknown as Config; diff --git a/packages/core/src/confirmation-bus/message-bus.ts b/packages/core/src/confirmation-bus/message-bus.ts index 5495996d25..72f1c1c15a 100644 --- a/packages/core/src/confirmation-bus/message-bus.ts +++ b/packages/core/src/confirmation-bus/message-bus.ts @@ -83,13 +83,15 @@ export class MessageBus extends EventEmitter { } if (message.type === MessageBusType.TOOL_CONFIRMATION_REQUEST) { - const { decision } = await this.policyEngine.check( + const { decision: policyDecision } = await this.policyEngine.check( message.toolCall, message.serverName, message.toolAnnotations, message.subagent, ); + const decision = message.forcedDecision ?? policyDecision; + switch (decision) { case PolicyDecision.ALLOW: // Directly emit the response instead of recursive publish diff --git a/packages/core/src/confirmation-bus/types.ts b/packages/core/src/confirmation-bus/types.ts index 91aeab8308..998c32b7f6 100644 --- a/packages/core/src/confirmation-bus/types.ts +++ b/packages/core/src/confirmation-bus/types.ts @@ -8,6 +8,7 @@ import { type FunctionCall } from '@google/genai'; import type { ToolConfirmationOutcome, ToolConfirmationPayload, + DiffStat, } from '../tools/tools.js'; import type { ToolCall } from '../scheduler/types.js'; @@ -46,6 +47,10 @@ export interface ToolConfirmationRequest { * Optional rich details for the confirmation UI (diffs, counts, etc.) */ details?: SerializableConfirmationDetails; + /** + * Optional decision to force for this tool call, bypassing the policy engine. + */ + forcedDecision?: 'allow' | 'deny' | 'ask_user'; } export interface ToolConfirmationResponse { @@ -76,22 +81,26 @@ export type SerializableConfirmationDetails = | { type: 'info'; title: string; + systemMessage?: string; prompt: string; urls?: string[]; } | { type: 'edit'; title: string; + systemMessage?: string; fileName: string; filePath: string; fileDiff: string; originalContent: string | null; newContent: string; isModifying?: boolean; + diffStat?: DiffStat; } | { type: 'exec'; title: string; + systemMessage?: string; command: string; rootCommand: string; rootCommands: string[]; @@ -100,6 +109,7 @@ export type SerializableConfirmationDetails = | { type: 'mcp'; title: string; + systemMessage?: string; serverName: string; toolName: string; toolDisplayName: string; @@ -110,11 +120,13 @@ export type SerializableConfirmationDetails = | { type: 'ask_user'; title: string; + systemMessage?: string; questions: Question[]; } | { type: 'exit_plan_mode'; title: string; + systemMessage?: string; planPath: string; }; @@ -126,6 +138,7 @@ export interface UpdatePolicy { argsPattern?: string; commandPrefix?: string | string[]; mcpName?: string; + allowRedirection?: boolean; } export interface ToolPolicyRejection { diff --git a/packages/core/src/core/__snapshots__/prompts.test.ts.snap b/packages/core/src/core/__snapshots__/prompts.test.ts.snap index cf2635562d..a39ef962e1 100644 --- a/packages/core/src/core/__snapshots__/prompts.test.ts.snap +++ b/packages/core/src/core/__snapshots__/prompts.test.ts.snap @@ -2409,6 +2409,8 @@ For example: ## Development Lifecycle Operate using a **Research -> Strategy -> Execution** lifecycle. For the Execution phase, resolve each sub-task through an iterative **Plan -> Act -> Validate** cycle. +**State Transition Override:** You are now in **Execution Mode**. All previous "Read-Only", "Plan Mode", and "ONLY FOR PLANS" constraints are **immediately lifted**. You are explicitly authorized and required to use tools to modify source code and environment files to implement the approved plan. Begin executing the steps of the plan immediately. + 1. **Research:** Systematically map the codebase and validate assumptions. Use \`grep_search\` and \`glob\` search tools extensively (in parallel if independent) to understand file structures, existing code patterns, and conventions. Use \`read_file\` to validate all assumptions. **Prioritize empirical reproduction of reported issues to confirm the failure state.** 2. **Strategy:** An approved plan is available for this task. Treat this file as your single source of truth. You MUST read this file before proceeding. If you discover new requirements or need to change the approach, confirm with the user and update this plan file to reflect the updated design decisions or discovered requirements. Once all implementation and verification steps are finished, provide a **final summary** of the work completed against the plan and offer clear **next steps** to the user (e.g., 'Open a pull request'). 3. **Execution:** For each sub-task: diff --git a/packages/core/src/core/coreToolHookTriggers.test.ts b/packages/core/src/core/coreToolHookTriggers.test.ts index 414064ff85..60c6836452 100644 --- a/packages/core/src/core/coreToolHookTriggers.test.ts +++ b/packages/core/src/core/coreToolHookTriggers.test.ts @@ -16,10 +16,8 @@ import { import type { MessageBus } from '../confirmation-bus/message-bus.js'; import type { HookSystem } from '../hooks/hookSystem.js'; import type { Config } from '../config/config.js'; -import { - type DefaultHookOutput, - BeforeToolHookOutput, -} from '../hooks/types.js'; +import type { DefaultHookOutput } from '../hooks/types.js'; +import { BeforeToolHookOutput } from '../hooks/types.js'; class MockInvocation extends BaseToolInvocation<{ key?: string }, ToolResult> { constructor(params: { key?: string }, messageBus: MessageBus) { @@ -140,18 +138,11 @@ describe('executeToolWithHooks', () => { expect(result.error?.type).toBe(ToolErrorType.EXECUTION_FAILED); expect(result.error?.message).toBe('Execution blocked'); }); - it('should handle continue: false in AfterTool', async () => { const invocation = new MockInvocation({}, messageBus); const abortSignal = new AbortController().signal; const spy = vi.spyOn(invocation, 'execute'); - vi.mocked(mockHookSystem.fireBeforeToolEvent).mockResolvedValue({ - shouldStopExecution: () => false, - getEffectiveReason: () => '', - getBlockingError: () => ({ blocked: false, reason: '' }), - } as unknown as DefaultHookOutput); - vi.mocked(mockHookSystem.fireAfterToolEvent).mockResolvedValue({ shouldStopExecution: () => true, getEffectiveReason: () => 'Stop after execution', @@ -177,12 +168,6 @@ describe('executeToolWithHooks', () => { const invocation = new MockInvocation({}, messageBus); const abortSignal = new AbortController().signal; - vi.mocked(mockHookSystem.fireBeforeToolEvent).mockResolvedValue({ - shouldStopExecution: () => false, - getEffectiveReason: () => '', - getBlockingError: () => ({ blocked: false, reason: '' }), - } as unknown as DefaultHookOutput); - vi.mocked(mockHookSystem.fireAfterToolEvent).mockResolvedValue({ shouldStopExecution: () => false, getEffectiveReason: () => '', diff --git a/packages/core/src/core/coreToolHookTriggers.ts b/packages/core/src/core/coreToolHookTriggers.ts index 6bff4cfdd5..c2748cbd0a 100644 --- a/packages/core/src/core/coreToolHookTriggers.ts +++ b/packages/core/src/core/coreToolHookTriggers.ts @@ -14,8 +14,8 @@ import type { ExecuteOptions, } from '../tools/tools.js'; import { ToolErrorType } from '../tools/tool-error.js'; -import { debugLogger } from '../utils/debugLogger.js'; import { DiscoveredMCPToolInvocation } from '../tools/mcp-tool.js'; +import { debugLogger } from '../utils/debugLogger.js'; /** * Extracts MCP context from a tool invocation if it's an MCP tool. @@ -24,7 +24,7 @@ import { DiscoveredMCPToolInvocation } from '../tools/mcp-tool.js'; * @param config Config to look up server details * @returns MCP context if this is an MCP tool, undefined otherwise */ -function extractMcpContext( +export function extractMcpContext( invocation: AnyToolInvocation, config: Config, ): McpToolContext | undefined { @@ -74,6 +74,7 @@ export async function executeToolWithHooks( options?: ExecuteOptions, config?: Config, originalRequestName?: string, + skipBeforeHook?: boolean, ): Promise { // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion const toolInput = (invocation.params || {}) as Record; @@ -82,9 +83,9 @@ export async function executeToolWithHooks( // Extract MCP context if this is an MCP tool (only if config is provided) const mcpContext = config ? extractMcpContext(invocation, config) : undefined; - const hookSystem = config?.getHookSystem(); - if (hookSystem) { + + if (hookSystem && !skipBeforeHook) { const beforeOutput = await hookSystem.fireBeforeToolEvent( toolName, toolInput, diff --git a/packages/core/src/core/coreToolScheduler.test.ts b/packages/core/src/core/coreToolScheduler.test.ts deleted file mode 100644 index 3a9d0e2e92..0000000000 --- a/packages/core/src/core/coreToolScheduler.test.ts +++ /dev/null @@ -1,2409 +0,0 @@ -/** - * @license - * Copyright 2025 Google LLC - * SPDX-License-Identifier: Apache-2.0 - */ - -import { describe, it, expect, vi, type Mock } from 'vitest'; -import type { CallableTool } from '@google/genai'; -import { CoreToolScheduler } from './coreToolScheduler.js'; -import { - type ToolCall, - type WaitingToolCall, - type ErroredToolCall, - CoreToolCallStatus, -} from '../scheduler/types.js'; -import { - type ToolCallConfirmationDetails, - type ToolConfirmationPayload, - type ToolInvocation, - type ToolResult, - type Config, - type ToolRegistry, - type MessageBus, - DEFAULT_TRUNCATE_TOOL_OUTPUT_THRESHOLD, - BaseDeclarativeTool, - BaseToolInvocation, - ToolConfirmationOutcome, - Kind, - ApprovalMode, - HookSystem, - PolicyDecision, - ToolErrorType, - DiscoveredMCPTool, - GeminiCliOperation, -} from '../index.js'; -import { createMockMessageBus } from '../test-utils/mock-message-bus.js'; -import { NoopSandboxManager } from '../services/sandboxManager.js'; -import { - MockModifiableTool, - MockTool, - MOCK_TOOL_SHOULD_CONFIRM_EXECUTE, -} from '../test-utils/mock-tool.js'; -import * as modifiableToolModule from '../tools/modifiable-tool.js'; -import { DEFAULT_GEMINI_MODEL } from '../config/models.js'; -import type { PolicyEngine } from '../policy/policy-engine.js'; -import { runInDevTraceSpan, type SpanMetadata } from '../telemetry/trace.js'; - -vi.mock('fs/promises', () => ({ - writeFile: vi.fn(), -})); - -vi.mock('../telemetry/trace.js', () => ({ - runInDevTraceSpan: vi.fn(async (opts, fn) => { - const metadata = { attributes: opts.attributes || {} }; - return fn({ - metadata, - endSpan: vi.fn(), - }); - }), -})); - -class TestApprovalTool extends BaseDeclarativeTool<{ id: string }, ToolResult> { - static readonly Name = 'testApprovalTool'; - - constructor( - private config: Config, - messageBus: MessageBus, - ) { - super( - TestApprovalTool.Name, - 'TestApprovalTool', - 'A tool for testing approval logic', - Kind.Edit, - { - properties: { id: { type: 'string' } }, - required: ['id'], - type: 'object', - }, - messageBus, - ); - } - - protected createInvocation( - params: { id: string }, - messageBus: MessageBus, - _toolName?: string, - _toolDisplayName?: string, - ): ToolInvocation<{ id: string }, ToolResult> { - return new TestApprovalInvocation(this.config, params, messageBus); - } -} - -class TestApprovalInvocation extends BaseToolInvocation< - { id: string }, - ToolResult -> { - constructor( - private config: Config, - params: { id: string }, - messageBus: MessageBus, - ) { - super(params, messageBus); - } - - getDescription(): string { - return `Test tool ${this.params.id}`; - } - - override async shouldConfirmExecute(): Promise< - ToolCallConfirmationDetails | false - > { - // Need confirmation unless approval mode is AUTO_EDIT - if (this.config.getApprovalMode() === ApprovalMode.AUTO_EDIT) { - return false; - } - - return { - type: 'edit', - title: `Confirm Test Tool ${this.params.id}`, - fileName: `test-${this.params.id}.txt`, - filePath: `/test-${this.params.id}.txt`, - fileDiff: 'Test diff content', - originalContent: '', - newContent: 'Test content', - onConfirm: async (outcome: ToolConfirmationOutcome) => { - if (outcome === ToolConfirmationOutcome.ProceedAlways) { - this.config.setApprovalMode(ApprovalMode.AUTO_EDIT); - } - }, - }; - } - - async execute(): Promise { - return { - llmContent: `Executed test tool ${this.params.id}`, - returnDisplay: `Executed test tool ${this.params.id}`, - }; - } -} - -class AbortDuringConfirmationInvocation extends BaseToolInvocation< - Record, - ToolResult -> { - constructor( - private readonly abortController: AbortController, - private readonly abortError: Error, - params: Record, - messageBus: MessageBus, - ) { - super(params, messageBus); - } - - override async shouldConfirmExecute( - _signal: AbortSignal, - ): Promise { - this.abortController.abort(); - throw this.abortError; - } - - async execute(_abortSignal: AbortSignal): Promise { - throw new Error('execute should not be called when confirmation fails'); - } - - getDescription(): string { - return 'Abort during confirmation invocation'; - } -} - -class AbortDuringConfirmationTool extends BaseDeclarativeTool< - Record, - ToolResult -> { - constructor( - private readonly abortController: AbortController, - private readonly abortError: Error, - messageBus: MessageBus, - ) { - super( - 'abortDuringConfirmationTool', - 'Abort During Confirmation Tool', - 'A tool that aborts while confirming execution.', - Kind.Other, - { - type: 'object', - properties: {}, - }, - messageBus, - ); - } - - protected createInvocation( - params: Record, - messageBus: MessageBus, - _toolName?: string, - _toolDisplayName?: string, - ): ToolInvocation, ToolResult> { - return new AbortDuringConfirmationInvocation( - this.abortController, - this.abortError, - params, - messageBus, - ); - } -} - -async function waitForStatus( - onToolCallsUpdate: Mock, - status: CoreToolCallStatus, - timeout = 5000, -): Promise { - return new Promise((resolve, reject) => { - const startTime = Date.now(); - const check = () => { - if (Date.now() - startTime > timeout) { - const seenStatuses = onToolCallsUpdate.mock.calls - .flatMap((call) => call[0]) - .map((toolCall: ToolCall) => toolCall.status); - reject( - new Error( - `Timed out waiting for status "${status}". Seen statuses: ${seenStatuses.join( - ', ', - )}`, - ), - ); - return; - } - - const foundCall = onToolCallsUpdate.mock.calls - .flatMap((call) => call[0]) - .find((toolCall: ToolCall) => toolCall.status === status); - if (foundCall) { - resolve(foundCall); - } else { - setTimeout(check, 10); // Check again in 10ms - } - }; - check(); - }); -} - -function createMockConfig(overrides: Partial = {}): Config { - const defaultToolRegistry = { - getTool: () => undefined, - getToolByName: () => undefined, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByDisplayName: () => undefined, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - getExperiments: () => {}, - } as unknown as ToolRegistry; - - const baseConfig = { - getSessionId: () => 'test-session-id', - getUsageStatisticsEnabled: () => true, - getDebugMode: () => false, - isInteractive: () => true, - getApprovalMode: () => ApprovalMode.DEFAULT, - setApprovalMode: () => {}, - getAllowedTools: () => [], - getContentGeneratorConfig: () => ({ - model: 'test-model', - authType: 'oauth-personal', - }), - getShellExecutionConfig: () => ({ - terminalWidth: 90, - terminalHeight: 30, - sanitizationConfig: { - enableEnvironmentVariableRedaction: true, - allowedEnvironmentVariables: [], - blockedEnvironmentVariables: [], - }, - sandboxManager: new NoopSandboxManager(), - }), - storage: { - getProjectTempDir: () => '/tmp', - }, - getTruncateToolOutputThreshold: () => - DEFAULT_TRUNCATE_TOOL_OUTPUT_THRESHOLD, - getToolRegistry: () => defaultToolRegistry, - getActiveModel: () => DEFAULT_GEMINI_MODEL, - getGeminiClient: () => null, - getMessageBus: () => createMockMessageBus(), - getEnableHooks: () => false, - getExperiments: () => {}, - } as unknown as Config; - - const finalConfig = { ...baseConfig, ...overrides } as Config; - - (finalConfig as unknown as { config: Config }).config = finalConfig; - - // Patch the policy engine to use the final config if not overridden - if (!overrides.getPolicyEngine) { - finalConfig.getPolicyEngine = () => - ({ - check: async ( - toolCall: { name: string; args: object }, - _serverName?: string, - ) => { - // Mock simple policy logic for tests - const mode = finalConfig.getApprovalMode(); - if (mode === ApprovalMode.YOLO) { - return { decision: PolicyDecision.ALLOW }; - } - const allowed = finalConfig.getAllowedTools(); - if ( - allowed && - (allowed.includes(toolCall.name) || - allowed.some((p) => toolCall.name.startsWith(p))) - ) { - return { decision: PolicyDecision.ALLOW }; - } - return { decision: PolicyDecision.ASK_USER }; - }, - }) as unknown as PolicyEngine; - } - - Object.defineProperty(finalConfig, 'toolRegistry', { - get: () => finalConfig.getToolRegistry?.() || defaultToolRegistry, - }); - Object.defineProperty(finalConfig, 'messageBus', { - get: () => finalConfig.getMessageBus?.(), - }); - Object.defineProperty(finalConfig, 'geminiClient', { - get: () => finalConfig.getGeminiClient?.(), - }); - - return finalConfig; -} - -describe('CoreToolScheduler', () => { - it('should cancel a tool call if the signal is aborted before confirmation', async () => { - const mockTool = new MockTool({ - name: 'mockTool', - shouldConfirmExecute: MOCK_TOOL_SHOULD_CONFIRM_EXECUTE, - }); - const declarativeTool = mockTool; - const mockToolRegistry = { - getTool: () => declarativeTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByName: () => declarativeTool, - getToolByDisplayName: () => declarativeTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - isInteractive: () => false, - }); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const request = { - callId: '1', - name: 'mockTool', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-id-1', - }; - - abortController.abort(); - await scheduler.schedule([request], abortController.signal); - - expect(onAllToolCallsComplete).toHaveBeenCalled(); - const completedCalls = onAllToolCallsComplete.mock - .calls[0][0] as ToolCall[]; - expect(completedCalls[0].status).toBe(CoreToolCallStatus.Cancelled); - - expect(runInDevTraceSpan).toHaveBeenCalledWith( - expect.objectContaining({ - operation: GeminiCliOperation.ScheduleToolCalls, - }), - expect.any(Function), - ); - - const spanArgs = vi.mocked(runInDevTraceSpan).mock.calls[0]; - const fn = spanArgs[1]; - const metadata: SpanMetadata = { name: '', attributes: {} }; - await fn({ metadata, endSpan: vi.fn() }); - expect(metadata).toMatchObject({ - input: [request], - }); - }); - - it('should cancel all tools when cancelAll is called', async () => { - const mockTool1 = new MockTool({ - name: 'mockTool1', - shouldConfirmExecute: MOCK_TOOL_SHOULD_CONFIRM_EXECUTE, - }); - const mockTool2 = new MockTool({ name: 'mockTool2' }); - const mockTool3 = new MockTool({ name: 'mockTool3' }); - - const mockToolRegistry = { - getTool: (name: string) => { - if (name === 'mockTool1') return mockTool1; - if (name === 'mockTool2') return mockTool2; - if (name === 'mockTool3') return mockTool3; - return undefined; - }, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByName: (name: string) => { - if (name === 'mockTool1') return mockTool1; - if (name === 'mockTool2') return mockTool2; - if (name === 'mockTool3') return mockTool3; - return undefined; - }, - getToolByDisplayName: () => undefined, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - getHookSystem: () => undefined, - }); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const requests = [ - { - callId: '1', - name: 'mockTool1', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-id-1', - }, - { - callId: '2', - name: 'mockTool2', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-id-1', - }, - { - callId: '3', - name: 'mockTool3', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-id-1', - }, - ]; - - // Don't await, let it run in the background - void scheduler.schedule(requests, abortController.signal); - - // Wait for the first tool to be awaiting approval - await waitForStatus(onToolCallsUpdate, CoreToolCallStatus.AwaitingApproval); - - // Cancel all operations - scheduler.cancelAll(abortController.signal); - abortController.abort(); // Also fire the signal - - await vi.waitFor(() => { - expect(onAllToolCallsComplete).toHaveBeenCalled(); - }); - - const completedCalls = onAllToolCallsComplete.mock - .calls[0][0] as ToolCall[]; - - expect(completedCalls).toHaveLength(3); - expect(completedCalls.find((c) => c.request.callId === '1')?.status).toBe( - CoreToolCallStatus.Cancelled, - ); - expect(completedCalls.find((c) => c.request.callId === '2')?.status).toBe( - CoreToolCallStatus.Cancelled, - ); - expect(completedCalls.find((c) => c.request.callId === '3')?.status).toBe( - CoreToolCallStatus.Cancelled, - ); - }); - - it('should cancel all tools in a batch when one is cancelled via confirmation', async () => { - const mockTool1 = new MockTool({ - name: 'mockTool1', - shouldConfirmExecute: MOCK_TOOL_SHOULD_CONFIRM_EXECUTE, - }); - const mockTool2 = new MockTool({ name: 'mockTool2' }); - const mockTool3 = new MockTool({ name: 'mockTool3' }); - - const mockToolRegistry = { - getTool: (name: string) => { - if (name === 'mockTool1') return mockTool1; - if (name === 'mockTool2') return mockTool2; - if (name === 'mockTool3') return mockTool3; - return undefined; - }, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByName: (name: string) => { - if (name === 'mockTool1') return mockTool1; - if (name === 'mockTool2') return mockTool2; - if (name === 'mockTool3') return mockTool3; - return undefined; - }, - getToolByDisplayName: () => undefined, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - getHookSystem: () => undefined, - }); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const requests = [ - { - callId: '1', - name: 'mockTool1', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-id-1', - }, - { - callId: '2', - name: 'mockTool2', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-id-1', - }, - { - callId: '3', - name: 'mockTool3', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-id-1', - }, - ]; - - // Don't await, let it run in the background - void scheduler.schedule(requests, abortController.signal); - - // Wait for the first tool to be awaiting approval - const awaitingCall = (await waitForStatus( - onToolCallsUpdate, - CoreToolCallStatus.AwaitingApproval, - )) as WaitingToolCall; - - // Cancel the first tool via its confirmation handler - const confirmationDetails = - awaitingCall.confirmationDetails as ToolCallConfirmationDetails; - await confirmationDetails.onConfirm(ToolConfirmationOutcome.Cancel); - abortController.abort(); // User cancelling often involves an abort signal - - await vi.waitFor(() => { - expect(onAllToolCallsComplete).toHaveBeenCalled(); - }); - - const completedCalls = onAllToolCallsComplete.mock - .calls[0][0] as ToolCall[]; - - expect(completedCalls).toHaveLength(3); - expect(completedCalls.find((c) => c.request.callId === '1')?.status).toBe( - CoreToolCallStatus.Cancelled, - ); - expect(completedCalls.find((c) => c.request.callId === '2')?.status).toBe( - CoreToolCallStatus.Cancelled, - ); - expect(completedCalls.find((c) => c.request.callId === '3')?.status).toBe( - CoreToolCallStatus.Cancelled, - ); - }); - - it('should mark tool call as cancelled when abort happens during confirmation error', async () => { - const abortController = new AbortController(); - const abortError = new Error('Abort requested during confirmation'); - const declarativeTool = new AbortDuringConfirmationTool( - abortController, - abortError, - createMockMessageBus(), - ); - - const mockToolRegistry = { - getTool: () => declarativeTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByName: () => declarativeTool, - getToolByDisplayName: () => declarativeTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - isInteractive: () => true, - }); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const request = { - callId: 'abort-1', - name: 'abortDuringConfirmationTool', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-id-abort', - }; - - await scheduler.schedule([request], abortController.signal); - - expect(onAllToolCallsComplete).toHaveBeenCalled(); - const completedCalls = onAllToolCallsComplete.mock - .calls[0][0] as ToolCall[]; - expect(completedCalls[0].status).toBe(CoreToolCallStatus.Cancelled); - const statuses = onToolCallsUpdate.mock.calls.flatMap((call) => - (call[0] as ToolCall[]).map((toolCall) => toolCall.status), - ); - expect(statuses).not.toContain(CoreToolCallStatus.Error); - }); - - it('should error when tool requires confirmation in non-interactive mode', async () => { - const mockTool = new MockTool({ - name: 'mockTool', - shouldConfirmExecute: MOCK_TOOL_SHOULD_CONFIRM_EXECUTE, - }); - const declarativeTool = mockTool; - const mockToolRegistry = { - getTool: () => declarativeTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByName: () => declarativeTool, - getToolByDisplayName: () => declarativeTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - isInteractive: () => false, - }); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const request = { - callId: '1', - name: 'mockTool', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-id-1', - }; - - await scheduler.schedule([request], abortController.signal); - - expect(onAllToolCallsComplete).toHaveBeenCalled(); - const completedCalls = onAllToolCallsComplete.mock - .calls[0][0] as ToolCall[]; - expect(completedCalls[0].status).toBe(CoreToolCallStatus.Error); - - const erroredCall = completedCalls[0] as ErroredToolCall; - const errorResponse = erroredCall.response; - const errorParts = errorResponse.responseParts; - // @ts-expect-error - accessing internal structure of FunctionResponsePart - const errorMessage = errorParts[0].functionResponse.response.error; - expect(errorMessage).toContain( - 'Tool execution for "mockTool" requires user confirmation, which is not supported in non-interactive mode.', - ); - }); -}); - -describe('CoreToolScheduler with payload', () => { - it('should update args and diff and execute tool when payload is provided', async () => { - const mockTool = new MockModifiableTool(); - mockTool.executeFn = vi.fn(); - const declarativeTool = mockTool; - const mockToolRegistry = { - getTool: () => declarativeTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByName: () => declarativeTool, - getToolByDisplayName: () => declarativeTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - }); - const mockMessageBus = createMockMessageBus(); - mockConfig.getMessageBus = vi.fn().mockReturnValue(mockMessageBus); - mockConfig.getEnableHooks = vi.fn().mockReturnValue(false); - mockConfig.getHookSystem = vi - .fn() - .mockReturnValue(new HookSystem(mockConfig)); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const request = { - callId: '1', - name: 'mockModifiableTool', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-id-2', - }; - - await scheduler.schedule([request], abortController.signal); - - const awaitingCall = (await waitForStatus( - onToolCallsUpdate, - CoreToolCallStatus.AwaitingApproval, - )) as WaitingToolCall; - const confirmationDetails = awaitingCall.confirmationDetails; - - if (confirmationDetails) { - const payload: ToolConfirmationPayload = { newContent: 'final version' }; - await (confirmationDetails as ToolCallConfirmationDetails).onConfirm( - ToolConfirmationOutcome.ProceedOnce, - payload, - ); - } - - // After internal update, the tool should be awaiting approval again with the NEW content. - const updatedAwaitingCall = (await waitForStatus( - onToolCallsUpdate, - CoreToolCallStatus.AwaitingApproval, - )) as WaitingToolCall; - - // Now confirm for real to execute. - await ( - updatedAwaitingCall.confirmationDetails as ToolCallConfirmationDetails - ).onConfirm(ToolConfirmationOutcome.ProceedOnce); - - // Wait for the tool execution to complete - await vi.waitFor(() => { - expect(onAllToolCallsComplete).toHaveBeenCalled(); - }); - - const completedCalls = onAllToolCallsComplete.mock - .calls[0][0] as ToolCall[]; - expect(completedCalls[0].status).toBe(CoreToolCallStatus.Success); - expect(mockTool.executeFn).toHaveBeenCalledWith({ - newContent: 'final version', - }); - }); -}); - -class MockEditToolInvocation extends BaseToolInvocation< - Record, - ToolResult -> { - constructor(params: Record, messageBus: MessageBus) { - super(params, messageBus); - } - - getDescription(): string { - return 'A mock edit tool invocation'; - } - - override async shouldConfirmExecute( - _abortSignal: AbortSignal, - ): Promise { - return { - type: 'edit', - title: 'Confirm Edit', - fileName: 'test.txt', - filePath: 'test.txt', - fileDiff: - '--- test.txt\n+++ test.txt\n@@ -1,1 +1,1 @@\n-old content\n+new content', - originalContent: 'old content', - newContent: 'new content', - onConfirm: async () => {}, - }; - } - - async execute(_abortSignal: AbortSignal): Promise { - return { - llmContent: 'Edited successfully', - returnDisplay: 'Edited successfully', - }; - } -} - -class MockEditTool extends BaseDeclarativeTool< - Record, - ToolResult -> { - constructor(messageBus: MessageBus) { - super( - 'mockEditTool', - 'mockEditTool', - 'A mock edit tool', - Kind.Edit, - {}, - messageBus, - ); - } - - protected createInvocation( - params: Record, - messageBus: MessageBus, - _toolName?: string, - _toolDisplayName?: string, - ): ToolInvocation, ToolResult> { - return new MockEditToolInvocation(params, messageBus); - } -} - -describe('CoreToolScheduler edit cancellation', () => { - it('should preserve diff when an edit is cancelled', async () => { - const mockEditTool = new MockEditTool(createMockMessageBus()); - const mockToolRegistry = { - getTool: () => mockEditTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByName: () => mockEditTool, - getToolByDisplayName: () => mockEditTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - }); - const mockMessageBus = createMockMessageBus(); - mockConfig.getMessageBus = vi.fn().mockReturnValue(mockMessageBus); - mockConfig.getEnableHooks = vi.fn().mockReturnValue(false); - mockConfig.getHookSystem = vi - .fn() - .mockReturnValue(new HookSystem(mockConfig)); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const request = { - callId: '1', - name: 'mockEditTool', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-id-1', - }; - - await scheduler.schedule([request], abortController.signal); - - const awaitingCall = (await waitForStatus( - onToolCallsUpdate, - CoreToolCallStatus.AwaitingApproval, - )) as WaitingToolCall; - - // Cancel the edit - const confirmationDetails = awaitingCall.confirmationDetails; - if (confirmationDetails) { - await (confirmationDetails as ToolCallConfirmationDetails).onConfirm( - ToolConfirmationOutcome.Cancel, - ); - } - - expect(onAllToolCallsComplete).toHaveBeenCalled(); - const completedCalls = onAllToolCallsComplete.mock - .calls[0][0] as ToolCall[]; - - expect(completedCalls[0].status).toBe(CoreToolCallStatus.Cancelled); - - // Check that the diff is preserved - // eslint-disable-next-line @typescript-eslint/no-explicit-any - const cancelledCall = completedCalls[0] as any; - expect(cancelledCall.response.resultDisplay).toBeDefined(); - expect(cancelledCall.response.resultDisplay.fileDiff).toBe( - '--- test.txt\n+++ test.txt\n@@ -1,1 +1,1 @@\n-old content\n+new content', - ); - expect(cancelledCall.response.resultDisplay.fileName).toBe('test.txt'); - }); -}); - -describe('CoreToolScheduler YOLO mode', () => { - it('should execute tool requiring confirmation directly without waiting', async () => { - // Arrange - const executeFn = vi.fn().mockResolvedValue({ - llmContent: 'Tool executed', - returnDisplay: 'Tool executed', - }); - const mockTool = new MockTool({ - name: 'mockTool', - execute: executeFn, - shouldConfirmExecute: MOCK_TOOL_SHOULD_CONFIRM_EXECUTE, - }); - const declarativeTool = mockTool; - - const mockToolRegistry = { - getTool: () => declarativeTool, - getToolByName: () => declarativeTool, - // Other properties are not needed for this test but are included for type consistency. - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByDisplayName: () => declarativeTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - // Configure the scheduler for YOLO mode. - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - getApprovalMode: () => ApprovalMode.YOLO, - isInteractive: () => false, - }); - const mockMessageBus = createMockMessageBus(); - mockConfig.getMessageBus = vi.fn().mockReturnValue(mockMessageBus); - mockConfig.getEnableHooks = vi.fn().mockReturnValue(false); - mockConfig.getHookSystem = vi - .fn() - .mockReturnValue(new HookSystem(mockConfig)); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const request = { - callId: '1', - name: 'mockTool', - args: { param: 'value' }, - isClientInitiated: false, - prompt_id: 'prompt-id-yolo', - }; - - // Act - await scheduler.schedule([request], abortController.signal); - - // Wait for the tool execution to complete - await vi.waitFor(() => { - expect(onAllToolCallsComplete).toHaveBeenCalled(); - }); - - // Assert - // 1. The tool's execute method was called directly. - expect(executeFn).toHaveBeenCalledWith({ param: 'value' }); - - // 2. The tool call status never entered CoreToolCallStatus.AwaitingApproval. - const statusUpdates = onToolCallsUpdate.mock.calls - .map((call) => (call[0][0] as ToolCall)?.status) - .filter(Boolean); - expect(statusUpdates).not.toContain(CoreToolCallStatus.AwaitingApproval); - expect(statusUpdates).toEqual([ - CoreToolCallStatus.Validating, - CoreToolCallStatus.Scheduled, - CoreToolCallStatus.Executing, - CoreToolCallStatus.Success, - ]); - - // 3. The final callback indicates the tool call was successful. - const completedCalls = onAllToolCallsComplete.mock - .calls[0][0] as ToolCall[]; - expect(completedCalls).toHaveLength(1); - const completedCall = completedCalls[0]; - expect(completedCall.status).toBe(CoreToolCallStatus.Success); - if (completedCall.status === CoreToolCallStatus.Success) { - expect(completedCall.response.resultDisplay).toBe('Tool executed'); - } - }); -}); - -describe('CoreToolScheduler request queueing', () => { - it('should queue a request if another is running', async () => { - let resolveFirstCall: (result: ToolResult) => void; - const firstCallPromise = new Promise((resolve) => { - resolveFirstCall = resolve; - }); - - const executeFn = vi.fn().mockImplementation(() => firstCallPromise); - const mockTool = new MockTool({ name: 'mockTool', execute: executeFn }); - const declarativeTool = mockTool; - - const mockToolRegistry = { - getTool: () => declarativeTool, - getToolByName: () => declarativeTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByDisplayName: () => declarativeTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - getApprovalMode: () => ApprovalMode.YOLO, // Use YOLO to avoid confirmation prompts - isInteractive: () => false, - }); - const mockMessageBus = createMockMessageBus(); - mockConfig.getMessageBus = vi.fn().mockReturnValue(mockMessageBus); - mockConfig.getEnableHooks = vi.fn().mockReturnValue(false); - mockConfig.getHookSystem = vi - .fn() - .mockReturnValue(new HookSystem(mockConfig)); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const request1 = { - callId: '1', - name: 'mockTool', - args: { a: 1 }, - isClientInitiated: false, - prompt_id: 'prompt-1', - }; - const request2 = { - callId: '2', - name: 'mockTool', - args: { b: 2 }, - isClientInitiated: false, - prompt_id: 'prompt-2', - }; - - // Schedule the first call, which will pause execution. - // eslint-disable-next-line @typescript-eslint/no-floating-promises - scheduler.schedule([request1], abortController.signal); - - // Wait for the first call to be in the CoreToolCallStatus.Executing state. - await waitForStatus(onToolCallsUpdate, CoreToolCallStatus.Executing); - - // Schedule the second call while the first is "running". - const schedulePromise2 = scheduler.schedule( - [request2], - abortController.signal, - ); - - // Ensure the second tool call hasn't been executed yet. - expect(executeFn).toHaveBeenCalledWith({ a: 1 }); - - // Complete the first tool call. - resolveFirstCall!({ - llmContent: 'First call complete', - returnDisplay: 'First call complete', - }); - - // Wait for the second schedule promise to resolve. - await schedulePromise2; - - // Let the second call finish. - const secondCallResult = { - llmContent: 'Second call complete', - returnDisplay: 'Second call complete', - }; - // Since the mock is shared, we need to resolve the current promise. - // In a real scenario, a new promise would be created for the second call. - resolveFirstCall!(secondCallResult); - - await vi.waitFor(() => { - // Now the second tool call should have been executed. - expect(executeFn).toHaveBeenCalledTimes(2); - }); - expect(executeFn).toHaveBeenCalledWith({ b: 2 }); - - // Wait for the second completion. - await vi.waitFor(() => { - expect(onAllToolCallsComplete).toHaveBeenCalledTimes(2); - }); - - // Verify the completion callbacks were called correctly. - expect(onAllToolCallsComplete.mock.calls[0][0][0].status).toBe( - CoreToolCallStatus.Success, - ); - expect(onAllToolCallsComplete.mock.calls[1][0][0].status).toBe( - CoreToolCallStatus.Success, - ); - }); - - it('should auto-approve a tool call if it is on the allowedTools list', async () => { - // Arrange - const executeFn = vi.fn().mockResolvedValue({ - llmContent: 'Tool executed', - returnDisplay: 'Tool executed', - }); - const mockTool = new MockTool({ - name: 'mockTool', - execute: executeFn, - shouldConfirmExecute: MOCK_TOOL_SHOULD_CONFIRM_EXECUTE, - }); - const declarativeTool = mockTool; - - const toolRegistry = { - getTool: () => declarativeTool, - getToolByName: () => declarativeTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByDisplayName: () => declarativeTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - // Configure the scheduler to auto-approve the specific tool call. - const mockConfig = createMockConfig({ - getAllowedTools: () => ['mockTool'], // Auto-approve this tool - getToolRegistry: () => toolRegistry, - getShellExecutionConfig: () => ({ - terminalWidth: 80, - terminalHeight: 24, - sanitizationConfig: { - enableEnvironmentVariableRedaction: true, - allowedEnvironmentVariables: [], - blockedEnvironmentVariables: [], - }, - sandboxManager: new NoopSandboxManager(), - }), - isInteractive: () => false, - }); - const mockMessageBus = createMockMessageBus(); - mockConfig.getMessageBus = vi.fn().mockReturnValue(mockMessageBus); - mockConfig.getEnableHooks = vi.fn().mockReturnValue(false); - mockConfig.getHookSystem = vi - .fn() - .mockReturnValue(new HookSystem(mockConfig)); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const request = { - callId: '1', - name: 'mockTool', - args: { param: 'value' }, - isClientInitiated: false, - prompt_id: 'prompt-auto-approved', - }; - - // Act - await scheduler.schedule([request], abortController.signal); - - // Wait for the tool execution to complete - await vi.waitFor(() => { - expect(onAllToolCallsComplete).toHaveBeenCalled(); - }); - - // Assert - // 1. The tool's execute method was called directly. - expect(executeFn).toHaveBeenCalledWith({ param: 'value' }); - - // 2. The tool call status never entered CoreToolCallStatus.AwaitingApproval. - const statusUpdates = onToolCallsUpdate.mock.calls - .map((call) => (call[0][0] as ToolCall)?.status) - .filter(Boolean); - expect(statusUpdates).not.toContain(CoreToolCallStatus.AwaitingApproval); - expect(statusUpdates).toEqual([ - CoreToolCallStatus.Validating, - CoreToolCallStatus.Scheduled, - CoreToolCallStatus.Executing, - CoreToolCallStatus.Success, - ]); - - // 3. The final callback indicates the tool call was successful. - expect(onAllToolCallsComplete).toHaveBeenCalled(); - const completedCalls = onAllToolCallsComplete.mock - .calls[0][0] as ToolCall[]; - expect(completedCalls).toHaveLength(1); - const completedCall = completedCalls[0]; - expect(completedCall.status).toBe(CoreToolCallStatus.Success); - if (completedCall.status === CoreToolCallStatus.Success) { - expect(completedCall.response.resultDisplay).toBe('Tool executed'); - } - }); - - it('should require approval for a chained shell command even when prefix is allowlisted', async () => { - const executeFn = vi.fn().mockResolvedValue({ - llmContent: 'Shell command executed', - returnDisplay: 'Shell command executed', - }); - - const mockShellTool = new MockTool({ - name: 'run_shell_command', - shouldConfirmExecute: (params) => - Promise.resolve({ - type: 'exec', - title: 'Confirm Shell Command', - command: String(params['command'] ?? ''), - rootCommand: 'git', - rootCommands: ['git'], - onConfirm: async () => {}, - }), - execute: () => executeFn({}), - }); - - const toolRegistry = { - getTool: () => mockShellTool, - getToolByName: () => mockShellTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByDisplayName: () => mockShellTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - const mockConfig = createMockConfig({ - getAllowedTools: () => ['run_shell_command(git)'], - getShellExecutionConfig: () => ({ - terminalWidth: 80, - terminalHeight: 24, - sanitizationConfig: { - enableEnvironmentVariableRedaction: true, - allowedEnvironmentVariables: [], - blockedEnvironmentVariables: [], - }, - sandboxManager: new NoopSandboxManager(), - }), - getToolRegistry: () => toolRegistry, - getHookSystem: () => undefined, - getPolicyEngine: () => - ({ - check: async () => ({ decision: PolicyDecision.ASK_USER }), - }) as unknown as PolicyEngine, - }); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const request = { - callId: 'shell-1', - name: 'run_shell_command', - args: { command: 'git status && rm -rf /tmp/should-not-run' }, - isClientInitiated: false, - prompt_id: 'prompt-shell-auto-approved', - }; - - await scheduler.schedule([request], abortController.signal); - - const statusUpdates = onToolCallsUpdate.mock.calls - .map((call) => (call[0][0] as ToolCall)?.status) - .filter(Boolean); - - expect(statusUpdates).toContain(CoreToolCallStatus.AwaitingApproval); - expect(executeFn).not.toHaveBeenCalled(); - expect(onAllToolCallsComplete).not.toHaveBeenCalled(); - }, 20000); - - it('should handle two synchronous calls to schedule', async () => { - const executeFn = vi.fn().mockResolvedValue({ - llmContent: 'Tool executed', - returnDisplay: 'Tool executed', - }); - const mockTool = new MockTool({ name: 'mockTool', execute: executeFn }); - const declarativeTool = mockTool; - const mockToolRegistry = { - getTool: () => declarativeTool, - getToolByName: () => declarativeTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByDisplayName: () => declarativeTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - getApprovalMode: () => ApprovalMode.YOLO, - }); - const mockMessageBus = createMockMessageBus(); - mockConfig.getMessageBus = vi.fn().mockReturnValue(mockMessageBus); - mockConfig.getEnableHooks = vi.fn().mockReturnValue(false); - mockConfig.getHookSystem = vi - .fn() - .mockReturnValue(new HookSystem(mockConfig)); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const request1 = { - callId: '1', - name: 'mockTool', - args: { a: 1 }, - isClientInitiated: false, - prompt_id: 'prompt-1', - }; - const request2 = { - callId: '2', - name: 'mockTool', - args: { b: 2 }, - isClientInitiated: false, - prompt_id: 'prompt-2', - }; - - // Schedule two calls synchronously. - const schedulePromise1 = scheduler.schedule( - [request1], - abortController.signal, - ); - const schedulePromise2 = scheduler.schedule( - [request2], - abortController.signal, - ); - - // Wait for both promises to resolve. - await Promise.all([schedulePromise1, schedulePromise2]); - - // Ensure the tool was called twice with the correct arguments. - expect(executeFn).toHaveBeenCalledTimes(2); - expect(executeFn).toHaveBeenCalledWith({ a: 1 }); - expect(executeFn).toHaveBeenCalledWith({ b: 2 }); - - // Ensure completion callbacks were called twice. - expect(onAllToolCallsComplete).toHaveBeenCalledTimes(2); - }); - - it('should auto-approve remaining tool calls when first tool call is approved with ProceedAlways', async () => { - let approvalMode = ApprovalMode.DEFAULT; - const mockConfig = createMockConfig({ - getApprovalMode: () => approvalMode, - setApprovalMode: (mode: ApprovalMode) => { - approvalMode = mode; - }, - }); - const mockMessageBus = createMockMessageBus(); - mockConfig.getMessageBus = vi.fn().mockReturnValue(mockMessageBus); - mockConfig.getEnableHooks = vi.fn().mockReturnValue(false); - mockConfig.getHookSystem = vi - .fn() - .mockReturnValue(new HookSystem(mockConfig)); - - const testTool = new TestApprovalTool(mockConfig, mockMessageBus); - const toolRegistry = { - getTool: () => testTool, - getFunctionDeclarations: () => [], - getFunctionDeclarationsFiltered: () => [], - registerTool: () => {}, - discoverAllTools: async () => {}, - discoverMcpTools: async () => {}, - discoverToolsForServer: async () => {}, - removeMcpToolsByServer: () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - tools: new Map(), - context: mockConfig, - mcpClientManager: undefined, - getToolByName: () => testTool, - getToolByDisplayName: () => testTool, - getTools: () => [], - discoverTools: async () => {}, - discovery: {}, - } as unknown as ToolRegistry; - - mockConfig.getToolRegistry = () => toolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - const pendingConfirmations: Array< - (outcome: ToolConfirmationOutcome) => void - > = []; - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate: (toolCalls) => { - onToolCallsUpdate(toolCalls); - // Capture confirmation handlers for awaiting_approval tools - toolCalls.forEach((call) => { - if (call.status === CoreToolCallStatus.AwaitingApproval) { - const waitingCall = call; - const details = - waitingCall.confirmationDetails as ToolCallConfirmationDetails; - if (details?.onConfirm) { - const originalHandler = pendingConfirmations.find( - (h) => h === details.onConfirm, - ); - if (!originalHandler) { - pendingConfirmations.push(details.onConfirm); - } - } - } - }); - }, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - - // Schedule multiple tools that need confirmation - const requests = [ - { - callId: '1', - name: 'testApprovalTool', - args: { id: 'first' }, - isClientInitiated: false, - prompt_id: 'prompt-1', - }, - { - callId: '2', - name: 'testApprovalTool', - args: { id: 'second' }, - isClientInitiated: false, - prompt_id: 'prompt-2', - }, - { - callId: '3', - name: 'testApprovalTool', - args: { id: 'third' }, - isClientInitiated: false, - prompt_id: 'prompt-3', - }, - ]; - - await scheduler.schedule(requests, abortController.signal); - - // Wait for the FIRST tool to be awaiting approval - await vi.waitFor(() => { - const calls = onToolCallsUpdate.mock.calls.at(-1)?.[0] as ToolCall[]; - // With the sequential scheduler, the update includes the active call and the queue. - expect(calls?.length).toBe(3); - expect(calls?.[0].status).toBe(CoreToolCallStatus.AwaitingApproval); - expect(calls?.[0].request.callId).toBe('1'); - // Check that the other two are in the queue (still in CoreToolCallStatus.Validating state) - expect(calls?.[1].status).toBe(CoreToolCallStatus.Validating); - expect(calls?.[2].status).toBe(CoreToolCallStatus.Validating); - }); - - expect(pendingConfirmations.length).toBe(1); - - // Approve the first tool with ProceedAlways - const firstConfirmation = pendingConfirmations[0]; - firstConfirmation(ToolConfirmationOutcome.ProceedAlways); - - // Wait for all tools to be completed - await vi.waitFor(() => { - expect(onAllToolCallsComplete).toHaveBeenCalled(); - }); - - const completedCalls = onAllToolCallsComplete.mock.calls.at( - -1, - )?.[0] as ToolCall[]; - expect(completedCalls?.length).toBe(3); - expect( - completedCalls?.every( - (call) => call.status === CoreToolCallStatus.Success, - ), - ).toBe(true); - - // Verify approval mode was changed - expect(approvalMode).toBe(ApprovalMode.AUTO_EDIT); - }); -}); - -describe('CoreToolScheduler Sequential Execution', () => { - it('should execute tool calls in a batch sequentially', async () => { - // Arrange - let firstCallFinished = false; - const executeFn = vi - .fn() - .mockImplementation(async (args: { call: number }) => { - if (args.call === 1) { - // First call, wait for a bit to simulate work - await new Promise((resolve) => setTimeout(resolve, 50)); - firstCallFinished = true; - return { llmContent: 'First call done' }; - } - if (args.call === 2) { - // Second call, should only happen after the first is finished - if (!firstCallFinished) { - throw new Error( - 'Second tool call started before the first one finished!', - ); - } - return { llmContent: 'Second call done' }; - } - return { llmContent: 'default' }; - }); - - const mockTool = new MockTool({ name: 'mockTool', execute: executeFn }); - const declarativeTool = mockTool; - - const mockToolRegistry = { - getTool: () => declarativeTool, - getToolByName: () => declarativeTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByDisplayName: () => declarativeTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - getApprovalMode: () => ApprovalMode.YOLO, // Use YOLO to avoid confirmation prompts - isInteractive: () => false, - }); - const mockMessageBus = createMockMessageBus(); - mockConfig.getMessageBus = vi.fn().mockReturnValue(mockMessageBus); - mockConfig.getEnableHooks = vi.fn().mockReturnValue(false); - mockConfig.getHookSystem = vi - .fn() - .mockReturnValue(new HookSystem(mockConfig)); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const requests = [ - { - callId: '1', - name: 'mockTool', - args: { call: 1 }, - isClientInitiated: false, - prompt_id: 'prompt-1', - }, - { - callId: '2', - name: 'mockTool', - args: { call: 2 }, - isClientInitiated: false, - prompt_id: 'prompt-1', - }, - ]; - - // Act - await scheduler.schedule(requests, abortController.signal); - - // Assert - await vi.waitFor(() => { - expect(onAllToolCallsComplete).toHaveBeenCalled(); - }); - - // Check that execute was called twice - expect(executeFn).toHaveBeenCalledTimes(2); - - // Check the order of calls - const calls = executeFn.mock.calls; - expect(calls[0][0]).toEqual({ call: 1 }); - expect(calls[1][0]).toEqual({ call: 2 }); - - // The onAllToolCallsComplete should be called once with both results - const completedCalls = onAllToolCallsComplete.mock - .calls[0][0] as ToolCall[]; - expect(completedCalls).toHaveLength(2); - expect(completedCalls[0].status).toBe(CoreToolCallStatus.Success); - expect(completedCalls[1].status).toBe(CoreToolCallStatus.Success); - }); - - it('should cancel subsequent tools when the signal is aborted.', async () => { - // Arrange - const abortController = new AbortController(); - let secondCallStarted = false; - - const executeFn = vi - .fn() - .mockImplementation(async (args: { call: number }) => { - if (args.call === 1) { - return { llmContent: 'First call done' }; - } - if (args.call === 2) { - secondCallStarted = true; - // This call will be cancelled while it's "running". - await new Promise((resolve) => setTimeout(resolve, 100)); - // It should not return a value because it will be cancelled. - return { llmContent: 'Second call should not complete' }; - } - if (args.call === 3) { - return { llmContent: 'Third call done' }; - } - return { llmContent: 'default' }; - }); - - const mockTool = new MockTool({ name: 'mockTool', execute: executeFn }); - const declarativeTool = mockTool; - - const mockToolRegistry = { - getTool: () => declarativeTool, - getToolByName: () => declarativeTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByDisplayName: () => declarativeTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - getApprovalMode: () => ApprovalMode.YOLO, - isInteractive: () => false, - }); - const mockMessageBus = createMockMessageBus(); - mockConfig.getMessageBus = vi.fn().mockReturnValue(mockMessageBus); - mockConfig.getEnableHooks = vi.fn().mockReturnValue(false); - mockConfig.getHookSystem = vi - .fn() - .mockReturnValue(new HookSystem(mockConfig)); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const requests = [ - { - callId: '1', - name: 'mockTool', - args: { call: 1 }, - isClientInitiated: false, - prompt_id: 'prompt-1', - }, - { - callId: '2', - name: 'mockTool', - args: { call: 2 }, - isClientInitiated: false, - prompt_id: 'prompt-1', - }, - { - callId: '3', - name: 'mockTool', - args: { call: 3 }, - isClientInitiated: false, - prompt_id: 'prompt-1', - }, - ]; - - // Act - const schedulePromise = scheduler.schedule( - requests, - abortController.signal, - ); - - // Wait for the second call to start, then abort. - await vi.waitFor(() => { - expect(secondCallStarted).toBe(true); - }); - abortController.abort(); - - await schedulePromise; - - // Assert - await vi.waitFor(() => { - expect(onAllToolCallsComplete).toHaveBeenCalled(); - }); - - // Check that execute was called for the first two tools only - expect(executeFn).toHaveBeenCalledTimes(2); - expect(executeFn).toHaveBeenCalledWith({ call: 1 }); - expect(executeFn).toHaveBeenCalledWith({ call: 2 }); - - const completedCalls = onAllToolCallsComplete.mock - .calls[0][0] as ToolCall[]; - expect(completedCalls).toHaveLength(3); - - const call1 = completedCalls.find((c) => c.request.callId === '1'); - const call2 = completedCalls.find((c) => c.request.callId === '2'); - const call3 = completedCalls.find((c) => c.request.callId === '3'); - - expect(call1?.status).toBe(CoreToolCallStatus.Success); - expect(call2?.status).toBe(CoreToolCallStatus.Cancelled); - expect(call3?.status).toBe(CoreToolCallStatus.Cancelled); - }); - - it('should pass confirmation diff data into modifyWithEditor overrides', async () => { - const modifyWithEditorSpy = vi - .spyOn(modifiableToolModule, 'modifyWithEditor') - .mockResolvedValue({ - updatedParams: { param: 'updated' }, - updatedDiff: 'updated diff', - }); - - const mockModifiableTool = new MockModifiableTool('mockModifiableTool'); - const mockToolRegistry = { - getTool: () => mockModifiableTool, - getToolByName: () => mockModifiableTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByDisplayName: () => mockModifiableTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - }); - const mockMessageBus = createMockMessageBus(); - mockConfig.getMessageBus = vi.fn().mockReturnValue(mockMessageBus); - mockConfig.getEnableHooks = vi.fn().mockReturnValue(false); - mockConfig.getHookSystem = vi - .fn() - .mockReturnValue(new HookSystem(mockConfig)); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - - await scheduler.schedule( - [ - { - callId: '1', - name: 'mockModifiableTool', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-1', - }, - ], - abortController.signal, - ); - - const toolCall = (scheduler as unknown as { toolCalls: ToolCall[] }) - .toolCalls[0] as WaitingToolCall; - expect(toolCall.status).toBe(CoreToolCallStatus.AwaitingApproval); - - const confirmationSignal = new AbortController().signal; - await scheduler.handleConfirmationResponse( - toolCall.request.callId, - async () => {}, - ToolConfirmationOutcome.ModifyWithEditor, - confirmationSignal, - ); - - expect(modifyWithEditorSpy).toHaveBeenCalled(); - const overrides = - modifyWithEditorSpy.mock.calls[ - modifyWithEditorSpy.mock.calls.length - 1 - ][4]; - expect(overrides).toEqual({ - currentContent: 'originalContent', - proposedContent: 'newContent', - }); - - modifyWithEditorSpy.mockRestore(); - }); - - it('should handle inline modify with empty new content', async () => { - // Mock the modifiable check to return true for this test - const isModifiableSpy = vi - .spyOn(modifiableToolModule, 'isModifiableDeclarativeTool') - .mockReturnValue(true); - - const mockTool = new MockModifiableTool(); - const mockToolRegistry = { - getTool: () => mockTool, - getAllToolNames: () => [], - } as unknown as ToolRegistry; - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - isInteractive: () => true, - }); - mockConfig.getHookSystem = vi.fn().mockReturnValue(undefined); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - getPreferredEditor: () => 'vscode', - }); - - // Manually inject a waiting tool call - const callId = 'call-1'; - const toolCall: WaitingToolCall = { - status: CoreToolCallStatus.AwaitingApproval, - request: { - callId, - name: 'mockModifiableTool', - args: {}, - isClientInitiated: false, - prompt_id: 'p1', - }, - tool: mockTool, - invocation: {} as unknown as ToolInvocation< - Record, - ToolResult - >, - confirmationDetails: { - type: 'edit', - title: 'Confirm', - fileName: 'test.txt', - filePath: 'test.txt', - fileDiff: 'diff', - originalContent: 'old', - newContent: 'new', - onConfirm: async () => {}, - }, - startTime: Date.now(), - }; - - const schedulerInternals = scheduler as unknown as { - toolCalls: ToolCall[]; - toolModifier: { applyInlineModify: Mock }; - }; - schedulerInternals.toolCalls = [toolCall]; - - const applyInlineModifySpy = vi - .spyOn(schedulerInternals.toolModifier, 'applyInlineModify') - .mockResolvedValue({ - updatedParams: { content: '' }, - updatedDiff: 'diff-empty', - }); - - await scheduler.handleConfirmationResponse( - callId, - async () => {}, - ToolConfirmationOutcome.ProceedOnce, - new AbortController().signal, - { newContent: '' } as ToolConfirmationPayload, - ); - - expect(applyInlineModifySpy).toHaveBeenCalled(); - isModifiableSpy.mockRestore(); - }); - - it('should pass serverName and toolAnnotations to policy engine for DiscoveredMCPTool', async () => { - const mockMcpTool = { - tool: async () => ({ functionDeclarations: [] }), - callTool: async () => [], - }; - const serverName = 'test-server'; - const toolName = 'test-tool'; - const annotations = { readOnlyHint: true }; - const mcpTool = new DiscoveredMCPTool( - mockMcpTool as unknown as CallableTool, - serverName, - toolName, - 'description', - { type: 'object', properties: {} }, - createMockMessageBus() as unknown as MessageBus, - undefined, // trust - true, // isReadOnly - undefined, // nameOverride - undefined, // cliConfig - undefined, // extensionName - undefined, // extensionId - annotations, // toolAnnotations - ); - - const mockToolRegistry = { - getTool: () => mcpTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByName: () => mcpTool, - getToolByDisplayName: () => mcpTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const mockPolicyEngineCheck = vi.fn().mockResolvedValue({ - decision: PolicyDecision.ALLOW, - }); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - getPolicyEngine: () => - ({ - check: mockPolicyEngineCheck, - }) as unknown as PolicyEngine, - isInteractive: () => false, - }); - mockConfig.getHookSystem = vi.fn().mockReturnValue(undefined); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const request = { - callId: '1', - name: toolName, - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-id-1', - }; - - await scheduler.schedule(request, abortController.signal); - - expect(mockPolicyEngineCheck).toHaveBeenCalledWith( - expect.objectContaining({ name: toolName }), - serverName, - annotations, - ); - }); - - it('should not double-report completed tools when concurrent completions occur', async () => { - // Arrange - const executeFn = vi - .fn() - .mockResolvedValue({ llmContent: CoreToolCallStatus.Success }); - const mockTool = new MockTool({ name: 'mockTool', execute: executeFn }); - const declarativeTool = mockTool; - - const mockToolRegistry = { - getTool: () => declarativeTool, - getToolByName: () => declarativeTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByDisplayName: () => declarativeTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - let completionCallCount = 0; - const onAllToolCallsComplete = vi.fn().mockImplementation(async () => { - completionCallCount++; - // Simulate slow reporting (e.g. Gemini API call) - await new Promise((resolve) => setTimeout(resolve, 50)); - }); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - getApprovalMode: () => ApprovalMode.YOLO, - isInteractive: () => false, - }); - const mockMessageBus = createMockMessageBus(); - mockConfig.getMessageBus = vi.fn().mockReturnValue(mockMessageBus); - mockConfig.getEnableHooks = vi.fn().mockReturnValue(false); - mockConfig.getHookSystem = vi - .fn() - .mockReturnValue(new HookSystem(mockConfig)); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const request = { - callId: '1', - name: 'mockTool', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-1', - }; - - // Act - // 1. Start execution - const schedulePromise = scheduler.schedule( - [request], - abortController.signal, - ); - - // 2. Wait just enough for it to finish and enter checkAndNotifyCompletion - // (awaiting our slow mock) - await vi.waitFor(() => { - expect(completionCallCount).toBe(1); - }); - - // 3. Trigger a concurrent completion event (e.g. via cancelAll) - scheduler.cancelAll(abortController.signal); - - await schedulePromise; - - // Assert - // Even though cancelAll was called while the first completion was in progress, - // it should not have triggered a SECOND completion call because the first one - // was still 'finalizing' and will drain any new tools. - expect(onAllToolCallsComplete).toHaveBeenCalledTimes(1); - }); - - it('should complete reporting all tools even mid-callback during abort', async () => { - // Arrange - const onAllToolCallsComplete = vi.fn().mockImplementation(async () => { - // Simulate slow reporting - await new Promise((resolve) => setTimeout(resolve, 50)); - }); - - const mockTool = new MockTool({ name: 'mockTool' }); - const mockToolRegistry = { - getTool: () => mockTool, - getToolByName: () => mockTool, - getFunctionDeclarations: () => [], - tools: new Map(), - discovery: {}, - registerTool: () => {}, - getToolByDisplayName: () => mockTool, - getTools: () => [], - discoverTools: async () => {}, - getAllTools: () => [], - getToolsByServer: () => [], - } as unknown as ToolRegistry; - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - getApprovalMode: () => ApprovalMode.YOLO, - isInteractive: () => false, - }); - mockConfig.getHookSystem = vi.fn().mockReturnValue(undefined); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const signal = abortController.signal; - - // Act - // 1. Start execution of two tools - const schedulePromise = scheduler.schedule( - [ - { - callId: '1', - name: 'mockTool', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-1', - }, - { - callId: '2', - name: 'mockTool', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-1', - }, - ], - signal, - ); - - // 2. Wait for reporting to start - await vi.waitFor(() => { - expect(onAllToolCallsComplete).toHaveBeenCalled(); - }); - - // 3. Abort the signal while reporting is in progress - abortController.abort(); - - await schedulePromise; - - // Assert - // Verify that onAllToolCallsComplete was called and processed the tools, - // and that the scheduler didn't just drop them because of the abort. - expect(onAllToolCallsComplete).toHaveBeenCalled(); - - const reportedTools = onAllToolCallsComplete.mock.calls.flatMap((call) => - // eslint-disable-next-line @typescript-eslint/no-explicit-any - call[0].map((t: any) => t.request.callId), - ); - - // Both tools should have been reported exactly once with success status - expect(reportedTools).toContain('1'); - expect(reportedTools).toContain('2'); - - const allStatuses = onAllToolCallsComplete.mock.calls.flatMap((call) => - // eslint-disable-next-line @typescript-eslint/no-explicit-any - call[0].map((t: any) => t.status), - ); - expect(allStatuses).toEqual([ - CoreToolCallStatus.Success, - CoreToolCallStatus.Success, - ]); - - expect(onAllToolCallsComplete).toHaveBeenCalledTimes(1); - }); - - describe('Policy Decisions in Plan Mode', () => { - it('should return POLICY_VIOLATION error type and informative message when denied in Plan Mode', async () => { - const mockTool = new MockTool({ - name: 'dangerous_tool', - displayName: 'Dangerous Tool', - description: 'Does risky stuff', - }); - const mockToolRegistry = { - getTool: () => mockTool, - getAllToolNames: () => ['dangerous_tool'], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - getApprovalMode: () => ApprovalMode.PLAN, - getPolicyEngine: () => - ({ - check: async () => ({ decision: PolicyDecision.DENY }), - }) as unknown as PolicyEngine, - }); - mockConfig.getHookSystem = vi.fn().mockReturnValue(undefined); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - getPreferredEditor: () => 'vscode', - }); - - const request = { - callId: 'call-1', - name: 'dangerous_tool', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-1', - }; - - await scheduler.schedule(request, new AbortController().signal); - - expect(onAllToolCallsComplete).toHaveBeenCalledTimes(1); - const reportedTools = onAllToolCallsComplete.mock.calls[0][0]; - const result = reportedTools[0]; - - expect(result.status).toBe(CoreToolCallStatus.Error); - expect(result.response.errorType).toBe(ToolErrorType.POLICY_VIOLATION); - expect(result.response.error.message).toBe( - 'Tool execution denied by policy.', - ); - }); - - it('should return custom deny message when denied in Plan Mode with a specific rule message', async () => { - const mockTool = new MockTool({ - name: 'dangerous_tool', - displayName: 'Dangerous Tool', - description: 'Does risky stuff', - }); - const mockToolRegistry = { - getTool: () => mockTool, - getAllToolNames: () => ['dangerous_tool'], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const customDenyMessage = 'Custom denial message for testing'; - - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - getApprovalMode: () => ApprovalMode.PLAN, - getPolicyEngine: () => - ({ - check: async () => ({ - decision: PolicyDecision.DENY, - rule: { denyMessage: customDenyMessage }, - }), - }) as unknown as PolicyEngine, - }); - mockConfig.getHookSystem = vi.fn().mockReturnValue(undefined); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - getPreferredEditor: () => 'vscode', - }); - - const request = { - callId: 'call-1', - name: 'dangerous_tool', - args: {}, - isClientInitiated: false, - prompt_id: 'prompt-1', - }; - - await scheduler.schedule(request, new AbortController().signal); - - expect(onAllToolCallsComplete).toHaveBeenCalledTimes(1); - const reportedTools = onAllToolCallsComplete.mock.calls[0][0]; - const result = reportedTools[0]; - - expect(result.status).toBe(CoreToolCallStatus.Error); - expect(result.response.errorType).toBe(ToolErrorType.POLICY_VIOLATION); - expect(result.response.error.message).toBe( - `Tool execution denied by policy. ${customDenyMessage}`, - ); - }); - }); - - describe('ApprovalMode Preservation', () => { - it('should preserve approvalMode throughout tool lifecycle', async () => { - // Arrange - const executeFn = vi.fn().mockResolvedValue({ - llmContent: 'Tool executed', - returnDisplay: 'Tool executed', - }); - const mockTool = new MockTool({ - name: 'mockTool', - execute: executeFn, - shouldConfirmExecute: MOCK_TOOL_SHOULD_CONFIRM_EXECUTE, - }); - - const mockToolRegistry = { - getTool: () => mockTool, - getAllToolNames: () => ['mockTool'], - } as unknown as ToolRegistry; - - const onAllToolCallsComplete = vi.fn(); - const onToolCallsUpdate = vi.fn(); - - // Set approval mode to PLAN - const mockConfig = createMockConfig({ - getToolRegistry: () => mockToolRegistry, - getApprovalMode: () => ApprovalMode.PLAN, - // Ensure policy engine returns ASK_USER to trigger AwaitingApproval state - getPolicyEngine: () => - ({ - check: async () => ({ decision: PolicyDecision.ASK_USER }), - }) as unknown as PolicyEngine, - }); - mockConfig.getHookSystem = vi.fn().mockReturnValue(undefined); - - const scheduler = new CoreToolScheduler({ - context: mockConfig, - onAllToolCallsComplete, - onToolCallsUpdate, - getPreferredEditor: () => 'vscode', - }); - - const abortController = new AbortController(); - const request = { - callId: '1', - name: 'mockTool', - args: { param: 'value' }, - isClientInitiated: false, - prompt_id: 'test-prompt', - }; - - // Act - Schedule - const schedulePromise = scheduler.schedule( - request, - abortController.signal, - ); - - // Assert - Check AwaitingApproval state - const awaitingCall = (await waitForStatus( - onToolCallsUpdate, - CoreToolCallStatus.AwaitingApproval, - )) as WaitingToolCall; - - expect(awaitingCall).toBeDefined(); - expect(awaitingCall.approvalMode).toBe(ApprovalMode.PLAN); - - // Act - Confirm - - await ( - awaitingCall.confirmationDetails as ToolCallConfirmationDetails - ).onConfirm(ToolConfirmationOutcome.ProceedOnce); - - // Wait for completion - await schedulePromise; - - // Assert - Check Success state - expect(onAllToolCallsComplete).toHaveBeenCalled(); - const completedCalls = onAllToolCallsComplete.mock - .calls[0][0] as ToolCall[]; - expect(completedCalls).toHaveLength(1); - expect(completedCalls[0].status).toBe(CoreToolCallStatus.Success); - expect(completedCalls[0].approvalMode).toBe(ApprovalMode.PLAN); - }); - }); -}); diff --git a/packages/core/src/core/coreToolScheduler.ts b/packages/core/src/core/coreToolScheduler.ts deleted file mode 100644 index 1ecae4ef33..0000000000 --- a/packages/core/src/core/coreToolScheduler.ts +++ /dev/null @@ -1,1109 +0,0 @@ -/** - * @license - * Copyright 2025 Google LLC - * SPDX-License-Identifier: Apache-2.0 - */ - -import { - type ToolResultDisplay, - type AnyDeclarativeTool, - type AnyToolInvocation, - type ToolCallConfirmationDetails, - type ToolConfirmationPayload, - ToolConfirmationOutcome, -} from '../tools/tools.js'; -import type { EditorType } from '../utils/editor.js'; -import { PolicyDecision } from '../policy/types.js'; -import { logToolCall } from '../telemetry/loggers.js'; -import { ToolErrorType } from '../tools/tool-error.js'; -import { ToolCallEvent } from '../telemetry/types.js'; -import { runInDevTraceSpan } from '../telemetry/trace.js'; -import { ToolModificationHandler } from '../scheduler/tool-modifier.js'; -import { - getToolSuggestion, - isToolCallResponseInfo, -} from '../utils/tool-utils.js'; -import type { ToolConfirmationRequest } from '../confirmation-bus/types.js'; -import { MessageBusType } from '../confirmation-bus/types.js'; -import type { MessageBus } from '../confirmation-bus/message-bus.js'; -import { - CoreToolCallStatus, - type ToolCall, - type ValidatingToolCall, - type ScheduledToolCall, - type ErroredToolCall, - type SuccessfulToolCall, - type ExecutingToolCall, - type CancelledToolCall, - type WaitingToolCall, - type Status, - type CompletedToolCall, - type ConfirmHandler, - type OutputUpdateHandler, - type AllToolCallsCompleteHandler, - type ToolCallsUpdateHandler, - type ToolCallRequestInfo, - type ToolCallResponseInfo, -} from '../scheduler/types.js'; -import { ToolExecutor } from '../scheduler/tool-executor.js'; -import { DiscoveredMCPTool } from '../tools/mcp-tool.js'; -import { getPolicyDenialError } from '../scheduler/policy.js'; -import { GeminiCliOperation } from '../telemetry/constants.js'; -import type { AgentLoopContext } from '../config/agent-loop-context.js'; - -export type { - ToolCall, - ValidatingToolCall, - ScheduledToolCall, - ErroredToolCall, - SuccessfulToolCall, - ExecutingToolCall, - CancelledToolCall, - WaitingToolCall, - Status, - CompletedToolCall, - ConfirmHandler, - OutputUpdateHandler, - AllToolCallsCompleteHandler, - ToolCallsUpdateHandler, - ToolCallRequestInfo, - ToolCallResponseInfo, -}; - -const createErrorResponse = ( - request: ToolCallRequestInfo, - error: Error, - errorType: ToolErrorType | undefined, -): ToolCallResponseInfo => ({ - callId: request.callId, - error, - responseParts: [ - { - functionResponse: { - id: request.callId, - name: request.name, - response: { error: error.message }, - }, - }, - ], - resultDisplay: error.message, - errorType, - contentLength: error.message.length, -}); - -interface CoreToolSchedulerOptions { - context: AgentLoopContext; - outputUpdateHandler?: OutputUpdateHandler; - onAllToolCallsComplete?: AllToolCallsCompleteHandler; - onToolCallsUpdate?: ToolCallsUpdateHandler; - getPreferredEditor: () => EditorType | undefined; -} - -export class CoreToolScheduler { - // Static WeakMap to track which MessageBus instances already have a handler subscribed - // This prevents duplicate subscriptions when multiple CoreToolScheduler instances are created - private static subscribedMessageBuses = new WeakMap< - MessageBus, - (request: ToolConfirmationRequest) => void - >(); - - private toolCalls: ToolCall[] = []; - private outputUpdateHandler?: OutputUpdateHandler; - private onAllToolCallsComplete?: AllToolCallsCompleteHandler; - private onToolCallsUpdate?: ToolCallsUpdateHandler; - private getPreferredEditor: () => EditorType | undefined; - private context: AgentLoopContext; - private isFinalizingToolCalls = false; - private isScheduling = false; - private isCancelling = false; - private requestQueue: Array<{ - request: ToolCallRequestInfo | ToolCallRequestInfo[]; - signal: AbortSignal; - resolve: () => void; - reject: (reason?: Error) => void; - }> = []; - private toolCallQueue: ToolCall[] = []; - private completedToolCallsForBatch: CompletedToolCall[] = []; - private toolExecutor: ToolExecutor; - private toolModifier: ToolModificationHandler; - - constructor(options: CoreToolSchedulerOptions) { - this.context = options.context; - this.outputUpdateHandler = options.outputUpdateHandler; - this.onAllToolCallsComplete = options.onAllToolCallsComplete; - this.onToolCallsUpdate = options.onToolCallsUpdate; - this.getPreferredEditor = options.getPreferredEditor; - this.toolExecutor = new ToolExecutor(this.context); - this.toolModifier = new ToolModificationHandler(); - - // Subscribe to message bus for ASK_USER policy decisions - // Use a static WeakMap to ensure we only subscribe ONCE per MessageBus instance - // This prevents memory leaks when multiple CoreToolScheduler instances are created - // (e.g., on every React render, or for each non-interactive tool call) - const messageBus = this.context.messageBus; - - // Check if we've already subscribed a handler to this message bus - if (!CoreToolScheduler.subscribedMessageBuses.has(messageBus)) { - // Create a shared handler that will be used for this message bus - const sharedHandler = (request: ToolConfirmationRequest) => { - // When ASK_USER policy decision is made, respond with requiresUserConfirmation=true - // to tell tools to use their legacy confirmation flow - // eslint-disable-next-line @typescript-eslint/no-floating-promises - messageBus.publish({ - type: MessageBusType.TOOL_CONFIRMATION_RESPONSE, - correlationId: request.correlationId, - confirmed: false, - requiresUserConfirmation: true, - }); - }; - - messageBus.subscribe( - MessageBusType.TOOL_CONFIRMATION_REQUEST, - sharedHandler, - ); - - // Store the handler in the WeakMap so we don't subscribe again - CoreToolScheduler.subscribedMessageBuses.set(messageBus, sharedHandler); - } - } - - private setStatusInternal( - targetCallId: string, - status: CoreToolCallStatus.Success, - signal: AbortSignal, - response: ToolCallResponseInfo, - ): void; - private setStatusInternal( - targetCallId: string, - status: CoreToolCallStatus.AwaitingApproval, - signal: AbortSignal, - confirmationDetails: ToolCallConfirmationDetails, - ): void; - private setStatusInternal( - targetCallId: string, - status: CoreToolCallStatus.Error, - signal: AbortSignal, - response: ToolCallResponseInfo, - ): void; - private setStatusInternal( - targetCallId: string, - status: CoreToolCallStatus.Cancelled, - signal: AbortSignal, - reason: string, - ): void; - private setStatusInternal( - targetCallId: string, - status: - | CoreToolCallStatus.Executing - | CoreToolCallStatus.Scheduled - | CoreToolCallStatus.Validating, - signal: AbortSignal, - ): void; - private setStatusInternal( - targetCallId: string, - newStatus: Status, - signal: AbortSignal, - auxiliaryData?: unknown, - ): void { - this.toolCalls = this.toolCalls.map((currentCall) => { - if ( - currentCall.request.callId !== targetCallId || - currentCall.status === CoreToolCallStatus.Success || - currentCall.status === CoreToolCallStatus.Error || - currentCall.status === CoreToolCallStatus.Cancelled - ) { - return currentCall; - } - - // currentCall is a non-terminal state here and should have startTime and tool. - const existingStartTime = currentCall.startTime; - const toolInstance = currentCall.tool; - const invocation = currentCall.invocation; - - const outcome = currentCall.outcome; - const approvalMode = currentCall.approvalMode; - - switch (newStatus) { - case CoreToolCallStatus.Success: { - const durationMs = existingStartTime - ? Date.now() - existingStartTime - : undefined; - if (isToolCallResponseInfo(auxiliaryData)) { - return { - request: currentCall.request, - tool: toolInstance, - invocation, - status: CoreToolCallStatus.Success, - response: auxiliaryData, - durationMs, - outcome, - approvalMode, - } as SuccessfulToolCall; - } - throw new Error('Invalid response data for tool success'); - } - case CoreToolCallStatus.Error: { - const durationMs = existingStartTime - ? Date.now() - existingStartTime - : undefined; - if (isToolCallResponseInfo(auxiliaryData)) { - return { - request: currentCall.request, - status: CoreToolCallStatus.Error, - tool: toolInstance, - response: auxiliaryData, - durationMs, - outcome, - approvalMode, - } as ErroredToolCall; - } - throw new Error('Invalid response data for tool error'); - } - case CoreToolCallStatus.AwaitingApproval: - return { - request: currentCall.request, - tool: toolInstance, - status: CoreToolCallStatus.AwaitingApproval, - confirmationDetails: - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion - auxiliaryData as ToolCallConfirmationDetails, - startTime: existingStartTime, - outcome, - invocation, - approvalMode, - } as WaitingToolCall; - case CoreToolCallStatus.Scheduled: - return { - request: currentCall.request, - tool: toolInstance, - status: CoreToolCallStatus.Scheduled, - startTime: existingStartTime, - outcome, - invocation, - approvalMode, - } as ScheduledToolCall; - case CoreToolCallStatus.Cancelled: { - const durationMs = existingStartTime - ? Date.now() - existingStartTime - : undefined; - - if (isToolCallResponseInfo(auxiliaryData)) { - return { - request: currentCall.request, - tool: toolInstance, - invocation, - status: CoreToolCallStatus.Cancelled, - response: auxiliaryData, - durationMs, - outcome, - approvalMode, - } as CancelledToolCall; - } - - // Preserve diff for cancelled edit operations - let resultDisplay: ToolResultDisplay | undefined = undefined; - if (currentCall.status === CoreToolCallStatus.AwaitingApproval) { - const waitingCall = currentCall; - if (waitingCall.confirmationDetails.type === 'edit') { - resultDisplay = { - fileDiff: waitingCall.confirmationDetails.fileDiff, - fileName: waitingCall.confirmationDetails.fileName, - originalContent: - waitingCall.confirmationDetails.originalContent, - newContent: waitingCall.confirmationDetails.newContent, - filePath: waitingCall.confirmationDetails.filePath, - }; - } - } - - const errorMessage = `[Operation Cancelled] Reason: ${auxiliaryData}`; - return { - request: currentCall.request, - tool: toolInstance, - invocation, - status: CoreToolCallStatus.Cancelled, - response: { - callId: currentCall.request.callId, - responseParts: [ - { - functionResponse: { - id: currentCall.request.callId, - name: currentCall.request.name, - response: { - error: errorMessage, - }, - }, - }, - ], - resultDisplay, - error: undefined, - errorType: undefined, - contentLength: errorMessage.length, - }, - durationMs, - outcome, - approvalMode, - } as CancelledToolCall; - } - case CoreToolCallStatus.Validating: - return { - request: currentCall.request, - tool: toolInstance, - status: CoreToolCallStatus.Validating, - startTime: existingStartTime, - outcome, - invocation, - approvalMode, - } as ValidatingToolCall; - case CoreToolCallStatus.Executing: - return { - request: currentCall.request, - tool: toolInstance, - status: CoreToolCallStatus.Executing, - startTime: existingStartTime, - outcome, - invocation, - approvalMode, - } as ExecutingToolCall; - default: { - const exhaustiveCheck: never = newStatus; - return exhaustiveCheck; - } - } - }); - this.notifyToolCallsUpdate(); - } - - private setArgsInternal(targetCallId: string, args: unknown): void { - this.toolCalls = this.toolCalls.map((call) => { - // We should never be asked to set args on an ErroredToolCall, but - // we guard for the case anyways. - if ( - call.request.callId !== targetCallId || - call.status === CoreToolCallStatus.Error - ) { - return call; - } - - const invocationOrError = this.buildInvocation( - call.tool, - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion - args as Record, - ); - if (invocationOrError instanceof Error) { - const response = createErrorResponse( - call.request, - invocationOrError, - ToolErrorType.INVALID_TOOL_PARAMS, - ); - return { - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion - request: { ...call.request, args: args as Record }, - status: CoreToolCallStatus.Error, - tool: call.tool, - response, - approvalMode: call.approvalMode, - } as ErroredToolCall; - } - - return { - ...call, - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion - request: { ...call.request, args: args as Record }, - invocation: invocationOrError, - }; - }); - } - - private isRunning(): boolean { - return ( - this.isFinalizingToolCalls || - this.toolCalls.some( - (call) => - call.status === CoreToolCallStatus.Executing || - call.status === CoreToolCallStatus.AwaitingApproval, - ) - ); - } - - private buildInvocation( - tool: AnyDeclarativeTool, - args: object, - ): AnyToolInvocation | Error { - try { - return tool.build(args); - } catch (e) { - if (e instanceof Error) { - return e; - } - return new Error(String(e)); - } - } - - schedule( - request: ToolCallRequestInfo | ToolCallRequestInfo[], - signal: AbortSignal, - ): Promise { - return runInDevTraceSpan( - { operation: GeminiCliOperation.ScheduleToolCalls }, - async ({ metadata: spanMetadata }) => { - spanMetadata.input = request; - if (this.isRunning() || this.isScheduling) { - return new Promise((resolve, reject) => { - const abortHandler = () => { - // Find and remove the request from the queue - const index = this.requestQueue.findIndex( - (item) => item.request === request, - ); - if (index > -1) { - this.requestQueue.splice(index, 1); - reject(new Error('Tool call cancelled while in queue.')); - } - }; - - signal.addEventListener('abort', abortHandler, { once: true }); - - this.requestQueue.push({ - request, - signal, - resolve: () => { - signal.removeEventListener('abort', abortHandler); - resolve(); - }, - reject: (reason?: Error) => { - signal.removeEventListener('abort', abortHandler); - reject(reason); - }, - }); - }); - } - return this._schedule(request, signal); - }, - ); - } - - cancelAll(signal: AbortSignal): void { - if (this.isCancelling) { - return; - } - this.isCancelling = true; - // Cancel the currently active tool call, if there is one. - if (this.toolCalls.length > 0) { - const activeCall = this.toolCalls[0]; - // Only cancel if it's in a cancellable state. - if ( - activeCall.status === CoreToolCallStatus.AwaitingApproval || - activeCall.status === CoreToolCallStatus.Executing || - activeCall.status === CoreToolCallStatus.Scheduled || - activeCall.status === CoreToolCallStatus.Validating - ) { - this.setStatusInternal( - activeCall.request.callId, - CoreToolCallStatus.Cancelled, - signal, - 'User cancelled the operation.', - ); - } - } - - // Clear the queue and mark all queued items as cancelled for completion reporting. - this._cancelAllQueuedCalls(); - - // Finalize the batch immediately. - void this.checkAndNotifyCompletion(signal); - } - - private async _schedule( - request: ToolCallRequestInfo | ToolCallRequestInfo[], - signal: AbortSignal, - ): Promise { - this.isScheduling = true; - this.isCancelling = false; - try { - if (this.isRunning()) { - throw new Error( - 'Cannot schedule new tool calls while other tool calls are actively running (executing or awaiting approval).', - ); - } - const requestsToProcess = Array.isArray(request) ? request : [request]; - const currentApprovalMode = this.context.config.getApprovalMode(); - this.completedToolCallsForBatch = []; - - const newToolCalls: ToolCall[] = requestsToProcess.map( - (reqInfo): ToolCall => { - const toolInstance = this.context.toolRegistry.getTool(reqInfo.name); - if (!toolInstance) { - const suggestion = getToolSuggestion( - reqInfo.name, - this.context.toolRegistry.getAllToolNames(), - ); - const errorMessage = `Tool "${reqInfo.name}" not found in registry. Tools must use the exact names that are registered.${suggestion}`; - return { - status: CoreToolCallStatus.Error, - request: reqInfo, - response: createErrorResponse( - reqInfo, - new Error(errorMessage), - ToolErrorType.TOOL_NOT_REGISTERED, - ), - durationMs: 0, - approvalMode: currentApprovalMode, - }; - } - - const invocationOrError = this.buildInvocation( - toolInstance, - reqInfo.args, - ); - if (invocationOrError instanceof Error) { - return { - status: CoreToolCallStatus.Error, - request: reqInfo, - tool: toolInstance, - response: createErrorResponse( - reqInfo, - invocationOrError, - ToolErrorType.INVALID_TOOL_PARAMS, - ), - durationMs: 0, - approvalMode: currentApprovalMode, - }; - } - - return { - status: CoreToolCallStatus.Validating, - request: reqInfo, - tool: toolInstance, - invocation: invocationOrError, - startTime: Date.now(), - approvalMode: currentApprovalMode, - }; - }, - ); - - this.toolCallQueue.push(...newToolCalls); - await this._processNextInQueue(signal); - } finally { - this.isScheduling = false; - } - } - - private async _processNextInQueue(signal: AbortSignal): Promise { - // If there's already a tool being processed, or the queue is empty, stop. - if (this.toolCalls.length > 0 || this.toolCallQueue.length === 0) { - return; - } - - // If cancellation happened between steps, handle it. - if (signal.aborted) { - this._cancelAllQueuedCalls(); - // Finalize the batch. - await this.checkAndNotifyCompletion(signal); - return; - } - - const toolCall = this.toolCallQueue.shift()!; - - // This is now the single active tool call. - this.toolCalls = [toolCall]; - this.notifyToolCallsUpdate(); - - // Handle tools that were already errored during creation. - if (toolCall.status === CoreToolCallStatus.Error) { - // An error during validation means this "active" tool is already complete. - // We need to check for batch completion to either finish or process the next in queue. - await this.checkAndNotifyCompletion(signal); - return; - } - - // This logic is moved from the old `for` loop in `_schedule`. - if (toolCall.status === CoreToolCallStatus.Validating) { - const { request: reqInfo, invocation } = toolCall; - - try { - if (signal.aborted) { - this.setStatusInternal( - reqInfo.callId, - CoreToolCallStatus.Cancelled, - signal, - 'Tool call cancelled by user.', - ); - // The completion check will handle the cascade. - await this.checkAndNotifyCompletion(signal); - return; - } - - // Policy Check using PolicyEngine - // We must reconstruct the FunctionCall format expected by PolicyEngine - const toolCallForPolicy = { - name: toolCall.request.name, - args: toolCall.request.args, - }; - const serverName = - toolCall.tool instanceof DiscoveredMCPTool - ? toolCall.tool.serverName - : undefined; - const toolAnnotations = toolCall.tool.toolAnnotations; - - const { decision, rule } = await this.context.config - .getPolicyEngine() - .check(toolCallForPolicy, serverName, toolAnnotations); - - if (decision === PolicyDecision.DENY) { - const { errorMessage, errorType } = getPolicyDenialError( - this.context.config, - rule, - ); - this.setStatusInternal( - reqInfo.callId, - CoreToolCallStatus.Error, - signal, - createErrorResponse(reqInfo, new Error(errorMessage), errorType), - ); - await this.checkAndNotifyCompletion(signal); - return; - } - - if (decision === PolicyDecision.ALLOW) { - this.setToolCallOutcome( - reqInfo.callId, - ToolConfirmationOutcome.ProceedAlways, - ); - this.setStatusInternal( - reqInfo.callId, - CoreToolCallStatus.Scheduled, - signal, - ); - } else { - // PolicyDecision.ASK_USER - - // We need confirmation details to show to the user - const confirmationDetails = - await invocation.shouldConfirmExecute(signal); - - if (!confirmationDetails) { - this.setToolCallOutcome( - reqInfo.callId, - ToolConfirmationOutcome.ProceedAlways, - ); - this.setStatusInternal( - reqInfo.callId, - CoreToolCallStatus.Scheduled, - signal, - ); - } else { - if (!this.context.config.isInteractive()) { - throw new Error( - `Tool execution for "${ - toolCall.tool.displayName || toolCall.tool.name - }" requires user confirmation, which is not supported in non-interactive mode.`, - ); - } - - // Fire Notification hook before showing confirmation to user - const hookSystem = this.context.config.getHookSystem(); - if (hookSystem) { - await hookSystem.fireToolNotificationEvent(confirmationDetails); - } - - // Allow IDE to resolve confirmation - if ( - confirmationDetails.type === 'edit' && - confirmationDetails.ideConfirmation - ) { - // eslint-disable-next-line @typescript-eslint/no-floating-promises - confirmationDetails.ideConfirmation.then((resolution) => { - if (resolution.status === 'accepted') { - // eslint-disable-next-line @typescript-eslint/no-floating-promises - this.handleConfirmationResponse( - reqInfo.callId, - confirmationDetails.onConfirm, - ToolConfirmationOutcome.ProceedOnce, - signal, - ); - } else { - // eslint-disable-next-line @typescript-eslint/no-floating-promises - this.handleConfirmationResponse( - reqInfo.callId, - confirmationDetails.onConfirm, - ToolConfirmationOutcome.Cancel, - signal, - ); - } - }); - } - - const originalOnConfirm = confirmationDetails.onConfirm; - const wrappedConfirmationDetails: ToolCallConfirmationDetails = { - ...confirmationDetails, - onConfirm: ( - outcome: ToolConfirmationOutcome, - payload?: ToolConfirmationPayload, - ) => - this.handleConfirmationResponse( - reqInfo.callId, - originalOnConfirm, - outcome, - signal, - payload, - ), - }; - this.setStatusInternal( - reqInfo.callId, - CoreToolCallStatus.AwaitingApproval, - signal, - wrappedConfirmationDetails, - ); - } - } - } catch (error) { - if (signal.aborted) { - this.setStatusInternal( - reqInfo.callId, - CoreToolCallStatus.Cancelled, - signal, - 'Tool call cancelled by user.', - ); - await this.checkAndNotifyCompletion(signal); - } else { - this.setStatusInternal( - reqInfo.callId, - CoreToolCallStatus.Error, - signal, - createErrorResponse( - reqInfo, - error instanceof Error ? error : new Error(String(error)), - ToolErrorType.UNHANDLED_EXCEPTION, - ), - ); - await this.checkAndNotifyCompletion(signal); - } - } - } - await this.attemptExecutionOfScheduledCalls(signal); - } - - async handleConfirmationResponse( - callId: string, - originalOnConfirm: (outcome: ToolConfirmationOutcome) => Promise, - outcome: ToolConfirmationOutcome, - signal: AbortSignal, - payload?: ToolConfirmationPayload, - ): Promise { - const toolCall = this.toolCalls.find( - (c) => - c.request.callId === callId && - c.status === CoreToolCallStatus.AwaitingApproval, - ); - - if (toolCall && toolCall.status === CoreToolCallStatus.AwaitingApproval) { - await originalOnConfirm(outcome); - } - - this.setToolCallOutcome(callId, outcome); - - if (outcome === ToolConfirmationOutcome.Cancel || signal.aborted) { - // Instead of just cancelling one tool, trigger the full cancel cascade. - this.cancelAll(signal); - return; // `cancelAll` calls `checkAndNotifyCompletion`, so we can exit here. - } else if (outcome === ToolConfirmationOutcome.ModifyWithEditor) { - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion - const waitingToolCall = toolCall as WaitingToolCall; - - const editorType = this.getPreferredEditor(); - if (!editorType) { - return; - } - - /* eslint-disable @typescript-eslint/no-unsafe-type-assertion */ - this.setStatusInternal( - callId, - CoreToolCallStatus.AwaitingApproval, - signal, - { - ...waitingToolCall.confirmationDetails, - isModifying: true, - } as ToolCallConfirmationDetails, - ); - /* eslint-enable @typescript-eslint/no-unsafe-type-assertion */ - - const result = await this.toolModifier.handleModifyWithEditor( - waitingToolCall, - editorType, - signal, - ); - - // Restore status (isModifying: false) and update diff if result exists - if (result) { - this.setArgsInternal(callId, result.updatedParams); - /* eslint-disable @typescript-eslint/no-unsafe-type-assertion */ - this.setStatusInternal( - callId, - CoreToolCallStatus.AwaitingApproval, - signal, - { - ...waitingToolCall.confirmationDetails, - fileDiff: result.updatedDiff, - isModifying: false, - } as ToolCallConfirmationDetails, - ); - /* eslint-enable @typescript-eslint/no-unsafe-type-assertion */ - } else { - /* eslint-disable @typescript-eslint/no-unsafe-type-assertion */ - this.setStatusInternal( - callId, - CoreToolCallStatus.AwaitingApproval, - signal, - { - ...waitingToolCall.confirmationDetails, - isModifying: false, - } as ToolCallConfirmationDetails, - ); - /* eslint-enable @typescript-eslint/no-unsafe-type-assertion */ - } - } else { - // If the client provided new content, apply it and wait for - // re-confirmation. - if (payload && 'newContent' in payload && toolCall) { - const result = await this.toolModifier.applyInlineModify( - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion - toolCall as WaitingToolCall, - payload, - signal, - ); - if (result) { - this.setArgsInternal(callId, result.updatedParams); - /* eslint-disable @typescript-eslint/no-unsafe-type-assertion */ - this.setStatusInternal( - callId, - CoreToolCallStatus.AwaitingApproval, - signal, - { - ...(toolCall as WaitingToolCall).confirmationDetails, - fileDiff: result.updatedDiff, - } as ToolCallConfirmationDetails, - ); - /* eslint-enable @typescript-eslint/no-unsafe-type-assertion */ - // After an inline modification, wait for another user confirmation. - return; - } - } - this.setStatusInternal(callId, CoreToolCallStatus.Scheduled, signal); - } - await this.attemptExecutionOfScheduledCalls(signal); - } - - private async attemptExecutionOfScheduledCalls( - signal: AbortSignal, - ): Promise { - const allCallsFinalOrScheduled = this.toolCalls.every( - (call) => - call.status === CoreToolCallStatus.Scheduled || - call.status === CoreToolCallStatus.Cancelled || - call.status === CoreToolCallStatus.Success || - call.status === CoreToolCallStatus.Error, - ); - - if (allCallsFinalOrScheduled) { - const callsToExecute = this.toolCalls.filter( - (call) => call.status === CoreToolCallStatus.Scheduled, - ); - - for (const toolCall of callsToExecute) { - if (toolCall.status !== CoreToolCallStatus.Scheduled) continue; - - this.setStatusInternal( - toolCall.request.callId, - CoreToolCallStatus.Executing, - signal, - ); - const executingCall = this.toolCalls.find( - (c) => c.request.callId === toolCall.request.callId, - ); - - if (!executingCall) { - // Should not happen, but safe guard - continue; - } - - const completedCall = await this.toolExecutor.execute({ - call: executingCall, - signal, - outputUpdateHandler: (callId, output) => { - if (this.outputUpdateHandler) { - this.outputUpdateHandler(callId, output); - } - this.toolCalls = this.toolCalls.map((tc) => - tc.request.callId === callId && - tc.status === CoreToolCallStatus.Executing - ? { ...tc, liveOutput: output } - : tc, - ); - this.notifyToolCallsUpdate(); - }, - onUpdateToolCall: (updatedCall) => { - this.toolCalls = this.toolCalls.map((tc) => - tc.request.callId === updatedCall.request.callId - ? updatedCall - : tc, - ); - this.notifyToolCallsUpdate(); - }, - }); - - this.toolCalls = this.toolCalls.map((tc) => - tc.request.callId === completedCall.request.callId - ? { ...completedCall, approvalMode: tc.approvalMode } - : tc, - ); - this.notifyToolCallsUpdate(); - - await this.checkAndNotifyCompletion(signal); - } - } - } - - private async checkAndNotifyCompletion(signal: AbortSignal): Promise { - // This method is now only concerned with the single active tool call. - if (this.toolCalls.length === 0) { - // It's possible to be called when a batch is cancelled before any tool has started. - if (signal.aborted && this.toolCallQueue.length > 0) { - this._cancelAllQueuedCalls(); - } - } else { - const activeCall = this.toolCalls[0]; - const isTerminal = - activeCall.status === CoreToolCallStatus.Success || - activeCall.status === CoreToolCallStatus.Error || - activeCall.status === CoreToolCallStatus.Cancelled; - - // If the active tool is not in a terminal state (e.g., it's CoreToolCallStatus.Executing or CoreToolCallStatus.AwaitingApproval), - // then the scheduler is still busy or paused. We should not proceed. - if (!isTerminal) { - return; - } - - // The active tool is finished. Move it to the completed batch. - const completedCall = activeCall as CompletedToolCall; - this.completedToolCallsForBatch.push(completedCall); - logToolCall(this.context.config, new ToolCallEvent(completedCall)); - - // Clear the active tool slot. This is crucial for the sequential processing. - this.toolCalls = []; - } - - // Now, check if the entire batch is complete. - // The batch is complete if the queue is empty or the operation was cancelled. - if (this.toolCallQueue.length === 0 || signal.aborted) { - if (signal.aborted) { - this._cancelAllQueuedCalls(); - } - - // If we are already finalizing, another concurrent call to - // checkAndNotifyCompletion will just return. The ongoing finalized loop - // will pick up any new tools added to completedToolCallsForBatch. - if (this.isFinalizingToolCalls) { - return; - } - - // If there's nothing to report and we weren't cancelled, we can stop. - // But if we were cancelled, we must proceed to potentially start the next queued request. - if (this.completedToolCallsForBatch.length === 0 && !signal.aborted) { - return; - } - - this.isFinalizingToolCalls = true; - try { - // We use a while loop here to ensure that if new tools are added to the - // batch (e.g., via cancellation) while we are awaiting - // onAllToolCallsComplete, they are also reported before we finish. - while (this.completedToolCallsForBatch.length > 0) { - const batchToReport = [...this.completedToolCallsForBatch]; - this.completedToolCallsForBatch = []; - if (this.onAllToolCallsComplete) { - await this.onAllToolCallsComplete(batchToReport); - } - } - } finally { - this.isFinalizingToolCalls = false; - this.isCancelling = false; - this.notifyToolCallsUpdate(); - } - - // After completion of the entire batch, process the next item in the main request queue. - if (this.requestQueue.length > 0) { - const next = this.requestQueue.shift()!; - this._schedule(next.request, next.signal) - .then(next.resolve) - .catch(next.reject); - } - } else { - // The batch is not yet complete, so continue processing the current batch sequence. - await this._processNextInQueue(signal); - } - } - - private _cancelAllQueuedCalls(): void { - while (this.toolCallQueue.length > 0) { - const queuedCall = this.toolCallQueue.shift()!; - // Don't cancel tools that already errored during validation. - if (queuedCall.status === CoreToolCallStatus.Error) { - this.completedToolCallsForBatch.push(queuedCall); - continue; - } - const durationMs = - 'startTime' in queuedCall && queuedCall.startTime - ? Date.now() - queuedCall.startTime - : undefined; - const errorMessage = - '[Operation Cancelled] User cancelled the operation.'; - this.completedToolCallsForBatch.push({ - request: queuedCall.request, - tool: queuedCall.tool, - invocation: queuedCall.invocation, - status: CoreToolCallStatus.Cancelled, - response: { - callId: queuedCall.request.callId, - responseParts: [ - { - functionResponse: { - id: queuedCall.request.callId, - name: queuedCall.request.name, - response: { - error: errorMessage, - }, - }, - }, - ], - resultDisplay: undefined, - error: undefined, - errorType: undefined, - contentLength: errorMessage.length, - }, - durationMs, - outcome: ToolConfirmationOutcome.Cancel, - approvalMode: queuedCall.approvalMode, - }); - } - } - - private notifyToolCallsUpdate(): void { - if (this.onToolCallsUpdate) { - this.onToolCallsUpdate([ - ...this.completedToolCallsForBatch, - ...this.toolCalls, - ...this.toolCallQueue, - ]); - } - } - - private setToolCallOutcome(callId: string, outcome: ToolConfirmationOutcome) { - this.toolCalls = this.toolCalls.map((call) => { - if (call.request.callId !== callId) return call; - return { - ...call, - outcome, - }; - }); - } -} diff --git a/packages/core/src/core/geminiChat.ts b/packages/core/src/core/geminiChat.ts index ff6c3a3806..236d219228 100644 --- a/packages/core/src/core/geminiChat.ts +++ b/packages/core/src/core/geminiChat.ts @@ -32,7 +32,7 @@ import { } from '../config/models.js'; import { hasCycleInSchema } from '../tools/tools.js'; import type { StructuredError } from './turn.js'; -import type { CompletedToolCall } from './coreToolScheduler.js'; +import type { CompletedToolCall } from '../scheduler/types.js'; import { logContentRetry, logContentRetryFailure, diff --git a/packages/core/src/core/loggingContentGenerator.test.ts b/packages/core/src/core/loggingContentGenerator.test.ts index 1e8a886f69..7b37d1a5ff 100644 --- a/packages/core/src/core/loggingContentGenerator.test.ts +++ b/packages/core/src/core/loggingContentGenerator.test.ts @@ -19,7 +19,6 @@ const runInDevTraceSpan = vi.hoisted(() => const metadata = { attributes: opts.attributes || {} }; return fn({ metadata, - endSpan: vi.fn(), }); }), ); @@ -73,6 +72,7 @@ describe('LoggingContentGenerator', () => { getContentGeneratorConfig: vi.fn().mockReturnValue({ authType: 'API_KEY', }), + getTelemetryLogPromptsEnabled: vi.fn().mockReturnValue(true), refreshUserQuotaIfStale: vi.fn().mockResolvedValue(undefined), } as unknown as Config; loggingContentGenerator = new LoggingContentGenerator(wrapped, config); @@ -158,7 +158,7 @@ describe('LoggingContentGenerator', () => { const spanArgs = vi.mocked(runInDevTraceSpan).mock.calls[0]; const fn = spanArgs[1]; const metadata: SpanMetadata = { name: '', attributes: {} }; - await fn({ metadata, endSpan: vi.fn() }); + await fn({ metadata }); expect(metadata).toMatchObject({ input: req.contents, @@ -222,7 +222,7 @@ describe('LoggingContentGenerator', () => { const spanArgs = vi.mocked(runInDevTraceSpan).mock.calls[0]; const fn = spanArgs[1]; const metadata: SpanMetadata = { name: '', attributes: {} }; - promise = fn({ metadata, endSpan: vi.fn() }); + promise = fn({ metadata }); await expect(promise).rejects.toThrow(error); @@ -407,7 +407,7 @@ describe('LoggingContentGenerator', () => { expect(runInDevTraceSpan).toHaveBeenCalledWith( expect.objectContaining({ operation: GeminiCliOperation.LLMCall, - noAutoEnd: true, + attributes: expect.objectContaining({ [GEN_AI_REQUEST_MODEL]: 'gemini-pro', [GEN_AI_PROMPT_NAME]: userPromptId, @@ -427,7 +427,7 @@ describe('LoggingContentGenerator', () => { vi.mocked(wrapped.generateContentStream).mockResolvedValue( createAsyncGenerator(), ); - stream = await fn({ metadata, endSpan: vi.fn() }); + stream = await fn({ metadata }); for await (const _ of stream) { // consume stream @@ -644,7 +644,7 @@ describe('LoggingContentGenerator', () => { const spanArgs = vi.mocked(runInDevTraceSpan).mock.calls[0]; const fn = spanArgs[1]; const metadata: SpanMetadata = { name: '', attributes: {} }; - await fn({ metadata, endSpan: vi.fn() }); + await fn({ metadata }); expect(metadata).toMatchObject({ input: req.contents, diff --git a/packages/core/src/core/loggingContentGenerator.ts b/packages/core/src/core/loggingContentGenerator.ts index 60144740c2..82fd384ee4 100644 --- a/packages/core/src/core/loggingContentGenerator.ts +++ b/packages/core/src/core/loggingContentGenerator.ts @@ -349,6 +349,7 @@ export class LoggingContentGenerator implements ContentGenerator { return runInDevTraceSpan( { operation: GeminiCliOperation.LLMCall, + logPrompts: this.config.getTelemetryLogPromptsEnabled(), attributes: { [GEN_AI_REQUEST_MODEL]: req.model, [GEN_AI_PROMPT_NAME]: userPromptId, @@ -438,7 +439,7 @@ export class LoggingContentGenerator implements ContentGenerator { return runInDevTraceSpan( { operation: GeminiCliOperation.LLMCall, - noAutoEnd: true, + logPrompts: this.config.getTelemetryLogPromptsEnabled(), attributes: { [GEN_AI_REQUEST_MODEL]: req.model, [GEN_AI_PROMPT_NAME]: userPromptId, @@ -448,7 +449,7 @@ export class LoggingContentGenerator implements ContentGenerator { [GEN_AI_TOOL_DEFINITIONS]: safeJsonStringify(req.config?.tools ?? []), }, }, - async ({ metadata: spanMetadata, endSpan }) => { + async ({ metadata: spanMetadata }) => { spanMetadata.input = req.contents; const startTime = Date.now(); @@ -504,7 +505,6 @@ export class LoggingContentGenerator implements ContentGenerator { userPromptId, role, spanMetadata, - endSpan, ); }, ); @@ -517,7 +517,6 @@ export class LoggingContentGenerator implements ContentGenerator { userPromptId: string, role: LlmRole, spanMetadata: SpanMetadata, - endSpan: () => void, ): AsyncGenerator { const responses: GenerateContentResponse[] = []; @@ -581,8 +580,6 @@ export class LoggingContentGenerator implements ContentGenerator { serverDetails, ); throw error; - } finally { - endSpan(); } } @@ -596,6 +593,7 @@ export class LoggingContentGenerator implements ContentGenerator { return runInDevTraceSpan( { operation: GeminiCliOperation.LLMCall, + logPrompts: this.config.getTelemetryLogPromptsEnabled(), attributes: { [GEN_AI_REQUEST_MODEL]: req.model, }, diff --git a/packages/core/src/hooks/hookAggregator.ts b/packages/core/src/hooks/hookAggregator.ts index 73e814702e..b67266edf5 100644 --- a/packages/core/src/hooks/hookAggregator.ts +++ b/packages/core/src/hooks/hookAggregator.ts @@ -125,6 +125,7 @@ export class HookAggregator { const additionalContexts: string[] = []; let hasBlockDecision = false; + let hasAskDecision = false; let hasContinueFalse = false; for (const output of outputs) { @@ -142,6 +143,12 @@ export class HookAggregator { if (tempOutput.isBlockingDecision()) { hasBlockDecision = true; merged.decision = output.decision; + } else if (tempOutput.isAskDecision()) { + hasAskDecision = true; + // Ask decision is only set if no blocking decision was found so far + if (!hasBlockDecision) { + merged.decision = output.decision; + } } // Collect messages @@ -180,8 +187,8 @@ export class HookAggregator { this.extractAdditionalContext(output, additionalContexts); } - // Set final decision if no blocking decision was found - if (!hasBlockDecision && !hasContinueFalse) { + // Set final decision if no blocking or ask decision was found + if (!hasBlockDecision && !hasAskDecision && !hasContinueFalse) { merged.decision = 'allow'; } diff --git a/packages/core/src/hooks/hookEventHandler.ts b/packages/core/src/hooks/hookEventHandler.ts index a092bed334..e7b970875c 100644 --- a/packages/core/src/hooks/hookEventHandler.ts +++ b/packages/core/src/hooks/hookEventHandler.ts @@ -303,6 +303,7 @@ export class HookEventHandler { coreEvents.emitHookStart({ hookName: this.getHookName(config), eventName, + source: config.source, hookIndex: index + 1, totalHooks: plan.hookConfigs.length, }); diff --git a/packages/core/src/hooks/types.ts b/packages/core/src/hooks/types.ts index 9c6217ffa4..11dbe874e5 100644 --- a/packages/core/src/hooks/types.ts +++ b/packages/core/src/hooks/types.ts @@ -28,6 +28,15 @@ export enum ConfigSource { Extensions = 'extensions', } +/** + * Returns true if a hook source implies it is a user-visible hook. + * Only System hooks are hidden by default to reduce noise. + */ +export function isUserVisibleHook(source?: string | ConfigSource): boolean { + if (!source) return true; // Treat unknown/legacy hooks as user-visible + return source !== ConfigSource.System; +} + /** * Event names for the hook system */ @@ -197,12 +206,19 @@ export class DefaultHookOutput implements HookOutput { } /** - * Check if this output represents a blocking decision + * Check if this output represents a blocking decision (block or deny) */ isBlockingDecision(): boolean { return this.decision === 'block' || this.decision === 'deny'; } + /** + * Check if this output represents an 'ask' decision + */ + isAskDecision(): boolean { + return this.decision === 'ask'; + } + /** * Check if this output requests to stop execution */ diff --git a/packages/core/src/index.ts b/packages/core/src/index.ts index 5729730365..4a5dc9d11d 100644 --- a/packages/core/src/index.ts +++ b/packages/core/src/index.ts @@ -43,7 +43,6 @@ export * from './core/prompts.js'; export * from './core/tokenLimits.js'; export * from './core/turn.js'; export * from './core/geminiRequest.js'; -export * from './core/coreToolScheduler.js'; export * from './scheduler/scheduler.js'; export * from './scheduler/types.js'; export * from './scheduler/tool-executor.js'; @@ -181,6 +180,31 @@ export * from './agents/agentLoader.js'; export * from './agents/local-executor.js'; export * from './agents/agent-scheduler.js'; +// Export agent session interface +export * from './agent/agent-session.js'; +export * from './agent/legacy-agent-session.js'; +export * from './agent/event-translator.js'; +export * from './agent/content-utils.js'; +// Agent event types — namespaced to avoid collisions with existing exports +export type { + AgentEvent, + AgentEventCommon, + AgentEventData, + AgentEnd, + AgentEvents as AgentEventMap, + AgentEventType, + AgentProtocol, + AgentSend, + AgentStart, + ContentPart, + ErrorData, + StreamEndReason, + Trajectory, + Unsubscribe, + Usage as AgentUsage, + WithMeta, +} from './agent/types.js'; + // Export specific tool logic export * from './tools/read-file.js'; export * from './tools/ls.js'; diff --git a/packages/core/src/mcp/google-auth-provider.test.ts b/packages/core/src/mcp/google-auth-provider.test.ts index f535f17d83..cd15263984 100644 --- a/packages/core/src/mcp/google-auth-provider.test.ts +++ b/packages/core/src/mcp/google-auth-provider.test.ts @@ -177,6 +177,7 @@ describe('GoogleCredentialProvider', () => { it('should prioritize config headers over quota project ID', async () => { mockClient['quotaProjectId'] = 'quota-project-id'; const configWithHeaders = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...validConfig, headers: { 'X-Goog-User-Project': 'config-project-id', @@ -193,6 +194,7 @@ describe('GoogleCredentialProvider', () => { it('should prioritize config headers over quota project ID (case-insensitive)', async () => { mockClient['quotaProjectId'] = 'quota-project-id'; const configWithHeaders = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...validConfig, headers: { 'x-goog-user-project': 'config-project-id', diff --git a/packages/core/src/policy/config.test.ts b/packages/core/src/policy/config.test.ts index c4204e3c6c..7e39fe41dd 100644 --- a/packages/core/src/policy/config.test.ts +++ b/packages/core/src/policy/config.test.ts @@ -314,7 +314,7 @@ describe('createPolicyEngineConfig', () => { it('should allow all tools in YOLO mode', async () => { const config = await createPolicyEngineConfig({}, ApprovalMode.YOLO); const rule = config.rules?.find( - (r) => r.decision === PolicyDecision.ALLOW && !r.toolName, + (r) => r.decision === PolicyDecision.ALLOW && r.toolName === '*', ); expect(rule).toBeDefined(); expect(rule?.priority).toBeCloseTo(1.998, 5); @@ -513,7 +513,7 @@ describe('createPolicyEngineConfig', () => { ); const wildcardRule = config.rules?.find( - (r) => !r.toolName && r.decision === PolicyDecision.ALLOW, + (r) => r.toolName === '*' && r.decision === PolicyDecision.ALLOW, ); const writeToolRules = config.rules?.filter( (r) => diff --git a/packages/core/src/policy/config.ts b/packages/core/src/policy/config.ts index eb53196c92..f6107bf460 100644 --- a/packages/core/src/policy/config.ts +++ b/packages/core/src/policy/config.ts @@ -30,7 +30,10 @@ import { type MessageBus } from '../confirmation-bus/message-bus.js'; import { coreEvents } from '../utils/events.js'; import { debugLogger } from '../utils/debugLogger.js'; import { SHELL_TOOL_NAMES } from '../utils/shell-utils.js'; -import { SHELL_TOOL_NAME, SENSITIVE_TOOLS } from '../tools/tool-names.js'; +import { + SHELL_TOOL_NAME, + TOOLS_REQUIRING_NARROWING, +} from '../tools/tool-names.js'; import { isNodeError } from '../utils/errors.js'; import { MCP_TOOL_PREFIX } from '../tools/mcp-tool.js'; @@ -534,6 +537,7 @@ interface TomlRule { priority?: number; commandPrefix?: string | string[]; argsPattern?: string; + allowRedirection?: boolean; // Index signature to satisfy Record type if needed for toml.stringify [key: string]: unknown; } @@ -560,7 +564,7 @@ export function createPolicyUpdater( : WORKSPACE_POLICY_TIER; const priority = tier + getAlwaysAllowPriorityFraction() / 1000; - if (SENSITIVE_TOOLS.has(toolName) && !message.commandPrefix) { + if (TOOLS_REQUIRING_NARROWING.has(toolName) && !message.commandPrefix) { debugLogger.warn( `Attempted to update policy for sensitive tool '${toolName}' without a commandPrefix. Skipping.`, ); @@ -578,6 +582,7 @@ export function createPolicyUpdater( argsPattern: new RegExp(pattern), mcpName: message.mcpName, source: 'Dynamic (Confirmed)', + allowRedirection: message.allowRedirection, }); } } @@ -600,7 +605,7 @@ export function createPolicyUpdater( : WORKSPACE_POLICY_TIER; const priority = tier + getAlwaysAllowPriorityFraction() / 1000; - if (SENSITIVE_TOOLS.has(toolName) && !message.argsPattern) { + if (TOOLS_REQUIRING_NARROWING.has(toolName) && !message.argsPattern) { debugLogger.warn( `Attempted to update policy for sensitive tool '${toolName}' without an argsPattern. Skipping.`, ); @@ -614,6 +619,7 @@ export function createPolicyUpdater( argsPattern, mcpName: message.mcpName, source: 'Dynamic (Confirmed)', + allowRedirection: message.allowRedirection, }); } @@ -678,6 +684,10 @@ export function createPolicyUpdater( newRule.argsPattern = message.argsPattern; } + if (message.allowRedirection !== undefined) { + newRule.allowRedirection = message.allowRedirection; + } + // Add to rules existingData.rule.push(newRule); diff --git a/packages/core/src/policy/persistence.test.ts b/packages/core/src/policy/persistence.test.ts index da39160020..d4781fb4be 100644 --- a/packages/core/src/policy/persistence.test.ts +++ b/packages/core/src/policy/persistence.test.ts @@ -71,6 +71,26 @@ describe('createPolicyUpdater', () => { expect(content).toContain(`priority = ${expectedPriority}`); }); + it('should include allowRedirection when persisting policy', async () => { + createPolicyUpdater(policyEngine, messageBus, mockStorage); + + const policyFile = '/mock/user/.gemini/policies/auto-saved.toml'; + vi.spyOn(mockStorage, 'getAutoSavedPolicyPath').mockReturnValue(policyFile); + + await messageBus.publish({ + type: MessageBusType.UPDATE_POLICY, + toolName: 'test_tool', + persist: true, + allowRedirection: true, + }); + + await vi.advanceTimersByTimeAsync(100); + + const content = memfs.readFileSync(policyFile, 'utf-8') as string; + expect(content).toContain('toolName = "test_tool"'); + expect(content).toContain('allowRedirection = true'); + }); + it('should not persist policy when persist flag is false or undefined', async () => { createPolicyUpdater(policyEngine, messageBus, mockStorage); diff --git a/packages/core/src/policy/policies/memory-manager.toml b/packages/core/src/policy/policies/memory-manager.toml index 2055fcdf3a..b1b1b4ddd9 100644 --- a/packages/core/src/policy/policies/memory-manager.toml +++ b/packages/core/src/policy/policies/memory-manager.toml @@ -7,4 +7,4 @@ toolName = ["read_file", "write_file", "replace", "list_directory", "glob", "gre decision = "allow" priority = 100 argsPattern = "(^|.*/)\\.gemini/.*" -deny_message = "Memory Manager is only allowed to access the .gemini folder." +denyMessage = "Memory Manager is only allowed to access the .gemini folder." diff --git a/packages/core/src/policy/policies/plan.toml b/packages/core/src/policy/policies/plan.toml index 5a7ee6e59f..b6ddef72ef 100644 --- a/packages/core/src/policy/policies/plan.toml +++ b/packages/core/src/policy/policies/plan.toml @@ -46,7 +46,7 @@ toolName = "enter_plan_mode" decision = "deny" priority = 70 modes = ["plan"] -deny_message = "You are already in Plan Mode." +denyMessage = "You are already in Plan Mode." [[rule]] toolName = "exit_plan_mode" @@ -65,20 +65,22 @@ interactive = false toolName = "exit_plan_mode" decision = "deny" priority = 50 -deny_message = "You are not currently in Plan Mode. Use enter_plan_mode first to design a plan." +denyMessage = "You are not currently in Plan Mode. Use enter_plan_mode first to design a plan." # Catch-All: Deny everything by default in Plan mode. [[rule]] +toolName = "*" decision = "deny" priority = 60 modes = ["plan"] -deny_message = "You are in Plan Mode with access to read-only tools. Execution of scripts (including those from skills) is blocked." +denyMessage = "You are in Plan Mode with access to read-only tools. Execution of scripts (including those from skills) is blocked." # Explicitly Allow Read-Only Tools in Plan mode. [[rule]] +toolName = "*" mcpName = "*" toolAnnotations = { readOnlyHint = true } decision = "ask_user" @@ -121,4 +123,4 @@ toolName = ["write_file", "replace"] decision = "deny" priority = 65 modes = ["plan"] -deny_message = "You are in Plan Mode and cannot modify source code. You may ONLY use write_file or replace to save plans to the designated plans directory as .md files." +denyMessage = "You are in Plan Mode and cannot modify source code. You may ONLY use write_file or replace to save plans to the designated plans directory as .md files." diff --git a/packages/core/src/policy/policies/write.toml b/packages/core/src/policy/policies/write.toml index c24f6dfee3..527ac6f059 100644 --- a/packages/core/src/policy/policies/write.toml +++ b/packages/core/src/policy/policies/write.toml @@ -74,6 +74,12 @@ type = "in-process" name = "allowed-path" required_context = ["environment"] +[[rule]] +toolName = "web_fetch" +decision = "allow" +priority = 15 +modes = ["autoEdit"] + [[rule]] toolName = "web_fetch" decision = "ask_user" diff --git a/packages/core/src/policy/policies/yolo.toml b/packages/core/src/policy/policies/yolo.toml index 230b4c2670..5e2a194d2e 100644 --- a/packages/core/src/policy/policies/yolo.toml +++ b/packages/core/src/policy/policies/yolo.toml @@ -49,7 +49,8 @@ interactive = true # Allow everything else in YOLO mode [[rule]] +toolName = "*" decision = "allow" priority = 998 modes = ["yolo"] -allow_redirection = true +allowRedirection = true diff --git a/packages/core/src/policy/policy-engine.test.ts b/packages/core/src/policy/policy-engine.test.ts index 4e53418907..eb39d6ed8d 100644 --- a/packages/core/src/policy/policy-engine.test.ts +++ b/packages/core/src/policy/policy-engine.test.ts @@ -267,7 +267,7 @@ describe('PolicyEngine', () => { it('should apply wildcard rules (no toolName)', async () => { const rules: PolicyRule[] = [ - { decision: PolicyDecision.DENY }, // Applies to all tools + { toolName: '*', decision: PolicyDecision.DENY }, // Applies to all tools { toolName: 'safe-tool', decision: PolicyDecision.ALLOW, priority: 10 }, ]; @@ -692,7 +692,7 @@ describe('PolicyEngine', () => { describe('complex scenarios', () => { it('should handle multiple matching rules with different priorities', async () => { const rules: PolicyRule[] = [ - { decision: PolicyDecision.DENY, priority: 0 }, // Default deny all + { toolName: '*', decision: PolicyDecision.DENY, priority: 0 }, // Default deny all { toolName: 'shell', decision: PolicyDecision.ASK_USER, priority: 5 }, { toolName: 'shell', @@ -1617,6 +1617,7 @@ describe('PolicyEngine', () => { const fixedRules: PolicyRule[] = [ { + toolName: '*', decision: PolicyDecision.DENY, priority: 1.06, modes: [ApprovalMode.PLAN], @@ -1647,6 +1648,7 @@ describe('PolicyEngine', () => { const { splitCommands } = await import('../utils/shell-utils.js'); const rules: PolicyRule[] = [ { + toolName: '*', decision: PolicyDecision.ALLOW, priority: 999, modes: [ApprovalMode.YOLO], @@ -1685,6 +1687,7 @@ describe('PolicyEngine', () => { priority: 2000, // Very high priority DENY (e.g. Admin) }, { + toolName: '*', decision: PolicyDecision.ALLOW, priority: 999, modes: [ApprovalMode.YOLO], @@ -1978,10 +1981,12 @@ describe('PolicyEngine', () => { describe('addChecker', () => { it('should add a new checker and maintain priority order', () => { const checker1: SafetyCheckerRule = { + toolName: '*', checker: { type: 'external', name: 'checker1' }, priority: 5, }; const checker2: SafetyCheckerRule = { + toolName: '*', checker: { type: 'external', name: 'checker2' }, priority: 10, }; @@ -2034,6 +2039,39 @@ describe('PolicyEngine', () => { ); }); + it('should match global wildcard (*) for checkers', async () => { + const rules: PolicyRule[] = [ + { toolName: '*', decision: PolicyDecision.ALLOW }, + ]; + const globalChecker: SafetyCheckerRule = { + checker: { type: 'external', name: 'global' }, + toolName: '*', + }; + + engine = new PolicyEngine( + { rules, checkers: [globalChecker] }, + mockCheckerRunner, + ); + + vi.mocked(mockCheckerRunner.runChecker).mockResolvedValue({ + decision: SafetyCheckDecision.ALLOW, + }); + + await engine.check({ name: 'any_tool' }, undefined); + expect(mockCheckerRunner.runChecker).toHaveBeenCalledWith( + expect.anything(), + expect.objectContaining({ name: 'global' }), + ); + + vi.mocked(mockCheckerRunner.runChecker).mockClear(); + + await engine.check({ name: 'mcp_server_tool' }, 'server'); + expect(mockCheckerRunner.runChecker).toHaveBeenCalledWith( + expect.anything(), + expect.objectContaining({ name: 'global' }), + ); + }); + it('should support wildcard patterns for checkers', async () => { const rules: PolicyRule[] = [ { @@ -2070,6 +2108,7 @@ describe('PolicyEngine', () => { ]; const checkers: SafetyCheckerRule[] = [ { + toolName: '*', checker: { type: 'in-process', name: InProcessCheckerType.ALLOWED_PATH, @@ -2095,6 +2134,7 @@ describe('PolicyEngine', () => { ]; const checkers: SafetyCheckerRule[] = [ { + toolName: '*', checker: { type: 'in-process', name: InProcessCheckerType.ALLOWED_PATH, @@ -2119,6 +2159,7 @@ describe('PolicyEngine', () => { ]; const checkers: SafetyCheckerRule[] = [ { + toolName: '*', checker: { type: 'in-process', name: InProcessCheckerType.ALLOWED_PATH, @@ -2143,6 +2184,7 @@ describe('PolicyEngine', () => { ]; const checkers: SafetyCheckerRule[] = [ { + toolName: '*', checker: { type: 'in-process', name: InProcessCheckerType.ALLOWED_PATH, @@ -2320,6 +2362,7 @@ describe('PolicyEngine', () => { name: 'should respect wildcard ALLOW rules (e.g. YOLO mode)', rules: [ { + toolName: '*', decision: PolicyDecision.ALLOW, priority: 999, modes: [ApprovalMode.YOLO], @@ -2396,6 +2439,7 @@ describe('PolicyEngine', () => { }, { // Simulates the global deny in Plan Mode + toolName: '*', decision: PolicyDecision.DENY, priority: 60, modes: [ApprovalMode.PLAN], @@ -2506,6 +2550,7 @@ describe('PolicyEngine', () => { engine = new PolicyEngine({ rules: [ { + toolName: '*', toolAnnotations: { destructiveHint: true }, decision: PolicyDecision.DENY, priority: 10, @@ -2523,6 +2568,7 @@ describe('PolicyEngine', () => { engine = new PolicyEngine({ rules: [ { + toolName: '*', toolAnnotations: { destructiveHint: true }, decision: PolicyDecision.DENY, priority: 10, @@ -2544,6 +2590,7 @@ describe('PolicyEngine', () => { engine = new PolicyEngine({ rules: [ { + toolName: '*', toolAnnotations: { destructiveHint: true }, decision: PolicyDecision.DENY, priority: 10, @@ -2615,6 +2662,7 @@ describe('PolicyEngine', () => { priority: 70, }, { + toolName: '*', decision: PolicyDecision.DENY, priority: 60, }, @@ -2661,6 +2709,7 @@ describe('PolicyEngine', () => { priority: 70, }, { + toolName: '*', decision: PolicyDecision.DENY, priority: 60, }, @@ -2701,6 +2750,7 @@ describe('PolicyEngine', () => { priority: 70, }, { + toolName: '*', decision: PolicyDecision.DENY, priority: 60, }, @@ -2782,6 +2832,7 @@ describe('PolicyEngine', () => { modes: [ApprovalMode.PLAN], }, { + toolName: '*', decision: PolicyDecision.DENY, priority: 60, modes: [ApprovalMode.PLAN], @@ -2857,6 +2908,7 @@ describe('PolicyEngine', () => { modes: [ApprovalMode.YOLO], }, { + toolName: '*', decision: PolicyDecision.ALLOW, priority: PRIORITY_YOLO_ALLOW_ALL, modes: [ApprovalMode.YOLO], @@ -2884,6 +2936,7 @@ describe('PolicyEngine', () => { modes: [ApprovalMode.YOLO], }, { + toolName: '*', decision: PolicyDecision.ALLOW, priority: PRIORITY_YOLO_ALLOW_ALL, modes: [ApprovalMode.YOLO], @@ -2907,6 +2960,7 @@ describe('PolicyEngine', () => { it('should allow activate_skill but deny shell commands in Plan Mode', async () => { const rules: PolicyRule[] = [ { + toolName: '*', decision: PolicyDecision.DENY, priority: 60, modes: [ApprovalMode.PLAN], @@ -3110,14 +3164,17 @@ describe('PolicyEngine', () => { describe('removeCheckersByTier', () => { it('should remove checkers matching a specific tier', () => { engine.addChecker({ + toolName: '*', checker: { type: 'external', name: 'c1' }, priority: 1.1, }); engine.addChecker({ + toolName: '*', checker: { type: 'external', name: 'c2' }, priority: 1.9, }); engine.addChecker({ + toolName: '*', checker: { type: 'external', name: 'c3' }, priority: 2.5, }); @@ -3135,14 +3192,17 @@ describe('PolicyEngine', () => { describe('removeCheckersBySource', () => { it('should remove checkers matching a specific source', () => { engine.addChecker({ + toolName: '*', checker: { type: 'external', name: 'c1' }, source: 'sourceA', }); engine.addChecker({ + toolName: '*', checker: { type: 'external', name: 'c2' }, source: 'sourceB', }); engine.addChecker({ + toolName: '*', checker: { type: 'external', name: 'c3' }, source: 'sourceA', }); @@ -3161,6 +3221,7 @@ describe('PolicyEngine', () => { engine = new PolicyEngine({ rules: [ { + toolName: '*', toolAnnotations: { readOnlyHint: true }, decision: PolicyDecision.ALLOW, priority: 10, diff --git a/packages/core/src/policy/policy-engine.ts b/packages/core/src/policy/policy-engine.ts index cb114b7c7f..c35c9c5d4f 100644 --- a/packages/core/src/policy/policy-engine.ts +++ b/packages/core/src/policy/policy-engine.ts @@ -88,14 +88,14 @@ function ruleMatches( } // Check subagent if specified (only for PolicyRule, SafetyCheckerRule doesn't have it) - if ('subagent' in rule && rule.subagent) { + if ('subagent' in rule && rule.subagent !== undefined) { if (rule.subagent !== subagent) { return false; } } // Strictly enforce mcpName identity if the rule dictates it - if (rule.mcpName) { + if (rule.mcpName !== undefined) { if (rule.mcpName === '*') { // Rule requires it to be ANY MCP tool if (serverName === undefined) return false; @@ -106,7 +106,7 @@ function ruleMatches( } // Check tool name if specified - if (rule.toolName) { + if (rule.toolName !== undefined) { // Support wildcard patterns: "mcp_serverName_*" matches "mcp_serverName_anyTool" if (rule.toolName === '*') { // Match all tools @@ -203,6 +203,40 @@ export class PolicyEngine { this.hookCheckers = (config.hookCheckers ?? []).sort( (a, b) => (b.priority ?? 0) - (a.priority ?? 0), ); + + // Validate rules + for (const rule of this.rules) { + if (rule.toolName === undefined || rule.toolName === '') { + throw new Error( + `Invalid policy rule: toolName is required. Use '*' for all tools. Rule source: ${rule.source || 'unknown'}`, + ); + } + if (rule.mcpName === '') { + throw new Error( + `Invalid policy rule: mcpName is required if specified (cannot be empty). Rule source: ${rule.source || 'unknown'}`, + ); + } + if (rule.subagent === '') { + throw new Error( + `Invalid policy rule: subagent is required if specified (cannot be empty). Rule source: ${rule.source || 'unknown'}`, + ); + } + } + + // Validate checkers + for (const checker of this.checkers) { + if (checker.toolName === undefined || checker.toolName === '') { + throw new Error( + `Invalid safety checker rule: toolName is required. Use '*' for all tools. Checker source: ${checker.source || 'unknown'}`, + ); + } + if (checker.mcpName === '') { + throw new Error( + `Invalid safety checker rule: mcpName is required if specified (cannot be empty). Checker source: ${checker.source || 'unknown'}`, + ); + } + } + this.defaultDecision = config.defaultDecision ?? PolicyDecision.ASK_USER; this.nonInteractive = config.nonInteractive ?? false; this.disableAlwaysAllow = config.disableAlwaysAllow ?? false; diff --git a/packages/core/src/policy/policy-updater.test.ts b/packages/core/src/policy/policy-updater.test.ts index 3bf3579bbc..5ee9d65df4 100644 --- a/packages/core/src/policy/policy-updater.test.ts +++ b/packages/core/src/policy/policy-updater.test.ts @@ -26,6 +26,7 @@ vi.mock('../config/storage.js'); vi.mock('../utils/shell-utils.js', () => ({ getCommandRoots: vi.fn(), stripShellWrapper: vi.fn(), + hasRedirection: vi.fn(), })); interface ParsedPolicy { rule?: Array<{ @@ -177,6 +178,25 @@ describe('createPolicyUpdater', () => { ); }); + it('should pass allowRedirection to policyEngine.addRule', async () => { + createPolicyUpdater(policyEngine, messageBus, mockStorage); + + await messageBus.publish({ + type: MessageBusType.UPDATE_POLICY, + toolName: 'run_shell_command', + commandPrefix: 'ls', + persist: false, + allowRedirection: true, + }); + + expect(policyEngine.addRule).toHaveBeenCalledWith( + expect.objectContaining({ + toolName: 'run_shell_command', + allowRedirection: true, + }), + ); + }); + it('should persist multiple rules correctly to TOML', async () => { createPolicyUpdater(policyEngine, messageBus, mockStorage); vi.mocked(fs.readFile).mockRejectedValue({ code: 'ENOENT' }); @@ -238,6 +258,7 @@ describe('ShellToolInvocation Policy Update', () => { vi.mocked(shellUtils.stripShellWrapper).mockImplementation( (c: string) => c, ); + vi.mocked(shellUtils.hasRedirection).mockReturnValue(false); }); it('should extract multiple root commands for chained commands', () => { @@ -279,4 +300,26 @@ describe('ShellToolInvocation Policy Update', () => { expect(options!.commandPrefix).toEqual(['ls']); expect(shellUtils.getCommandRoots).toHaveBeenCalledWith('ls -la /tmp'); }); + + it('should include allowRedirection if command has redirection', () => { + vi.mocked(shellUtils.getCommandRoots).mockReturnValue(['echo']); + vi.mocked(shellUtils.hasRedirection).mockReturnValue(true); + + const invocation = new ShellToolInvocation( + mockConfig, + { command: 'echo "hello" > file.txt' }, + mockMessageBus, + 'run_shell_command', + 'Shell', + ); + + const options = ( + invocation as unknown as TestableShellToolInvocation + ).getPolicyUpdateOptions(ToolConfirmationOutcome.ProceedAlways); + expect(options!.commandPrefix).toEqual(['echo']); + expect(options!.allowRedirection).toBe(true); + expect(shellUtils.hasRedirection).toHaveBeenCalledWith( + 'echo "hello" > file.txt', + ); + }); }); diff --git a/packages/core/src/policy/toml-loader.test.ts b/packages/core/src/policy/toml-loader.test.ts index 959f09ba80..6835e200b4 100644 --- a/packages/core/src/policy/toml-loader.test.ts +++ b/packages/core/src/policy/toml-loader.test.ts @@ -123,6 +123,7 @@ priority = 70 it('should transform mcpName = "*" to wildcard toolName', async () => { const result = await runLoadPoliciesFromToml(` [[rule]] +toolName = "*" mcpName = "*" decision = "ask_user" priority = 10 @@ -263,6 +264,20 @@ allow_redirection = true expect(result.errors).toHaveLength(0); }); + it('should parse and transform allowRedirection property (camelCase)', async () => { + const result = await runLoadPoliciesFromToml(` +[[rule]] +toolName = "run_shell_command" +commandPrefix = "echo" +decision = "allow" +priority = 100 +allowRedirection = true +`); + + expect(result.rules).toHaveLength(1); + expect(result.rules[0].allowRedirection).toBe(true); + expect(result.errors).toHaveLength(0); + }); it('should parse deny_message property', async () => { const result = await runLoadPoliciesFromToml(` [[rule]] @@ -273,7 +288,21 @@ deny_message = "Deletion is permanent" `); expect(result.rules).toHaveLength(1); - expect(result.rules[0].toolName).toBe('rm'); + expect(result.rules[0].decision).toBe(PolicyDecision.DENY); + expect(result.rules[0].denyMessage).toBe('Deletion is permanent'); + expect(getErrors(result)).toHaveLength(0); + }); + + it('should parse denyMessage property (camelCase)', async () => { + const result = await runLoadPoliciesFromToml(` +[[rule]] +toolName = "rm" +decision = "deny" +priority = 100 +denyMessage = "Deletion is permanent" +`); + + expect(result.rules).toHaveLength(1); expect(result.rules[0].decision).toBe(PolicyDecision.DENY); expect(result.rules[0].denyMessage).toBe('Deletion is permanent'); expect(getErrors(result)).toHaveLength(0); @@ -448,6 +477,21 @@ name = "allowed-path" }); describe('Negative Tests', () => { + it('should return a schema_validation error if toolName is missing in safety_checker', async () => { + const result = await runLoadPoliciesFromToml(` +[[safety_checker]] +priority = 100 +[safety_checker.checker] +type = "in-process" +name = "allowed-path" +`); + expect(result.errors).toHaveLength(1); + const error = result.errors[0]; + expect(error.errorType).toBe('schema_validation'); + expect(error.details).toContain('toolName'); + expect(error.details).toContain('Invalid input'); + }); + it('should return a schema_validation error if priority is missing', async () => { const result = await runLoadPoliciesFromToml(` [[rule]] @@ -543,6 +587,19 @@ priority = 100 expect(error.details).toContain('decision'); }); + it('should return a schema_validation error if toolName is missing', async () => { + const result = await runLoadPoliciesFromToml(` +[[rule]] +decision = "allow" +priority = 100 +`); + expect(result.errors).toHaveLength(1); + const error = result.errors[0]; + expect(error.errorType).toBe('schema_validation'); + expect(error.details).toContain('toolName'); + expect(error.details).toContain('Invalid input'); + }); + it('should return a schema_validation error if toolName is not a string or array', async () => { const result = await runLoadPoliciesFromToml(` [[rule]] @@ -767,9 +824,10 @@ priority = 100 expect(result.rules).toHaveLength(2); }); - it('should not warn for catch-all rules (no toolName)', async () => { + it('should not warn for catch-all rules (toolName = "*")', async () => { const result = await runLoadPoliciesFromToml(` [[rule]] +toolName = "*" decision = "deny" priority = 100 `); @@ -827,6 +885,7 @@ priority = 100 'Should have loaded a rule with toolAnnotations', ).toBeDefined(); expect(annotationRule!.toolName).toBe('mcp_*'); + expect(annotationRule!.mcpName).toBe('*'); expect(annotationRule!.toolAnnotations).toEqual({ readOnlyHint: true, }); @@ -838,7 +897,7 @@ priority = 100 const denyRule = result.rules.find( (r) => r.decision === PolicyDecision.DENY && - r.toolName === undefined && + r.toolName === '*' && r.denyMessage?.includes('Plan Mode'), ); expect( @@ -1061,13 +1120,12 @@ priority = 100 expect(warnings).toHaveLength(0); }); - it('should skip rules without toolName', () => { + it('should skip wildcard rules (matching all tools)', () => { const warnings = validateMcpPolicyToolNames( 'my-server', ['tool1'], - [{ toolName: undefined }], + [{ toolName: '*', mcpName: 'my-server' }], ); - expect(warnings).toHaveLength(0); }); diff --git a/packages/core/src/policy/toml-loader.ts b/packages/core/src/policy/toml-loader.ts index f5210954f7..977e8a399a 100644 --- a/packages/core/src/policy/toml-loader.ts +++ b/packages/core/src/policy/toml-loader.ts @@ -37,7 +37,7 @@ const MAX_TYPO_DISTANCE = 3; * Schema for a single policy rule in the TOML file (before transformation). */ const PolicyRuleSchema = z.object({ - toolName: z.union([z.string(), z.array(z.string())]).optional(), + toolName: z.union([z.string(), z.array(z.string())]), subagent: z.string().optional(), mcpName: z.string().optional(), argsPattern: z.string().optional(), @@ -63,15 +63,17 @@ const PolicyRuleSchema = z.object({ modes: z.array(z.nativeEnum(ApprovalMode)).optional(), interactive: z.boolean().optional(), toolAnnotations: z.record(z.any()).optional(), - allow_redirection: z.boolean().optional(), - deny_message: z.string().optional(), + allowRedirection: z.boolean().optional(), + allow_redirection: z.boolean().optional(), // deprecated snake_case for backward compatibility + denyMessage: z.string().optional(), + deny_message: z.string().optional(), // deprecated snake_case for backward compatibility }); /** * Schema for a single safety checker rule in the TOML file. */ const SafetyCheckerRuleSchema = z.object({ - toolName: z.union([z.string(), z.array(z.string())]).optional(), + toolName: z.union([z.string(), z.array(z.string())]), mcpName: z.string().optional(), argsPattern: z.string().optional(), commandPrefix: z.union([z.string(), z.array(z.string())]).optional(), @@ -409,14 +411,28 @@ export async function loadPoliciesFromToml( // Validate tool names in rules for (let i = 0; i < tomlRules.length; i++) { const rule = tomlRules[i]; + + const toolNamesRaw: string[] = Array.isArray(rule.toolName) + ? rule.toolName + : [rule.toolName]; + + if (toolNamesRaw.some((name) => name === '')) { + errors.push({ + filePath, + fileName: file, + tier: tierName, + ruleIndex: i, + errorType: 'rule_validation', + message: 'Invalid policy rule: toolName cannot be empty string', + details: `Rule #${i + 1} contains an empty toolName string. Use "*" to match all tools.`, + }); + continue; + } + // We no longer skip MCP-scoped rules because we need to specifically // warn users if they use deprecated "__" syntax for MCP tool names - const toolNames: string[] = rule.toolName - ? Array.isArray(rule.toolName) - ? rule.toolName - : [rule.toolName] - : []; + const toolNames: string[] = toolNamesRaw; for (const name of toolNames) { const warning = validateToolName(name, i); @@ -446,15 +462,13 @@ export async function loadPoliciesFromToml( // For each argsPattern, expand toolName arrays return argsPatterns.flatMap((argsPattern) => { - const toolNames: Array = rule.toolName - ? Array.isArray(rule.toolName) - ? rule.toolName - : [rule.toolName] - : [undefined]; + const toolNames: string[] = Array.isArray(rule.toolName) + ? rule.toolName + : [rule.toolName]; // Create a policy rule for each tool name return toolNames.map((toolName) => { - let effectiveToolName: string | undefined = toolName; + let effectiveToolName: string = toolName; const mcpName = rule.mcpName; if (mcpName) { @@ -478,9 +492,10 @@ export async function loadPoliciesFromToml( modes: rule.modes, interactive: rule.interactive, toolAnnotations: rule.toolAnnotations, - allowRedirection: rule.allow_redirection, + allowRedirection: + rule.allowRedirection ?? rule.allow_redirection, source: `${tierName.charAt(0).toUpperCase() + tierName.slice(1)}: ${file}`, - denyMessage: rule.deny_message, + denyMessage: rule.denyMessage ?? rule.deny_message, }; // Compile regex pattern @@ -532,13 +547,28 @@ export async function loadPoliciesFromToml( const tomlCheckerRules = validationResult.data.safety_checker ?? []; for (let i = 0; i < tomlCheckerRules.length; i++) { const checker = tomlCheckerRules[i]; + + const checkerToolNamesRaw: string[] = Array.isArray(checker.toolName) + ? checker.toolName + : [checker.toolName]; + + if (checkerToolNamesRaw.some((name) => name === '')) { + errors.push({ + filePath, + fileName: file, + tier: tierName, + ruleIndex: i, + errorType: 'rule_validation', + message: + 'Invalid safety checker rule: toolName cannot be empty string', + details: `Checker #${i + 1} contains an empty toolName string. Use "*" to match all tools.`, + }); + continue; + } + if (checker.mcpName) continue; - const checkerToolNames: string[] = checker.toolName - ? Array.isArray(checker.toolName) - ? checker.toolName - : [checker.toolName] - : []; + const checkerToolNames: string[] = checkerToolNamesRaw; for (const name of checkerToolNames) { const warning = validateToolName(name, i); @@ -569,15 +599,13 @@ export async function loadPoliciesFromToml( ); return argsPatterns.flatMap((argsPattern) => { - const toolNames: Array = checker.toolName - ? Array.isArray(checker.toolName) - ? checker.toolName - : [checker.toolName] - : [undefined]; + const toolNames: string[] = Array.isArray(checker.toolName) + ? checker.toolName + : [checker.toolName]; return toolNames.map((toolName) => { - let effectiveToolName: string | undefined; - if (checker.mcpName && toolName) { + let effectiveToolName: string; + if (checker.mcpName && toolName !== '*') { effectiveToolName = `${MCP_TOOL_PREFIX}${checker.mcpName}_${toolName}`; } else if (checker.mcpName) { effectiveToolName = `${MCP_TOOL_PREFIX}${checker.mcpName}_*`; @@ -672,7 +700,7 @@ export function validateMcpPolicyToolNames( serverName: string, discoveredToolNames: string[], policyRules: ReadonlyArray<{ - toolName?: string; + toolName: string; mcpName?: string; source?: string; }>, diff --git a/packages/core/src/policy/types.ts b/packages/core/src/policy/types.ts index 5cd668ef4e..494956c364 100644 --- a/packages/core/src/policy/types.ts +++ b/packages/core/src/policy/types.ts @@ -106,9 +106,9 @@ export interface PolicyRule { /** * The name of the tool this rule applies to. - * If undefined, the rule applies to all tools. + * Use '*' to match all tools. */ - toolName?: string; + toolName: string; /** * The name of the subagent this rule applies to. @@ -182,9 +182,9 @@ export interface PolicyRule { export interface SafetyCheckerRule { /** * The name of the tool this rule applies to. - * If undefined, the rule applies to all tools. + * Use '*' to match all tools. */ - toolName?: string; + toolName: string; /** * Identifies the MCP server this rule applies to. diff --git a/packages/core/src/prompts/snippets.ts b/packages/core/src/prompts/snippets.ts index d5ff8714b0..5c285fc554 100644 --- a/packages/core/src/prompts/snippets.ts +++ b/packages/core/src/prompts/snippets.ts @@ -315,11 +315,16 @@ export function renderPrimaryWorkflows( options?: PrimaryWorkflowsOptions, ): string { if (!options) return ''; + + const transitionOverride = options.approvedPlan + ? `\n\n**State Transition Override:** You are now in **Execution Mode**. All previous "Read-Only", "Plan Mode", and "ONLY FOR PLANS" constraints are **immediately lifted**. You are explicitly authorized and required to use tools to modify source code and environment files to implement the approved plan. Begin executing the steps of the plan immediately.` + : ''; + return ` # Primary Workflows ## Development Lifecycle -Operate using a **Research -> Strategy -> Execution** lifecycle. For the Execution phase, resolve each sub-task through an iterative **Plan -> Act -> Validate** cycle. +Operate using a **Research -> Strategy -> Execution** lifecycle. For the Execution phase, resolve each sub-task through an iterative **Plan -> Act -> Validate** cycle.${transitionOverride} ${workflowStepResearch(options)} ${workflowStepStrategy(options)} diff --git a/packages/core/src/sandbox/linux/LinuxSandboxManager.test.ts b/packages/core/src/sandbox/linux/LinuxSandboxManager.test.ts index 4b1237b167..d3864d8278 100644 --- a/packages/core/src/sandbox/linux/LinuxSandboxManager.test.ts +++ b/packages/core/src/sandbox/linux/LinuxSandboxManager.test.ts @@ -4,24 +4,20 @@ * SPDX-License-Identifier: Apache-2.0 */ -import { describe, it, expect } from 'vitest'; +import { describe, it, expect, beforeEach } from 'vitest'; import { LinuxSandboxManager } from './LinuxSandboxManager.js'; import type { SandboxRequest } from '../../services/sandboxManager.js'; describe('LinuxSandboxManager', () => { const workspace = '/home/user/workspace'; + let manager: LinuxSandboxManager; - it('correctly outputs bwrap as the program with appropriate isolation flags', async () => { - const manager = new LinuxSandboxManager({ workspace }); - const req: SandboxRequest = { - command: 'ls', - args: ['-la'], - cwd: workspace, - env: {}, - }; + beforeEach(() => { + manager = new LinuxSandboxManager({ workspace }); + }); + const getBwrapArgs = async (req: SandboxRequest) => { const result = await manager.prepareCommand(req); - expect(result.program).toBe('sh'); expect(result.args[0]).toBe('-c'); expect(result.args[1]).toBe( @@ -29,8 +25,17 @@ describe('LinuxSandboxManager', () => { ); expect(result.args[2]).toBe('_'); expect(result.args[3]).toMatch(/gemini-cli-seccomp-.*\.bpf$/); + return result.args.slice(4); + }; + + it('correctly outputs bwrap as the program with appropriate isolation flags', async () => { + const bwrapArgs = await getBwrapArgs({ + command: 'ls', + args: ['-la'], + cwd: workspace, + env: {}, + }); - const bwrapArgs = result.args.slice(4); expect(bwrapArgs).toEqual([ '--unshare-all', '--new-session', @@ -56,55 +61,48 @@ describe('LinuxSandboxManager', () => { }); it('maps allowedPaths to bwrap binds', async () => { - const manager = new LinuxSandboxManager({ - workspace, - allowedPaths: ['/tmp/cache', '/opt/tools', workspace], - }); - const req: SandboxRequest = { + const bwrapArgs = await getBwrapArgs({ command: 'node', args: ['script.js'], cwd: workspace, env: {}, - }; + policy: { + allowedPaths: ['/tmp/cache', '/opt/tools', workspace], + }, + }); - const result = await manager.prepareCommand(req); + // Verify the specific bindings were added correctly + const bindsIndex = bwrapArgs.indexOf('--seccomp'); + const binds = bwrapArgs.slice(bwrapArgs.indexOf('--bind'), bindsIndex); - expect(result.program).toBe('sh'); - expect(result.args[0]).toBe('-c'); - expect(result.args[1]).toBe( - 'bpf_path="$1"; shift; exec bwrap "$@" 9< "$bpf_path"', - ); - expect(result.args[2]).toBe('_'); - expect(result.args[3]).toMatch(/gemini-cli-seccomp-.*\.bpf$/); - - const bwrapArgs = result.args.slice(4); - expect(bwrapArgs).toEqual([ - '--unshare-all', - '--new-session', - '--die-with-parent', - '--ro-bind', - '/', - '/', - '--dev', - '/dev', - '--proc', - '/proc', - '--tmpfs', - '/tmp', + expect(binds).toEqual([ '--bind', workspace, workspace, - '--bind', + '--bind-try', '/tmp/cache', '/tmp/cache', - '--bind', + '--bind-try', '/opt/tools', '/opt/tools', - '--seccomp', - '9', - '--', - 'node', - 'script.js', ]); }); + + it('should not bind the workspace twice even if it has a trailing slash in allowedPaths', async () => { + const bwrapArgs = await getBwrapArgs({ + command: 'ls', + args: ['-la'], + cwd: workspace, + env: {}, + policy: { + allowedPaths: [workspace + '/'], + }, + }); + + const bindsIndex = bwrapArgs.indexOf('--seccomp'); + const binds = bwrapArgs.slice(bwrapArgs.indexOf('--bind'), bindsIndex); + + // Should only contain the primary workspace bind, not the second one with a trailing slash + expect(binds).toEqual(['--bind', workspace, workspace]); + }); }); diff --git a/packages/core/src/sandbox/linux/LinuxSandboxManager.ts b/packages/core/src/sandbox/linux/LinuxSandboxManager.ts index db75eb2dfa..f9f0ed68e9 100644 --- a/packages/core/src/sandbox/linux/LinuxSandboxManager.ts +++ b/packages/core/src/sandbox/linux/LinuxSandboxManager.ts @@ -4,18 +4,19 @@ * SPDX-License-Identifier: Apache-2.0 */ -import { join } from 'node:path'; +import { join, normalize } from 'node:path'; import { writeFileSync } from 'node:fs'; import os from 'node:os'; import { type SandboxManager, + type GlobalSandboxOptions, type SandboxRequest, type SandboxedCommand, + sanitizePaths, } from '../../services/sandboxManager.js'; import { sanitizeEnvironment, getSecureSanitizationConfig, - type EnvironmentSanitizationConfig, } from '../../services/environmentSanitization.js'; let cachedBpfPath: string | undefined; @@ -76,28 +77,15 @@ function getSeccompBpfPath(): string { return bpfPath; } -/** - * Options for configuring the LinuxSandboxManager. - */ -export interface LinuxSandboxOptions { - /** The primary workspace path to bind into the sandbox. */ - workspace: string; - /** Additional paths to bind into the sandbox. */ - allowedPaths?: string[]; - /** Optional base sanitization config. */ - sanitizationConfig?: EnvironmentSanitizationConfig; -} - /** * A SandboxManager implementation for Linux that uses Bubblewrap (bwrap). */ export class LinuxSandboxManager implements SandboxManager { - constructor(private readonly options: LinuxSandboxOptions) {} + constructor(private readonly options: GlobalSandboxOptions) {} async prepareCommand(req: SandboxRequest): Promise { const sanitizationConfig = getSecureSanitizationConfig( - req.config?.sanitizationConfig, - this.options.sanitizationConfig, + req.policy?.sanitizationConfig, ); const sanitizedEnv = sanitizeEnvironment(req.env, sanitizationConfig); @@ -121,13 +109,20 @@ export class LinuxSandboxManager implements SandboxManager { this.options.workspace, ]; - const allowedPaths = this.options.allowedPaths ?? []; - for (const path of allowedPaths) { - if (path !== this.options.workspace) { - bwrapArgs.push('--bind', path, path); + const allowedPaths = sanitizePaths(req.policy?.allowedPaths) || []; + const normalizedWorkspace = normalize(this.options.workspace).replace( + /\/$/, + '', + ); + for (const allowedPath of allowedPaths) { + const normalizedAllowedPath = normalize(allowedPath).replace(/\/$/, ''); + if (normalizedAllowedPath !== normalizedWorkspace) { + bwrapArgs.push('--bind-try', allowedPath, allowedPath); } } + // TODO: handle forbidden paths + const bpfPath = getSeccompBpfPath(); bwrapArgs.push('--seccomp', '9'); diff --git a/packages/core/src/sandbox/macos/MacOsSandboxManager.integration.test.ts b/packages/core/src/sandbox/macos/MacOsSandboxManager.integration.test.ts index d9776bc715..f9a3551124 100644 --- a/packages/core/src/sandbox/macos/MacOsSandboxManager.integration.test.ts +++ b/packages/core/src/sandbox/macos/MacOsSandboxManager.integration.test.ts @@ -116,7 +116,6 @@ describe.skipIf(os.platform() !== 'darwin')( try { const manager = new MacOsSandboxManager({ workspace: process.cwd(), - allowedPaths: [allowedDir], }); const testFile = path.join(allowedDir, 'test.txt'); @@ -125,6 +124,9 @@ describe.skipIf(os.platform() !== 'darwin')( args: [testFile], cwd: process.cwd(), env: process.env, + policy: { + allowedPaths: [allowedDir], + }, }); const execResult = await runCommand(command); @@ -183,13 +185,15 @@ describe.skipIf(os.platform() !== 'darwin')( it('should grant network access when explicitly allowed', async () => { const manager = new MacOsSandboxManager({ workspace: process.cwd(), - networkAccess: true, }); const command = await manager.prepareCommand({ command: 'curl', args: ['-s', '--connect-timeout', '1', testServerUrl], cwd: process.cwd(), env: process.env, + policy: { + networkAccess: true, + }, }); const execResult = await runCommand(command); diff --git a/packages/core/src/sandbox/macos/MacOsSandboxManager.test.ts b/packages/core/src/sandbox/macos/MacOsSandboxManager.test.ts index 69946daade..d6a72e8439 100644 --- a/packages/core/src/sandbox/macos/MacOsSandboxManager.test.ts +++ b/packages/core/src/sandbox/macos/MacOsSandboxManager.test.ts @@ -3,105 +3,182 @@ * Copyright 2026 Google LLC * SPDX-License-Identifier: Apache-2.0 */ -import { - describe, - it, - expect, - vi, - beforeEach, - afterEach, - type MockInstance, -} from 'vitest'; +import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'; import { MacOsSandboxManager } from './MacOsSandboxManager.js'; -import * as seatbeltArgsBuilder from './seatbeltArgsBuilder.js'; +import type { ExecutionPolicy } from '../../services/sandboxManager.js'; +import fs from 'node:fs'; +import os from 'node:os'; describe('MacOsSandboxManager', () => { const mockWorkspace = '/test/workspace'; const mockAllowedPaths = ['/test/allowed']; const mockNetworkAccess = true; + const mockPolicy: ExecutionPolicy = { + allowedPaths: mockAllowedPaths, + networkAccess: mockNetworkAccess, + }; + let manager: MacOsSandboxManager; - let buildArgsSpy: MockInstance; beforeEach(() => { - manager = new MacOsSandboxManager({ - workspace: mockWorkspace, - allowedPaths: mockAllowedPaths, - networkAccess: mockNetworkAccess, - }); - - buildArgsSpy = vi - .spyOn(seatbeltArgsBuilder, 'buildSeatbeltArgs') - .mockReturnValue([ - '-p', - '(mock profile)', - '-D', - 'WORKSPACE=/test/workspace', - ]); + manager = new MacOsSandboxManager({ workspace: mockWorkspace }); + // Mock realpathSync to just return the path for testing + vi.spyOn(fs, 'realpathSync').mockImplementation((p) => p as string); }); afterEach(() => { vi.restoreAllMocks(); }); - it('should correctly invoke buildSeatbeltArgs with the configured options', async () => { - await manager.prepareCommand({ - command: 'echo', - args: ['hello'], - cwd: mockWorkspace, - env: {}, + describe('prepareCommand', () => { + it('should build a strict allowlist profile allowing the workspace via param', async () => { + const result = await manager.prepareCommand({ + command: 'echo', + args: ['hello'], + cwd: mockWorkspace, + env: {}, + policy: { networkAccess: false }, + }); + + expect(result.program).toBe('/usr/bin/sandbox-exec'); + const profile = result.args[1]; + expect(profile).toContain('(version 1)'); + expect(profile).toContain('(deny default)'); + expect(profile).toContain('(allow process-exec)'); + expect(profile).toContain('(subpath (param "WORKSPACE"))'); + expect(profile).not.toContain('(allow network*)'); + + expect(result.args).toContain('-D'); + expect(result.args).toContain('WORKSPACE=/test/workspace'); + expect(result.args).toContain(`TMPDIR=${os.tmpdir()}`); }); - expect(buildArgsSpy).toHaveBeenCalledWith({ - workspace: mockWorkspace, - allowedPaths: mockAllowedPaths, - networkAccess: mockNetworkAccess, - }); - }); + it('should allow network when networkAccess is true in policy', async () => { + const result = await manager.prepareCommand({ + command: 'curl', + args: ['example.com'], + cwd: mockWorkspace, + env: {}, + policy: { networkAccess: true }, + }); - it('should format the executable and arguments correctly for sandbox-exec', async () => { - const result = await manager.prepareCommand({ - command: 'echo', - args: ['hello'], - cwd: mockWorkspace, - env: {}, + const profile = result.args[1]; + expect(profile).toContain('(allow network*)'); }); - expect(result.program).toBe('/usr/bin/sandbox-exec'); - expect(result.args).toEqual([ - '-p', - '(mock profile)', - '-D', - 'WORKSPACE=/test/workspace', - '--', - 'echo', - 'hello', - ]); - }); + it('should parameterize allowed paths and normalize them', async () => { + vi.spyOn(fs, 'realpathSync').mockImplementation((p) => { + if (p === '/test/symlink') return '/test/real_path'; + return p as string; + }); - it('should correctly pass through the cwd to the resulting command', async () => { - const result = await manager.prepareCommand({ - command: 'echo', - args: ['hello'], - cwd: '/test/different/cwd', - env: {}, + const result = await manager.prepareCommand({ + command: 'ls', + args: ['/custom/path1'], + cwd: mockWorkspace, + env: {}, + policy: { + allowedPaths: ['/custom/path1', '/test/symlink'], + }, + }); + + const profile = result.args[1]; + expect(profile).toContain('(subpath (param "ALLOWED_PATH_0"))'); + expect(profile).toContain('(subpath (param "ALLOWED_PATH_1"))'); + + expect(result.args).toContain('-D'); + expect(result.args).toContain('ALLOWED_PATH_0=/custom/path1'); + expect(result.args).toContain('ALLOWED_PATH_1=/test/real_path'); }); - expect(result.cwd).toBe('/test/different/cwd'); - }); + it('should format the executable and arguments correctly for sandbox-exec', async () => { + const result = await manager.prepareCommand({ + command: 'echo', + args: ['hello'], + cwd: mockWorkspace, + env: {}, + policy: mockPolicy, + }); - it('should apply environment sanitization via the default mechanisms', async () => { - const result = await manager.prepareCommand({ - command: 'echo', - args: ['hello'], - cwd: mockWorkspace, - env: { - SAFE_VAR: '1', - GITHUB_TOKEN: 'sensitive', - }, + expect(result.program).toBe('/usr/bin/sandbox-exec'); + expect(result.args.slice(-3)).toEqual(['--', 'echo', 'hello']); }); - expect(result.env['SAFE_VAR']).toBe('1'); - expect(result.env['GITHUB_TOKEN']).toBeUndefined(); + it('should correctly pass through the cwd to the resulting command', async () => { + const result = await manager.prepareCommand({ + command: 'echo', + args: ['hello'], + cwd: '/test/different/cwd', + env: {}, + policy: mockPolicy, + }); + + expect(result.cwd).toBe('/test/different/cwd'); + }); + + it('should apply environment sanitization via the default mechanisms', async () => { + const result = await manager.prepareCommand({ + command: 'echo', + args: ['hello'], + cwd: mockWorkspace, + env: { + SAFE_VAR: '1', + GITHUB_TOKEN: 'sensitive', + }, + policy: mockPolicy, + }); + + expect(result.env['SAFE_VAR']).toBe('1'); + expect(result.env['GITHUB_TOKEN']).toBeUndefined(); + }); + + it('should resolve parent directories if a file does not exist', async () => { + vi.spyOn(fs, 'realpathSync').mockImplementation((p) => { + if (p === '/test/symlink/nonexistent.txt') { + const error = new Error('ENOENT'); + Object.assign(error, { code: 'ENOENT' }); + throw error; + } + if (p === '/test/symlink') { + return '/test/real_path'; + } + return p as string; + }); + + const dynamicManager = new MacOsSandboxManager({ + workspace: '/test/symlink/nonexistent.txt', + }); + const dynamicResult = await dynamicManager.prepareCommand({ + command: 'echo', + args: ['hello'], + cwd: '/test/symlink/nonexistent.txt', + env: {}, + }); + + expect(dynamicResult.args).toContain( + 'WORKSPACE=/test/real_path/nonexistent.txt', + ); + }); + + it('should throw if realpathSync throws a non-ENOENT error', async () => { + vi.spyOn(fs, 'realpathSync').mockImplementation(() => { + const error = new Error('Permission denied'); + Object.assign(error, { code: 'EACCES' }); + throw error; + }); + + const errorManager = new MacOsSandboxManager({ + workspace: '/test/workspace', + }); + await expect( + errorManager.prepareCommand({ + command: 'echo', + args: ['hello'], + cwd: mockWorkspace, + env: {}, + }), + ).rejects.toThrow('Permission denied'); + }); }); }); diff --git a/packages/core/src/sandbox/macos/MacOsSandboxManager.ts b/packages/core/src/sandbox/macos/MacOsSandboxManager.ts index a212b310b2..06eabd2a94 100644 --- a/packages/core/src/sandbox/macos/MacOsSandboxManager.ts +++ b/packages/core/src/sandbox/macos/MacOsSandboxManager.ts @@ -4,51 +4,40 @@ * SPDX-License-Identifier: Apache-2.0 */ +import fs from 'node:fs'; +import os from 'node:os'; +import path from 'node:path'; import { type SandboxManager, + type GlobalSandboxOptions, type SandboxRequest, type SandboxedCommand, + type ExecutionPolicy, + sanitizePaths, } from '../../services/sandboxManager.js'; import { sanitizeEnvironment, getSecureSanitizationConfig, - type EnvironmentSanitizationConfig, } from '../../services/environmentSanitization.js'; -import { buildSeatbeltArgs } from './seatbeltArgsBuilder.js'; - -/** - * Options for configuring the MacOsSandboxManager. - */ -export interface MacOsSandboxOptions { - /** The primary workspace path to allow access to within the sandbox. */ - workspace: string; - /** Additional paths to allow access to within the sandbox. */ - allowedPaths?: string[]; - /** Whether network access is allowed. */ - networkAccess?: boolean; - /** Optional base sanitization config. */ - sanitizationConfig?: EnvironmentSanitizationConfig; -} +import { + BASE_SEATBELT_PROFILE, + NETWORK_SEATBELT_PROFILE, +} from './baseProfile.js'; /** * A SandboxManager implementation for macOS that uses Seatbelt. */ export class MacOsSandboxManager implements SandboxManager { - constructor(private readonly options: MacOsSandboxOptions) {} + constructor(private readonly options: GlobalSandboxOptions) {} async prepareCommand(req: SandboxRequest): Promise { const sanitizationConfig = getSecureSanitizationConfig( - req.config?.sanitizationConfig, - this.options.sanitizationConfig, + req.policy?.sanitizationConfig, ); const sanitizedEnv = sanitizeEnvironment(req.env, sanitizationConfig); - const sandboxArgs = buildSeatbeltArgs({ - workspace: this.options.workspace, - allowedPaths: this.options.allowedPaths, - networkAccess: this.options.networkAccess, - }); + const sandboxArgs = this.buildSeatbeltArgs(this.options, req.policy); return { program: '/usr/bin/sandbox-exec', @@ -57,4 +46,65 @@ export class MacOsSandboxManager implements SandboxManager { cwd: req.cwd, }; } + + /** + * Builds the arguments array for sandbox-exec using a strict allowlist profile. + * It relies on parameters passed to sandbox-exec via the -D flag to avoid + * string interpolation vulnerabilities, and normalizes paths against symlink escapes. + * + * Returns arguments up to the end of sandbox-exec configuration (e.g. ['-p', '', '-D', ...]) + * Does not include the final '--' separator or the command to run. + */ + private buildSeatbeltArgs( + options: GlobalSandboxOptions, + policy?: ExecutionPolicy, + ): string[] { + const profileLines = [BASE_SEATBELT_PROFILE]; + const args: string[] = []; + + const workspacePath = this.tryRealpath(options.workspace); + args.push('-D', `WORKSPACE=${workspacePath}`); + + const tmpPath = this.tryRealpath(os.tmpdir()); + args.push('-D', `TMPDIR=${tmpPath}`); + + const allowedPaths = sanitizePaths(policy?.allowedPaths) || []; + for (let i = 0; i < allowedPaths.length; i++) { + const allowedPath = this.tryRealpath(allowedPaths[i]); + args.push('-D', `ALLOWED_PATH_${i}=${allowedPath}`); + profileLines.push( + `(allow file-read* file-write* (subpath (param "ALLOWED_PATH_${i}")))`, + ); + } + + // TODO: handle forbidden paths + + if (policy?.networkAccess) { + profileLines.push(NETWORK_SEATBELT_PROFILE); + } + + args.unshift('-p', profileLines.join('\n')); + + return args; + } + + /** + * Resolves symlinks for a given path to prevent sandbox escapes. + * If a file does not exist (ENOENT), it recursively resolves the parent directory. + * Other errors (e.g. EACCES) are re-thrown. + */ + private tryRealpath(p: string): string { + try { + return fs.realpathSync(p); + } catch (e) { + if (e instanceof Error && 'code' in e && e.code === 'ENOENT') { + const parentDir = path.dirname(p); + if (parentDir === p) { + return p; + } + return path.join(this.tryRealpath(parentDir), path.basename(p)); + } + throw e; + } + } } diff --git a/packages/core/src/sandbox/macos/seatbeltArgsBuilder.test.ts b/packages/core/src/sandbox/macos/seatbeltArgsBuilder.test.ts deleted file mode 100644 index 340eaead60..0000000000 --- a/packages/core/src/sandbox/macos/seatbeltArgsBuilder.test.ts +++ /dev/null @@ -1,97 +0,0 @@ -/** - * @license - * Copyright 2026 Google LLC - * SPDX-License-Identifier: Apache-2.0 - */ -import { describe, it, expect, vi } from 'vitest'; -import { buildSeatbeltArgs } from './seatbeltArgsBuilder.js'; -import fs from 'node:fs'; -import os from 'node:os'; - -describe('seatbeltArgsBuilder', () => { - it('should build a strict allowlist profile allowing the workspace via param', () => { - // Mock realpathSync to just return the path for testing - vi.spyOn(fs, 'realpathSync').mockImplementation((p) => p as string); - - const args = buildSeatbeltArgs({ workspace: '/Users/test/workspace' }); - - expect(args[0]).toBe('-p'); - const profile = args[1]; - expect(profile).toContain('(version 1)'); - expect(profile).toContain('(deny default)'); - expect(profile).toContain('(allow process-exec)'); - expect(profile).toContain('(subpath (param "WORKSPACE"))'); - expect(profile).not.toContain('(allow network*)'); - - expect(args).toContain('-D'); - expect(args).toContain('WORKSPACE=/Users/test/workspace'); - expect(args).toContain(`TMPDIR=${os.tmpdir()}`); - - vi.restoreAllMocks(); - }); - - it('should allow network when networkAccess is true', () => { - const args = buildSeatbeltArgs({ workspace: '/test', networkAccess: true }); - const profile = args[1]; - expect(profile).toContain('(allow network*)'); - }); - - it('should parameterize allowed paths and normalize them', () => { - vi.spyOn(fs, 'realpathSync').mockImplementation((p) => { - if (p === '/test/symlink') return '/test/real_path'; - return p as string; - }); - - const args = buildSeatbeltArgs({ - workspace: '/test', - allowedPaths: ['/custom/path1', '/test/symlink'], - }); - - const profile = args[1]; - expect(profile).toContain('(subpath (param "ALLOWED_PATH_0"))'); - expect(profile).toContain('(subpath (param "ALLOWED_PATH_1"))'); - - expect(args).toContain('-D'); - expect(args).toContain('ALLOWED_PATH_0=/custom/path1'); - expect(args).toContain('ALLOWED_PATH_1=/test/real_path'); - - vi.restoreAllMocks(); - }); - - it('should resolve parent directories if a file does not exist', () => { - vi.spyOn(fs, 'realpathSync').mockImplementation((p) => { - if (p === '/test/symlink/nonexistent.txt') { - const error = new Error('ENOENT'); - Object.assign(error, { code: 'ENOENT' }); - throw error; - } - if (p === '/test/symlink') { - return '/test/real_path'; - } - return p as string; - }); - - const args = buildSeatbeltArgs({ - workspace: '/test/symlink/nonexistent.txt', - }); - - expect(args).toContain('WORKSPACE=/test/real_path/nonexistent.txt'); - vi.restoreAllMocks(); - }); - - it('should throw if realpathSync throws a non-ENOENT error', () => { - vi.spyOn(fs, 'realpathSync').mockImplementation(() => { - const error = new Error('Permission denied'); - Object.assign(error, { code: 'EACCES' }); - throw error; - }); - - expect(() => - buildSeatbeltArgs({ - workspace: '/test/workspace', - }), - ).toThrow('Permission denied'); - - vi.restoreAllMocks(); - }); -}); diff --git a/packages/core/src/sandbox/macos/seatbeltArgsBuilder.ts b/packages/core/src/sandbox/macos/seatbeltArgsBuilder.ts deleted file mode 100644 index 0e162f22dd..0000000000 --- a/packages/core/src/sandbox/macos/seatbeltArgsBuilder.ts +++ /dev/null @@ -1,80 +0,0 @@ -/** - * @license - * Copyright 2026 Google LLC - * SPDX-License-Identifier: Apache-2.0 - */ - -import fs from 'node:fs'; -import os from 'node:os'; -import path from 'node:path'; -import { - BASE_SEATBELT_PROFILE, - NETWORK_SEATBELT_PROFILE, -} from './baseProfile.js'; - -/** - * Options for building macOS Seatbelt arguments. - */ -export interface SeatbeltArgsOptions { - /** The primary workspace path to allow access to. */ - workspace: string; - /** Additional paths to allow access to. */ - allowedPaths?: string[]; - /** Whether to allow network access. */ - networkAccess?: boolean; -} - -/** - * Resolves symlinks for a given path to prevent sandbox escapes. - * If a file does not exist (ENOENT), it recursively resolves the parent directory. - * Other errors (e.g. EACCES) are re-thrown. - */ -function tryRealpath(p: string): string { - try { - return fs.realpathSync(p); - } catch (e) { - if (e instanceof Error && 'code' in e && e.code === 'ENOENT') { - const parentDir = path.dirname(p); - if (parentDir === p) { - return p; - } - return path.join(tryRealpath(parentDir), path.basename(p)); - } - throw e; - } -} - -/** - * Builds the arguments array for sandbox-exec using a strict allowlist profile. - * It relies on parameters passed to sandbox-exec via the -D flag to avoid - * string interpolation vulnerabilities, and normalizes paths against symlink escapes. - * - * Returns arguments up to the end of sandbox-exec configuration (e.g. ['-p', '', '-D', ...]) - * Does not include the final '--' separator or the command to run. - */ -export function buildSeatbeltArgs(options: SeatbeltArgsOptions): string[] { - let profile = BASE_SEATBELT_PROFILE + '\n'; - const args: string[] = []; - - const workspacePath = tryRealpath(options.workspace); - args.push('-D', `WORKSPACE=${workspacePath}`); - - const tmpPath = tryRealpath(os.tmpdir()); - args.push('-D', `TMPDIR=${tmpPath}`); - - if (options.allowedPaths) { - for (let i = 0; i < options.allowedPaths.length; i++) { - const allowedPath = tryRealpath(options.allowedPaths[i]); - args.push('-D', `ALLOWED_PATH_${i}=${allowedPath}`); - profile += `(allow file-read* file-write* (subpath (param "ALLOWED_PATH_${i}")))\n`; - } - } - - if (options.networkAccess) { - profile += NETWORK_SEATBELT_PROFILE; - } - - args.unshift('-p', profile); - - return args; -} diff --git a/packages/core/src/scheduler/confirmation.ts b/packages/core/src/scheduler/confirmation.ts index 67ae26d2eb..7db7a0b48f 100644 --- a/packages/core/src/scheduler/confirmation.ts +++ b/packages/core/src/scheduler/confirmation.ts @@ -16,6 +16,7 @@ import { ToolConfirmationOutcome, type ToolConfirmationPayload, type ToolCallConfirmationDetails, + type ForcedToolDecision, } from '../tools/tools.js'; import { type ValidatingToolCall, @@ -116,6 +117,8 @@ export async function resolveConfirmation( getPreferredEditor: () => EditorType | undefined; schedulerId: string; onWaitingForConfirmation?: (waiting: boolean) => void; + systemMessage?: string; + forcedDecision?: ForcedToolDecision; }, ): Promise { const { state, onWaitingForConfirmation } = deps; @@ -126,7 +129,7 @@ export async function resolveConfirmation( // Loop exists to allow the user to modify the parameters and see the new // diff. while (outcome === ToolConfirmationOutcome.ModifyWithEditor) { - if (signal.aborted) throw new Error('Operation cancelled'); + if (signal.aborted) throw new Error('Operation cancelled by user'); const currentCall = state.getToolCall(callId); if (!currentCall || !('invocation' in currentCall)) { @@ -134,12 +137,19 @@ export async function resolveConfirmation( } const currentInvocation = currentCall.invocation; - const details = await currentInvocation.shouldConfirmExecute(signal); + const details = await currentInvocation.shouldConfirmExecute( + signal, + deps.forcedDecision, + ); if (!details) { outcome = ToolConfirmationOutcome.ProceedOnce; break; } + if (deps.systemMessage) { + details.systemMessage = deps.systemMessage; + } + await notifyHooks(deps, details); const correlationId = randomUUID(); diff --git a/packages/core/src/scheduler/hook-utils.ts b/packages/core/src/scheduler/hook-utils.ts new file mode 100644 index 0000000000..78d5aeaa53 --- /dev/null +++ b/packages/core/src/scheduler/hook-utils.ts @@ -0,0 +1,109 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +import type { Config } from '../config/config.js'; +import type { AnyDeclarativeTool, AnyToolInvocation } from '../tools/tools.js'; +import type { ToolCallRequestInfo } from './types.js'; +import { extractMcpContext } from '../core/coreToolHookTriggers.js'; +import { BeforeToolHookOutput } from '../hooks/types.js'; +import { ToolErrorType } from '../tools/tool-error.js'; + +export type HookEvaluationResult = + | { + status: 'continue'; + hookDecision?: 'ask' | 'block'; + hookSystemMessage?: string; + modifiedArgs?: Record; + newInvocation?: AnyToolInvocation; + } + | { + status: 'error'; + error: Error; + errorType: ToolErrorType; + }; + +export async function evaluateBeforeToolHook( + config: Config, + tool: AnyDeclarativeTool, + request: ToolCallRequestInfo, + invocation: AnyToolInvocation, +): Promise { + const hookSystem = config.getHookSystem(); + if (!hookSystem) { + return { status: 'continue' }; + } + + const params = invocation.params || {}; + const toolInput: Record = { ...params }; + const mcpContext = extractMcpContext(invocation, config); + + const beforeOutput = await hookSystem.fireBeforeToolEvent( + request.name, + toolInput, + mcpContext, + request.originalRequestName, + ); + + if (!beforeOutput) { + return { status: 'continue' }; + } + + if (beforeOutput.shouldStopExecution()) { + return { + status: 'error', + error: new Error( + `Agent execution stopped by hook: ${beforeOutput.getEffectiveReason()}`, + ), + errorType: ToolErrorType.STOP_EXECUTION, + }; + } + + const blockingError = beforeOutput.getBlockingError(); + if (blockingError?.blocked) { + return { + status: 'error', + error: new Error(`Tool execution blocked: ${blockingError.reason}`), + errorType: ToolErrorType.POLICY_VIOLATION, + }; + } + + let hookDecision: 'ask' | 'block' | undefined; + let hookSystemMessage: string | undefined; + + if (beforeOutput.isAskDecision()) { + hookDecision = 'ask'; + hookSystemMessage = beforeOutput.systemMessage; + } + + let modifiedArgs: Record | undefined; + let newInvocation: AnyToolInvocation | undefined; + + if (beforeOutput instanceof BeforeToolHookOutput) { + const modifiedInput = beforeOutput.getModifiedToolInput(); + if (modifiedInput) { + modifiedArgs = modifiedInput; + try { + newInvocation = tool.build(modifiedInput); + } catch (error) { + return { + status: 'error', + error: new Error( + `Tool parameter modification by hook failed validation: ${error instanceof Error ? error.message : String(error)}`, + ), + errorType: ToolErrorType.INVALID_TOOL_PARAMS, + }; + } + } + } + + return { + status: 'continue', + hookDecision, + hookSystemMessage, + modifiedArgs, + newInvocation, + }; +} diff --git a/packages/core/src/scheduler/policy.test.ts b/packages/core/src/scheduler/policy.test.ts index 32a92309e0..44a3feaa34 100644 --- a/packages/core/src/scheduler/policy.test.ts +++ b/packages/core/src/scheduler/policy.test.ts @@ -34,11 +34,9 @@ import { ROOT_SCHEDULER_ID, type ValidatingToolCall, type ToolCallRequestInfo, - type CompletedToolCall, } from './types.js'; import type { PolicyEngine } from '../policy/policy-engine.js'; import { DiscoveredMCPTool } from '../tools/mcp-tool.js'; -import { CoreToolScheduler } from '../core/coreToolScheduler.js'; import { Scheduler } from './scheduler.js'; import { ToolErrorType } from '../tools/tool-error.js'; import type { ToolRegistry } from '../tools/tool-registry.js'; @@ -762,6 +760,7 @@ describe('policy.ts', () => { (mockConfig as unknown as { config: Config }).config = mockConfig; const rule = { + toolName: '*', decision: PolicyDecision.DENY, denyMessage: 'Custom Deny', }; @@ -824,9 +823,11 @@ describe('Plan Mode Denial Consistency', () => { toolRegistry: mockToolRegistry, getToolRegistry: () => mockToolRegistry, getMessageBus: vi.fn().mockReturnValue(mockMessageBus), + getHookSystem: vi.fn().mockReturnValue(undefined), isInteractive: vi.fn().mockReturnValue(true), getEnableHooks: vi.fn().mockReturnValue(false), getApprovalMode: vi.fn().mockReturnValue(ApprovalMode.PLAN), // Key: Plan Mode + getTelemetryLogPromptsEnabled: vi.fn().mockReturnValue(false), setApprovalMode: vi.fn(), getUsageStatisticsEnabled: vi.fn().mockReturnValue(false), } as unknown as Mocked; @@ -839,61 +840,32 @@ describe('Plan Mode Denial Consistency', () => { vi.clearAllMocks(); }); - describe.each([ - { enableEventDrivenScheduler: false, name: 'Legacy CoreToolScheduler' }, - { enableEventDrivenScheduler: true, name: 'Event-Driven Scheduler' }, - ])('$name', ({ enableEventDrivenScheduler }) => { - it('should return the correct Plan Mode denial message when policy denies execution', async () => { - let resultMessage: string | undefined; - let resultErrorType: ToolErrorType | undefined; + it('should return the correct Plan Mode denial message when policy denies execution', async () => { + let resultMessage: string | undefined; + let resultErrorType: ToolErrorType | undefined; - const signal = new AbortController().signal; + const signal = new AbortController().signal; - if (enableEventDrivenScheduler) { - const scheduler = new Scheduler({ - context: { - config: mockConfig, - messageBus: mockMessageBus, - toolRegistry: mockToolRegistry, - } as unknown as AgentLoopContext, - getPreferredEditor: () => undefined, - schedulerId: ROOT_SCHEDULER_ID, - }); - - const results = await scheduler.schedule(req, signal); - const result = results[0]; - - expect(result.status).toBe('error'); - if (result.status === 'error') { - resultMessage = result.response.error?.message; - resultErrorType = result.response.errorType; - } - } else { - let capturedCalls: CompletedToolCall[] = []; - const scheduler = new CoreToolScheduler({ - context: { - config: mockConfig, - messageBus: mockMessageBus, - toolRegistry: mockToolRegistry, - } as unknown as AgentLoopContext, - getPreferredEditor: () => undefined, - onAllToolCallsComplete: async (calls) => { - capturedCalls = calls; - }, - }); - - await scheduler.schedule(req, signal); - - expect(capturedCalls.length).toBeGreaterThan(0); - const call = capturedCalls[0]; - if (call.status === 'error') { - resultMessage = call.response.error?.message; - resultErrorType = call.response.errorType; - } - } - - expect(resultMessage).toBe('Tool execution denied by policy.'); - expect(resultErrorType).toBe(ToolErrorType.POLICY_VIOLATION); + const scheduler = new Scheduler({ + context: { + config: mockConfig, + messageBus: mockMessageBus, + toolRegistry: mockToolRegistry, + } as unknown as AgentLoopContext, + getPreferredEditor: () => undefined, + schedulerId: ROOT_SCHEDULER_ID, }); + + const results = await scheduler.schedule(req, signal); + const result = results[0]; + + expect(result.status).toBe('error'); + if (result.status === 'error') { + resultMessage = result.response.error?.message; + resultErrorType = result.response.errorType; + } + + expect(resultMessage).toBe('Tool execution denied by policy.'); + expect(resultErrorType).toBe(ToolErrorType.POLICY_VIOLATION); }); }); diff --git a/packages/core/src/scheduler/scheduler.test.ts b/packages/core/src/scheduler/scheduler.test.ts index 35cfdc3af7..d029d714d7 100644 --- a/packages/core/src/scheduler/scheduler.test.ts +++ b/packages/core/src/scheduler/scheduler.test.ts @@ -25,7 +25,6 @@ const runInDevTraceSpan = vi.hoisted(() => const metadata = { attributes: opts.attributes || {} }; return fn({ metadata, - endSpan: vi.fn(), }); }), ); @@ -170,10 +169,13 @@ describe('Scheduler (Orchestrator)', () => { mockConfig = { getPolicyEngine: vi.fn().mockReturnValue(mockPolicyEngine), toolRegistry: mockToolRegistry, + getToolRegistry: vi.fn().mockReturnValue(mockToolRegistry), + getHookSystem: vi.fn().mockReturnValue(undefined), isInteractive: vi.fn().mockReturnValue(true), getEnableHooks: vi.fn().mockReturnValue(true), setApprovalMode: vi.fn(), getApprovalMode: vi.fn().mockReturnValue(ApprovalMode.DEFAULT), + getTelemetryLogPromptsEnabled: vi.fn().mockReturnValue(false), } as unknown as Mocked; (mockConfig as unknown as { config: Config }).config = mockConfig as Config; @@ -420,7 +422,7 @@ describe('Scheduler (Orchestrator)', () => { const spanArgs = vi.mocked(runInDevTraceSpan).mock.calls[0]; const fn = spanArgs[1]; const metadata = { attributes: {} }; - await fn({ metadata, endSpan: vi.fn() }); + await fn({ metadata }); expect(metadata).toMatchObject({ input: [req1], }); @@ -640,6 +642,7 @@ describe('Scheduler (Orchestrator)', () => { vi.mocked(checkPolicy).mockResolvedValue({ decision: PolicyDecision.DENY, rule: { + toolName: '*', decision: PolicyDecision.DENY, denyMessage: 'Custom denial reason', }, @@ -691,7 +694,7 @@ describe('Scheduler (Orchestrator)', () => { it('should return POLICY_VIOLATION error type when denied in Plan Mode', async () => { vi.mocked(checkPolicy).mockResolvedValue({ decision: PolicyDecision.DENY, - rule: { decision: PolicyDecision.DENY }, + rule: { toolName: '*', decision: PolicyDecision.DENY }, }); mockConfig.getApprovalMode.mockReturnValue(ApprovalMode.PLAN); @@ -720,7 +723,11 @@ describe('Scheduler (Orchestrator)', () => { const customMessage = 'Custom Plan Mode Deny'; vi.mocked(checkPolicy).mockResolvedValue({ decision: PolicyDecision.DENY, - rule: { decision: PolicyDecision.DENY, denyMessage: customMessage }, + rule: { + toolName: '*', + decision: PolicyDecision.DENY, + denyMessage: customMessage, + }, }); mockConfig.getApprovalMode.mockReturnValue(ApprovalMode.PLAN); @@ -1346,10 +1353,12 @@ describe('Scheduler MCP Progress', () => { mockConfig = { getPolicyEngine: vi.fn().mockReturnValue(mockPolicyEngine), getToolRegistry: vi.fn().mockReturnValue(mockToolRegistry), + getHookSystem: vi.fn().mockReturnValue(undefined), isInteractive: vi.fn().mockReturnValue(true), getEnableHooks: vi.fn().mockReturnValue(true), setApprovalMode: vi.fn(), getApprovalMode: vi.fn().mockReturnValue(ApprovalMode.DEFAULT), + getTelemetryLogPromptsEnabled: vi.fn().mockReturnValue(false), } as unknown as Mocked; (mockConfig as unknown as { config: Config }).config = mockConfig as Config; diff --git a/packages/core/src/scheduler/scheduler.ts b/packages/core/src/scheduler/scheduler.ts index cc14e3d875..ce2e530a16 100644 --- a/packages/core/src/scheduler/scheduler.ts +++ b/packages/core/src/scheduler/scheduler.ts @@ -10,6 +10,7 @@ import type { MessageBus } from '../confirmation-bus/message-bus.js'; import { SchedulerStateManager } from './state-manager.js'; import { resolveConfirmation } from './confirmation.js'; import { checkPolicy, updatePolicy, getPolicyDenialError } from './policy.js'; +import { evaluateBeforeToolHook } from './hook-utils.js'; import { ToolExecutor } from './tool-executor.js'; import { ToolModificationHandler } from './tool-modifier.js'; import { @@ -192,7 +193,10 @@ export class Scheduler { signal: AbortSignal, ): Promise { return runInDevTraceSpan( - { operation: GeminiCliOperation.ScheduleToolCalls }, + { + operation: GeminiCliOperation.ScheduleToolCalls, + logPrompts: this.context.config.getTelemetryLogPromptsEnabled(), + }, async ({ metadata: spanMetadata }) => { const requests = Array.isArray(request) ? request : [request]; @@ -572,12 +576,46 @@ export class Scheduler { ): Promise { const callId = toolCall.request.callId; - // Policy & Security - const { decision, rule } = await checkPolicy( + // 1. Hook Check (BeforeTool) + const hookResult = await evaluateBeforeToolHook( + this.config, + toolCall.tool, + toolCall.request, + toolCall.invocation, + ); + + if (hookResult.status === 'error') { + this.state.updateStatus( + callId, + CoreToolCallStatus.Error, + createErrorResponse( + toolCall.request, + hookResult.error, + hookResult.errorType, + ), + ); + return; + } + + const { hookDecision, hookSystemMessage, modifiedArgs, newInvocation } = + hookResult; + + if (modifiedArgs && newInvocation) { + toolCall.request.args = modifiedArgs; + toolCall.request.inputModifiedByHook = true; + toolCall.invocation = newInvocation; + } + + // 2. Policy & Security + const { decision: policyDecision, rule } = await checkPolicy( toolCall, this.config, this.subagent, ); + let decision = policyDecision; + if (hookDecision === 'ask') { + decision = PolicyDecision.ASK_USER; + } if (decision === PolicyDecision.DENY) { const { errorMessage, errorType } = getPolicyDenialError( @@ -610,6 +648,8 @@ export class Scheduler { getPreferredEditor: this.getPreferredEditor, schedulerId: this.schedulerId, onWaitingForConfirmation: this.onWaitingForConfirmation, + systemMessage: hookSystemMessage, + forcedDecision: hookDecision === 'ask' ? 'ask_user' : undefined, }); outcome = result.outcome; lastDetails = result.lastDetails; diff --git a/packages/core/src/scheduler/scheduler_hooks.test.ts b/packages/core/src/scheduler/scheduler_hooks.test.ts new file mode 100644 index 0000000000..9f7796ffe9 --- /dev/null +++ b/packages/core/src/scheduler/scheduler_hooks.test.ts @@ -0,0 +1,306 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +import { describe, it, expect, vi } from 'vitest'; +import { Scheduler } from './scheduler.js'; +import type { ErroredToolCall } from './types.js'; +import { CoreToolCallStatus } from './types.js'; +import type { Config, ToolRegistry, AgentLoopContext } from '../index.js'; +import { + ApprovalMode, + DEFAULT_TRUNCATE_TOOL_OUTPUT_THRESHOLD, +} from '../index.js'; +import { createMockMessageBus } from '../test-utils/mock-message-bus.js'; +import { MockTool } from '../test-utils/mock-tool.js'; +import { DEFAULT_GEMINI_MODEL } from '../config/models.js'; +import type { PolicyEngine } from '../policy/policy-engine.js'; +import { HookSystem } from '../hooks/hookSystem.js'; +import { HookType, HookEventName } from '../hooks/types.js'; + +function createMockConfig(overrides: Partial = {}): Config { + const defaultToolRegistry = { + getTool: () => undefined, + getToolByName: () => undefined, + getFunctionDeclarations: () => [], + tools: new Map(), + discovery: {}, + registerTool: () => {}, + getToolByDisplayName: () => undefined, + getTools: () => [], + discoverTools: async () => {}, + getAllTools: () => [], + getToolsByServer: () => [], + getExperiments: () => {}, + } as unknown as ToolRegistry; + + const baseConfig = { + getSessionId: () => 'test-session-id', + getUsageStatisticsEnabled: () => true, + getDebugMode: () => false, + isInteractive: () => true, + getApprovalMode: () => ApprovalMode.DEFAULT, + setApprovalMode: () => {}, + getAllowedTools: () => [], + getContentGeneratorConfig: () => ({ + model: 'test-model', + authType: 'oauth-personal', + }), + getShellExecutionConfig: () => ({ + terminalWidth: 90, + terminalHeight: 30, + sanitizationConfig: { + enableEnvironmentVariableRedaction: true, + allowedEnvironmentVariables: [], + blockedEnvironmentVariables: [], + }, + }), + storage: { + getProjectTempDir: () => '/tmp', + }, + getTruncateToolOutputThreshold: () => + DEFAULT_TRUNCATE_TOOL_OUTPUT_THRESHOLD, + getTruncateToolOutputLines: () => 1000, + getToolRegistry: () => defaultToolRegistry, + getWorkingDir: () => '/mock/dir', + getActiveModel: () => DEFAULT_GEMINI_MODEL, + getGeminiClient: () => null, + getMessageBus: () => createMockMessageBus(), + getEnableHooks: () => true, + getExperiments: () => {}, + getTelemetryLogPromptsEnabled: () => false, + getPolicyEngine: () => + ({ + check: async () => ({ decision: 'allow' }), + }) as unknown as PolicyEngine, + } as unknown as Config; + + const mockConfig = Object.assign({}, baseConfig, overrides) as Config; + + (mockConfig as { config?: Config }).config = mockConfig; + + return mockConfig; +} + +describe('Scheduler Hooks', () => { + it('should stop execution if BeforeTool hook requests stop', async () => { + const executeFn = vi.fn().mockResolvedValue({ + llmContent: 'Tool executed', + returnDisplay: 'Tool executed', + }); + const mockTool = new MockTool({ name: 'mockTool', execute: executeFn }); + + const toolRegistry = { + getTool: () => mockTool, + getAllToolNames: () => ['mockTool'], + } as unknown as ToolRegistry; + + const mockMessageBus = createMockMessageBus(); + + const mockConfig = createMockConfig({ + getToolRegistry: () => toolRegistry, + getMessageBus: () => mockMessageBus, + getApprovalMode: () => ApprovalMode.YOLO, + }); + + const hookSystem = new HookSystem(mockConfig); + + (mockConfig as { getHookSystem?: () => HookSystem }).getHookSystem = () => + hookSystem; + + // Register a programmatic runtime hook + hookSystem.registerHook( + { + type: HookType.Runtime, + name: 'test-stop-hook', + action: async () => ({ + continue: false, + stopReason: 'Hook stopped execution', + }), + }, + HookEventName.BeforeTool, + ); + + const scheduler = new Scheduler({ + context: { + config: mockConfig, + messageBus: mockMessageBus, + toolRegistry, + } as unknown as AgentLoopContext, + getPreferredEditor: () => 'vscode', + schedulerId: 'test-scheduler', + }); + + const request = { + callId: '1', + name: 'mockTool', + args: {}, + isClientInitiated: false, + prompt_id: 'prompt-1', + }; + + const results = await scheduler.schedule( + [request], + new AbortController().signal, + ); + + expect(results.length).toBe(1); + const result = results[0]; + expect(result.status).toBe(CoreToolCallStatus.Error); + const erroredCall = result as ErroredToolCall; + + expect(erroredCall.response.error?.message).toContain( + 'Agent execution stopped by hook: Hook stopped execution', + ); + expect(executeFn).not.toHaveBeenCalled(); + }); + + it('should block tool execution if BeforeTool hook requests block', async () => { + const executeFn = vi.fn(); + const mockTool = new MockTool({ name: 'mockTool', execute: executeFn }); + + const toolRegistry = { + getTool: () => mockTool, + getAllToolNames: () => ['mockTool'], + } as unknown as ToolRegistry; + + const mockMessageBus = createMockMessageBus(); + + const mockConfig = createMockConfig({ + getToolRegistry: () => toolRegistry, + getMessageBus: () => mockMessageBus, + getApprovalMode: () => ApprovalMode.YOLO, + }); + + const hookSystem = new HookSystem(mockConfig); + + (mockConfig as { getHookSystem?: () => HookSystem }).getHookSystem = () => + hookSystem; + + hookSystem.registerHook( + { + type: HookType.Runtime, + name: 'test-block-hook', + action: async () => ({ + decision: 'block', + reason: 'Hook blocked execution', + }), + }, + HookEventName.BeforeTool, + ); + + const scheduler = new Scheduler({ + context: { + config: mockConfig, + messageBus: mockMessageBus, + toolRegistry, + } as unknown as AgentLoopContext, + getPreferredEditor: () => 'vscode', + schedulerId: 'test-scheduler', + }); + + const request = { + callId: '1', + name: 'mockTool', + args: {}, + isClientInitiated: false, + prompt_id: 'prompt-1', + }; + + const results = await scheduler.schedule( + [request], + new AbortController().signal, + ); + + expect(results.length).toBe(1); + const result = results[0]; + expect(result.status).toBe(CoreToolCallStatus.Error); + const erroredCall = result as ErroredToolCall; + + expect(erroredCall.response.error?.message).toContain( + 'Tool execution blocked: Hook blocked execution', + ); + expect(executeFn).not.toHaveBeenCalled(); + }); + + it('should update tool input if BeforeTool hook provides modified input', async () => { + const executeFn = vi.fn().mockResolvedValue({ + llmContent: 'Tool executed', + returnDisplay: 'Tool executed', + }); + const mockTool = new MockTool({ name: 'mockTool', execute: executeFn }); + + const toolRegistry = { + getTool: () => mockTool, + getAllToolNames: () => ['mockTool'], + } as unknown as ToolRegistry; + + const mockMessageBus = createMockMessageBus(); + + const mockConfig = createMockConfig({ + getToolRegistry: () => toolRegistry, + getMessageBus: () => mockMessageBus, + getApprovalMode: () => ApprovalMode.YOLO, + }); + + const hookSystem = new HookSystem(mockConfig); + + (mockConfig as { getHookSystem?: () => HookSystem }).getHookSystem = () => + hookSystem; + + hookSystem.registerHook( + { + type: HookType.Runtime, + name: 'test-modify-input-hook', + action: async () => ({ + continue: true, + hookSpecificOutput: { + hookEventName: 'BeforeTool', + tool_input: { newParam: 'modifiedValue' }, + }, + }), + }, + HookEventName.BeforeTool, + ); + + const scheduler = new Scheduler({ + context: { + config: mockConfig, + messageBus: mockMessageBus, + toolRegistry, + } as unknown as AgentLoopContext, + getPreferredEditor: () => 'vscode', + schedulerId: 'test-scheduler', + }); + + const request = { + callId: '1', + name: 'mockTool', + args: { originalParam: 'originalValue' }, + isClientInitiated: false, + prompt_id: 'prompt-1', + }; + + const results = await scheduler.schedule( + [request], + new AbortController().signal, + ); + + expect(results.length).toBe(1); + const result = results[0]; + expect(result.status).toBe(CoreToolCallStatus.Success); + + expect(executeFn).toHaveBeenCalledWith( + { newParam: 'modifiedValue' }, + expect.anything(), + undefined, + expect.anything(), + ); + + expect(result.request.args).toEqual({ + newParam: 'modifiedValue', + }); + }); +}); diff --git a/packages/core/src/scheduler/scheduler_parallel.test.ts b/packages/core/src/scheduler/scheduler_parallel.test.ts index 06b5e169df..ec187452f0 100644 --- a/packages/core/src/scheduler/scheduler_parallel.test.ts +++ b/packages/core/src/scheduler/scheduler_parallel.test.ts @@ -25,7 +25,6 @@ const runInDevTraceSpan = vi.hoisted(() => const metadata = { name: '', attributes: opts.attributes || {} }; return fn({ metadata, - endSpan: vi.fn(), }); }), ); @@ -212,10 +211,13 @@ describe('Scheduler Parallel Execution', () => { mockConfig = { getPolicyEngine: vi.fn().mockReturnValue(mockPolicyEngine), toolRegistry: mockToolRegistry, + getToolRegistry: vi.fn().mockReturnValue(mockToolRegistry), + getHookSystem: vi.fn().mockReturnValue(undefined), isInteractive: vi.fn().mockReturnValue(true), getEnableHooks: vi.fn().mockReturnValue(true), setApprovalMode: vi.fn(), getApprovalMode: vi.fn().mockReturnValue(ApprovalMode.DEFAULT), + getTelemetryLogPromptsEnabled: vi.fn().mockReturnValue(false), } as unknown as Mocked; (mockConfig as unknown as { config: Config }).config = mockConfig as Config; @@ -376,7 +378,7 @@ describe('Scheduler Parallel Execution', () => { const spanArgs = vi.mocked(runInDevTraceSpan).mock.calls[0]; const fn = spanArgs[1]; const metadata = { name: '', attributes: {} }; - await fn({ metadata, endSpan: vi.fn() }); + await fn({ metadata }); expect(metadata).toMatchObject({ input: [req1, req2, req3], }); diff --git a/packages/core/src/scheduler/state-manager.test.ts b/packages/core/src/scheduler/state-manager.test.ts index dd5071c5bf..ff69e0d207 100644 --- a/packages/core/src/scheduler/state-manager.test.ts +++ b/packages/core/src/scheduler/state-manager.test.ts @@ -22,6 +22,7 @@ import { ToolConfirmationOutcome, type AnyDeclarativeTool, type AnyToolInvocation, + type FileDiff, } from '../tools/tools.js'; import { MessageBusType } from '../confirmation-bus/types.js'; import type { MessageBus } from '../confirmation-bus/message-bus.js'; @@ -359,7 +360,7 @@ describe('SchedulerStateManager', () => { expect(active.confirmationDetails).toEqual(details); }); - it('should preserve diff when cancelling an edit tool call', () => { + it('should preserve diff and derive stats when cancelling an edit tool call', () => { const call = createValidatingCall(); stateManager.enqueue([call]); stateManager.dequeue(); @@ -369,9 +370,9 @@ describe('SchedulerStateManager', () => { title: 'Edit', fileName: 'test.txt', filePath: '/path/to/test.txt', - fileDiff: 'diff', - originalContent: 'old', - newContent: 'new', + fileDiff: '@@ -1,1 +1,1 @@\n-old line\n+new line', + originalContent: 'old line', + newContent: 'new line', onConfirm: vi.fn(), }; @@ -389,13 +390,14 @@ describe('SchedulerStateManager', () => { const completed = stateManager.completedBatch[0] as CancelledToolCall; expect(completed.status).toBe(CoreToolCallStatus.Cancelled); - expect(completed.response.resultDisplay).toEqual({ - fileDiff: 'diff', - fileName: 'test.txt', - filePath: '/path/to/test.txt', - originalContent: 'old', - newContent: 'new', - }); + const result = completed.response.resultDisplay as FileDiff; + expect(result.fileDiff).toBe(details.fileDiff); + expect(result.diffStat).toEqual( + expect.objectContaining({ + model_added_lines: 1, + model_removed_lines: 1, + }), + ); }); it('should ignore status updates for non-existent callIds', () => { diff --git a/packages/core/src/scheduler/state-manager.ts b/packages/core/src/scheduler/state-manager.ts index 428b7f87a8..093aaa7308 100644 --- a/packages/core/src/scheduler/state-manager.ts +++ b/packages/core/src/scheduler/state-manager.ts @@ -32,6 +32,7 @@ import { type SerializableConfirmationDetails, } from '../confirmation-bus/types.js'; import { isToolCallResponseInfo } from '../utils/tool-utils.js'; +import { getDiffStatFromPatch } from '../tools/diffOptions.js'; /** * Handler for terminal tool calls. @@ -473,6 +474,8 @@ export class SchedulerStateManager { filePath: details.filePath, originalContent: details.originalContent, newContent: details.newContent, + // Derive stats from the patch if they aren't already present + diffStat: details.diffStat ?? getDiffStatFromPatch(details.fileDiff), }; } } diff --git a/packages/core/src/scheduler/tool-executor.test.ts b/packages/core/src/scheduler/tool-executor.test.ts index ff9edd83f3..6abd5c7476 100644 --- a/packages/core/src/scheduler/tool-executor.test.ts +++ b/packages/core/src/scheduler/tool-executor.test.ts @@ -44,7 +44,6 @@ const runInDevTraceSpan = vi.hoisted(() => const metadata = { attributes: opts.attributes || {} }; return fn({ metadata, - endSpan: vi.fn(), }); }), ); @@ -142,7 +141,7 @@ describe('ToolExecutor', () => { const spanArgs = vi.mocked(runInDevTraceSpan).mock.calls[0]; const fn = spanArgs[1]; const metadata = { attributes: {} }; - await fn({ metadata, endSpan: vi.fn() }); + await fn({ metadata }); expect(metadata).toMatchObject({ input: scheduledCall.request, output: { @@ -205,7 +204,7 @@ describe('ToolExecutor', () => { const spanArgs = vi.mocked(runInDevTraceSpan).mock.calls[0]; const fn = spanArgs[1]; const metadata = { attributes: {} }; - await fn({ metadata, endSpan: vi.fn() }); + await fn({ metadata }); expect(metadata).toMatchObject({ error: new Error('Tool Failed'), }); diff --git a/packages/core/src/scheduler/tool-executor.ts b/packages/core/src/scheduler/tool-executor.ts index 81232d39d9..f13f8a8657 100644 --- a/packages/core/src/scheduler/tool-executor.ts +++ b/packages/core/src/scheduler/tool-executor.ts @@ -82,6 +82,7 @@ export class ToolExecutor { return runInDevTraceSpan( { operation: GeminiCliOperation.ToolCall, + logPrompts: this.config.getTelemetryLogPromptsEnabled(), attributes: { [GEN_AI_TOOL_NAME]: toolName, [GEN_AI_TOOL_CALL_ID]: callId, @@ -115,10 +116,25 @@ export class ToolExecutor { { shellExecutionConfig, setExecutionIdCallback }, this.config, request.originalRequestName, + true, // skipBeforeHook ); const toolResult: ToolResult = await promise; + if (call.request.inputModifiedByHook) { + const modificationMsg = `\n\n[System] Tool input parameters were modified by a hook before execution.`; + if (typeof toolResult.llmContent === 'string') { + toolResult.llmContent += modificationMsg; + } else if (Array.isArray(toolResult.llmContent)) { + toolResult.llmContent.push({ text: modificationMsg }); + } else if (toolResult.llmContent) { + toolResult.llmContent = [ + toolResult.llmContent, + { text: modificationMsg }, + ]; + } + } + if (signal.aborted) { completedToolCall = await this.createCancelledResult( call, diff --git a/packages/core/src/scheduler/types.ts b/packages/core/src/scheduler/types.ts index 9fedd48f41..a9cde87d27 100644 --- a/packages/core/src/scheduler/types.ts +++ b/packages/core/src/scheduler/types.ts @@ -47,6 +47,8 @@ export interface ToolCallRequestInfo { traceId?: string; parentCallId?: string; schedulerId?: string; + inputModifiedByHook?: boolean; + forcedAsk?: boolean; } export interface ToolCallResponseInfo { diff --git a/packages/core/src/services/FolderTrustDiscoveryService.ts b/packages/core/src/services/FolderTrustDiscoveryService.ts index 499077d33f..6e8b7b1c32 100644 --- a/packages/core/src/services/FolderTrustDiscoveryService.ts +++ b/packages/core/src/services/FolderTrustDiscoveryService.ts @@ -163,6 +163,7 @@ export class FolderTrustDiscoveryService { for (const event of Object.values(hooksConfig)) { if (!Array.isArray(event)) continue; for (const hook of event) { + // eslint-disable-next-line no-restricted-syntax if (this.isRecord(hook) && typeof hook['command'] === 'string') { hooks.add(hook['command']); } diff --git a/packages/core/src/services/chatCompressionService.ts b/packages/core/src/services/chatCompressionService.ts index a1f9c12f2c..4640860e48 100644 --- a/packages/core/src/services/chatCompressionService.ts +++ b/packages/core/src/services/chatCompressionService.ts @@ -196,6 +196,7 @@ async function truncateHistoryToBudget( newParts.unshift({ functionResponse: { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...part.functionResponse, response: { output: truncatedMessage }, }, diff --git a/packages/core/src/services/chatRecordingService.ts b/packages/core/src/services/chatRecordingService.ts index 2591d90bb4..a161b7da80 100644 --- a/packages/core/src/services/chatRecordingService.ts +++ b/packages/core/src/services/chatRecordingService.ts @@ -4,7 +4,7 @@ * SPDX-License-Identifier: Apache-2.0 */ -import { type Status } from '../core/coreToolScheduler.js'; +import { type Status } from '../scheduler/types.js'; import { type ThoughtSummary } from '../utils/thoughtUtils.js'; import { getProjectHash } from '../utils/paths.js'; import { sanitizeFilenamePart } from '../utils/fileUtils.js'; diff --git a/packages/core/src/services/sandboxManager.test.ts b/packages/core/src/services/sandboxManager.test.ts index d201314d9f..50760ccf1c 100644 --- a/packages/core/src/services/sandboxManager.test.ts +++ b/packages/core/src/services/sandboxManager.test.ts @@ -6,12 +6,30 @@ import os from 'node:os'; import { describe, expect, it, vi } from 'vitest'; -import { NoopSandboxManager } from './sandboxManager.js'; +import { NoopSandboxManager, sanitizePaths } from './sandboxManager.js'; import { createSandboxManager } from './sandboxManagerFactory.js'; import { LinuxSandboxManager } from '../sandbox/linux/LinuxSandboxManager.js'; import { MacOsSandboxManager } from '../sandbox/macos/MacOsSandboxManager.js'; import { WindowsSandboxManager } from './windowsSandboxManager.js'; +describe('sanitizePaths', () => { + it('should return undefined if no paths are provided', () => { + expect(sanitizePaths(undefined)).toBeUndefined(); + }); + + it('should deduplicate paths and return them', () => { + const paths = ['/workspace/foo', '/workspace/bar', '/workspace/foo']; + expect(sanitizePaths(paths)).toEqual(['/workspace/foo', '/workspace/bar']); + }); + + it('should throw an error if a path is not absolute', () => { + const paths = ['/workspace/foo', 'relative/path']; + expect(() => sanitizePaths(paths)).toThrow( + 'Sandbox path must be absolute: relative/path', + ); + }); +}); + describe('NoopSandboxManager', () => { const sandboxManager = new NoopSandboxManager(); @@ -58,7 +76,7 @@ describe('NoopSandboxManager', () => { env: { API_KEY: 'sensitive-key', }, - config: { + policy: { sanitizationConfig: { enableEnvironmentVariableRedaction: false, }, @@ -80,7 +98,7 @@ describe('NoopSandboxManager', () => { MY_SAFE_VAR: 'safe-value', MY_TOKEN: 'secret-token', }, - config: { + policy: { sanitizationConfig: { allowedEnvironmentVariables: ['MY_SAFE_VAR', 'MY_TOKEN'], }, @@ -103,7 +121,7 @@ describe('NoopSandboxManager', () => { SAFE_VAR: 'safe-value', BLOCKED_VAR: 'blocked-value', }, - config: { + policy: { sanitizationConfig: { blockedEnvironmentVariables: ['BLOCKED_VAR'], }, diff --git a/packages/core/src/services/sandboxManager.ts b/packages/core/src/services/sandboxManager.ts index 8642edff11..0108c8f172 100644 --- a/packages/core/src/services/sandboxManager.ts +++ b/packages/core/src/services/sandboxManager.ts @@ -4,11 +4,37 @@ * SPDX-License-Identifier: Apache-2.0 */ +import os from 'node:os'; +import path from 'node:path'; import { sanitizeEnvironment, getSecureSanitizationConfig, type EnvironmentSanitizationConfig, } from './environmentSanitization.js'; +/** + * Security boundaries and permissions applied to a specific sandboxed execution. + */ +export interface ExecutionPolicy { + /** Additional absolute paths to grant full read/write access to. */ + allowedPaths?: string[]; + /** Absolute paths to explicitly deny read/write access to (overrides allowlists). */ + forbiddenPaths?: string[]; + /** Whether network access is allowed. */ + networkAccess?: boolean; + /** Rules for scrubbing sensitive environment variables. */ + sanitizationConfig?: Partial; +} + +/** + * Global configuration options used to initialize a SandboxManager. + */ +export interface GlobalSandboxOptions { + /** + * The primary workspace path the sandbox is anchored to. + * This directory is granted full read and write access. + */ + workspace: string; +} /** * Request for preparing a command to run in a sandbox. @@ -22,12 +48,8 @@ export interface SandboxRequest { cwd: string; /** Environment variables to be passed to the program. */ env: NodeJS.ProcessEnv; - /** Optional sandbox-specific configuration. */ - config?: { - sanitizationConfig?: Partial; - allowedPaths?: string[]; - networkAccess?: boolean; - }; + /** Policy to use for this request. */ + policy?: ExecutionPolicy; } /** @@ -65,7 +87,7 @@ export class NoopSandboxManager implements SandboxManager { */ async prepareCommand(req: SandboxRequest): Promise { const sanitizationConfig = getSecureSanitizationConfig( - req.config?.sanitizationConfig, + req.policy?.sanitizationConfig, ); const sanitizedEnv = sanitizeEnvironment(req.env, sanitizationConfig); @@ -87,4 +109,35 @@ export class LocalSandboxManager implements SandboxManager { } } +/** + * Sanitizes an array of paths by deduplicating them and ensuring they are absolute. + */ +export function sanitizePaths(paths?: string[]): string[] | undefined { + if (!paths) return undefined; + + // We use a Map to deduplicate paths based on their normalized, + // platform-specific identity e.g. handling case-insensitivity on Windows) + // while preserving the original string casing. + const uniquePathsMap = new Map(); + for (const p of paths) { + if (!path.isAbsolute(p)) { + throw new Error(`Sandbox path must be absolute: ${p}`); + } + + // Normalize the path (resolves slashes and redundant components) + let key = path.normalize(p); + + // Windows file systems are case-insensitive, so we lowercase the key for + // deduplication + if (os.platform() === 'win32') { + key = key.toLowerCase(); + } + + if (!uniquePathsMap.has(key)) { + uniquePathsMap.set(key, p); + } + } + + return Array.from(uniquePathsMap.values()); +} export { createSandboxManager } from './sandboxManagerFactory.js'; diff --git a/packages/core/src/services/sandboxManagerFactory.ts b/packages/core/src/services/sandboxManagerFactory.ts index fffc366da9..410f5e07dc 100644 --- a/packages/core/src/services/sandboxManagerFactory.ts +++ b/packages/core/src/services/sandboxManagerFactory.ts @@ -28,7 +28,7 @@ export function createSandboxManager( isWindows && (sandbox?.enabled || sandbox?.command === 'windows-native') ) { - return new WindowsSandboxManager(); + return new WindowsSandboxManager({ workspace }); } if (sandbox?.enabled) { diff --git a/packages/core/src/services/shellExecutionService.ts b/packages/core/src/services/shellExecutionService.ts index e96cf7e037..98396fa4ee 100644 --- a/packages/core/src/services/shellExecutionService.ts +++ b/packages/core/src/services/shellExecutionService.ts @@ -437,7 +437,7 @@ export class ShellExecutionService { args: spawnArgs, env: baseEnv, cwd, - config: { + policy: { ...shellExecutionConfig, ...(shellExecutionConfig.sandboxConfig || {}), sanitizationConfig, diff --git a/packages/core/src/services/toolOutputMaskingService.ts b/packages/core/src/services/toolOutputMaskingService.ts index 9d5a3fb2c2..4151ec46d5 100644 --- a/packages/core/src/services/toolOutputMaskingService.ts +++ b/packages/core/src/services/toolOutputMaskingService.ts @@ -226,6 +226,7 @@ export class ToolOutputMaskingService { const maskedPart = { ...part, functionResponse: { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...part.functionResponse, response: { output: maskedSnippet }, }, diff --git a/packages/core/src/services/windowsSandboxManager.test.ts b/packages/core/src/services/windowsSandboxManager.test.ts index 6bec183410..966deefe6b 100644 --- a/packages/core/src/services/windowsSandboxManager.test.ts +++ b/packages/core/src/services/windowsSandboxManager.test.ts @@ -4,12 +4,28 @@ * SPDX-License-Identifier: Apache-2.0 */ -import { describe, it, expect } from 'vitest'; +import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'; +import os from 'node:os'; +import path from 'node:path'; import { WindowsSandboxManager } from './windowsSandboxManager.js'; import type { SandboxRequest } from './sandboxManager.js'; +import { spawnAsync } from '../utils/shell-utils.js'; + +vi.mock('../utils/shell-utils.js', () => ({ + spawnAsync: vi.fn(), +})); describe('WindowsSandboxManager', () => { - const manager = new WindowsSandboxManager('win32'); + let manager: WindowsSandboxManager; + + beforeEach(() => { + vi.spyOn(os, 'platform').mockReturnValue('win32'); + manager = new WindowsSandboxManager({ workspace: '/test/workspace' }); + }); + + afterEach(() => { + vi.restoreAllMocks(); + }); it('should prepare a GeminiSandbox.exe command', async () => { const req: SandboxRequest = { @@ -17,7 +33,7 @@ describe('WindowsSandboxManager', () => { args: ['/groups'], cwd: '/test/cwd', env: { TEST_VAR: 'test_value' }, - config: { + policy: { networkAccess: false, }, }; @@ -34,7 +50,7 @@ describe('WindowsSandboxManager', () => { args: [], cwd: '/test/cwd', env: {}, - config: { + policy: { networkAccess: true, }, }; @@ -52,7 +68,7 @@ describe('WindowsSandboxManager', () => { API_KEY: 'secret', PATH: '/usr/bin', }, - config: { + policy: { sanitizationConfig: { allowedEnvironmentVariables: ['PATH'], blockedEnvironmentVariables: ['API_KEY'], @@ -65,4 +81,30 @@ describe('WindowsSandboxManager', () => { expect(result.env['PATH']).toBe('/usr/bin'); expect(result.env['API_KEY']).toBeUndefined(); }); + + it('should grant Low Integrity access to the workspace and allowed paths', async () => { + const req: SandboxRequest = { + command: 'test', + args: [], + cwd: '/test/cwd', + env: {}, + policy: { + allowedPaths: ['/test/allowed1'], + }, + }; + + await manager.prepareCommand(req); + + expect(spawnAsync).toHaveBeenCalledWith('icacls', [ + path.resolve('/test/workspace'), + '/setintegritylevel', + 'Low', + ]); + + expect(spawnAsync).toHaveBeenCalledWith('icacls', [ + path.resolve('/test/allowed1'), + '/setintegritylevel', + 'Low', + ]); + }); }); diff --git a/packages/core/src/services/windowsSandboxManager.ts b/packages/core/src/services/windowsSandboxManager.ts index dc39b9ee67..347cb19395 100644 --- a/packages/core/src/services/windowsSandboxManager.ts +++ b/packages/core/src/services/windowsSandboxManager.ts @@ -6,15 +6,18 @@ import fs from 'node:fs'; import path from 'node:path'; +import os from 'node:os'; import { fileURLToPath } from 'node:url'; -import type { - SandboxManager, - SandboxRequest, - SandboxedCommand, +import { + type SandboxManager, + type SandboxRequest, + type SandboxedCommand, + type GlobalSandboxOptions, + sanitizePaths, } from './sandboxManager.js'; import { sanitizeEnvironment, - type EnvironmentSanitizationConfig, + getSecureSanitizationConfig, } from './environmentSanitization.js'; import { debugLogger } from '../utils/debugLogger.js'; import { spawnAsync } from '../utils/shell-utils.js'; @@ -29,18 +32,16 @@ const __dirname = path.dirname(__filename); */ export class WindowsSandboxManager implements SandboxManager { private readonly helperPath: string; - private readonly platform: string; private initialized = false; private readonly lowIntegrityCache = new Set(); - constructor(platform: string = process.platform) { - this.platform = platform; + constructor(private readonly options: GlobalSandboxOptions) { this.helperPath = path.resolve(__dirname, 'scripts', 'GeminiSandbox.exe'); } private async ensureInitialized(): Promise { if (this.initialized) return; - if (this.platform !== 'win32') { + if (os.platform() !== 'win32') { this.initialized = true; return; } @@ -145,36 +146,31 @@ export class WindowsSandboxManager implements SandboxManager { async prepareCommand(req: SandboxRequest): Promise { await this.ensureInitialized(); - const sanitizationConfig: EnvironmentSanitizationConfig = { - allowedEnvironmentVariables: - req.config?.sanitizationConfig?.allowedEnvironmentVariables ?? [], - blockedEnvironmentVariables: - req.config?.sanitizationConfig?.blockedEnvironmentVariables ?? [], - enableEnvironmentVariableRedaction: - req.config?.sanitizationConfig?.enableEnvironmentVariableRedaction ?? - true, - }; + const sanitizationConfig = getSecureSanitizationConfig( + req.policy?.sanitizationConfig, + ); const sanitizedEnv = sanitizeEnvironment(req.env, sanitizationConfig); // 1. Handle filesystem permissions for Low Integrity - // Grant "Low Mandatory Level" write access to the CWD. - await this.grantLowIntegrityAccess(req.cwd); + // Grant "Low Mandatory Level" write access to the workspace. + await this.grantLowIntegrityAccess(this.options.workspace); // Grant "Low Mandatory Level" read access to allowedPaths. - if (req.config?.allowedPaths) { - for (const allowedPath of req.config.allowedPaths) { - await this.grantLowIntegrityAccess(allowedPath); - } + const allowedPaths = sanitizePaths(req.policy?.allowedPaths) || []; + for (const allowedPath of allowedPaths) { + await this.grantLowIntegrityAccess(allowedPath); } + // TODO: handle forbidden paths + // 2. Construct the helper command // GeminiSandbox.exe [args...] const program = this.helperPath; // If the command starts with __, it's an internal command for the sandbox helper itself. const args = [ - req.config?.networkAccess ? '1' : '0', + req.policy?.networkAccess ? '1' : '0', req.cwd, req.command, ...req.args, @@ -191,7 +187,7 @@ export class WindowsSandboxManager implements SandboxManager { * Grants "Low Mandatory Level" access to a path using icacls. */ private async grantLowIntegrityAccess(targetPath: string): Promise { - if (this.platform !== 'win32') { + if (os.platform() !== 'win32') { return; } diff --git a/packages/core/src/telemetry/clearcut-logger/clearcut-logger.test.ts b/packages/core/src/telemetry/clearcut-logger/clearcut-logger.test.ts index dd641e3955..69ac326d7f 100644 --- a/packages/core/src/telemetry/clearcut-logger/clearcut-logger.test.ts +++ b/packages/core/src/telemetry/clearcut-logger/clearcut-logger.test.ts @@ -25,7 +25,7 @@ import { AuthType, type ContentGeneratorConfig, } from '../../core/contentGenerator.js'; -import type { SuccessfulToolCall } from '../../core/coreToolScheduler.js'; +import type { SuccessfulToolCall } from '../../scheduler/types.js'; import type { ConfigParameters } from '../../config/config.js'; import { EventMetadataKey } from './event-metadata-key.js'; import { makeFakeConfig } from '../../test-utils/config.js'; @@ -41,6 +41,8 @@ import { AgentFinishEvent, WebFetchFallbackAttemptEvent, HookCallEvent, + OnboardingStartEvent, + OnboardingSuccessEvent, } from '../types.js'; import { HookType } from '../../hooks/types.js'; import { AgentTerminateMode } from '../../agents/types.js'; @@ -1652,4 +1654,38 @@ describe('ClearcutLogger', () => { ]); }); }); + + describe('logOnboardingStartEvent', () => { + it('logs an event with proper name and start key', () => { + const { logger } = setup(); + const event = new OnboardingStartEvent(); + + logger?.logOnboardingStartEvent(event); + + const events = getEvents(logger!); + expect(events.length).toBe(1); + expect(events[0]).toHaveEventName(EventNames.ONBOARDING_START); + expect(events[0]).toHaveMetadataValue([ + EventMetadataKey.GEMINI_CLI_ONBOARDING_START, + 'true', + ]); + }); + }); + + describe('logOnboardingSuccessEvent', () => { + it('logs an event with proper name and user tier', () => { + const { logger } = setup(); + const event = new OnboardingSuccessEvent('standard-tier'); + + logger?.logOnboardingSuccessEvent(event); + + const events = getEvents(logger!); + expect(events.length).toBe(1); + expect(events[0]).toHaveEventName(EventNames.ONBOARDING_SUCCESS); + expect(events[0]).toHaveMetadataValue([ + EventMetadataKey.GEMINI_CLI_ONBOARDING_USER_TIER, + 'standard-tier', + ]); + }); + }); }); diff --git a/packages/core/src/telemetry/clearcut-logger/clearcut-logger.ts b/packages/core/src/telemetry/clearcut-logger/clearcut-logger.ts index 11433db3e8..4791d6d1c2 100644 --- a/packages/core/src/telemetry/clearcut-logger/clearcut-logger.ts +++ b/packages/core/src/telemetry/clearcut-logger/clearcut-logger.ts @@ -51,6 +51,8 @@ import type { KeychainAvailabilityEvent, TokenStorageInitializationEvent, StartupStatsEvent, + OnboardingStartEvent, + OnboardingSuccessEvent, } from '../types.js'; import type { CreditsUsedEvent, @@ -124,6 +126,8 @@ export enum EventNames { TOOL_OUTPUT_MASKING = 'tool_output_masking', KEYCHAIN_AVAILABILITY = 'keychain_availability', TOKEN_STORAGE_INITIALIZATION = 'token_storage_initialization', + ONBOARDING_START = 'onboarding_start', + ONBOARDING_SUCCESS = 'onboarding_success', CONSECA_POLICY_GENERATION = 'conseca_policy_generation', CONSECA_VERDICT = 'conseca_verdict', STARTUP_STATS = 'startup_stats', @@ -1796,6 +1800,33 @@ export class ClearcutLogger { this.flushIfNeeded(); } + logOnboardingStartEvent(_event: OnboardingStartEvent): void { + const data: EventValue[] = [ + { + gemini_cli_key: EventMetadataKey.GEMINI_CLI_ONBOARDING_START, + value: 'true', + }, + ]; + this.enqueueLogEvent( + this.createLogEvent(EventNames.ONBOARDING_START, data), + ); + this.flushIfNeeded(); + } + + logOnboardingSuccessEvent(event: OnboardingSuccessEvent): void { + const data: EventValue[] = []; + if (event.userTier) { + data.push({ + gemini_cli_key: EventMetadataKey.GEMINI_CLI_ONBOARDING_USER_TIER, + value: event.userTier, + }); + } + this.enqueueLogEvent( + this.createLogEvent(EventNames.ONBOARDING_SUCCESS, data), + ); + this.flushIfNeeded(); + } + logStartupStatsEvent(event: StartupStatsEvent): void { const data: EventValue[] = [ { diff --git a/packages/core/src/telemetry/clearcut-logger/event-metadata-key.ts b/packages/core/src/telemetry/clearcut-logger/event-metadata-key.ts index b7b9c0fd3a..b124a84386 100644 --- a/packages/core/src/telemetry/clearcut-logger/event-metadata-key.ts +++ b/packages/core/src/telemetry/clearcut-logger/event-metadata-key.ts @@ -7,7 +7,7 @@ // Defines valid event metadata keys for Clearcut logging. export enum EventMetadataKey { // Deleted enums: 24 - // Next ID: 191 + // Next ID: 194 GEMINI_CLI_KEY_UNKNOWN = 0, @@ -712,4 +712,14 @@ export enum EventMetadataKey { // Logs the source of a credit purchase click (e.g. overage_menu, empty_wallet_menu, manage). GEMINI_CLI_BILLING_PURCHASE_SOURCE = 190, + + // ========================================================================== + // Gemini Enterprise (GE) Event Keys + // ========================================================================== + + // Logs the start of the onboarding process. + GEMINI_CLI_ONBOARDING_START = 192, + + // Logs the user tier for onboarding success events. + GEMINI_CLI_ONBOARDING_USER_TIER = 193, } diff --git a/packages/core/src/telemetry/conseca-logger.test.ts b/packages/core/src/telemetry/conseca-logger.test.ts index e3ce85432e..0eac29276f 100644 --- a/packages/core/src/telemetry/conseca-logger.test.ts +++ b/packages/core/src/telemetry/conseca-logger.test.ts @@ -112,7 +112,7 @@ describe('conseca-logger', () => { 'user prompt', 'policy', 'tool call', - 'ALLOW', + 'allow', 'rationale', ); @@ -122,7 +122,7 @@ describe('conseca-logger', () => { expect(logs.getLogger).toHaveBeenCalled(); expect(mockLogger.emit).toHaveBeenCalledWith( expect.objectContaining({ - body: 'Conseca Verdict: ALLOW.', + body: 'Conseca Verdict: allow.', attributes: expect.objectContaining({ 'event.name': EVENT_CONSECA_VERDICT, }), diff --git a/packages/core/src/telemetry/index.ts b/packages/core/src/telemetry/index.ts index 0d264695d8..ea65941e06 100644 --- a/packages/core/src/telemetry/index.ts +++ b/packages/core/src/telemetry/index.ts @@ -48,6 +48,8 @@ export { logWebFetchFallbackAttempt, logNetworkRetryAttempt, logRewind, + logOnboardingStart, + logOnboardingSuccess, } from './loggers.js'; export { logConsecaPolicyGeneration, @@ -70,6 +72,8 @@ export { NetworkRetryAttemptEvent, ToolCallDecision, RewindEvent, + OnboardingStartEvent, + OnboardingSuccessEvent, ConsecaPolicyGenerationEvent, ConsecaVerdictEvent, } from './types.js'; diff --git a/packages/core/src/telemetry/loggers.test.circular.ts b/packages/core/src/telemetry/loggers.test.circular.ts index 119c661e86..e3763f9533 100644 --- a/packages/core/src/telemetry/loggers.test.circular.ts +++ b/packages/core/src/telemetry/loggers.test.circular.ts @@ -12,11 +12,11 @@ import { describe, it, expect } from 'vitest'; import { logToolCall } from './loggers.js'; import { ToolCallEvent } from './types.js'; import type { Config } from '../config/config.js'; -import type { CompletedToolCall } from '../core/coreToolScheduler.js'; import { CoreToolCallStatus, type ToolCallRequestInfo, type ToolCallResponseInfo, + type CompletedToolCall, } from '../scheduler/types.js'; import { MockTool } from '../test-utils/mock-tool.js'; diff --git a/packages/core/src/telemetry/loggers.test.ts b/packages/core/src/telemetry/loggers.test.ts index 27c23e7baa..71e2e8ea7b 100644 --- a/packages/core/src/telemetry/loggers.test.ts +++ b/packages/core/src/telemetry/loggers.test.ts @@ -48,6 +48,8 @@ import { logNetworkRetryAttempt, logExtensionUpdateEvent, logHookCall, + logOnboardingStart, + logOnboardingSuccess, } from './loggers.js'; import { ToolCallDecision } from './tool-call-decision.js'; import { @@ -72,6 +74,8 @@ import { EVENT_WEB_FETCH_FALLBACK_ATTEMPT, EVENT_INVALID_CHUNK, EVENT_NETWORK_RETRY_ATTEMPT, + EVENT_ONBOARDING_START, + EVENT_ONBOARDING_SUCCESS, ApiErrorEvent, ApiRequestEvent, ApiResponseEvent, @@ -98,6 +102,8 @@ import { EVENT_EXTENSION_UPDATE, HookCallEvent, EVENT_HOOK_CALL, + OnboardingStartEvent, + OnboardingSuccessEvent, LlmRole, } from './types.js'; import { HookType } from '../hooks/types.js'; @@ -280,6 +286,7 @@ describe('loggers', () => { it('should set worktree_active to true when worktree settings are present', async () => { const mockConfig = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...baseMockConfig, getWorktreeSettings: () => ({ name: 'test-worktree', @@ -550,6 +557,7 @@ describe('loggers', () => { ); expect(mockUiEvent.addEvent).toHaveBeenCalledWith({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...event, 'event.name': EVENT_API_RESPONSE, 'event.timestamp': '2025-01-01T00:00:00.000Z', @@ -709,6 +717,7 @@ describe('loggers', () => { ); expect(mockUiEvent.addEvent).toHaveBeenCalledWith({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...event, 'event.name': EVENT_API_ERROR, 'event.timestamp': '2025-01-01T00:00:00.000Z', @@ -1279,6 +1288,7 @@ describe('loggers', () => { ); expect(mockUiEvent.addEvent).toHaveBeenCalledWith({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...event, 'event.name': EVENT_TOOL_CALL, 'event.timestamp': '2025-01-01T00:00:00.000Z', @@ -1416,6 +1426,7 @@ describe('loggers', () => { ); expect(mockUiEvent.addEvent).toHaveBeenCalledWith({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...event, 'event.name': EVENT_TOOL_CALL, 'event.timestamp': '2025-01-01T00:00:00.000Z', @@ -1496,6 +1507,7 @@ describe('loggers', () => { ); expect(mockUiEvent.addEvent).toHaveBeenCalledWith({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...event, 'event.name': EVENT_TOOL_CALL, 'event.timestamp': '2025-01-01T00:00:00.000Z', @@ -1575,6 +1587,7 @@ describe('loggers', () => { ); expect(mockUiEvent.addEvent).toHaveBeenCalledWith({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...event, 'event.name': EVENT_TOOL_CALL, 'event.timestamp': '2025-01-01T00:00:00.000Z', @@ -1655,6 +1668,7 @@ describe('loggers', () => { ); expect(mockUiEvent.addEvent).toHaveBeenCalledWith({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...event, 'event.name': EVENT_TOOL_CALL, 'event.timestamp': '2025-01-01T00:00:00.000Z', @@ -1949,6 +1963,7 @@ describe('loggers', () => { 'session.id': 'test-session-id', 'user.email': 'test-user@example.com', 'installation.id': 'test-installation-id', + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...event, 'event.name': EVENT_MODEL_ROUTING, interactive: false, @@ -1986,6 +2001,7 @@ describe('loggers', () => { 'session.id': 'test-session-id', 'user.email': 'test-user@example.com', 'installation.id': 'test-installation-id', + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...event, 'event.name': EVENT_MODEL_ROUTING, interactive: false, @@ -2508,6 +2524,76 @@ describe('loggers', () => { }); }); + describe('logOnboardingStart', () => { + const mockConfig = makeFakeConfig(); + + beforeEach(() => { + vi.spyOn(ClearcutLogger.prototype, 'logOnboardingStartEvent'); + vi.spyOn(metrics, 'recordOnboardingStart'); + }); + + it('should log onboarding start event to Clearcut and OTEL, and record metrics', () => { + const event = new OnboardingStartEvent(); + + logOnboardingStart(mockConfig, event); + + expect( + ClearcutLogger.prototype.logOnboardingStartEvent, + ).toHaveBeenCalledWith(event); + + expect(mockLogger.emit).toHaveBeenCalledWith({ + body: 'Onboarding started.', + attributes: { + 'session.id': 'test-session-id', + 'user.email': 'test-user@example.com', + 'installation.id': 'test-installation-id', + 'event.name': EVENT_ONBOARDING_START, + 'event.timestamp': '2025-01-01T00:00:00.000Z', + interactive: false, + }, + }); + + expect(metrics.recordOnboardingStart).toHaveBeenCalledWith(mockConfig); + }); + }); + + describe('logOnboardingSuccess', () => { + const mockConfig = makeFakeConfig(); + + beforeEach(() => { + vi.spyOn(ClearcutLogger.prototype, 'logOnboardingSuccessEvent'); + vi.spyOn(metrics, 'recordOnboardingSuccess'); + }); + + it('should log onboarding success event to Clearcut and OTEL, and record metrics', () => { + const event = new OnboardingSuccessEvent('standard-tier'); + + logOnboardingSuccess(mockConfig, event); + + expect( + ClearcutLogger.prototype.logOnboardingSuccessEvent, + ).toHaveBeenCalledWith(event); + + expect(mockLogger.emit).toHaveBeenCalledWith({ + body: 'Onboarding succeeded. Tier: standard-tier', + attributes: { + 'session.id': 'test-session-id', + 'user.email': 'test-user@example.com', + 'installation.id': 'test-installation-id', + 'event.name': EVENT_ONBOARDING_SUCCESS, + 'event.timestamp': '2025-01-01T00:00:00.000Z', + interactive: false, + user_tier: 'standard-tier', + }, + }); + + expect(metrics.recordOnboardingSuccess).toHaveBeenCalledWith( + mockConfig, + 'standard-tier', + ); + }); + }); + describe('Telemetry Buffering', () => { it('should buffer events when SDK is not initialized', async () => { vi.spyOn(sdk, 'isTelemetrySdkInitialized').mockReturnValue(false); diff --git a/packages/core/src/telemetry/loggers.ts b/packages/core/src/telemetry/loggers.ts index d5cc605e65..53c7dcb894 100644 --- a/packages/core/src/telemetry/loggers.ts +++ b/packages/core/src/telemetry/loggers.ts @@ -57,6 +57,8 @@ import { type ToolOutputMaskingEvent, type KeychainAvailabilityEvent, type TokenStorageInitializationEvent, + type OnboardingStartEvent, + type OnboardingSuccessEvent, } from './types.js'; import { recordApiErrorMetrics, @@ -79,6 +81,8 @@ import { recordKeychainAvailability, recordTokenStorageInitialization, recordInvalidChunk, + recordOnboardingStart, + recordOnboardingSuccess, } from './metrics.js'; import { bufferTelemetryEvent } from './sdk.js'; import { uiTelemetryService, type UiEvent } from './uiTelemetry.js'; @@ -131,6 +135,7 @@ export function logUserPrompt(config: Config, event: UserPromptEvent): void { export function logToolCall(config: Config, event: ToolCallEvent): void { // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion const uiEvent = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...event, 'event.name': EVENT_TOOL_CALL, 'event.timestamp': new Date().toISOString(), @@ -265,6 +270,7 @@ export function logRipgrepFallback( export function logApiError(config: Config, event: ApiErrorEvent): void { // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion const uiEvent = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...event, 'event.name': EVENT_API_ERROR, 'event.timestamp': new Date().toISOString(), @@ -297,6 +303,7 @@ export function logApiError(config: Config, event: ApiErrorEvent): void { export function logApiResponse(config: Config, event: ApiResponseEvent): void { // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion const uiEvent = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...event, 'event.name': EVENT_API_RESPONSE, 'event.timestamp': new Date().toISOString(), @@ -397,6 +404,7 @@ export function logSlashCommand( export function logRewind(config: Config, event: RewindEvent): void { // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion const uiEvent = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...event, 'event.name': EVENT_REWIND, 'event.timestamp': new Date().toISOString(), @@ -871,6 +879,40 @@ export function logTokenStorageInitialization( }); } +export function logOnboardingStart( + config: Config, + event: OnboardingStartEvent, +): void { + ClearcutLogger.getInstance(config)?.logOnboardingStartEvent(event); + bufferTelemetryEvent(() => { + const logger = logs.getLogger(SERVICE_NAME); + const logRecord: LogRecord = { + body: event.toLogBody(), + attributes: event.toOpenTelemetryAttributes(config), + }; + logger.emit(logRecord); + + recordOnboardingStart(config); + }); +} + +export function logOnboardingSuccess( + config: Config, + event: OnboardingSuccessEvent, +): void { + ClearcutLogger.getInstance(config)?.logOnboardingSuccessEvent(event); + bufferTelemetryEvent(() => { + const logger = logs.getLogger(SERVICE_NAME); + const logRecord: LogRecord = { + body: event.toLogBody(), + attributes: event.toOpenTelemetryAttributes(config), + }; + logger.emit(logRecord); + + recordOnboardingSuccess(config, event.userTier); + }); +} + export function logBillingEvent( config: Config, event: BillingTelemetryEvent, diff --git a/packages/core/src/telemetry/metrics.ts b/packages/core/src/telemetry/metrics.ts index af7f54c535..16147b3d64 100644 --- a/packages/core/src/telemetry/metrics.ts +++ b/packages/core/src/telemetry/metrics.ts @@ -51,6 +51,8 @@ const KEYCHAIN_AVAILABILITY_COUNT = 'gemini_cli.keychain.availability.count'; const TOKEN_STORAGE_TYPE_COUNT = 'gemini_cli.token_storage.type.count'; const OVERAGE_OPTION_COUNT = 'gemini_cli.overage_option.count'; const CREDIT_PURCHASE_COUNT = 'gemini_cli.credit_purchase.count'; +const EVENT_ONBOARDING_START = 'gemini_cli.onboarding.start'; +const EVENT_ONBOARDING_SUCCESS = 'gemini_cli.onboarding.success'; // Agent Metrics const AGENT_RUN_COUNT = 'gemini_cli.agent.run.count'; @@ -299,6 +301,20 @@ const COUNTER_DEFINITIONS = { model: string; }, }, + [EVENT_ONBOARDING_START]: { + description: 'Counts onboarding started', + valueType: ValueType.INT, + assign: (c: Counter) => (onboardingStartCounter = c), + attributes: {} as Record, + }, + [EVENT_ONBOARDING_SUCCESS]: { + description: 'Counts onboarding succeeded', + valueType: ValueType.INT, + assign: (c: Counter) => (onboardingSuccessCounter = c), + attributes: {} as { + user_tier?: string; + }, + }, } as const; const HISTOGRAM_DEFINITIONS = { @@ -640,6 +656,8 @@ let keychainAvailabilityCounter: Counter | undefined; let tokenStorageTypeCounter: Counter | undefined; let overageOptionCounter: Counter | undefined; let creditPurchaseCounter: Counter | undefined; +let onboardingStartCounter: Counter | undefined; +let onboardingSuccessCounter: Counter | undefined; // OpenTelemetry GenAI Semantic Convention Metrics let genAiClientTokenUsageHistogram: Histogram | undefined; @@ -812,6 +830,31 @@ export function recordLinesChanged( // --- New Metric Recording Functions --- +/** + * Records a metric for when the Google auth process starts. + */ +export function recordOnboardingStart(config: Config): void { + if (!onboardingStartCounter || !isMetricsInitialized) return; + onboardingStartCounter.add( + 1, + baseMetricDefinition.getCommonAttributes(config), + ); +} + +/** + * Records a metric for when the Google auth process ends successfully. + */ +export function recordOnboardingSuccess( + config: Config, + userTier?: string, +): void { + if (!onboardingSuccessCounter || !isMetricsInitialized) return; + onboardingSuccessCounter.add(1, { + ...baseMetricDefinition.getCommonAttributes(config), + ...(userTier && { user_tier: userTier }), + }); +} + /** * Records a metric for when a UI frame flickers. */ diff --git a/packages/core/src/telemetry/sanitize.test.ts b/packages/core/src/telemetry/sanitize.test.ts index 5ac5374d01..71863011c0 100644 --- a/packages/core/src/telemetry/sanitize.test.ts +++ b/packages/core/src/telemetry/sanitize.test.ts @@ -136,7 +136,9 @@ describe('Telemetry Sanitization', () => { const attributes = event.toOpenTelemetryAttributes(config); // Should be JSON stringified + // eslint-disable-next-line no-restricted-syntax expect(typeof attributes['hook_input']).toBe('string'); + // eslint-disable-next-line no-restricted-syntax expect(typeof attributes['hook_output']).toBe('string'); const parsedInput = JSON.parse(attributes['hook_input'] as string); diff --git a/packages/core/src/telemetry/sdk.ts b/packages/core/src/telemetry/sdk.ts index 3752d3e40f..bafa540790 100644 --- a/packages/core/src/telemetry/sdk.ts +++ b/packages/core/src/telemetry/sdk.ts @@ -344,9 +344,9 @@ export async function initializeTelemetry( if (config.getDebugMode()) { debugLogger.log('OpenTelemetry SDK started successfully.'); } - telemetryInitialized = true; activeTelemetryEmail = credentials?.client_email; initializeMetrics(config); + telemetryInitialized = true; void flushTelemetryBuffer(); } catch (error) { debugLogger.error('Error starting OpenTelemetry SDK:', error); diff --git a/packages/core/src/telemetry/trace.test.ts b/packages/core/src/telemetry/trace.test.ts index 4d9aa0baa8..ba2ad9c444 100644 --- a/packages/core/src/telemetry/trace.test.ts +++ b/packages/core/src/telemetry/trace.test.ts @@ -6,7 +6,7 @@ import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'; import { trace, SpanStatusCode, diag, type Tracer } from '@opentelemetry/api'; -import { runInDevTraceSpan } from './trace.js'; +import { runInDevTraceSpan, truncateForTelemetry } from './trace.js'; import { GeminiCliOperation, GEN_AI_CONVERSATION_ID, @@ -36,6 +36,55 @@ vi.mock('../utils/session.js', () => ({ sessionId: 'test-session-id', })); +describe('truncateForTelemetry', () => { + it('should return string unchanged if within maxLength', () => { + expect(truncateForTelemetry('hello', 10)).toBe('hello'); + }); + + it('should truncate string if exceeding maxLength', () => { + const result = truncateForTelemetry('hello world', 5); + expect(result).toBe('hello...[TRUNCATED: original length 11]'); + }); + + it('should correctly truncate strings with multi-byte unicode characters (emojis)', () => { + // 5 emojis, each is multiple bytes in UTF-16 + const emojis = '👋🌍🚀🔥🎉'; + + // Truncating to length 5 (which is 2.5 emojis in UTF-16 length terms) + // truncateString will stop after the full grapheme clusters that fit within 5 + const result = truncateForTelemetry(emojis, 5); + + expect(result).toBe('👋🌍...[TRUNCATED: original length 10]'); + }); + + it('should stringify and truncate objects if exceeding maxLength', () => { + const obj = { message: 'hello world', nested: { a: 1 } }; + const stringified = JSON.stringify(obj); + const result = truncateForTelemetry(obj, 10); + expect(result).toBe( + stringified.substring(0, 10) + + `...[TRUNCATED: original length ${stringified.length}]`, + ); + }); + + it('should stringify objects unchanged if within maxLength', () => { + const obj = { a: 1 }; + expect(truncateForTelemetry(obj, 100)).toBe(JSON.stringify(obj)); + }); + + it('should return booleans and numbers unchanged', () => { + expect(truncateForTelemetry(100)).toBe(100); + expect(truncateForTelemetry(true)).toBe(true); + expect(truncateForTelemetry(false)).toBe(false); + }); + + it('should return undefined for unsupported types', () => { + expect(truncateForTelemetry(undefined)).toBeUndefined(); + expect(truncateForTelemetry(() => {})).toBeUndefined(); + expect(truncateForTelemetry(Symbol('test'))).toBeUndefined(); + }); +}); + describe('runInDevTraceSpan', () => { const mockSpan = { setAttribute: vi.fn(), @@ -133,33 +182,45 @@ describe('runInDevTraceSpan', () => { expect(mockSpan.end).toHaveBeenCalled(); }); - it('should respect noAutoEnd option', async () => { - let capturedEndSpan: () => void = () => {}; - const result = await runInDevTraceSpan( - { operation: GeminiCliOperation.LLMCall, noAutoEnd: true }, - async ({ endSpan }) => { - capturedEndSpan = endSpan; - return 'streaming'; - }, + it('should auto-wrap async iterators and end span when iterator completes', async () => { + async function* testStream() { + yield 1; + yield 2; + } + + const resultStream = await runInDevTraceSpan( + { operation: GeminiCliOperation.LLMCall }, + async () => testStream(), ); - expect(result).toBe('streaming'); expect(mockSpan.end).not.toHaveBeenCalled(); - capturedEndSpan(); + const results = []; + for await (const val of resultStream) { + results.push(val); + } + + expect(results).toEqual([1, 2]); expect(mockSpan.end).toHaveBeenCalled(); }); - it('should automatically end span on error even if noAutoEnd is true', async () => { + it('should end span automatically on error in async iterators', async () => { const error = new Error('streaming error'); - await expect( - runInDevTraceSpan( - { operation: GeminiCliOperation.LLMCall, noAutoEnd: true }, - async () => { - throw error; - }, - ), - ).rejects.toThrow(error); + async function* errorStream() { + yield 1; + throw error; + } + + const resultStream = await runInDevTraceSpan( + { operation: GeminiCliOperation.LLMCall }, + async () => errorStream(), + ); + + await expect(async () => { + for await (const _ of resultStream) { + // iterate + } + }).rejects.toThrow(error); expect(mockSpan.end).toHaveBeenCalled(); }); diff --git a/packages/core/src/telemetry/trace.ts b/packages/core/src/telemetry/trace.ts index 1f4676343a..9059340495 100644 --- a/packages/core/src/telemetry/trace.ts +++ b/packages/core/src/telemetry/trace.ts @@ -25,9 +25,42 @@ import { } from './constants.js'; import { sessionId } from '../utils/session.js'; +import { truncateString } from '../utils/textUtils.js'; + const TRACER_NAME = 'gemini-cli'; const TRACER_VERSION = 'v1'; +export function truncateForTelemetry( + value: unknown, + maxLength: number = 10000, +): AttributeValue | undefined { + if (typeof value === 'string') { + return truncateString( + value, + maxLength, + `...[TRUNCATED: original length ${value.length}]`, + ); + } + if (typeof value === 'object' && value !== null) { + const stringified = safeJsonStringify(value); + return truncateString( + stringified, + maxLength, + `...[TRUNCATED: original length ${stringified.length}]`, + ); + } + if (typeof value === 'number' || typeof value === 'boolean') { + return value; + } + return undefined; +} + +function isAsyncIterable(value: T): value is T & AsyncIterable { + return ( + typeof value === 'object' && value !== null && Symbol.asyncIterator in value + ); +} + /** * Metadata for a span. */ @@ -63,15 +96,10 @@ export interface SpanMetadata { * @returns The result of the function. */ export async function runInDevTraceSpan( - opts: SpanOptions & { operation: GeminiCliOperation; noAutoEnd?: boolean }, - fn: ({ - metadata, - }: { - metadata: SpanMetadata; - endSpan: () => void; - }) => Promise, + opts: SpanOptions & { operation: GeminiCliOperation; logPrompts?: boolean }, + fn: ({ metadata }: { metadata: SpanMetadata }) => Promise, ): Promise { - const { operation, noAutoEnd, ...restOfSpanOpts } = opts; + const { operation, logPrompts, ...restOfSpanOpts } = opts; const tracer = trace.getTracer(TRACER_NAME, TRACER_VERSION); return tracer.startActiveSpan(operation, restOfSpanOpts, async (span) => { @@ -86,20 +114,25 @@ export async function runInDevTraceSpan( }; const endSpan = () => { try { - if (meta.input !== undefined) { - span.setAttribute( - GEN_AI_INPUT_MESSAGES, - safeJsonStringify(meta.input), - ); - } - if (meta.output !== undefined) { - span.setAttribute( - GEN_AI_OUTPUT_MESSAGES, - safeJsonStringify(meta.output), - ); + if (logPrompts !== false) { + if (meta.input !== undefined) { + const truncated = truncateForTelemetry(meta.input); + if (truncated !== undefined) { + span.setAttribute(GEN_AI_INPUT_MESSAGES, truncated); + } + } + if (meta.output !== undefined) { + const truncated = truncateForTelemetry(meta.output); + if (truncated !== undefined) { + span.setAttribute(GEN_AI_OUTPUT_MESSAGES, truncated); + } + } } for (const [key, value] of Object.entries(meta.attributes)) { - span.setAttribute(key, value); + const truncated = truncateForTelemetry(value); + if (truncated !== undefined) { + span.setAttribute(key, truncated); + } } if (meta.error) { span.setStatus({ @@ -123,20 +156,32 @@ export async function runInDevTraceSpan( span.end(); } }; + + let isStream = false; try { - return await fn({ metadata: meta, endSpan }); + const result = await fn({ metadata: meta }); + + if (isAsyncIterable(result)) { + isStream = true; + const streamWrapper = (async function* () { + try { + yield* result; + } catch (e) { + meta.error = e; + throw e; + } finally { + endSpan(); + } + })(); + + return Object.assign(streamWrapper, result); + } + return result; } catch (e) { meta.error = e; - if (noAutoEnd) { - // For streaming operations, the delegated endSpan call will not be reached - // on an exception, so we must end the span here to prevent a leak. - endSpan(); - } throw e; } finally { - if (!noAutoEnd) { - // For non-streaming operations, this ensures the span is always closed, - // and if an error occurred, it will be recorded correctly by endSpan. + if (!isStream) { endSpan(); } } diff --git a/packages/core/src/telemetry/types.ts b/packages/core/src/telemetry/types.ts index 1e0e3abc6e..ffca3a2698 100644 --- a/packages/core/src/telemetry/types.ts +++ b/packages/core/src/telemetry/types.ts @@ -13,7 +13,7 @@ import type { import type { Config } from '../config/config.js'; import type { ApprovalMode } from '../policy/types.js'; -import type { CompletedToolCall } from '../core/coreToolScheduler.js'; +import type { CompletedToolCall } from '../scheduler/types.js'; import { CoreToolCallStatus } from '../scheduler/types.js'; import { DiscoveredMCPTool } from '../tools/mcp-tool.js'; import { AuthType } from '../core/contentGenerator.js'; @@ -44,6 +44,7 @@ import { getFileDiffFromResultDisplay } from '../utils/fileDiffUtils.js'; import { LlmRole } from './llmRole.js'; export { LlmRole }; import type { HookType } from '../hooks/types.js'; +import type { UserTierId } from '../code_assist/types.js'; export interface BaseTelemetryEvent { 'event.name': string; @@ -2360,6 +2361,55 @@ export class KeychainAvailabilityEvent implements BaseTelemetryEvent { } } +export const EVENT_ONBOARDING_START = 'gemini_cli.onboarding.start'; +export class OnboardingStartEvent implements BaseTelemetryEvent { + 'event.name': 'onboarding_start'; + 'event.timestamp': string; + + constructor() { + this['event.name'] = 'onboarding_start'; + this['event.timestamp'] = new Date().toISOString(); + } + + toOpenTelemetryAttributes(config: Config): LogAttributes { + return { + ...getCommonAttributes(config), + 'event.name': EVENT_ONBOARDING_START, + 'event.timestamp': this['event.timestamp'], + }; + } + + toLogBody(): string { + return 'Onboarding started.'; + } +} + +export const EVENT_ONBOARDING_SUCCESS = 'gemini_cli.onboarding.success'; +export class OnboardingSuccessEvent implements BaseTelemetryEvent { + 'event.name': 'onboarding_success'; + 'event.timestamp': string; + userTier?: UserTierId; + + constructor(userTier?: UserTierId) { + this['event.name'] = 'onboarding_success'; + this['event.timestamp'] = new Date().toISOString(); + this.userTier = userTier; + } + + toOpenTelemetryAttributes(config: Config): LogAttributes { + return { + ...getCommonAttributes(config), + 'event.name': EVENT_ONBOARDING_SUCCESS, + 'event.timestamp': this['event.timestamp'], + user_tier: this.userTier ?? '', + }; + } + + toLogBody(): string { + return `Onboarding succeeded.${this.userTier ? ` Tier: ${this.userTier}` : ''}`; + } +} + export const EVENT_TOKEN_STORAGE_INITIALIZATION = 'gemini_cli.token_storage.initialization'; export class TokenStorageInitializationEvent implements BaseTelemetryEvent { diff --git a/packages/core/src/telemetry/uiTelemetry.test.ts b/packages/core/src/telemetry/uiTelemetry.test.ts index abbfecf313..263f904b5a 100644 --- a/packages/core/src/telemetry/uiTelemetry.test.ts +++ b/packages/core/src/telemetry/uiTelemetry.test.ts @@ -20,7 +20,7 @@ import type { CompletedToolCall, ErroredToolCall, SuccessfulToolCall, -} from '../core/coreToolScheduler.js'; +} from '../scheduler/types.js'; import { ToolErrorType } from '../tools/tool-error.js'; import { ToolConfirmationOutcome } from '../tools/tools.js'; import { MockTool } from '../test-utils/mock-tool.js'; @@ -403,6 +403,7 @@ describe('UiTelemetryService', () => { ToolConfirmationOutcome.ProceedOnce, ); service.addEvent({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...structuredClone(new ToolCallEvent(toolCall)), 'event.name': EVENT_TOOL_CALL, } as ToolCallEvent & { 'event.name': typeof EVENT_TOOL_CALL }); @@ -437,6 +438,7 @@ describe('UiTelemetryService', () => { ToolConfirmationOutcome.Cancel, ); service.addEvent({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...structuredClone(new ToolCallEvent(toolCall)), 'event.name': EVENT_TOOL_CALL, } as ToolCallEvent & { 'event.name': typeof EVENT_TOOL_CALL }); @@ -471,6 +473,7 @@ describe('UiTelemetryService', () => { ToolConfirmationOutcome.ModifyWithEditor, ); service.addEvent({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...structuredClone(new ToolCallEvent(toolCall)), 'event.name': EVENT_TOOL_CALL, } as ToolCallEvent & { 'event.name': typeof EVENT_TOOL_CALL }); @@ -487,6 +490,7 @@ describe('UiTelemetryService', () => { it('should process a ToolCallEvent without a decision', () => { const toolCall = createFakeCompletedToolCall('test_tool', true, 100); service.addEvent({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...structuredClone(new ToolCallEvent(toolCall)), 'event.name': EVENT_TOOL_CALL, } as ToolCallEvent & { 'event.name': typeof EVENT_TOOL_CALL }); @@ -523,10 +527,12 @@ describe('UiTelemetryService', () => { ); service.addEvent({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...structuredClone(new ToolCallEvent(toolCall1)), 'event.name': EVENT_TOOL_CALL, } as ToolCallEvent & { 'event.name': typeof EVENT_TOOL_CALL }); service.addEvent({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...structuredClone(new ToolCallEvent(toolCall2)), 'event.name': EVENT_TOOL_CALL, } as ToolCallEvent & { 'event.name': typeof EVENT_TOOL_CALL }); @@ -558,10 +564,12 @@ describe('UiTelemetryService', () => { const toolCall1 = createFakeCompletedToolCall('tool_A', true, 100); const toolCall2 = createFakeCompletedToolCall('tool_B', false, 200); service.addEvent({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...structuredClone(new ToolCallEvent(toolCall1)), 'event.name': EVENT_TOOL_CALL, } as ToolCallEvent & { 'event.name': typeof EVENT_TOOL_CALL }); service.addEvent({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...structuredClone(new ToolCallEvent(toolCall2)), 'event.name': EVENT_TOOL_CALL, } as ToolCallEvent & { 'event.name': typeof EVENT_TOOL_CALL }); @@ -818,6 +826,7 @@ describe('UiTelemetryService', () => { it('should aggregate valid line count metadata', () => { const toolCall = createFakeCompletedToolCall('test_tool', true, 100); const event = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...structuredClone(new ToolCallEvent(toolCall)), 'event.name': EVENT_TOOL_CALL, metadata: { @@ -836,6 +845,7 @@ describe('UiTelemetryService', () => { it('should ignore null/undefined values in line count metadata', () => { const toolCall = createFakeCompletedToolCall('test_tool', true, 100); const event = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...structuredClone(new ToolCallEvent(toolCall)), 'event.name': EVENT_TOOL_CALL, metadata: { diff --git a/packages/core/src/test-utils/mock-message-bus.ts b/packages/core/src/test-utils/mock-message-bus.ts index 05ed8cb32d..c28f077bf2 100644 --- a/packages/core/src/test-utils/mock-message-bus.ts +++ b/packages/core/src/test-utils/mock-message-bus.ts @@ -62,7 +62,6 @@ export class MockMessageBus { if (!this.subscriptions.has(type)) { this.subscriptions.set(type, new Set()); } - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion this.subscriptions.get(type)!.add(listener as (message: Message) => void); }, ); @@ -74,7 +73,6 @@ export class MockMessageBus { (type: T['type'], listener: (message: T) => void) => { const listeners = this.subscriptions.get(type); if (listeners) { - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion listeners.delete(listener as (message: Message) => void); } }, @@ -103,7 +101,6 @@ export class MockMessageBus { * Create a mock MessageBus for testing */ export function createMockMessageBus(): MessageBus { - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion return new MockMessageBus() as unknown as MessageBus; } @@ -113,6 +110,5 @@ export function createMockMessageBus(): MessageBus { export function getMockMessageBusInstance( messageBus: MessageBus, ): MockMessageBus { - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion return messageBus as unknown as MockMessageBus; } diff --git a/packages/core/src/test-utils/mock-tool.ts b/packages/core/src/test-utils/mock-tool.ts index 5f89a506cd..a16f42093b 100644 --- a/packages/core/src/test-utils/mock-tool.ts +++ b/packages/core/src/test-utils/mock-tool.ts @@ -14,7 +14,9 @@ import { Kind, type ToolCallConfirmationDetails, type ToolInvocation, + type ToolLiveOutput, type ToolResult, + type ExecuteOptions, } from '../tools/tools.js'; import { createMockMessageBus } from './mock-message-bus.js'; import type { MessageBus } from '../confirmation-bus/message-bus.js'; @@ -33,6 +35,7 @@ interface MockToolOptions { params: { [key: string]: unknown }, signal?: AbortSignal, updateOutput?: (output: string) => void, + options?: ExecuteOptions, ) => Promise; params?: object; messageBus?: MessageBus; @@ -52,13 +55,15 @@ class MockToolInvocation extends BaseToolInvocation< execute( signal: AbortSignal, - updateOutput?: (output: string) => void, + updateOutput?: (output: ToolLiveOutput) => void, + options?: ExecuteOptions, ): Promise { - if (updateOutput) { - return this.tool.execute(this.params, signal, updateOutput); - } else { - return this.tool.execute(this.params); - } + return this.tool.execute( + this.params, + signal, + updateOutput as ((output: string) => void) | undefined, + options, + ); } override shouldConfirmExecute( @@ -79,14 +84,16 @@ export class MockTool extends BaseDeclarativeTool< { [key: string]: unknown }, ToolResult > { - shouldConfirmExecute: ( + readonly shouldConfirmExecute: ( params: { [key: string]: unknown }, signal: AbortSignal, ) => Promise; - execute: ( + + readonly execute: ( params: { [key: string]: unknown }, signal?: AbortSignal, updateOutput?: (output: string) => void, + options?: ExecuteOptions, ) => Promise; constructor(options: MockToolOptions) { @@ -150,7 +157,11 @@ export class MockModifiableToolInvocation extends BaseToolInvocation< super(params, messageBus, tool.name, tool.displayName); } - async execute(_abortSignal: AbortSignal): Promise { + async execute( + _signal: AbortSignal, + _updateOutput?: (output: ToolLiveOutput) => void, + _options?: ExecuteOptions, + ): Promise { const result = this.tool.executeFn(this.params); return ( result ?? { diff --git a/packages/core/src/test-utils/mockWorkspaceContext.ts b/packages/core/src/test-utils/mockWorkspaceContext.ts index 640b51f616..67c614e9f5 100644 --- a/packages/core/src/test-utils/mockWorkspaceContext.ts +++ b/packages/core/src/test-utils/mockWorkspaceContext.ts @@ -19,7 +19,6 @@ export function createMockWorkspaceContext( ): WorkspaceContext { const allDirs = [rootDir, ...additionalDirs]; - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion const mockWorkspaceContext = { addDirectory: vi.fn(), getDirectories: vi.fn().mockReturnValue(allDirs), diff --git a/packages/core/src/tools/confirmation-policy.test.ts b/packages/core/src/tools/confirmation-policy.test.ts index b18b1dd77e..af9f178b8b 100644 --- a/packages/core/src/tools/confirmation-policy.test.ts +++ b/packages/core/src/tools/confirmation-policy.test.ts @@ -166,7 +166,7 @@ describe('Tool Confirmation Policy Updates', () => { // Mock getMessageBusDecision to trigger ASK_USER flow vi.spyOn(invocation as any, 'getMessageBusDecision').mockResolvedValue( - 'ASK_USER', + 'ask_user', ); const confirmation = await invocation.shouldConfirmExecute( @@ -194,5 +194,39 @@ describe('Tool Confirmation Policy Updates', () => { } }, ); + + it('should skip confirmation in AUTO_EDIT mode', async () => { + vi.spyOn(mockConfig, 'getApprovalMode').mockReturnValue( + ApprovalMode.AUTO_EDIT, + ); + const tool = create(mockConfig, mockMessageBus); + const invocation = tool.build(params as any); + + const confirmation = await invocation.shouldConfirmExecute( + new AbortController().signal, + ); + + expect(confirmation).toBe(false); + }); + + it('should NOT skip confirmation in AUTO_EDIT mode if forcedDecision is ask_user', async () => { + vi.spyOn(mockConfig, 'getApprovalMode').mockReturnValue( + ApprovalMode.AUTO_EDIT, + ); + const tool = create(mockConfig, mockMessageBus); + const invocation = tool.build(params as any); + + // Mock getMessageBusDecision to return ask_user + vi.spyOn(invocation as any, 'getMessageBusDecision').mockResolvedValue( + 'ask_user', + ); + + const confirmation = await invocation.shouldConfirmExecute( + new AbortController().signal, + 'ask_user', + ); + + expect(confirmation).not.toBe(false); + }); }); }); diff --git a/packages/core/src/tools/diffOptions.ts b/packages/core/src/tools/diffOptions.ts index b026b14f7c..0a0e0fa49e 100644 --- a/packages/core/src/tools/diffOptions.ts +++ b/packages/core/src/tools/diffOptions.ts @@ -76,3 +76,39 @@ export function getDiffStat( user_removed_chars: userStats.removedChars, }; } + +/** + * Extracts line and character stats from a unified diff patch string. + * This is useful for reconstructing stats for rejected or errored operations + * where the full strings may no longer be easily accessible. + */ +export function getDiffStatFromPatch(patch: string): DiffStat { + let addedLines = 0; + let removedLines = 0; + let addedChars = 0; + let removedChars = 0; + + const lines = patch.split('\n'); + for (const line of lines) { + // Only count lines that are additions or removals, + // excluding the diff headers (--- and +++) and metadata (\) + if (line.startsWith('+') && !line.startsWith('+++')) { + addedLines++; + addedChars += line.length - 1; + } else if (line.startsWith('-') && !line.startsWith('---')) { + removedLines++; + removedChars += line.length - 1; + } + } + + return { + model_added_lines: addedLines, + model_removed_lines: removedLines, + model_added_chars: addedChars, + model_removed_chars: removedChars, + user_added_lines: 0, + user_removed_lines: 0, + user_added_chars: 0, + user_removed_chars: 0, + }; +} diff --git a/packages/core/src/tools/edit.ts b/packages/core/src/tools/edit.ts index bfa70565be..434f4b2518 100644 --- a/packages/core/src/tools/edit.ts +++ b/packages/core/src/tools/edit.ts @@ -29,7 +29,6 @@ import { makeRelative, shortenPath } from '../utils/paths.js'; import { isNodeError } from '../utils/errors.js'; import { correctPath } from '../utils/pathCorrector.js'; import type { Config } from '../config/config.js'; -import { ApprovalMode } from '../policy/types.js'; import { CoreToolCallStatus } from '../scheduler/types.js'; import { DEFAULT_DIFF_OPTIONS, getDiffStat } from './diffOptions.js'; @@ -454,7 +453,16 @@ class EditToolInvocation toolName?: string, displayName?: string, ) { - super(params, messageBus, toolName, displayName); + super( + params, + messageBus, + toolName, + displayName, + undefined, + undefined, + true, + () => this.config.getApprovalMode(), + ); if (!path.isAbsolute(this.params.file_path)) { const result = correctPath(this.params.file_path, this.config); if (result.success) { @@ -732,10 +740,6 @@ class EditToolInvocation protected override async getConfirmationDetails( abortSignal: AbortSignal, ): Promise { - if (this.config.getApprovalMode() === ApprovalMode.AUTO_EDIT) { - return false; - } - let editData: CalculatedEdit; try { editData = await this.calculateEdit(this.params, abortSignal); @@ -896,11 +900,36 @@ class EditToolInvocation DEFAULT_DIFF_OPTIONS, ); + // Determine the full content as originally proposed by the AI to ensure accurate diff stats. + let fullAiProposedContent = editData.newContent; + if ( + this.params.modified_by_user && + this.params.ai_proposed_content !== undefined + ) { + try { + const aiReplacement = await calculateReplacement(this.config, { + params: { + ...this.params, + new_string: this.params.ai_proposed_content, + }, + currentContent: editData.currentContent ?? '', + abortSignal: signal, + }); + fullAiProposedContent = aiReplacement.newContent; + } catch (error) { + const errorMsg = + error instanceof Error ? error.message : String(error); + debugLogger.log(`AI replacement fallback: ${errorMsg}`); + // Fallback to newContent if speculative calculation fails + fullAiProposedContent = editData.newContent; + } + } + const diffStat = getDiffStat( fileName, editData.currentContent ?? '', + fullAiProposedContent, editData.newContent, - this.params.new_string, ); displayResult = { fileDiff, diff --git a/packages/core/src/tools/enter-plan-mode.test.ts b/packages/core/src/tools/enter-plan-mode.test.ts index 48bc5b494e..d14e1bfcdc 100644 --- a/packages/core/src/tools/enter-plan-mode.test.ts +++ b/packages/core/src/tools/enter-plan-mode.test.ts @@ -47,7 +47,7 @@ describe('EnterPlanModeTool', () => { getMessageBusDecision: () => Promise; }, 'getMessageBusDecision', - ).mockResolvedValue('ASK_USER'); + ).mockResolvedValue('ask_user'); const result = await invocation.shouldConfirmExecute( new AbortController().signal, @@ -74,7 +74,7 @@ describe('EnterPlanModeTool', () => { getMessageBusDecision: () => Promise; }, 'getMessageBusDecision', - ).mockResolvedValue('ALLOW'); + ).mockResolvedValue('allow'); const result = await invocation.shouldConfirmExecute( new AbortController().signal, @@ -92,7 +92,7 @@ describe('EnterPlanModeTool', () => { getMessageBusDecision: () => Promise; }, 'getMessageBusDecision', - ).mockResolvedValue('DENY'); + ).mockResolvedValue('deny'); await expect( invocation.shouldConfirmExecute(new AbortController().signal), @@ -136,7 +136,7 @@ describe('EnterPlanModeTool', () => { getMessageBusDecision: () => Promise; }, 'getMessageBusDecision', - ).mockResolvedValue('ASK_USER'); + ).mockResolvedValue('ask_user'); const details = await invocation.shouldConfirmExecute( new AbortController().signal, diff --git a/packages/core/src/tools/enter-plan-mode.ts b/packages/core/src/tools/enter-plan-mode.ts index d52c721aae..dee8569669 100644 --- a/packages/core/src/tools/enter-plan-mode.ts +++ b/packages/core/src/tools/enter-plan-mode.ts @@ -87,11 +87,11 @@ export class EnterPlanModeInvocation extends BaseToolInvocation< abortSignal: AbortSignal, ): Promise { const decision = await this.getMessageBusDecision(abortSignal); - if (decision === 'ALLOW') { + if (decision === 'allow') { return false; } - if (decision === 'DENY') { + if (decision === 'deny') { throw new Error( `Tool execution for "${ this._toolDisplayName || this._toolName @@ -99,7 +99,7 @@ export class EnterPlanModeInvocation extends BaseToolInvocation< ); } - // ASK_USER + // ask_user return { type: 'info', title: 'Enter Plan Mode', diff --git a/packages/core/src/tools/exit-plan-mode.test.ts b/packages/core/src/tools/exit-plan-mode.test.ts index 88e327ab34..855c5d2aba 100644 --- a/packages/core/src/tools/exit-plan-mode.test.ts +++ b/packages/core/src/tools/exit-plan-mode.test.ts @@ -59,7 +59,7 @@ describe('ExitPlanModeTool', () => { getMessageBusDecision: () => Promise; }, 'getMessageBusDecision', - ).mockResolvedValue('ASK_USER'); + ).mockResolvedValue('ask_user'); }); afterEach(() => { @@ -127,7 +127,7 @@ describe('ExitPlanModeTool', () => { getMessageBusDecision: () => Promise; }, 'getMessageBusDecision', - ).mockResolvedValue('ALLOW'); + ).mockResolvedValue('allow'); const result = await invocation.shouldConfirmExecute( new AbortController().signal, @@ -150,7 +150,7 @@ describe('ExitPlanModeTool', () => { getMessageBusDecision: () => Promise; }, 'getMessageBusDecision', - ).mockResolvedValue('DENY'); + ).mockResolvedValue('deny'); await expect( invocation.shouldConfirmExecute(new AbortController().signal), diff --git a/packages/core/src/tools/exit-plan-mode.ts b/packages/core/src/tools/exit-plan-mode.ts index aad95492c2..892e8926e0 100644 --- a/packages/core/src/tools/exit-plan-mode.ts +++ b/packages/core/src/tools/exit-plan-mode.ts @@ -138,7 +138,7 @@ export class ExitPlanModeInvocation extends BaseToolInvocation< } const decision = await this.getMessageBusDecision(abortSignal); - if (decision === 'DENY') { + if (decision === 'deny') { throw new Error( `Tool execution for "${ this._toolDisplayName || this._toolName @@ -146,7 +146,7 @@ export class ExitPlanModeInvocation extends BaseToolInvocation< ); } - if (decision === 'ALLOW') { + if (decision === 'allow') { // If policy is allow, auto-approve with default settings and execute. this.confirmationOutcome = ToolConfirmationOutcome.ProceedOnce; this.approvalPayload = { @@ -156,7 +156,7 @@ export class ExitPlanModeInvocation extends BaseToolInvocation< return false; } - // decision is 'ASK_USER' + // decision is 'ask_user' return { type: 'exit_plan_mode', title: 'Plan Approval', diff --git a/packages/core/src/tools/mcp-client-manager.test.ts b/packages/core/src/tools/mcp-client-manager.test.ts index dce8708628..84d3e138ce 100644 --- a/packages/core/src/tools/mcp-client-manager.test.ts +++ b/packages/core/src/tools/mcp-client-manager.test.ts @@ -511,6 +511,7 @@ describe('McpClientManager', () => { await manager.startExtension(extension); mockedMcpClient.getServerConfig.mockReturnValue({ + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...extension.mcpServers!['test-server'], extension, }); diff --git a/packages/core/src/tools/mcp-client-manager.ts b/packages/core/src/tools/mcp-client-manager.ts index a607b19508..666b6d5321 100644 --- a/packages/core/src/tools/mcp-client-manager.ts +++ b/packages/core/src/tools/mcp-client-manager.ts @@ -215,6 +215,7 @@ export class McpClientManager { await Promise.all( Object.entries(extension.mcpServers ?? {}).map(([name, config]) => this.maybeDiscoverMcpServer(name, { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...config, extension, }), @@ -331,7 +332,9 @@ export class McpClientManager { const env = { ...(base.env ?? {}), ...(override.env ?? {}) }; return { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...base, + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...override, includeTools, excludeTools: excludeTools.length > 0 ? excludeTools : undefined, diff --git a/packages/core/src/tools/mcp-client.ts b/packages/core/src/tools/mcp-client.ts index 58b7b6c8e2..fdd8bb7008 100644 --- a/packages/core/src/tools/mcp-client.ts +++ b/packages/core/src/tools/mcp-client.ts @@ -1755,7 +1755,11 @@ export interface McpContext { setUserInteractedWithMcp?(): void; isTrustedFolder(): boolean; getPolicyEngine?(): { - getRules(): ReadonlyArray<{ toolName?: string; source?: string }>; + getRules(): ReadonlyArray<{ + toolName: string; + mcpName?: string; + source?: string; + }>; }; } diff --git a/packages/core/src/tools/mcp-tool.test.ts b/packages/core/src/tools/mcp-tool.test.ts index 4bb76e2e98..ee97771369 100644 --- a/packages/core/src/tools/mcp-tool.test.ts +++ b/packages/core/src/tools/mcp-tool.test.ts @@ -99,6 +99,10 @@ describe('formatMcpToolName', () => { expect(formatMcpToolName('github', '*')).toBe('mcp_github_*'); }); + it('should handle both server and tool wildcards', () => { + expect(formatMcpToolName('*', '*')).toBe('mcp_*'); + }); + it('should handle undefined toolName as a tool-level wildcard', () => { expect(formatMcpToolName('github')).toBe('mcp_github_*'); }); @@ -165,6 +169,53 @@ describe('DiscoveredMCPTool', () => { }); }); + describe('getDisplayTitle and getExplanation', () => { + const commandTool = new DiscoveredMCPTool( + mockCallableToolInstance, + serverName, + serverToolName, + baseDescription, + { + type: 'object', + properties: { command: { type: 'string' }, path: { type: 'string' } }, + required: ['command'], + }, + createMockMessageBus(), + undefined, + undefined, + undefined, + undefined, + undefined, + undefined, + ); + + it('should return command as title if it exists', () => { + const invocation = commandTool.build({ command: 'ls -la' }); + expect(invocation.getDisplayTitle?.()).toBe('ls -la'); + }); + + it('should return displayName if command does not exist', () => { + const invocation = tool.build({ param: 'testValue' }); + expect(invocation.getDisplayTitle?.()).toBe(tool.displayName); + }); + + it('should return stringified json for getExplanation', () => { + const params = { command: 'ls -la', path: '/' }; + const invocation = commandTool.build(params); + expect(invocation.getExplanation?.()).toBe(safeJsonStringify(params)); + }); + + it('should truncate and summarize long json payloads for getExplanation', () => { + const longString = 'a'.repeat(600); + const params = { command: 'echo', text: longString, other: 'value' }; + const invocation = commandTool.build(params); + const explanation = invocation.getExplanation?.() ?? ''; + expect(explanation).toMatch( + /^\[Payload omitted due to length with parameters: command, text, other\]$/, + ); + }); + }); + describe('execute', () => { it('should call mcpTool.callTool with correct parameters and format display output', async () => { const params = { param: 'testValue' }; diff --git a/packages/core/src/tools/mcp-tool.ts b/packages/core/src/tools/mcp-tool.ts index 195a78ec61..fe4038b6e8 100644 --- a/packages/core/src/tools/mcp-tool.ts +++ b/packages/core/src/tools/mcp-tool.ts @@ -80,11 +80,11 @@ export function formatMcpToolName( serverName: string, toolName?: string, ): string { - if (serverName === '*' && !toolName) { + if (serverName === '*' && (toolName === undefined || toolName === '*')) { return `${MCP_TOOL_PREFIX}*`; } else if (serverName === '*') { return `${MCP_TOOL_PREFIX}*_${toolName}`; - } else if (!toolName) { + } else if (toolName === undefined || toolName === '*') { return `${MCP_TOOL_PREFIX}${serverName}_*`; } else { return `${MCP_TOOL_PREFIX}${serverName}_${toolName}`; @@ -105,12 +105,13 @@ export interface McpToolAnnotation extends Record { export function isMcpToolAnnotation( annotation: unknown, ): annotation is McpToolAnnotation { - return ( - typeof annotation === 'object' && - annotation !== null && - // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion, no-restricted-syntax - typeof (annotation as Record)['_serverName'] === 'string' - ); + if (typeof annotation !== 'object' || annotation === null) { + return false; + } + // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion + const record = annotation as Record; + const serverName = record['_serverName']; + return typeof serverName === 'string'; } type ToolParams = Record; @@ -331,6 +332,35 @@ export class DiscoveredMCPToolInvocation extends BaseToolInvocation< getDescription(): string { return safeJsonStringify(this.params); } + + override getDisplayTitle(): string { + // If it's a known terminal execute tool provided by JetBrains or similar, + // and a command argument is present, return just the command. + const command = this.params['command']; + if (typeof command === 'string') { + return command; + } + + // Otherwise fallback to the display name or server tool name + return this.displayName || this.serverToolName; + } + + override getExplanation(): string { + const MAX_EXPLANATION_LENGTH = 500; + const stringified = safeJsonStringify(this.params); + if (stringified.length > MAX_EXPLANATION_LENGTH) { + const keys = Object.keys(this.params); + const displayedKeys = keys.slice(0, 5); + const keysDesc = + displayedKeys.length > 0 + ? ` with parameters: ${displayedKeys.join(', ')}${ + keys.length > 5 ? ', ...' : '' + }` + : ''; + return `[Payload omitted due to length${keysDesc}]`; + } + return stringified; + } } export class DiscoveredMCPTool extends BaseDeclarativeTool< diff --git a/packages/core/src/tools/message-bus-integration.test.ts b/packages/core/src/tools/message-bus-integration.test.ts index bfc369b58b..91a2e30d94 100644 --- a/packages/core/src/tools/message-bus-integration.test.ts +++ b/packages/core/src/tools/message-bus-integration.test.ts @@ -57,10 +57,10 @@ class TestToolInvocation extends BaseToolInvocation { abortSignal: AbortSignal, ): Promise { const decision = await this.getMessageBusDecision(abortSignal); - if (decision === 'ALLOW') { + if (decision === 'allow') { return false; } - if (decision === 'DENY') { + if (decision === 'deny') { throw new Error('Tool execution denied by policy'); } return false; diff --git a/packages/core/src/tools/read-file.test.ts b/packages/core/src/tools/read-file.test.ts index fa7a0669d6..584155ce29 100644 --- a/packages/core/src/tools/read-file.test.ts +++ b/packages/core/src/tools/read-file.test.ts @@ -674,6 +674,7 @@ describe('ReadFileTool', () => { const parts = result.llmContent as Array>; const jitTextPart = parts.find( (p) => + // eslint-disable-next-line no-restricted-syntax typeof p['text'] === 'string' && p['text'].includes('Auth rules'), ); expect(jitTextPart).toBeDefined(); diff --git a/packages/core/src/tools/shell.test.ts b/packages/core/src/tools/shell.test.ts index ace59cd7cf..9320b4f3f8 100644 --- a/packages/core/src/tools/shell.test.ts +++ b/packages/core/src/tools/shell.test.ts @@ -668,6 +668,39 @@ describe('ShellTool', () => { }); }); + describe('getDisplayTitle and getExplanation', () => { + it('should return only the command for getDisplayTitle', () => { + const invocation = shellTool.build({ + command: 'echo hello', + description: 'prints hello', + dir_path: 'foo/bar', + is_background: true, + }); + expect(invocation.getDisplayTitle?.()).toBe('echo hello'); + }); + + it('should return the context for getExplanation', () => { + const invocation = shellTool.build({ + command: 'echo hello', + description: 'prints hello', + dir_path: 'foo/bar', + is_background: true, + }); + expect(invocation.getExplanation?.()).toBe( + '[in foo/bar] (prints hello) [background]', + ); + }); + + it('should construct explanation without optional parameters', () => { + const invocation = shellTool.build({ + command: 'echo hello', + }); + expect(invocation.getExplanation?.()).toBe( + `[current working directory ${process.cwd()}]`, + ); + }); + }); + describe('llmContent output format', () => { const mockAbortSignal = new AbortController().signal; diff --git a/packages/core/src/tools/shell.ts b/packages/core/src/tools/shell.ts index 8917d281bd..86e3a68bc5 100644 --- a/packages/core/src/tools/shell.ts +++ b/packages/core/src/tools/shell.ts @@ -72,23 +72,35 @@ export class ShellToolInvocation extends BaseToolInvocation< super(params, messageBus, _toolName, _toolDisplayName); } - getDescription(): string { - let description = `${this.params.command}`; + private getContextualDetails(): string { + let details = ''; // append optional [in directory] - // note description is needed even if validation fails due to absolute path + // note explanation is needed even if validation fails due to absolute path if (this.params.dir_path) { - description += ` [in ${this.params.dir_path}]`; + details += `[in ${this.params.dir_path}]`; } else { - description += ` [current working directory ${process.cwd()}]`; + details += `[current working directory ${process.cwd()}]`; } // append optional (description), replacing any line breaks with spaces if (this.params.description) { - description += ` (${this.params.description.replace(/\n/g, ' ')})`; + details += ` (${this.params.description.replace(/\n/g, ' ')})`; } if (this.params.is_background) { - description += ' [background]'; + details += ' [background]'; } - return description; + return details; + } + + getDescription(): string { + return `${this.params.command} ${this.getContextualDetails()}`; + } + + override getDisplayTitle(): string { + return this.params.command; + } + + override getExplanation(): string { + return this.getContextualDetails().trim(); } override getPolicyUpdateOptions( @@ -100,10 +112,12 @@ export class ShellToolInvocation extends BaseToolInvocation< ) { const command = stripShellWrapper(this.params.command); const rootCommands = [...new Set(getCommandRoots(command))]; + const allowRedirection = hasRedirection(command) ? true : undefined; + if (rootCommands.length > 0) { - return { commandPrefix: rootCommands }; + return { commandPrefix: rootCommands, allowRedirection }; } - return { commandPrefix: this.params.command }; + return { commandPrefix: this.params.command, allowRedirection }; } return undefined; } @@ -367,6 +381,10 @@ export class ShellToolInvocation extends BaseToolInvocation< if (result.exitCode !== null && result.exitCode !== 0) { llmContentParts.push(`Exit Code: ${result.exitCode}`); + data = { + exitCode: result.exitCode, + isError: true, + }; } if (result.signal) { diff --git a/packages/core/src/tools/tool-names.ts b/packages/core/src/tools/tool-names.ts index e818881662..154a9de58f 100644 --- a/packages/core/src/tools/tool-names.ts +++ b/packages/core/src/tools/tool-names.ts @@ -150,19 +150,16 @@ export { SKILL_PARAM_NAME, }; -export const LS_TOOL_NAME_LEGACY = 'list_directory'; // Just to be safe if anything used the old exported name directly - export const EDIT_TOOL_NAMES = new Set([EDIT_TOOL_NAME, WRITE_FILE_TOOL_NAME]); /** - * Tools that can access local files or remote resources and should be - * treated with extra caution when updating policies. + * Tools that require mandatory argument narrowing (e.g., file paths, command prefixes) + * when granting persistent or session-wide approval. */ -export const SENSITIVE_TOOLS = new Set([ +export const TOOLS_REQUIRING_NARROWING = new Set([ GLOB_TOOL_NAME, GREP_TOOL_NAME, READ_MANY_FILES_TOOL_NAME, - WEB_FETCH_TOOL_NAME, READ_FILE_TOOL_NAME, LS_TOOL_NAME, WRITE_FILE_TOOL_NAME, @@ -183,6 +180,11 @@ export const EDIT_DISPLAY_NAME = 'Edit'; export const ASK_USER_DISPLAY_NAME = 'Ask User'; export const READ_FILE_DISPLAY_NAME = 'ReadFile'; export const GLOB_DISPLAY_NAME = 'FindFiles'; +export const LS_DISPLAY_NAME = 'ReadFolder'; +export const GREP_DISPLAY_NAME = 'SearchText'; +export const WEB_SEARCH_DISPLAY_NAME = 'GoogleSearch'; +export const WEB_FETCH_DISPLAY_NAME = 'WebFetch'; +export const READ_MANY_FILES_DISPLAY_NAME = 'ReadManyFiles'; /** * Mapping of legacy tool names to their current names. diff --git a/packages/core/src/tools/tools.ts b/packages/core/src/tools/tools.ts index 3865aaf357..a9f3b57f4e 100644 --- a/packages/core/src/tools/tools.ts +++ b/packages/core/src/tools/tools.ts @@ -6,6 +6,7 @@ import type { FunctionDeclaration, PartListUnion } from '@google/genai'; import { ToolErrorType } from './tool-error.js'; +import type { GrepMatch } from './grep-utils.js'; import type { DiffUpdateResult } from '../ide/ide-client.js'; import type { ShellExecutionConfig } from '../services/shellExecutionService.js'; import { SchemaValidator } from '../utils/schemaValidator.js'; @@ -19,9 +20,15 @@ import { type ToolConfirmationResponse, type Question, } from '../confirmation-bus/types.js'; -import { type ApprovalMode } from '../policy/types.js'; +import { ApprovalMode } from '../policy/types.js'; import type { SubagentProgress } from '../agents/types.js'; +/** +/** + * Supported decisions for forcing tool execution behavior. + */ +export type ForcedToolDecision = 'allow' | 'deny' | 'ask_user'; + /** * Options bag for tool execution, replacing positional parameters that are * only relevant to specific tool types. @@ -51,6 +58,19 @@ export interface ToolInvocation< */ getDescription(): string; + /** + * Gets a clean title for display in the UI (e.g. the raw command without metadata). + * If not implemented, the UI may fall back to getDescription(). + * @returns A string representing the tool call title. + */ + getDisplayTitle?(): string; + + /** + * Gets conversational explanation or secondary metadata. + * @returns A string representing the explanation, or undefined. + */ + getExplanation?(): string; + /** * Determines what file system paths the tool will affect. * @returns A list of such paths. @@ -65,6 +85,7 @@ export interface ToolInvocation< */ shouldConfirmExecute( abortSignal: AbortSignal, + forcedDecision?: ForcedToolDecision, ): Promise; /** @@ -131,6 +152,7 @@ export interface PolicyUpdateOptions { commandPrefix?: string | string[]; mcpName?: string; toolName?: string; + allowRedirection?: boolean; } /** @@ -148,23 +170,43 @@ export abstract class BaseToolInvocation< readonly _toolDisplayName?: string, readonly _serverName?: string, readonly _toolAnnotations?: Record, + readonly respectsAutoEdit: boolean = false, + readonly getApprovalMode: () => ApprovalMode = () => ApprovalMode.DEFAULT, ) {} abstract getDescription(): string; + getDisplayTitle(): string { + return this.getDescription(); + } + + getExplanation(): string { + return ''; + } + toolLocations(): ToolLocation[] { return []; } async shouldConfirmExecute( abortSignal: AbortSignal, + forcedDecision?: ForcedToolDecision, ): Promise { - const decision = await this.getMessageBusDecision(abortSignal); - if (decision === 'ALLOW') { + if ( + this.respectsAutoEdit && + this.getApprovalMode() === ApprovalMode.AUTO_EDIT && + forcedDecision !== 'ask_user' + ) { return false; } - if (decision === 'DENY') { + const decision = + forcedDecision ?? (await this.getMessageBusDecision(abortSignal)); + if (decision === 'allow') { + return false; + } + + if (decision === 'deny') { throw new Error( `Tool execution for "${ this._toolDisplayName || this._toolName @@ -172,7 +214,7 @@ export abstract class BaseToolInvocation< ); } - if (decision === 'ASK_USER') { + if (decision === 'ask_user') { return this.getConfirmationDetails(abortSignal); } @@ -216,7 +258,7 @@ export abstract class BaseToolInvocation< /** * Subclasses should override this method to provide custom confirmation UI - * when the policy engine's decision is 'ASK_USER'. + * when the policy engine's decision is 'ask_user'. * The base implementation provides a generic confirmation prompt. */ protected async getConfirmationDetails( @@ -239,11 +281,12 @@ export abstract class BaseToolInvocation< protected getMessageBusDecision( abortSignal: AbortSignal, - ): Promise<'ALLOW' | 'DENY' | 'ASK_USER'> { + forcedDecision?: ForcedToolDecision, + ): Promise { if (!this.messageBus || !this._toolName) { // If there's no message bus, we can't make a decision, so we allow. // The legacy confirmation flow will still apply if the tool needs it. - return Promise.resolve('ALLOW'); + return Promise.resolve('allow'); } const correlationId = randomUUID(); @@ -257,11 +300,12 @@ export abstract class BaseToolInvocation< }, serverName: this._serverName, toolAnnotations: this._toolAnnotations, + forcedDecision, }; - return new Promise<'ALLOW' | 'DENY' | 'ASK_USER'>((resolve) => { + return new Promise((resolve) => { if (!this.messageBus) { - resolve('ALLOW'); + resolve('allow'); return; } @@ -282,11 +326,11 @@ export abstract class BaseToolInvocation< const abortHandler = () => { cleanup(); - resolve('DENY'); + resolve('deny'); }; if (abortSignal.aborted) { - resolve('DENY'); + resolve('deny'); return; } @@ -294,11 +338,11 @@ export abstract class BaseToolInvocation< if (response.correlationId === correlationId) { cleanup(); if (response.requiresUserConfirmation) { - resolve('ASK_USER'); + resolve('ask_user'); } else if (response.confirmed) { - resolve('ALLOW'); + resolve('allow'); } else { - resolve('DENY'); + resolve('deny'); } } }; @@ -307,7 +351,7 @@ export abstract class BaseToolInvocation< timeoutId = setTimeout(() => { cleanup(); - resolve('ASK_USER'); // Default to ASK_USER on timeout + resolve('ask_user'); // Default to ask_user on timeout }, 30000); this.messageBus.subscribe( @@ -325,7 +369,7 @@ export abstract class BaseToolInvocation< void this.messageBus.publish(request); } catch (_error) { cleanup(); - resolve('ALLOW'); + resolve('allow'); } }); } @@ -816,6 +860,51 @@ export interface TodoList { export type ToolLiveOutput = string | AnsiOutput | SubagentProgress; +export interface StructuredToolResult { + summary: string; +} + +export function isStructuredToolResult( + obj: unknown, +): obj is StructuredToolResult { + return ( + typeof obj === 'object' && + obj !== null && + 'summary' in obj && + typeof obj.summary === 'string' + ); +} + +export const hasSummary = (res: unknown): res is { summary: string } => + isStructuredToolResult(res); + +export interface GrepResult extends StructuredToolResult { + matches: GrepMatch[]; + payload?: string; +} + +export interface ListDirectoryResult extends StructuredToolResult { + files: string[]; + payload?: string; +} + +export interface ReadManyFilesResult extends StructuredToolResult { + files: string[]; + skipped?: Array<{ path: string; reason: string }>; + include?: string[]; + excludes?: string[]; + targetDir?: string; + payload?: string; +} + +export const isGrepResult = (res: unknown): res is GrepResult => + isStructuredToolResult(res) && 'matches' in res && Array.isArray(res.matches); + +export const isListResult = ( + res: unknown, +): res is ListDirectoryResult | ReadManyFilesResult => + isStructuredToolResult(res) && 'files' in res && Array.isArray(res.files); + export type ToolResultDisplay = | string | FileDiff @@ -845,6 +934,13 @@ export interface FileDiff { isNewFile?: boolean; } +export const isFileDiff = (res: unknown): res is FileDiff => + typeof res === 'object' && + res !== null && + 'fileDiff' in res && + 'fileName' in res && + 'filePath' in res; + export interface DiffStat { model_added_lines: number; model_removed_lines: number; @@ -859,6 +955,7 @@ export interface DiffStat { export interface ToolEditConfirmationDetails { type: 'edit'; title: string; + systemMessage?: string; onConfirm: ( outcome: ToolConfirmationOutcome, payload?: ToolConfirmationPayload, @@ -869,6 +966,7 @@ export interface ToolEditConfirmationDetails { originalContent: string | null; newContent: string; isModifying?: boolean; + diffStat?: DiffStat; ideConfirmation?: Promise; } @@ -897,6 +995,7 @@ export type ToolConfirmationPayload = export interface ToolExecuteConfirmationDetails { type: 'exec'; title: string; + systemMessage?: string; onConfirm: (outcome: ToolConfirmationOutcome) => Promise; command: string; rootCommand: string; @@ -907,6 +1006,7 @@ export interface ToolExecuteConfirmationDetails { export interface ToolMcpConfirmationDetails { type: 'mcp'; title: string; + systemMessage?: string; serverName: string; toolName: string; toolDisplayName: string; @@ -919,6 +1019,7 @@ export interface ToolMcpConfirmationDetails { export interface ToolInfoConfirmationDetails { type: 'info'; title: string; + systemMessage?: string; onConfirm: (outcome: ToolConfirmationOutcome) => Promise; prompt: string; urls?: string[]; @@ -927,6 +1028,7 @@ export interface ToolInfoConfirmationDetails { export interface ToolAskUserConfirmationDetails { type: 'ask_user'; title: string; + systemMessage?: string; questions: Question[]; onConfirm: ( outcome: ToolConfirmationOutcome, @@ -937,6 +1039,7 @@ export interface ToolAskUserConfirmationDetails { export interface ToolExitPlanModeConfirmationDetails { type: 'exit_plan_mode'; title: string; + systemMessage?: string; planPath: string; onConfirm: ( outcome: ToolConfirmationOutcome, diff --git a/packages/core/src/tools/web-fetch.test.ts b/packages/core/src/tools/web-fetch.test.ts index 2b65a24930..f52ff214f4 100644 --- a/packages/core/src/tools/web-fetch.test.ts +++ b/packages/core/src/tools/web-fetch.test.ts @@ -752,6 +752,24 @@ describe('WebFetchTool', () => { }); }); + describe('getPolicyUpdateOptions', () => { + it('should return empty object for any outcome to allow global approval', () => { + const tool = new WebFetchTool(mockConfig, bus); + const invocation = tool.build({ prompt: 'fetch https://example.com' }); + + expect( + invocation.getPolicyUpdateOptions!( + ToolConfirmationOutcome.ProceedAlways, + ), + ).toEqual({}); + expect( + invocation.getPolicyUpdateOptions!( + ToolConfirmationOutcome.ProceedAlwaysAndSave, + ), + ).toEqual({}); + }); + }); + describe('Message Bus Integration', () => { let policyEngine: PolicyEngine; let messageBus: MessageBus; diff --git a/packages/core/src/tools/web-fetch.ts b/packages/core/src/tools/web-fetch.ts index 27a60c4259..dc90d892ef 100644 --- a/packages/core/src/tools/web-fetch.ts +++ b/packages/core/src/tools/web-fetch.ts @@ -5,20 +5,18 @@ */ import { + type ToolConfirmationOutcome, BaseDeclarativeTool, BaseToolInvocation, Kind, type ToolCallConfirmationDetails, type ToolInvocation, type ToolResult, - type ToolConfirmationOutcome, type PolicyUpdateOptions, } from './tools.js'; -import { buildParamArgsPattern } from '../policy/utils.js'; import type { MessageBus } from '../confirmation-bus/message-bus.js'; import { ToolErrorType } from './tool-error.js'; import { getErrorMessage } from '../utils/errors.js'; -import { ApprovalMode } from '../policy/types.js'; import { getResponseText } from '../utils/partUtils.js'; import { fetchWithTimeout, isPrivateIp } from '../utils/fetch.js'; import { truncateString } from '../utils/textUtils.js'; @@ -30,7 +28,7 @@ import { NetworkRetryAttemptEvent, } from '../telemetry/index.js'; import { LlmRole } from '../telemetry/llmRole.js'; -import { WEB_FETCH_TOOL_NAME } from './tool-names.js'; +import { WEB_FETCH_TOOL_NAME, WEB_FETCH_DISPLAY_NAME } from './tool-names.js'; import { debugLogger } from '../utils/debugLogger.js'; import { coreEvents } from '../utils/events.js'; import { retryWithBackoff, getRetryErrorType } from '../utils/retry.js'; @@ -231,7 +229,16 @@ class WebFetchToolInvocation extends BaseToolInvocation< _toolName?: string, _toolDisplayName?: string, ) { - super(params, messageBus, _toolName, _toolDisplayName); + super( + params, + messageBus, + _toolName, + _toolDisplayName, + undefined, + undefined, + true, + () => this.context.config.getApprovalMode(), + ); } private handleRetry(attempt: number, error: unknown, delayMs: number): void { @@ -501,27 +508,12 @@ ${aggregatedContent} override getPolicyUpdateOptions( _outcome: ToolConfirmationOutcome, ): PolicyUpdateOptions | undefined { - if (this.params.url) { - return { - argsPattern: buildParamArgsPattern('url', this.params.url), - }; - } else if (this.params.prompt) { - return { - argsPattern: buildParamArgsPattern('prompt', this.params.prompt), - }; - } - return undefined; + return {}; } protected override async getConfirmationDetails( _abortSignal: AbortSignal, ): Promise { - // Check for AUTO_EDIT approval mode. This tool has a specific behavior - // where ProceedAlways switches the entire session to AUTO_EDIT. - if (this.context.config.getApprovalMode() === ApprovalMode.AUTO_EDIT) { - return false; - } - let urls: string[] = []; let prompt = this.params.prompt || ''; @@ -891,7 +883,7 @@ export class WebFetchTool extends BaseDeclarativeTool< ) { super( WebFetchTool.Name, - 'WebFetch', + WEB_FETCH_DISPLAY_NAME, WEB_FETCH_DEFINITION.base.description!, Kind.Fetch, WEB_FETCH_DEFINITION.base.parametersJsonSchema, diff --git a/packages/core/src/tools/web-search.ts b/packages/core/src/tools/web-search.ts index 18132d2c35..2a29291437 100644 --- a/packages/core/src/tools/web-search.ts +++ b/packages/core/src/tools/web-search.ts @@ -5,7 +5,7 @@ */ import type { MessageBus } from '../confirmation-bus/message-bus.js'; -import { WEB_SEARCH_TOOL_NAME } from './tool-names.js'; +import { WEB_SEARCH_TOOL_NAME, WEB_SEARCH_DISPLAY_NAME } from './tool-names.js'; import type { GroundingMetadata } from '@google/genai'; import { BaseDeclarativeTool, @@ -212,7 +212,7 @@ export class WebSearchTool extends BaseDeclarativeTool< ) { super( WebSearchTool.Name, - 'GoogleSearch', + WEB_SEARCH_DISPLAY_NAME, WEB_SEARCH_DEFINITION.base.description!, Kind.Search, WEB_SEARCH_DEFINITION.base.parametersJsonSchema, diff --git a/packages/core/src/tools/write-file.test.ts b/packages/core/src/tools/write-file.test.ts index a014ec354c..b3d762554a 100644 --- a/packages/core/src/tools/write-file.test.ts +++ b/packages/core/src/tools/write-file.test.ts @@ -367,6 +367,7 @@ describe('WriteFileTool', () => { const abortSignal = new AbortController().signal; const mockGemini3Config = { + // eslint-disable-next-line @typescript-eslint/no-misused-spread ...mockConfig, getActiveModel: () => 'gemini-3.0-pro', } as unknown as Config; diff --git a/packages/core/src/tools/write-file.ts b/packages/core/src/tools/write-file.ts index f725a21c43..8ba967114c 100644 --- a/packages/core/src/tools/write-file.ts +++ b/packages/core/src/tools/write-file.ts @@ -11,7 +11,6 @@ import os from 'node:os'; import * as Diff from 'diff'; import { WRITE_FILE_TOOL_NAME, WRITE_FILE_DISPLAY_NAME } from './tool-names.js'; import type { Config } from '../config/config.js'; -import { ApprovalMode } from '../policy/types.js'; import { BaseDeclarativeTool, @@ -156,7 +155,16 @@ class WriteFileToolInvocation extends BaseToolInvocation< toolName?: string, displayName?: string, ) { - super(params, messageBus, toolName, displayName); + super( + params, + messageBus, + toolName, + displayName, + undefined, + undefined, + true, + () => this.config.getApprovalMode(), + ); this.resolvedPath = path.resolve( this.config.getTargetDir(), this.params.file_path, @@ -186,10 +194,6 @@ class WriteFileToolInvocation extends BaseToolInvocation< protected override async getConfirmationDetails( abortSignal: AbortSignal, ): Promise { - if (this.config.getApprovalMode() === ApprovalMode.AUTO_EDIT) { - return false; - } - const correctedContentResult = await getCorrectedFileContent( this.config, this.resolvedPath, diff --git a/packages/core/src/utils/events.ts b/packages/core/src/utils/events.ts index 47c42c93ba..bf3d997da1 100644 --- a/packages/core/src/utils/events.ts +++ b/packages/core/src/utils/events.ts @@ -88,9 +88,12 @@ export interface HookPayload { * Payload for the 'hook-start' event. */ export interface HookStartPayload extends HookPayload { + /** + * The source of the hook configuration. + */ + source?: string; /** * The 1-based index of the current hook in the execution sequence. - * Used for progress indication (e.g. "Hook 1/3"). */ hookIndex?: number; /** diff --git a/packages/core/src/utils/shell-utils.test.ts b/packages/core/src/utils/shell-utils.test.ts index 81b43abf50..2370aa25c4 100644 --- a/packages/core/src/utils/shell-utils.test.ts +++ b/packages/core/src/utils/shell-utils.test.ts @@ -19,6 +19,7 @@ import { getShellConfiguration, initializeShellParsers, parseCommandDetails, + splitCommands, stripShellWrapper, hasRedirection, resolveExecutable, @@ -119,8 +120,10 @@ describe('getCommandRoots', () => { expect(getCommandRoots('ls -l')).toEqual(['ls']); }); - it('should handle paths and return the binary name', () => { - expect(getCommandRoots('/usr/local/bin/node script.js')).toEqual(['node']); + it('should handle paths and return the full path', () => { + expect(getCommandRoots('/usr/local/bin/node script.js')).toEqual([ + '/usr/local/bin/node', + ]); }); it('should return an empty array for an empty string', () => { @@ -302,6 +305,40 @@ describeWindowsOnly('PowerShell integration', () => { }); }); +describe('splitCommands', () => { + it('should split chained commands', () => { + expect(splitCommands('ls -l && git status')).toEqual([ + 'ls -l', + 'git status', + ]); + }); + + it('should filter out redirection tokens but keep command parts', () => { + // Standard redirection + expect(splitCommands('echo "hello" > file.txt')).toEqual(['echo "hello"']); + expect(splitCommands('printf "test" >> log.txt')).toEqual([ + 'printf "test"', + ]); + expect(splitCommands('cat < input.txt')).toEqual(['cat']); + + // Heredoc/Herestring + expect(splitCommands('cat << EOF\nhello\nEOF')).toEqual(['cat']); + // Note: The Tree-sitter bash parser includes the herestring in the main + // command node's text, unlike standard redirections which are siblings. + expect(splitCommands('grep "foo" <<< "foobar"')).toEqual([ + 'grep "foo" <<< "foobar"', + ]); + }); + + it('should extract nested commands from process substitution while filtering the redirection operator', () => { + // This is the key security test: we want cat to be checked, but not the > >(...) wrapper part + const parts = splitCommands('echo "foo" > >(cat)'); + expect(parts).toContain('echo "foo"'); + expect(parts).toContain('cat'); + expect(parts.some((p) => p.includes('>'))).toBe(false); + }); +}); + describe('stripShellWrapper', () => { it('should strip sh -c with quotes', () => { expect(stripShellWrapper('sh -c "ls -l"')).toEqual('ls -l'); diff --git a/packages/core/src/utils/shell-utils.ts b/packages/core/src/utils/shell-utils.ts index 89f50a9ce7..14fce36a34 100644 --- a/packages/core/src/utils/shell-utils.ts +++ b/packages/core/src/utils/shell-utils.ts @@ -264,11 +264,7 @@ function normalizeCommandName(raw: string): string { return raw.slice(1, -1); } } - const trimmed = raw.trim(); - if (!trimmed) { - return trimmed; - } - return trimmed.split(/[\\/]/).pop() ?? trimmed; + return raw.trim(); } function extractNameFromNode(node: Node): string | null { @@ -667,7 +663,10 @@ export function splitCommands(command: string): string[] { return []; } - return parsed.details.map((detail) => detail.text).filter(Boolean); + return parsed.details + .filter((detail) => !REDIRECTION_NAMES.has(detail.name)) + .map((detail) => detail.text) + .filter(Boolean); } /** diff --git a/packages/test-utils/GEMINI.md b/packages/test-utils/GEMINI.md index 56f64c0291..f378270fbd 100644 --- a/packages/test-utils/GEMINI.md +++ b/packages/test-utils/GEMINI.md @@ -10,6 +10,58 @@ published to npm. - `src/file-system-test-helpers.ts`: Helpers for creating temporary file system fixtures. - `src/mock-utils.ts`: Common mock utilities. +- `src/test-mcp-server.ts`: Helper for building test MCP servers for tests. +- `src/test-mcp-server-template.mjs`: Generic template script for running + isolated MCP processes. + +## Test MCP Servers + +The `TestRig` provides a fully isolated, compliant way to test tool triggers and +workflows using local test MCP servers. This isolates your tests from live API +endpoints and rate-limiting. + +### Usage + +1. **Programmatic Builder:** + + ```typescript + import { TestMcpServerBuilder } from '@google/gemini-cli-test-utils'; + + const builder = new TestMcpServerBuilder('weather-server').addTool( + 'get_weather', + 'Get weather', + 'It is rainy', + ); + + rig.addTestMcpServer('weather-server', builder.build()); + ``` + +2. **Predefined configurations via JSON:** Place a configuration file in + `packages/test-utils/assets/test-servers/google-workspace.json` and load it + by title: + + ```typescript + rig.addTestMcpServer('workspace-server', 'google-workspace'); + ``` + + **JSON Format Structure (`TestMcpConfig`):** + + ```json + { + "name": "string (Fallback server name)", + "tools": [ + { + "name": "string (Tool execution name)", + "description": "string (Helpful summary for router)", + "inputSchema": { + "type": "object", + "properties": { ... } + }, + "response": "string | object (The forced reply payload)" + } + ] + } + ``` ## Usage diff --git a/packages/test-utils/assets/test-servers/google-workspace.json b/packages/test-utils/assets/test-servers/google-workspace.json new file mode 100644 index 0000000000..ceb46c0671 --- /dev/null +++ b/packages/test-utils/assets/test-servers/google-workspace.json @@ -0,0 +1,1816 @@ +{ + "name": "google-workspace", + "tools": [ + { + "name": "auth.clear", + "description": "Clears the authentication credentials, forcing a re-login on the next request.", + "inputSchema": { + "type": "object", + "properties": {}, + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for auth.clear" + } + ] + } + }, + { + "name": "auth.refreshToken", + "description": "Manually triggers the token refresh process.", + "inputSchema": { + "type": "object", + "properties": {}, + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for auth.refreshToken" + } + ] + } + }, + { + "name": "docs.getSuggestions", + "description": "Retrieves suggested edits from a Google Doc.", + "inputSchema": { + "type": "object", + "properties": { + "documentId": { + "type": "string", + "description": "The ID of the document to retrieve suggestions from." + } + }, + "required": ["documentId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for docs.getSuggestions" + } + ] + } + }, + { + "name": "drive.getComments", + "description": "Retrieves comments from a Google Drive file (Docs, Sheets, Slides, etc.).", + "inputSchema": { + "type": "object", + "properties": { + "fileId": { + "type": "string", + "description": "The ID of the file to retrieve comments from." + } + }, + "required": ["fileId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for drive.getComments" + } + ] + } + }, + { + "name": "docs.create", + "description": "Creates a new Google Doc. Can be blank or with initial text content.", + "inputSchema": { + "type": "object", + "properties": { + "title": { + "type": "string", + "description": "The title for the new Google Doc." + }, + "content": { + "description": "The text content to create the document with.", + "type": "string" + } + }, + "required": ["title"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for docs.create" + } + ] + } + }, + { + "name": "docs.writeText", + "description": "Writes text to a Google Doc at a specified position.", + "inputSchema": { + "type": "object", + "properties": { + "documentId": { + "type": "string", + "description": "The ID of the document to modify." + }, + "text": { + "type": "string", + "description": "The text to write to the document." + }, + "position": { + "description": "Where to insert the text. Use \"beginning\" for the start, \"end\" for the end (default), or a numeric index for a specific position.", + "type": "string" + }, + "tabId": { + "description": "The ID of the tab to modify. If not provided, modifies the first tab.", + "type": "string" + } + }, + "required": ["documentId", "text"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for docs.writeText" + } + ] + } + }, + { + "name": "drive.findFolder", + "description": "Finds a folder by name in Google Drive.", + "inputSchema": { + "type": "object", + "properties": { + "folderName": { + "type": "string", + "description": "The name of the folder to find." + } + }, + "required": ["folderName"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for drive.findFolder" + } + ] + } + }, + { + "name": "drive.createFolder", + "description": "Creates a new folder in Google Drive.", + "inputSchema": { + "type": "object", + "properties": { + "name": { + "type": "string", + "minLength": 1, + "description": "The name of the new folder." + }, + "parentId": { + "description": "The ID of the parent folder. If not provided, creates in the root directory.", + "type": "string", + "minLength": 1 + } + }, + "required": ["name"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for drive.createFolder" + } + ] + } + }, + { + "name": "docs.getText", + "description": "Retrieves the text content of a Google Doc.", + "inputSchema": { + "type": "object", + "properties": { + "documentId": { + "type": "string", + "description": "The ID of the document to read." + }, + "tabId": { + "description": "The ID of the tab to read. If not provided, returns all tabs.", + "type": "string" + } + }, + "required": ["documentId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for docs.getText" + } + ] + } + }, + { + "name": "docs.replaceText", + "description": "Replaces all occurrences of a given text with new text in a Google Doc.", + "inputSchema": { + "type": "object", + "properties": { + "documentId": { + "type": "string", + "description": "The ID of the document to modify." + }, + "findText": { + "type": "string", + "description": "The text to find in the document." + }, + "replaceText": { + "type": "string", + "description": "The text to replace the found text with." + }, + "tabId": { + "description": "The ID of the tab to modify. If not provided, replaces in all tabs (legacy behavior).", + "type": "string" + } + }, + "required": ["documentId", "findText", "replaceText"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for docs.replaceText" + } + ] + } + }, + { + "name": "docs.formatText", + "description": "Applies formatting (bold, italic, headings, etc.) to text ranges in a Google Doc. Use after inserting text to apply rich formatting.", + "inputSchema": { + "type": "object", + "properties": { + "documentId": { + "type": "string", + "description": "The ID of the document to format." + }, + "formats": { + "type": "array", + "items": { + "type": "object", + "properties": { + "startIndex": { + "type": "number", + "description": "The start index of the text range (1-based)." + }, + "endIndex": { + "type": "number", + "description": "The end index of the text range (exclusive, 1-based)." + }, + "style": { + "type": "string", + "description": "The formatting style to apply. Supported: bold, italic, underline, strikethrough, code, link, heading1, heading2, heading3, heading4, heading5, heading6, normalText." + }, + "url": { + "description": "The URL for link formatting. Required when style is \"link\".", + "type": "string" + } + }, + "required": ["startIndex", "endIndex", "style"] + }, + "description": "The formatting instructions to apply." + }, + "tabId": { + "description": "The ID of the tab to format. If not provided, formats the first tab.", + "type": "string" + } + }, + "required": ["documentId", "formats"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for docs.formatText" + } + ] + } + }, + { + "name": "slides.getText", + "description": "Retrieves the text content of a Google Slides presentation.", + "inputSchema": { + "type": "object", + "properties": { + "presentationId": { + "type": "string", + "description": "The ID or URL of the presentation to read." + } + }, + "required": ["presentationId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for slides.getText" + } + ] + } + }, + { + "name": "slides.getMetadata", + "description": "Gets metadata about a Google Slides presentation.", + "inputSchema": { + "type": "object", + "properties": { + "presentationId": { + "type": "string", + "description": "The ID or URL of the presentation." + } + }, + "required": ["presentationId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for slides.getMetadata" + } + ] + } + }, + { + "name": "slides.getImages", + "description": "Downloads all images embedded in a Google Slides presentation to a local directory.", + "inputSchema": { + "type": "object", + "properties": { + "presentationId": { + "type": "string", + "description": "The ID or URL of the presentation to extract images from." + }, + "localPath": { + "type": "string", + "description": "The absolute local directory path to download the images to (e.g., \"/Users/name/downloads/images\")." + } + }, + "required": ["presentationId", "localPath"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for slides.getImages" + } + ] + } + }, + { + "name": "slides.getSlideThumbnail", + "description": "Downloads a thumbnail image for a specific slide in a Google Slides presentation to a local path.", + "inputSchema": { + "type": "object", + "properties": { + "presentationId": { + "type": "string", + "description": "The ID or URL of the presentation." + }, + "slideObjectId": { + "type": "string", + "description": "The object ID of the slide (can be found via slides.getMetadata or slides.getText)." + }, + "localPath": { + "type": "string", + "description": "The absolute local file path to download the thumbnail to (e.g., \"/Users/name/downloads/slide1.png\")." + } + }, + "required": ["presentationId", "slideObjectId", "localPath"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for slides.getSlideThumbnail" + } + ] + } + }, + { + "name": "sheets.getText", + "description": "Retrieves the content of a Google Sheets spreadsheet.", + "inputSchema": { + "type": "object", + "properties": { + "spreadsheetId": { + "type": "string", + "description": "The ID or URL of the spreadsheet to read." + }, + "format": { + "description": "Output format (default: text).", + "type": "string", + "enum": ["text", "csv", "json"] + } + }, + "required": ["spreadsheetId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for sheets.getText" + } + ] + } + }, + { + "name": "sheets.getRange", + "description": "Gets values from a specific range in a Google Sheets spreadsheet.", + "inputSchema": { + "type": "object", + "properties": { + "spreadsheetId": { + "type": "string", + "description": "The ID or URL of the spreadsheet." + }, + "range": { + "type": "string", + "description": "The A1 notation range to get (e.g., \"Sheet1!A1:B10\")." + } + }, + "required": ["spreadsheetId", "range"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for sheets.getRange" + } + ] + } + }, + { + "name": "sheets.getMetadata", + "description": "Gets metadata about a Google Sheets spreadsheet.", + "inputSchema": { + "type": "object", + "properties": { + "spreadsheetId": { + "type": "string", + "description": "The ID or URL of the spreadsheet." + } + }, + "required": ["spreadsheetId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for sheets.getMetadata" + } + ] + } + }, + { + "name": "drive.search", + "description": "Searches for files and folders in Google Drive. The query can be a simple search term, a Google Drive URL, or a full query string. For more information on query strings see: https://developers.google.com/drive/api/guides/search-files", + "inputSchema": { + "type": "object", + "properties": { + "query": { + "description": "A simple search term (e.g., \"Budget Q3\"), a Google Drive URL, or a full query string (e.g., \"name contains 'Budget' and owners in 'user@example.com'\").", + "type": "string" + }, + "pageSize": { + "description": "The maximum number of results to return.", + "type": "number" + }, + "pageToken": { + "description": "The token for the next page of results.", + "type": "string" + }, + "corpus": { + "description": "The corpus of files to search (e.g., \"user\", \"domain\").", + "type": "string" + }, + "unreadOnly": { + "description": "Whether to filter for unread files only.", + "type": "boolean" + }, + "sharedWithMe": { + "description": "Whether to search for files shared with the user.", + "type": "boolean" + } + }, + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for drive.search" + } + ] + } + }, + { + "name": "drive.downloadFile", + "description": "Downloads the content of a file from Google Drive to a local path. Note: Google Docs, Sheets, and Slides require specialized handling.", + "inputSchema": { + "type": "object", + "properties": { + "fileId": { + "type": "string", + "description": "The ID of the file to download." + }, + "localPath": { + "type": "string", + "description": "The local file path where the content should be saved (e.g., \"downloads/report.pdf\")." + } + }, + "required": ["fileId", "localPath"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for drive.downloadFile" + } + ] + } + }, + { + "name": "drive.moveFile", + "description": "Moves a file or folder to a different folder in Google Drive.", + "inputSchema": { + "type": "object", + "properties": { + "fileId": { + "type": "string", + "description": "The ID or URL of the file to move." + }, + "folderId": { + "description": "The ID of the destination folder. Either folderId or folderName must be provided.", + "type": "string" + }, + "folderName": { + "description": "The name of the destination folder. Either folderId or folderName must be provided.", + "type": "string" + } + }, + "required": ["fileId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for drive.moveFile" + } + ] + } + }, + { + "name": "drive.trashFile", + "description": "Moves a file or folder to the trash in Google Drive. This is a safe, reversible operation.", + "inputSchema": { + "type": "object", + "properties": { + "fileId": { + "type": "string", + "description": "The ID or URL of the file to trash." + } + }, + "required": ["fileId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for drive.trashFile" + } + ] + } + }, + { + "name": "drive.renameFile", + "description": "Renames a file or folder in Google Drive.", + "inputSchema": { + "type": "object", + "properties": { + "fileId": { + "type": "string", + "description": "The ID or URL of the file to rename." + }, + "newName": { + "type": "string", + "minLength": 1, + "description": "The new name for the file." + } + }, + "required": ["fileId", "newName"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for drive.renameFile" + } + ] + } + }, + { + "name": "calendar.list", + "description": "Lists all of the user's calendars.", + "inputSchema": { + "type": "object", + "properties": {}, + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for calendar.list" + } + ] + } + }, + { + "name": "calendar.createEvent", + "description": "Creates a new event in a calendar. Supports optional Google Meet link generation and Google Drive file attachments. When addGoogleMeet is true, the Meet URL will be in the response's hangoutLink field. Attachments fully replace any existing attachments.", + "inputSchema": { + "type": "object", + "properties": { + "calendarId": { + "type": "string", + "description": "The ID of the calendar to create the event in." + }, + "summary": { + "type": "string", + "description": "The summary or title of the event." + }, + "description": { + "description": "The description of the event.", + "type": "string" + }, + "start": { + "type": "object", + "properties": { + "dateTime": { + "type": "string", + "description": "The start time in strict ISO 8601 format with seconds and timezone (e.g., 2024-01-15T10:30:00Z or 2024-01-15T10:30:00-05:00)." + } + }, + "required": ["dateTime"] + }, + "end": { + "type": "object", + "properties": { + "dateTime": { + "type": "string", + "description": "The end time in strict ISO 8601 format with seconds and timezone (e.g., 2024-01-15T11:30:00Z or 2024-01-15T11:30:00-05:00)." + } + }, + "required": ["dateTime"] + }, + "attendees": { + "description": "The email addresses of the attendees.", + "type": "array", + "items": { + "type": "string" + } + }, + "sendUpdates": { + "description": "Whether to send notifications to attendees. Defaults to \"all\" if attendees are provided, otherwise \"none\".", + "type": "string", + "enum": ["all", "externalOnly", "none"] + }, + "addGoogleMeet": { + "description": "Whether to create a Google Meet link for the event. The Meet URL will be available in the response's hangoutLink field.", + "type": "boolean" + }, + "attachments": { + "description": "Google Drive file attachments. IMPORTANT: Providing attachments fully REPLACES any existing attachments on the event (not appended).", + "type": "array", + "items": { + "type": "object", + "properties": { + "fileUrl": { + "type": "string", + "format": "uri", + "description": "Google Drive file URL (e.g., https://drive.google.com/file/d/...)" + }, + "title": { + "description": "Display title for the attachment.", + "type": "string" + }, + "mimeType": { + "description": "MIME type of the attachment.", + "type": "string" + } + }, + "required": ["fileUrl"] + } + } + }, + "required": ["calendarId", "summary", "start", "end"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for calendar.createEvent" + } + ] + } + }, + { + "name": "calendar.listEvents", + "description": "Lists events from a calendar. Defaults to upcoming events.", + "inputSchema": { + "type": "object", + "properties": { + "calendarId": { + "type": "string", + "description": "The ID of the calendar to list events from." + }, + "timeMin": { + "description": "The start time for the event search. Defaults to the current time.", + "type": "string" + }, + "timeMax": { + "description": "The end time for the event search.", + "type": "string" + }, + "attendeeResponseStatus": { + "description": "The response status of the attendee.", + "type": "array", + "items": { + "type": "string" + } + } + }, + "required": ["calendarId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for calendar.listEvents" + } + ] + } + }, + { + "name": "calendar.getEvent", + "description": "Gets the details of a specific calendar event.", + "inputSchema": { + "type": "object", + "properties": { + "eventId": { + "type": "string", + "description": "The ID of the event to retrieve." + }, + "calendarId": { + "description": "The ID of the calendar the event belongs to. Defaults to the primary calendar.", + "type": "string" + } + }, + "required": ["eventId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for calendar.getEvent" + } + ] + } + }, + { + "name": "calendar.findFreeTime", + "description": "Finds a free time slot for multiple people to meet.", + "inputSchema": { + "type": "object", + "properties": { + "attendees": { + "type": "array", + "items": { + "type": "string" + }, + "description": "The email addresses of the attendees." + }, + "timeMin": { + "type": "string", + "description": "The start time for the search in strict ISO 8601 format with seconds and timezone (e.g., 2024-01-15T09:00:00Z or 2024-01-15T09:00:00-05:00)." + }, + "timeMax": { + "type": "string", + "description": "The end time for the search in strict ISO 8601 format with seconds and timezone (e.g., 2024-01-15T18:00:00Z or 2024-01-15T18:00:00-05:00)." + }, + "duration": { + "type": "number", + "description": "The duration of the meeting in minutes." + } + }, + "required": ["attendees", "timeMin", "timeMax", "duration"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for calendar.findFreeTime" + } + ] + } + }, + { + "name": "calendar.updateEvent", + "description": "Updates an existing event in a calendar. Supports adding Google Meet links and Google Drive file attachments. When addGoogleMeet is true, the Meet URL will be in the response's hangoutLink field. Attachments fully replace any existing attachments (not appended).", + "inputSchema": { + "type": "object", + "properties": { + "eventId": { + "type": "string", + "description": "The ID of the event to update." + }, + "calendarId": { + "description": "The ID of the calendar to update the event in.", + "type": "string" + }, + "summary": { + "description": "The new summary or title of the event.", + "type": "string" + }, + "description": { + "description": "The new description of the event.", + "type": "string" + }, + "start": { + "type": "object", + "properties": { + "dateTime": { + "type": "string", + "description": "The new start time in strict ISO 8601 format with seconds and timezone (e.g., 2024-01-15T10:30:00Z or 2024-01-15T10:30:00-05:00)." + } + }, + "required": ["dateTime"] + }, + "end": { + "type": "object", + "properties": { + "dateTime": { + "type": "string", + "description": "The new end time in strict ISO 8601 format with seconds and timezone (e.g., 2024-01-15T11:30:00Z or 2024-01-15T11:30:00-05:00)." + } + }, + "required": ["dateTime"] + }, + "attendees": { + "description": "The new list of attendees for the event.", + "type": "array", + "items": { + "type": "string" + } + }, + "addGoogleMeet": { + "description": "Whether to create a Google Meet link for the event. The Meet URL will be available in the response's hangoutLink field.", + "type": "boolean" + }, + "attachments": { + "description": "Google Drive file attachments. IMPORTANT: Providing attachments fully REPLACES any existing attachments on the event (not appended).", + "type": "array", + "items": { + "type": "object", + "properties": { + "fileUrl": { + "type": "string", + "format": "uri", + "description": "Google Drive file URL (e.g., https://drive.google.com/file/d/...)" + }, + "title": { + "description": "Display title for the attachment.", + "type": "string" + }, + "mimeType": { + "description": "MIME type of the attachment.", + "type": "string" + } + }, + "required": ["fileUrl"] + } + } + }, + "required": ["eventId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for calendar.updateEvent" + } + ] + } + }, + { + "name": "calendar.respondToEvent", + "description": "Responds to a meeting invitation (accept, decline, or tentative).", + "inputSchema": { + "type": "object", + "properties": { + "eventId": { + "type": "string", + "description": "The ID of the event to respond to." + }, + "calendarId": { + "description": "The ID of the calendar containing the event.", + "type": "string" + }, + "responseStatus": { + "type": "string", + "enum": ["accepted", "declined", "tentative"], + "description": "Your response to the invitation." + }, + "sendNotification": { + "description": "Whether to send a notification to the organizer (default: true).", + "type": "boolean" + }, + "responseMessage": { + "description": "Optional message to include with your response.", + "type": "string" + } + }, + "required": ["eventId", "responseStatus"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for calendar.respondToEvent" + } + ] + } + }, + { + "name": "calendar.deleteEvent", + "description": "Deletes an event from a calendar.", + "inputSchema": { + "type": "object", + "properties": { + "eventId": { + "type": "string", + "description": "The ID of the event to delete." + }, + "calendarId": { + "description": "The ID of the calendar to delete the event from. Defaults to the primary calendar.", + "type": "string" + } + }, + "required": ["eventId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for calendar.deleteEvent" + } + ] + } + }, + { + "name": "chat.listSpaces", + "description": "Lists the spaces the user is a member of.", + "inputSchema": { + "type": "object", + "properties": {}, + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for chat.listSpaces" + } + ] + } + }, + { + "name": "chat.findSpaceByName", + "description": "Finds a Google Chat space by its display name.", + "inputSchema": { + "type": "object", + "properties": { + "displayName": { + "type": "string", + "description": "The display name of the space to find." + } + }, + "required": ["displayName"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for chat.findSpaceByName" + } + ] + } + }, + { + "name": "chat.sendMessage", + "description": "Sends a message to a Google Chat space.", + "inputSchema": { + "type": "object", + "properties": { + "spaceName": { + "type": "string", + "description": "The name of the space to send the message to (e.g., spaces/AAAAN2J52O8)." + }, + "message": { + "type": "string", + "description": "The message to send." + }, + "threadName": { + "description": "The resource name of the thread to reply to. Example: \"spaces/AAAAVJcnwPE/threads/IAf4cnLqYfg\"", + "type": "string" + } + }, + "required": ["spaceName", "message"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for chat.sendMessage" + } + ] + } + }, + { + "name": "chat.getMessages", + "description": "Gets messages from a Google Chat space.", + "inputSchema": { + "type": "object", + "properties": { + "spaceName": { + "type": "string", + "description": "The name of the space to get messages from (e.g., spaces/AAAAN2J52O8)." + }, + "threadName": { + "description": "The resource name of the thread to filter messages by. Example: \"spaces/AAAAVJcnwPE/threads/IAf4cnLqYfg\"", + "type": "string" + }, + "unreadOnly": { + "description": "Whether to return only unread messages.", + "type": "boolean" + }, + "pageSize": { + "description": "The maximum number of messages to return.", + "type": "number" + }, + "pageToken": { + "description": "The token for the next page of results.", + "type": "string" + }, + "orderBy": { + "description": "The order to list messages in (e.g., \"createTime desc\").", + "type": "string" + } + }, + "required": ["spaceName"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for chat.getMessages" + } + ] + } + }, + { + "name": "chat.sendDm", + "description": "Sends a direct message to a user.", + "inputSchema": { + "type": "object", + "properties": { + "email": { + "type": "string", + "format": "email", + "pattern": "^(?!\\.)(?!.*\\.\\.)([A-Za-z0-9_'+\\-\\.]*)[A-Za-z0-9_+-]@([A-Za-z0-9][A-Za-z0-9\\-]*\\.)+[A-Za-z]{2,}$", + "description": "The email address of the user to send the message to." + }, + "message": { + "type": "string", + "description": "The message to send." + }, + "threadName": { + "description": "The resource name of the thread to reply to. Example: \"spaces/AAAAVJcnwPE/threads/IAf4cnLqYfg\"", + "type": "string" + } + }, + "required": ["email", "message"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for chat.sendDm" + } + ] + } + }, + { + "name": "chat.findDmByEmail", + "description": "Finds a Google Chat DM space by a user's email address.", + "inputSchema": { + "type": "object", + "properties": { + "email": { + "type": "string", + "format": "email", + "pattern": "^(?!\\.)(?!.*\\.\\.)([A-Za-z0-9_'+\\-\\.]*)[A-Za-z0-9_+-]@([A-Za-z0-9][A-Za-z0-9\\-]*\\.)+[A-Za-z]{2,}$", + "description": "The email address of the user to find the DM space with." + } + }, + "required": ["email"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for chat.findDmByEmail" + } + ] + } + }, + { + "name": "chat.listThreads", + "description": "Lists threads from a Google Chat space in reverse chronological order.", + "inputSchema": { + "type": "object", + "properties": { + "spaceName": { + "type": "string", + "description": "The name of the space to get threads from (e.g., spaces/AAAAN2J52O8)." + }, + "pageSize": { + "description": "The maximum number of threads to return.", + "type": "number" + }, + "pageToken": { + "description": "The token for the next page of results.", + "type": "string" + } + }, + "required": ["spaceName"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for chat.listThreads" + } + ] + } + }, + { + "name": "chat.setUpSpace", + "description": "Sets up a new Google Chat space with a display name and a list of members.", + "inputSchema": { + "type": "object", + "properties": { + "displayName": { + "type": "string", + "description": "The display name of the space." + }, + "userNames": { + "type": "array", + "items": { + "type": "string" + }, + "description": "The user names of the members to add to the space (e.g. users/12345678)" + } + }, + "required": ["displayName", "userNames"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for chat.setUpSpace" + } + ] + } + }, + { + "name": "gmail.search", + "description": "Search for emails in Gmail using query parameters.", + "inputSchema": { + "type": "object", + "properties": { + "query": { + "description": "Search query (same syntax as Gmail search box, e.g., \"from:someone@example.com is:unread\").", + "type": "string" + }, + "maxResults": { + "description": "Maximum number of results to return (default: 100).", + "type": "number" + }, + "pageToken": { + "description": "Token for the next page of results.", + "type": "string" + }, + "labelIds": { + "description": "Filter by label IDs (e.g., [\"INBOX\", \"UNREAD\"]).", + "type": "array", + "items": { + "type": "string" + } + }, + "includeSpamTrash": { + "description": "Include messages from SPAM and TRASH (default: false).", + "type": "boolean" + } + }, + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for gmail.search" + } + ] + } + }, + { + "name": "gmail.get", + "description": "Get the full content of a specific email message.", + "inputSchema": { + "type": "object", + "properties": { + "messageId": { + "type": "string", + "description": "The ID of the message to retrieve." + }, + "format": { + "description": "Format of the message (default: full).", + "type": "string", + "enum": ["minimal", "full", "raw", "metadata"] + } + }, + "required": ["messageId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for gmail.get" + } + ] + } + }, + { + "name": "gmail.downloadAttachment", + "description": "Downloads an attachment from a Gmail message to a local file.", + "inputSchema": { + "type": "object", + "properties": { + "messageId": { + "type": "string", + "description": "The ID of the message containing the attachment." + }, + "attachmentId": { + "type": "string", + "description": "The ID of the attachment to download." + }, + "localPath": { + "type": "string", + "description": "The absolute local path where the attachment should be saved (e.g., \"/Users/name/downloads/report.pdf\")." + } + }, + "required": ["messageId", "attachmentId", "localPath"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for gmail.downloadAttachment" + } + ] + } + }, + { + "name": "gmail.modify", + "description": "Modify a Gmail message. Supported modifications include:\n - Add labels to a message.\n - Remove labels from a message.\nThere are a list of system labels that can be modified on a message:\n - INBOX: removing INBOX label removes the message from inbox and archives the message.\n - SPAM: adding SPAM label marks a message as spam.\n - TRASH: adding TRASH label moves a message to trash.\n - UNREAD: removing UNREAD label marks a message as read.\n - STARRED: adding STARRED label marks a message as starred.\n - IMPORTANT: adding IMPORTANT label marks a message as important.", + "inputSchema": { + "type": "object", + "properties": { + "messageId": { + "type": "string", + "description": "The ID of the message to add labels to and/or remove labels from." + }, + "addLabelIds": { + "description": "A list of label IDs to add to the message. Limit to 100 labels.", + "maxItems": 100, + "type": "array", + "items": { + "type": "string" + } + }, + "removeLabelIds": { + "description": "A list of label IDs to remove from the message. Limit to 100 labels.", + "maxItems": 100, + "type": "array", + "items": { + "type": "string" + } + } + }, + "required": ["messageId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for gmail.modify" + } + ] + } + }, + { + "name": "gmail.batchModify", + "description": "Bulk modify up to 1,000 Gmail messages at once. Applies the same label changes to all specified messages in a single API call. This is much more efficient than modifying messages individually.\n - Add labels to messages.\n - Remove labels from messages.\nSystem labels that can be modified:\n - INBOX: removing INBOX label archives messages.\n - SPAM: adding SPAM label marks messages as spam.\n - TRASH: adding TRASH label moves messages to trash.\n - UNREAD: removing UNREAD label marks messages as read.\n - STARRED: adding STARRED label marks messages as starred.\n - IMPORTANT: adding IMPORTANT label marks messages as important.", + "inputSchema": { + "type": "object", + "properties": { + "messageIds": { + "minItems": 1, + "maxItems": 1000, + "type": "array", + "items": { + "type": "string" + }, + "description": "The IDs of the messages to modify. Maximum 1,000 per call." + }, + "addLabelIds": { + "description": "A list of label IDs to add to the messages. Limit to 100 labels.", + "maxItems": 100, + "type": "array", + "items": { + "type": "string" + } + }, + "removeLabelIds": { + "description": "A list of label IDs to remove from the messages. Limit to 100 labels.", + "maxItems": 100, + "type": "array", + "items": { + "type": "string" + } + } + }, + "required": ["messageIds"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for gmail.batchModify" + } + ] + } + }, + { + "name": "gmail.modifyThread", + "description": "Modify labels on all messages in a Gmail thread. This applies label changes to every message in the thread at once, which is useful for operations like marking an entire conversation as read.\nSystem labels that can be modified:\n - INBOX: removing INBOX label archives the thread.\n - SPAM: adding SPAM label marks the thread as spam.\n - TRASH: adding TRASH label moves the thread to trash.\n - UNREAD: removing UNREAD label marks all messages in the thread as read.\n - STARRED: adding STARRED label marks the thread as starred.\n - IMPORTANT: adding IMPORTANT label marks the thread as important.", + "inputSchema": { + "type": "object", + "properties": { + "threadId": { + "type": "string", + "description": "The ID of the thread to modify." + }, + "addLabelIds": { + "description": "A list of label IDs to add to the thread. Limit to 100 labels.", + "maxItems": 100, + "type": "array", + "items": { + "type": "string" + } + }, + "removeLabelIds": { + "description": "A list of label IDs to remove from the thread. Limit to 100 labels.", + "maxItems": 100, + "type": "array", + "items": { + "type": "string" + } + } + }, + "required": ["threadId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for gmail.modifyThread" + } + ] + } + }, + { + "name": "gmail.send", + "description": "Send an email message.", + "inputSchema": { + "type": "object", + "properties": { + "to": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "array", + "items": { + "type": "string" + } + } + ], + "description": "Recipient email address(es)." + }, + "subject": { + "type": "string", + "description": "Email subject." + }, + "body": { + "type": "string", + "description": "Email body content." + }, + "cc": { + "description": "CC recipient email address(es).", + "anyOf": [ + { + "type": "string" + }, + { + "type": "array", + "items": { + "type": "string" + } + } + ] + }, + "bcc": { + "description": "BCC recipient email address(es).", + "anyOf": [ + { + "type": "string" + }, + { + "type": "array", + "items": { + "type": "string" + } + } + ] + }, + "isHtml": { + "description": "Whether the body is HTML (default: false).", + "type": "boolean" + } + }, + "required": ["to", "subject", "body"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for gmail.send" + } + ] + } + }, + { + "name": "gmail.createDraft", + "description": "Create a draft email message.", + "inputSchema": { + "type": "object", + "properties": { + "to": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "array", + "items": { + "type": "string" + } + } + ], + "description": "Recipient email address(es)." + }, + "subject": { + "type": "string", + "description": "Email subject." + }, + "body": { + "type": "string", + "description": "Email body content." + }, + "cc": { + "description": "CC recipient email address(es).", + "anyOf": [ + { + "type": "string" + }, + { + "type": "array", + "items": { + "type": "string" + } + } + ] + }, + "bcc": { + "description": "BCC recipient email address(es).", + "anyOf": [ + { + "type": "string" + }, + { + "type": "array", + "items": { + "type": "string" + } + } + ] + }, + "isHtml": { + "description": "Whether the body is HTML (default: false).", + "type": "boolean" + }, + "threadId": { + "description": "The thread ID to create the draft as a reply to. When provided, the draft will be linked to the existing thread with appropriate reply headers.", + "type": "string" + } + }, + "required": ["to", "subject", "body"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for gmail.createDraft" + } + ] + } + }, + { + "name": "gmail.sendDraft", + "description": "Send a previously created draft email.", + "inputSchema": { + "type": "object", + "properties": { + "draftId": { + "type": "string", + "description": "The ID of the draft to send." + } + }, + "required": ["draftId"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for gmail.sendDraft" + } + ] + } + }, + { + "name": "gmail.listLabels", + "description": "List all Gmail labels in the user's mailbox.", + "inputSchema": { + "type": "object", + "properties": {}, + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for gmail.listLabels" + } + ] + } + }, + { + "name": "gmail.createLabel", + "description": "Create a new Gmail label. Labels help organize emails into categories.", + "inputSchema": { + "type": "object", + "properties": { + "name": { + "type": "string", + "minLength": 1, + "description": "The display name of the label." + }, + "labelListVisibility": { + "description": "Visibility of the label in the label list. Defaults to \"labelShow\".", + "type": "string", + "enum": ["labelShow", "labelHide", "labelShowIfUnread"] + }, + "messageListVisibility": { + "description": "Visibility of messages with this label in the message list. Defaults to \"show\".", + "type": "string", + "enum": ["show", "hide"] + } + }, + "required": ["name"], + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for gmail.createLabel" + } + ] + } + }, + { + "name": "time.getCurrentDate", + "description": "Gets the current date. Returns both UTC (for calendar/API use) and local time (for display to the user), along with the timezone.", + "inputSchema": { + "type": "object", + "properties": {}, + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for time.getCurrentDate" + } + ] + } + }, + { + "name": "time.getCurrentTime", + "description": "Gets the current time. Returns both UTC (for calendar/API use) and local time (for display to the user), along with the timezone.", + "inputSchema": { + "type": "object", + "properties": {}, + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for time.getCurrentTime" + } + ] + } + }, + { + "name": "time.getTimeZone", + "description": "Gets the local timezone. Note: timezone is also included in getCurrentDate and getCurrentTime responses.", + "inputSchema": { + "type": "object", + "properties": {}, + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for time.getTimeZone" + } + ] + } + }, + { + "name": "people.getUserProfile", + "description": "Gets a user's profile information.", + "inputSchema": { + "type": "object", + "properties": { + "userId": { + "description": "The ID of the user to get profile information for.", + "type": "string" + }, + "email": { + "description": "The email address of the user to get profile information for.", + "type": "string" + }, + "name": { + "description": "The name of the user to get profile information for.", + "type": "string" + } + }, + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for people.getUserProfile" + } + ] + } + }, + { + "name": "people.getMe", + "description": "Gets the profile information of the authenticated user.", + "inputSchema": { + "type": "object", + "properties": {}, + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for people.getMe" + } + ] + } + }, + { + "name": "people.getUserRelations", + "description": "Gets a user's relations (e.g., manager, spouse, assistant, etc.). Common relation types include: manager, assistant, spouse, partner, relative, mother, father, parent, sibling, child, friend, domesticPartner, referredBy. Defaults to the authenticated user if no userId is provided.", + "inputSchema": { + "type": "object", + "properties": { + "userId": { + "description": "The ID of the user to get relations for (e.g., \"110001608645105799644\" or \"people/110001608645105799644\"). Defaults to the authenticated user if not provided.", + "type": "string" + }, + "relationType": { + "description": "The type of relation to filter by (e.g., \"manager\", \"spouse\", \"assistant\"). If not provided, returns all relations.", + "type": "string" + } + }, + "$schema": "http://json-schema.org/draft-07/schema#" + }, + "response": { + "content": [ + { + "type": "text", + "text": "Stub response for people.getUserRelations" + } + ] + } + } + ] +} diff --git a/packages/test-utils/src/index.ts b/packages/test-utils/src/index.ts index 583cbc8a8b..42dd12bb43 100644 --- a/packages/test-utils/src/index.ts +++ b/packages/test-utils/src/index.ts @@ -7,3 +7,4 @@ export * from './file-system-test-helpers.js'; export * from './test-rig.js'; export * from './mock-utils.js'; +export * from './test-mcp-server.js'; diff --git a/packages/test-utils/src/test-mcp-server-template.mjs b/packages/test-utils/src/test-mcp-server-template.mjs new file mode 100644 index 0000000000..8eff0c81d0 --- /dev/null +++ b/packages/test-utils/src/test-mcp-server-template.mjs @@ -0,0 +1,69 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +import { Server } from '@modelcontextprotocol/sdk/server/index.js'; +import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'; +import { + ListToolsRequestSchema, + CallToolRequestSchema, +} from '@modelcontextprotocol/sdk/types.js'; +import fs from 'fs'; + +const configPath = process.argv[2]; +if (!configPath) { + console.error('Usage: node template.mjs '); + process.exit(1); +} + +const config = JSON.parse(fs.readFileSync(configPath, 'utf-8')); + +const server = new Server( + { + name: config.name, + version: config.version || '1.0.0', + }, + { + capabilities: { + tools: {}, + }, + }, +); + +// Add tools handler +server.setRequestHandler(ListToolsRequestSchema, async () => { + return { + tools: (config.tools || []).map((tool) => ({ + name: tool.name, + description: tool.description, + inputSchema: tool.inputSchema || { type: 'object', properties: {} }, + })), + }; +}); + +// Add call handler +server.setRequestHandler(CallToolRequestSchema, async (request) => { + const toolName = request.params.name; + const tool = (config.tools || []).find((t) => t.name === toolName); + + if (!tool) { + return { + content: [ + { + type: 'text', + text: `Error: Tool ${toolName} not found`, + }, + ], + isError: true, + }; + } + + return tool.response; +}); + +const transport = new StdioServerTransport(); +await server.connect(transport); +// server.connect resolves when transport connects, but listening continues +console.error(`Test MCP Server '${config.name}' connected and listening.`); diff --git a/packages/test-utils/src/test-mcp-server.ts b/packages/test-utils/src/test-mcp-server.ts new file mode 100644 index 0000000000..0fb25dd21a --- /dev/null +++ b/packages/test-utils/src/test-mcp-server.ts @@ -0,0 +1,75 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +/** + * Response structure for a test tool call. + */ +export interface TestToolResponse { + content: { type: 'text'; text: string }[]; + isError?: boolean; +} + +/** + * Definition of a test tool. + */ +export interface TestTool { + name: string; + description: string; + /** JSON Schema for input arguments */ + inputSchema?: Record; + response: TestToolResponse; +} + +/** + * Configuration structure for the generic test MCP server template. + */ +export interface TestMcpConfig { + name: string; + version?: string; + tools: TestTool[]; +} + +/** + * Builder to easily configure a Test MCP Server in tests. + */ +export class TestMcpServerBuilder { + private config: TestMcpConfig; + + constructor(name: string) { + this.config = { name, tools: [] }; + } + + /** + * Adds a tool to the test server configuration. + * @param name Tool name + * @param description Tool description + * @param response The response to return. Can be a string for simple text responses. + * @param inputSchema Optional JSON Schema for validation/documentation + */ + addTool( + name: string, + description: string, + response: TestToolResponse | string, + inputSchema?: Record, + ): this { + const responseObj = + typeof response === 'string' + ? { content: [{ type: 'text' as const, text: response }] } + : response; + + this.config.tools.push({ + name, + description, + inputSchema, + response: responseObj, + }); + return this; + } + + build(): TestMcpConfig { + return this.config; + } +} diff --git a/packages/test-utils/src/test-rig.ts b/packages/test-utils/src/test-rig.ts index ee091bee92..bf85697a5c 100644 --- a/packages/test-utils/src/test-rig.ts +++ b/packages/test-utils/src/test-rig.ts @@ -16,6 +16,7 @@ export { GEMINI_DIR }; import * as pty from '@lydell/node-pty'; import stripAnsi from 'strip-ansi'; import * as os from 'node:os'; +import type { TestMcpConfig } from './test-mcp-server.js'; const __dirname = dirname(fileURLToPath(import.meta.url)); const BUNDLE_PATH = join(__dirname, '..', '..', '..', 'bundle/gemini.js'); @@ -551,7 +552,95 @@ export class TestRig { } const scriptPath = join(this.testDir, fileName); writeFileSync(scriptPath, content); - return normalizePath(scriptPath); + return normalizePath(scriptPath)!; + } + + /** + * Adds a test MCP server to the test workspace. + * @param name The name of the server + * @param config Configuration object or name of predefined config (e.g. 'github') + */ + addTestMcpServer(name: string, config: TestMcpConfig | string) { + if (!this.testDir) { + throw new Error( + 'TestRig.setup must be called before adding test servers', + ); + } + + let testConfig: TestMcpConfig; + if (typeof config === 'string') { + const assetsDir = join(__dirname, '..', 'assets', 'test-servers'); + const configPath = join(assetsDir, `${config}.json`); + if (!fs.existsSync(configPath)) { + throw new Error( + `Predefined test server config not found: ${configPath}`, + ); + } + testConfig = JSON.parse(fs.readFileSync(configPath, 'utf-8')); + testConfig.name = name; // Override name + } else { + testConfig = config; + } + + const configFileName = `test-mcp-${name}.json`; + const scriptFileName = `test-mcp-${name}.mjs`; + + const configFilePath = join(this.testDir, configFileName); + const scriptFilePath = join(this.testDir, scriptFileName); + + // Write config + fs.writeFileSync(configFilePath, JSON.stringify(testConfig, null, 2)); + + // Copy template script + const templatePath = join(__dirname, 'test-mcp-server-template.mjs'); + if (!fs.existsSync(templatePath)) { + throw new Error(`Test template not found at ${templatePath}`); + } + + fs.copyFileSync(templatePath, scriptFilePath); + + // Calculate path to monorepo node_modules + const monorepoNodeModules = join( + __dirname, + '..', + '..', + '..', + 'node_modules', + ); + + // Create symlink to node_modules in testDir for ESM resolution + const testNodeModules = join(this.testDir, 'node_modules'); + if (!fs.existsSync(testNodeModules)) { + fs.symlinkSync(monorepoNodeModules, testNodeModules, 'dir'); + } + + // Update settings in workspace and home + const updateSettings = (dir: string) => { + const settingsPath = join(dir, GEMINI_DIR, 'settings.json'); + let settings: any = {}; + if (fs.existsSync(settingsPath)) { + settings = JSON.parse(fs.readFileSync(settingsPath, 'utf-8')); + } else { + fs.mkdirSync(join(dir, GEMINI_DIR), { recursive: true }); + } + + if (!settings.mcpServers) { + settings.mcpServers = {}; + } + + settings.mcpServers[name] = { + command: 'node', + args: [scriptFilePath, configFilePath], + // Removed env.NODE_PATH as it is ignored in ESM + }; + + fs.writeFileSync(settingsPath, JSON.stringify(settings, null, 2)); + }; + + updateSettings(this.testDir); + if (this.homeDir) { + updateSettings(this.homeDir); + } } private _getCleanEnv( diff --git a/schemas/settings.schema.json b/schemas/settings.schema.json index f836d5985e..3789b64d52 100644 --- a/schemas/settings.schema.json +++ b/schemas/settings.schema.json @@ -392,6 +392,13 @@ "default": false, "type": "boolean" }, + "collapseDrawerDuringApproval": { + "title": "Collapse Drawer During Approval", + "description": "Whether to collapse the UI drawer when a tool is awaiting confirmation.", + "markdownDescription": "Whether to collapse the UI drawer when a tool is awaiting confirmation.\n\n- Category: `UI`\n- Requires restart: `no`\n- Default: `true`", + "default": true, + "type": "boolean" + }, "showMemoryUsage": { "title": "Show Memory Usage", "description": "Display memory usage information in the UI", @@ -2673,8 +2680,8 @@ "enableAgents": { "title": "Enable Agents", "description": "Enable local and remote subagents.", - "markdownDescription": "Enable local and remote subagents.\n\n- Category: `Experimental`\n- Requires restart: `yes`\n- Default: `true`", - "default": true, + "markdownDescription": "Enable local and remote subagents.\n\n- Category: `Experimental`\n- Requires restart: `yes`\n- Default: `false`", + "default": false, "type": "boolean" }, "worktrees": { diff --git a/scripts/changed_prompt.js b/scripts/changed_prompt.js index 0ad0e365f7..22563810e4 100644 --- a/scripts/changed_prompt.js +++ b/scripts/changed_prompt.js @@ -5,14 +5,26 @@ */ import { execSync } from 'node:child_process'; -const EVALS_FILE_PREFIXES = [ +const CORE_STEERING_PATHS = [ 'packages/core/src/prompts/', 'packages/core/src/tools/', - 'evals/', +]; + +const TEST_PATHS = ['evals/']; + +const STEERING_SIGNATURES = [ + 'LocalAgentDefinition', + 'LocalInvocation', + 'ToolDefinition', + 'inputSchema', + "kind: 'local'", ]; function main() { const targetBranch = process.env.GITHUB_BASE_REF || 'main'; + const verbose = process.argv.includes('--verbose'); + const steeringOnly = process.argv.includes('--steering-only'); + try { const remoteUrl = process.env.GITHUB_REPOSITORY ? `https://github.com/${process.env.GITHUB_REPOSITORY}.git` @@ -30,18 +42,60 @@ function main() { .split('\n') .filter(Boolean); - const shouldRun = changedFiles.some((file) => - EVALS_FILE_PREFIXES.some((prefix) => file.startsWith(prefix)), - ); + let detected = false; + const reasons = []; - console.log(shouldRun ? 'true' : 'false'); + // 1. Path-based detection + for (const file of changedFiles) { + if (CORE_STEERING_PATHS.some((prefix) => file.startsWith(prefix))) { + detected = true; + reasons.push(`Matched core steering path: ${file}`); + if (!verbose) break; + } + if ( + !steeringOnly && + TEST_PATHS.some((prefix) => file.startsWith(prefix)) + ) { + detected = true; + reasons.push(`Matched test path: ${file}`); + if (!verbose) break; + } + } + + // 2. Signature-based detection (only in packages/core/src/ and only if not already detected or if verbose) + if (!detected || verbose) { + const coreChanges = changedFiles.filter((f) => + f.startsWith('packages/core/src/'), + ); + if (coreChanges.length > 0) { + // Get the actual diff content for core files + const diff = execSync( + `git diff -U0 FETCH_HEAD...HEAD -- packages/core/src/`, + { encoding: 'utf-8' }, + ); + for (const sig of STEERING_SIGNATURES) { + if (diff.includes(sig)) { + detected = true; + reasons.push(`Matched steering signature in core: ${sig}`); + if (!verbose) break; + } + } + } + } + + if (verbose && reasons.length > 0) { + process.stderr.write('Detection reasons:\n'); + reasons.forEach((r) => process.stderr.write(` - ${r}\n`)); + } + + process.stdout.write(detected ? 'true' : 'false'); } catch (error) { - // If anything fails (e.g., no git history), run evals to be safe - console.warn( - 'Warning: Failed to determine if evals should run. Defaulting to true.', + // If anything fails (e.g., no git history), run evals/guidance to be safe + process.stderr.write( + 'Warning: Failed to determine if changes occurred. Defaulting to true.\n', ); - console.error(error); - console.log('true'); + process.stderr.write(String(error) + '\n'); + process.stdout.write('true'); } }