diff --git a/conductor/tracks.md b/conductor/tracks.md new file mode 100644 index 0000000000..329d8b2f90 --- /dev/null +++ b/conductor/tracks.md @@ -0,0 +1,3 @@ +# Tracks + +- [Dynamic Thinking Budget](tracks/dynamic-thinking-budget/plan.md) diff --git a/conductor/tracks/dynamic-thinking-budget/plan.md b/conductor/tracks/dynamic-thinking-budget/plan.md new file mode 100644 index 0000000000..6248a8b90f --- /dev/null +++ b/conductor/tracks/dynamic-thinking-budget/plan.md @@ -0,0 +1,101 @@ +# Dynamic Thinking Budget Plan + +## Context + +The current Gemini CLI implementation uses static thinking configurations +defined in `settings.json` (or defaults). + +- **Gemini 2.x**: Uses a static `thinkingBudget` (e.g., 8192 tokens). +- **Gemini 3**: Uses a static `thinkingLevel` (e.g., "HIGH"). + +This "one-size-fits-all" approach is inefficient. Simple queries waste compute, +while complex queries might not get enough reasoning depth. The goal is to +implement an "Adaptive Budget Manager" that dynamically adjusts the +`thinkingBudget` (for v2) or `thinkingLevel` (for v3) based on the complexity of +the user's request. + +## Goals + +- Implement a **Complexity Classifier** using a lightweight model (e.g., Gemini + Flash) to analyze the user's prompt and history. +- **Map complexity levels** to: + - `thinkingBudget` token counts for Gemini 2.x models. + - `thinkingLevel` enums for Gemini 3 models. +- **Dynamically update** the `GenerateContentConfig` in `GeminiClient` before + the main model call. +- Ensure **fallback mechanisms** if the classification fails. +- (Optional) **Visual feedback** to the user regarding the determined + complexity. + +## Strategy + +### 1. Adaptive Budget Manager Service + +Create a new service `AdaptiveBudgetService` in +`packages/core/src/services/adaptiveBudgetService.ts`. + +- **Functionality**: + - Takes `userPrompt` and `recentHistory` as input. + - Calls Gemini Flash (using `config.getBaseLlmClient()`) with a specialized + system prompt. + - Returns a `ComplexityLevel` (1-4). + +### 2. Budget/Level Mapping + +| Complexity Level | Gemini 2.x (`thinkingBudget`) | Gemini 3 (`thinkingLevel`) | Description | +| :--------------- | :---------------------------- | :------------------------- | :----------------------------- | +| **1 (Simple)** | 1,024 tokens | `LOW` | Quick fixes, syntax questions. | +| **2 (Moderate)** | 4,096 tokens | `MEDIUM` (or `LOW`) | Function-level logic. | +| **3 (High)** | 16,384 tokens | `HIGH` | Module-level refactoring. | +| **4 (Extreme)** | 32,768+ tokens | `HIGH` | Architecture, deep debugging. | + +### 3. Integration Point + +Modify `packages/core/src/core/client.ts` to invoke the `AdaptiveBudgetService` +before `sendMessageStream`. + +- **Flow**: + 1. User sends message. + 2. `GeminiClient` identifies the target model family (v2 or v3). + 3. Call `AdaptiveBudgetService.determineComplexity()`. + 4. If **v2**: Calculate `thinkingBudget` based on complexity. Update config. + 5. If **v3**: Calculate `thinkingLevel` based on complexity. Update config. + 6. Proceed with `sendMessageStream`. + +### 4. Configuration + +Add settings to `packages/core/src/config/config.ts` and `settings.schema.json`: + +- `adaptiveThinking.enabled`: boolean (default true) +- `adaptiveThinking.classifierModel`: string (default "gemini-2.0-flash") + +## Insights from "J1: Exploring Simple Test-Time Scaling (STTS)" + +The paper (arXiv:2505.xxxx / 2512.19585) highlights that models trained with +Reinforcement Learning (like Gemini 3) exhibit strong scaling trends when +allocated more inference-time compute. + +- **Budget Forcing**: The "Adaptive Budget Manager" implements this by forcing + higher `thinkingLevel` or `thinkingBudget` for harder tasks, maximizing the + "verifiable reward" (correct code) for complex problems while saving latency + on simple ones. +- **Best-of-N**: The paper suggests that generating N solutions and selecting + the best one is a powerful STTS method. While out of scope for _this_ specific + track, the "Complexity Classifier" we build here is the _prerequisite_ for + that future feature. We should only trigger expensive "Best-of-N" flows when + the Complexity Level is 3 or 4. + +## Files to Modify + +- `packages/core/src/services/adaptiveBudgetService.ts` (New) +- `packages/core/src/core/client.ts` +- `packages/core/src/config/config.ts` + +## Verification Plan + +1. **Unit Tests**: Verify `AdaptiveBudgetService` returns correct mappings for + both model families. +2. **Integration Tests**: Mock API calls to ensure `thinkingLevel` is sent for + v3 and `thinkingBudget` for v2. +3. **Manual Verification**: Use debug logs to verify the correct parameters are + being sent to the API. diff --git a/packages/cli/src/config/config.ts b/packages/cli/src/config/config.ts index 440f6e7a90..709de1f9a6 100755 --- a/packages/cli/src/config/config.ts +++ b/packages/cli/src/config/config.ts @@ -716,6 +716,7 @@ export async function loadCliConfig( settings.experimental?.codebaseInvestigatorSettings, introspectionAgentSettings: settings.experimental?.introspectionAgentSettings, + adaptiveThinking: settings.experimental?.adaptiveThinking, fakeResponses: argv.fakeResponses, recordResponses: argv.recordResponses, retryFetchErrors: settings.general?.retryFetchErrors, diff --git a/packages/cli/src/config/settingsSchema.ts b/packages/cli/src/config/settingsSchema.ts index a2d7b2a008..0677f3d3d8 100644 --- a/packages/cli/src/config/settingsSchema.ts +++ b/packages/cli/src/config/settingsSchema.ts @@ -1473,6 +1473,37 @@ const SETTINGS_SCHEMA = { }, }, }, + adaptiveThinking: { + type: 'object', + label: 'Adaptive Thinking Settings', + category: 'Experimental', + requiresRestart: false, + default: {}, + description: 'Configuration for Adaptive Thinking Budget.', + showInDialog: false, + properties: { + enabled: { + type: 'boolean', + label: 'Enable Adaptive Thinking', + category: 'Experimental', + requiresRestart: false, + default: false, + description: + 'Enable adaptive thinking budget based on task complexity.', + showInDialog: true, + }, + classifierModel: { + type: 'string', + label: 'Classifier Model', + category: 'Experimental', + requiresRestart: false, + default: 'classifier', + description: + 'The model (or alias) to use for complexity classification.', + showInDialog: false, + }, + }, + }, }, }, diff --git a/packages/core/src/config/config.ts b/packages/core/src/config/config.ts index bf16680f44..fe677b98e0 100644 --- a/packages/core/src/config/config.ts +++ b/packages/core/src/config/config.ts @@ -73,6 +73,7 @@ import type { ModelConfigServiceConfig } from '../services/modelConfigService.js import { ModelConfigService } from '../services/modelConfigService.js'; import { DEFAULT_MODEL_CONFIGS } from './defaultModelConfigs.js'; import { ContextManager } from '../services/contextManager.js'; +import { AdaptiveBudgetService } from '../services/adaptiveBudgetService.js'; // Re-export OAuth config type export type { MCPOAuthConfig, AnyToolInvocation }; @@ -335,6 +336,10 @@ export interface ConfigParameters { disableModelRouterForAuth?: AuthType[]; codebaseInvestigatorSettings?: CodebaseInvestigatorSettings; introspectionAgentSettings?: IntrospectionAgentSettings; + adaptiveThinking?: { + enabled?: boolean; + classifierModel?: string; + }; continueOnFailedApiCall?: boolean; retryFetchErrors?: boolean; enableShellOutputEfficiency?: boolean; @@ -460,6 +465,10 @@ export class Config { private readonly outputSettings: OutputSettings; private readonly codebaseInvestigatorSettings: CodebaseInvestigatorSettings; private readonly introspectionAgentSettings: IntrospectionAgentSettings; + private readonly adaptiveThinking: { + enabled: boolean; + classifierModel: string; + }; private readonly continueOnFailedApiCall: boolean; private readonly retryFetchErrors: boolean; private readonly enableShellOutputEfficiency: boolean; @@ -491,6 +500,7 @@ export class Config { private readonly experimentalJitContext: boolean; private contextManager?: ContextManager; private terminalBackground: string | undefined = undefined; + private adaptiveBudgetService!: AdaptiveBudgetService; constructor(params: ConfigParameters) { this.sessionId = params.sessionId; @@ -618,6 +628,10 @@ export class Config { this.introspectionAgentSettings = { enabled: params.introspectionAgentSettings?.enabled ?? false, }; + this.adaptiveThinking = { + enabled: params.adaptiveThinking?.enabled ?? false, + classifierModel: params.adaptiveThinking?.classifierModel ?? 'classifier', + }; this.continueOnFailedApiCall = params.continueOnFailedApiCall ?? true; this.enableShellOutputEfficiency = params.enableShellOutputEfficiency ?? true; @@ -763,6 +777,13 @@ export class Config { await this.contextManager.refresh(); } + this.adaptiveBudgetService = new AdaptiveBudgetService(this); + if (this.adaptiveThinking.enabled) { + debugLogger.debug( + `Adaptive Thinking Budget enabled (classifier: ${this.adaptiveThinking.classifierModel})`, + ); + } + await this.geminiClient.initialize(); } @@ -770,6 +791,10 @@ export class Config { return this.contentGenerator; } + getAdaptiveBudgetService(): AdaptiveBudgetService { + return this.adaptiveBudgetService; + } + async refreshAuth(authMethod: AuthType) { // Reset availability service when switching auth this.modelAvailabilityService.reset(); @@ -1664,6 +1689,10 @@ export class Config { return this.introspectionAgentSettings; } + getAdaptiveThinkingConfig(): { enabled: boolean; classifierModel: string } { + return this.adaptiveThinking; + } + async createToolRegistry(): Promise { const registry = new ToolRegistry(this, this.messageBus); diff --git a/packages/core/src/core/client.ts b/packages/core/src/core/client.ts index ecd1eff471..4f9d7eb21f 100644 --- a/packages/core/src/core/client.ts +++ b/packages/core/src/core/client.ts @@ -28,6 +28,7 @@ import { GeminiChat } from './geminiChat.js'; import { retryWithBackoff } from '../utils/retry.js'; import { getErrorMessage } from '../utils/errors.js'; import { tokenLimit } from './tokenLimits.js'; +import { partListUnionToString } from './geminiRequest.js'; import type { ChatRecordingService, ResumedSessionData, @@ -620,6 +621,25 @@ export class GeminiClient { // availability logic const modelConfigKey: ModelConfigKey = { model: modelToUse }; + + // Adaptive Thinking Budget Integration + if ( + !isInvalidStreamRetry && + this.config.getAdaptiveThinkingConfig().enabled + ) { + const userMessage = partListUnionToString(request); + if (userMessage) { + const adaptiveConfig = await this.config + .getAdaptiveBudgetService() + .determineAdaptiveConfig(userMessage, modelToUse); + + if (adaptiveConfig) { + modelConfigKey.thinkingBudget = adaptiveConfig.thinkingBudget; + modelConfigKey.thinkingLevel = adaptiveConfig.thinkingLevel; + } + } + } + const { model: finalModel } = applyModelSelection( this.config, modelConfigKey, diff --git a/packages/core/src/services/adaptiveBudgetService.test.ts b/packages/core/src/services/adaptiveBudgetService.test.ts new file mode 100644 index 0000000000..f8c549e6de --- /dev/null +++ b/packages/core/src/services/adaptiveBudgetService.test.ts @@ -0,0 +1,88 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ +import { describe, it, expect, vi } from 'vitest'; +import { + AdaptiveBudgetService, + ComplexityLevel, +} from './adaptiveBudgetService.js'; +import type { Config } from '../config/config.js'; +import { ThinkingLevel } from '@google/genai'; + +describe('AdaptiveBudgetService', () => { + it('should map complexity levels to correct V2 budgets', () => { + const service = new AdaptiveBudgetService({} as Config); + expect(service.getThinkingBudgetV2(ComplexityLevel.SIMPLE)).toBe(1024); + expect(service.getThinkingBudgetV2(ComplexityLevel.MODERATE)).toBe(4096); + expect(service.getThinkingBudgetV2(ComplexityLevel.HIGH)).toBe(16384); + expect(service.getThinkingBudgetV2(ComplexityLevel.EXTREME)).toBe(32768); + }); + + it('should map complexity levels to correct V3 levels', () => { + const service = new AdaptiveBudgetService({} as Config); + expect(service.getThinkingLevelV3(ComplexityLevel.SIMPLE)).toBe( + ThinkingLevel.LOW, + ); + expect(service.getThinkingLevelV3(ComplexityLevel.MODERATE)).toBe( + ThinkingLevel.LOW, + ); + expect(service.getThinkingLevelV3(ComplexityLevel.HIGH)).toBe( + ThinkingLevel.HIGH, + ); + expect(service.getThinkingLevelV3(ComplexityLevel.EXTREME)).toBe( + ThinkingLevel.HIGH, + ); + }); + + it('should determine adaptive config based on LLM response', async () => { + const mockGenerateContent = vi.fn().mockResolvedValue({ + candidates: [{ content: { parts: [{ text: '3' }] } }], + }); + + const mockConfig = { + getBaseLlmClient: () => ({ + generateContent: mockGenerateContent, + }), + getAdaptiveThinkingConfig: () => ({ + enabled: true, + classifierModel: 'gemini-2.0-flash', + }), + } as unknown as Config; + + const service = new AdaptiveBudgetService(mockConfig); + const result = await service.determineAdaptiveConfig( + 'Complex task', + 'gemini-2.5-pro', + ); + + expect(result?.complexity).toBe(ComplexityLevel.HIGH); + expect(result?.thinkingBudget).toBe(16384); + expect(mockGenerateContent).toHaveBeenCalled(); + }); + + it('should handle Gemini 3 models with thinkingLevel', async () => { + const mockConfig = { + getBaseLlmClient: () => ({ + generateContent: vi.fn().mockResolvedValue({ + candidates: [{ content: { parts: [{ text: '1' }] } }], + }), + }), + getAdaptiveThinkingConfig: () => ({ + enabled: true, + classifierModel: 'gemini-2.0-flash', + }), + } as unknown as Config; + + const service = new AdaptiveBudgetService(mockConfig); + const result = await service.determineAdaptiveConfig( + 'Hi', + 'gemini-3-pro-preview', + ); + + expect(result?.complexity).toBe(ComplexityLevel.SIMPLE); + expect(result?.thinkingLevel).toBe(ThinkingLevel.LOW); + expect(result?.thinkingBudget).toBeUndefined(); + }); +}); diff --git a/packages/core/src/services/adaptiveBudgetService.ts b/packages/core/src/services/adaptiveBudgetService.ts new file mode 100644 index 0000000000..f92697e9a3 --- /dev/null +++ b/packages/core/src/services/adaptiveBudgetService.ts @@ -0,0 +1,132 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ +import type { Config } from '../config/config.js'; +import { debugLogger } from '../utils/debugLogger.js'; +import { isGemini2Model, isPreviewModel } from '../config/models.js'; +import { ThinkingLevel } from '@google/genai'; + +export enum ComplexityLevel { + SIMPLE = 1, + MODERATE = 2, + HIGH = 3, + EXTREME = 4, +} + +export const BUDGET_MAPPING_V2: Record = { + [ComplexityLevel.SIMPLE]: 1024, + [ComplexityLevel.MODERATE]: 4096, + [ComplexityLevel.HIGH]: 16384, + [ComplexityLevel.EXTREME]: 32768, +}; + +export const LEVEL_MAPPING_V3: Record = { + [ComplexityLevel.SIMPLE]: ThinkingLevel.LOW, + [ComplexityLevel.MODERATE]: ThinkingLevel.LOW, + [ComplexityLevel.HIGH]: ThinkingLevel.HIGH, + [ComplexityLevel.EXTREME]: ThinkingLevel.HIGH, +}; + +export interface AdaptiveBudgetResult { + complexity: ComplexityLevel; + thinkingBudget?: number; + thinkingLevel?: ThinkingLevel; + strategyNote?: string; +} + +export class AdaptiveBudgetService { + constructor(private config: Config) {} + + /** + * Analyzes the user prompt and determines the optimal thinking configuration. + * + * Note on future scaling (per arXiv:2512.19585): + * At Complexity 4 (Extreme), we should consider: + * 1. Best-of-N: Generate multiple solutions. + * 2. LLM-as-a-Judge: Use a strong model to evaluate candidates. + * 3. Compiler Verification: Check code correctness via environment tools. + */ + async determineAdaptiveConfig( + userPrompt: string, + model: string, + ): Promise { + const { classifierModel } = this.config.getAdaptiveThinkingConfig(); + + try { + const llm = this.config.getBaseLlmClient(); + debugLogger.debug( + `AdaptiveBudgetService: Classifying prompt complexity using ${classifierModel}...`, + ); + const systemPrompt = `You are a complexity classifier for a coding assistant. +Analyze the user's request and determine the complexity of the task. +Output ONLY a single integer from 1 to 4 based on the following scale: + +1 (Simple): Quick fixes, syntax questions, simple explanations, greetings. +2 (Moderate): Function-level logic, writing small scripts, standard debugging. +3 (High): Module-level refactoring, complex feature implementation, multi-file changes. +4 (Extreme): Architecture design, deep root-cause analysis of obscure bugs, large-scale migrations. + +Request: ${userPrompt} +Complexity Level:`; + + const response = await llm.generateContent({ + modelConfigKey: { model: classifierModel }, + contents: [{ role: 'user', parts: [{ text: systemPrompt }] }], + promptId: 'adaptive-budget-classifier', + abortSignal: new AbortController().signal, + }); + + const text = response.candidates?.[0]?.content?.parts?.[0]?.text?.trim(); + if (!text) { + debugLogger.debug( + 'AdaptiveBudgetService: No response from classifier.', + ); + return undefined; + } + + const level = parseInt(text, 10) as ComplexityLevel; + if (isNaN(level) || level < 1 || level > 4) { + debugLogger.debug( + `AdaptiveBudgetService: Invalid complexity level returned: ${text}`, + ); + return undefined; + } + + const result: AdaptiveBudgetResult = { complexity: level }; + + // Determine mapping based on model version + // Gemini 3 uses ThinkingLevel, Gemini 2.x uses thinkingBudget + if (isPreviewModel(model)) { + result.thinkingLevel = LEVEL_MAPPING_V3[level] ?? ThinkingLevel.HIGH; + } else if (isGemini2Model(model)) { + result.thinkingBudget = BUDGET_MAPPING_V2[level]; + } + + if (level === ComplexityLevel.EXTREME) { + result.strategyNote = + 'EXTREME complexity detected. Future implementations should use Best-of-N + Verification.'; + } + + debugLogger.debug( + `AdaptiveBudgetService: Complexity ${level} -> Thinking Param: ${result.thinkingLevel || result.thinkingBudget}`, + ); + return result; + } catch (error) { + debugLogger.error( + 'AdaptiveBudgetService: Error classifying complexity', + error, + ); + return undefined; + } + } + + getThinkingBudgetV2(level: ComplexityLevel): number { + return BUDGET_MAPPING_V2[level]; + } + + getThinkingLevelV3(level: ComplexityLevel): ThinkingLevel { + return LEVEL_MAPPING_V3[level] ?? ThinkingLevel.HIGH; + } +} diff --git a/packages/core/src/services/modelConfigService.ts b/packages/core/src/services/modelConfigService.ts index 0b86baa4ad..aa5ad3966b 100644 --- a/packages/core/src/services/modelConfigService.ts +++ b/packages/core/src/services/modelConfigService.ts @@ -4,7 +4,7 @@ * SPDX-License-Identifier: Apache-2.0 */ -import type { GenerateContentConfig } from '@google/genai'; +import type { GenerateContentConfig, ThinkingLevel } from '@google/genai'; // The primary key for the ModelConfig is the model string. However, we also // support a secondary key to limit the override scope, typically an agent name. @@ -26,6 +26,10 @@ export interface ModelConfigKey { // This allows overrides to specify different settings (e.g., higher temperature) // specifically for retry scenarios. isRetry?: boolean; + + // Dynamic thinking configuration determined at runtime (e.g. via complexity classification) + thinkingBudget?: number; + thinkingLevel?: ThinkingLevel; } export interface ModelConfig { @@ -205,6 +209,22 @@ export class ModelConfigService { } } + // Apply dynamic thinking parameters from context if present + if ( + context.thinkingBudget !== undefined || + context.thinkingLevel !== undefined + ) { + resolvedConfig.thinkingConfig = { + ...(resolvedConfig.thinkingConfig as object), + ...(context.thinkingBudget !== undefined + ? { thinkingBudget: context.thinkingBudget } + : {}), + ...(context.thinkingLevel !== undefined + ? { thinkingLevel: context.thinkingLevel } + : {}), + }; + } + return { model: baseModel, generateContentConfig: resolvedConfig, diff --git a/schemas/settings.schema.json b/schemas/settings.schema.json index 4900fa25d6..8b9a2e3686 100644 --- a/schemas/settings.schema.json +++ b/schemas/settings.schema.json @@ -1441,6 +1441,30 @@ } }, "additionalProperties": false + }, + "adaptiveThinking": { + "title": "Adaptive Thinking Settings", + "description": "Configuration for Adaptive Thinking Budget.", + "markdownDescription": "Configuration for Adaptive Thinking Budget.\n\n- Category: `Experimental`\n- Requires restart: `no`\n- Default: `{}`", + "default": {}, + "type": "object", + "properties": { + "enabled": { + "title": "Enable Adaptive Thinking", + "description": "Enable adaptive thinking budget based on task complexity.", + "markdownDescription": "Enable adaptive thinking budget based on task complexity.\n\n- Category: `Experimental`\n- Requires restart: `no`\n- Default: `false`", + "default": false, + "type": "boolean" + }, + "classifierModel": { + "title": "Classifier Model", + "description": "The model (or alias) to use for complexity classification.", + "markdownDescription": "The model (or alias) to use for complexity classification.\n\n- Category: `Experimental`\n- Requires restart: `no`\n- Default: `classifier`", + "default": "classifier", + "type": "string" + } + }, + "additionalProperties": false } }, "additionalProperties": false