feat: implement adaptive thinking budget

This commit is contained in:
Adam Weidman
2026-01-06 15:54:07 -08:00
parent 6f4b2ad0b9
commit 2404e4fae8
10 changed files with 450 additions and 1 deletions

3
conductor/tracks.md Normal file
View File

@@ -0,0 +1,3 @@
# Tracks
- [Dynamic Thinking Budget](tracks/dynamic-thinking-budget/plan.md)

View File

@@ -0,0 +1,101 @@
# Dynamic Thinking Budget Plan
## Context
The current Gemini CLI implementation uses static thinking configurations
defined in `settings.json` (or defaults).
- **Gemini 2.x**: Uses a static `thinkingBudget` (e.g., 8192 tokens).
- **Gemini 3**: Uses a static `thinkingLevel` (e.g., "HIGH").
This "one-size-fits-all" approach is inefficient. Simple queries waste compute,
while complex queries might not get enough reasoning depth. The goal is to
implement an "Adaptive Budget Manager" that dynamically adjusts the
`thinkingBudget` (for v2) or `thinkingLevel` (for v3) based on the complexity of
the user's request.
## Goals
- Implement a **Complexity Classifier** using a lightweight model (e.g., Gemini
Flash) to analyze the user's prompt and history.
- **Map complexity levels** to:
- `thinkingBudget` token counts for Gemini 2.x models.
- `thinkingLevel` enums for Gemini 3 models.
- **Dynamically update** the `GenerateContentConfig` in `GeminiClient` before
the main model call.
- Ensure **fallback mechanisms** if the classification fails.
- (Optional) **Visual feedback** to the user regarding the determined
complexity.
## Strategy
### 1. Adaptive Budget Manager Service
Create a new service `AdaptiveBudgetService` in
`packages/core/src/services/adaptiveBudgetService.ts`.
- **Functionality**:
- Takes `userPrompt` and `recentHistory` as input.
- Calls Gemini Flash (using `config.getBaseLlmClient()`) with a specialized
system prompt.
- Returns a `ComplexityLevel` (1-4).
### 2. Budget/Level Mapping
| Complexity Level | Gemini 2.x (`thinkingBudget`) | Gemini 3 (`thinkingLevel`) | Description |
| :--------------- | :---------------------------- | :------------------------- | :----------------------------- |
| **1 (Simple)** | 1,024 tokens | `LOW` | Quick fixes, syntax questions. |
| **2 (Moderate)** | 4,096 tokens | `MEDIUM` (or `LOW`) | Function-level logic. |
| **3 (High)** | 16,384 tokens | `HIGH` | Module-level refactoring. |
| **4 (Extreme)** | 32,768+ tokens | `HIGH` | Architecture, deep debugging. |
### 3. Integration Point
Modify `packages/core/src/core/client.ts` to invoke the `AdaptiveBudgetService`
before `sendMessageStream`.
- **Flow**:
1. User sends message.
2. `GeminiClient` identifies the target model family (v2 or v3).
3. Call `AdaptiveBudgetService.determineComplexity()`.
4. If **v2**: Calculate `thinkingBudget` based on complexity. Update config.
5. If **v3**: Calculate `thinkingLevel` based on complexity. Update config.
6. Proceed with `sendMessageStream`.
### 4. Configuration
Add settings to `packages/core/src/config/config.ts` and `settings.schema.json`:
- `adaptiveThinking.enabled`: boolean (default true)
- `adaptiveThinking.classifierModel`: string (default "gemini-2.0-flash")
## Insights from "J1: Exploring Simple Test-Time Scaling (STTS)"
The paper (arXiv:2505.xxxx / 2512.19585) highlights that models trained with
Reinforcement Learning (like Gemini 3) exhibit strong scaling trends when
allocated more inference-time compute.
- **Budget Forcing**: The "Adaptive Budget Manager" implements this by forcing
higher `thinkingLevel` or `thinkingBudget` for harder tasks, maximizing the
"verifiable reward" (correct code) for complex problems while saving latency
on simple ones.
- **Best-of-N**: The paper suggests that generating N solutions and selecting
the best one is a powerful STTS method. While out of scope for _this_ specific
track, the "Complexity Classifier" we build here is the _prerequisite_ for
that future feature. We should only trigger expensive "Best-of-N" flows when
the Complexity Level is 3 or 4.
## Files to Modify
- `packages/core/src/services/adaptiveBudgetService.ts` (New)
- `packages/core/src/core/client.ts`
- `packages/core/src/config/config.ts`
## Verification Plan
1. **Unit Tests**: Verify `AdaptiveBudgetService` returns correct mappings for
both model families.
2. **Integration Tests**: Mock API calls to ensure `thinkingLevel` is sent for
v3 and `thinkingBudget` for v2.
3. **Manual Verification**: Use debug logs to verify the correct parameters are
being sent to the API.

View File

@@ -716,6 +716,7 @@ export async function loadCliConfig(
settings.experimental?.codebaseInvestigatorSettings,
introspectionAgentSettings:
settings.experimental?.introspectionAgentSettings,
adaptiveThinking: settings.experimental?.adaptiveThinking,
fakeResponses: argv.fakeResponses,
recordResponses: argv.recordResponses,
retryFetchErrors: settings.general?.retryFetchErrors,

View File

@@ -1473,6 +1473,37 @@ const SETTINGS_SCHEMA = {
},
},
},
adaptiveThinking: {
type: 'object',
label: 'Adaptive Thinking Settings',
category: 'Experimental',
requiresRestart: false,
default: {},
description: 'Configuration for Adaptive Thinking Budget.',
showInDialog: false,
properties: {
enabled: {
type: 'boolean',
label: 'Enable Adaptive Thinking',
category: 'Experimental',
requiresRestart: false,
default: false,
description:
'Enable adaptive thinking budget based on task complexity.',
showInDialog: true,
},
classifierModel: {
type: 'string',
label: 'Classifier Model',
category: 'Experimental',
requiresRestart: false,
default: 'classifier',
description:
'The model (or alias) to use for complexity classification.',
showInDialog: false,
},
},
},
},
},

View File

@@ -73,6 +73,7 @@ import type { ModelConfigServiceConfig } from '../services/modelConfigService.js
import { ModelConfigService } from '../services/modelConfigService.js';
import { DEFAULT_MODEL_CONFIGS } from './defaultModelConfigs.js';
import { ContextManager } from '../services/contextManager.js';
import { AdaptiveBudgetService } from '../services/adaptiveBudgetService.js';
// Re-export OAuth config type
export type { MCPOAuthConfig, AnyToolInvocation };
@@ -335,6 +336,10 @@ export interface ConfigParameters {
disableModelRouterForAuth?: AuthType[];
codebaseInvestigatorSettings?: CodebaseInvestigatorSettings;
introspectionAgentSettings?: IntrospectionAgentSettings;
adaptiveThinking?: {
enabled?: boolean;
classifierModel?: string;
};
continueOnFailedApiCall?: boolean;
retryFetchErrors?: boolean;
enableShellOutputEfficiency?: boolean;
@@ -460,6 +465,10 @@ export class Config {
private readonly outputSettings: OutputSettings;
private readonly codebaseInvestigatorSettings: CodebaseInvestigatorSettings;
private readonly introspectionAgentSettings: IntrospectionAgentSettings;
private readonly adaptiveThinking: {
enabled: boolean;
classifierModel: string;
};
private readonly continueOnFailedApiCall: boolean;
private readonly retryFetchErrors: boolean;
private readonly enableShellOutputEfficiency: boolean;
@@ -491,6 +500,7 @@ export class Config {
private readonly experimentalJitContext: boolean;
private contextManager?: ContextManager;
private terminalBackground: string | undefined = undefined;
private adaptiveBudgetService!: AdaptiveBudgetService;
constructor(params: ConfigParameters) {
this.sessionId = params.sessionId;
@@ -618,6 +628,10 @@ export class Config {
this.introspectionAgentSettings = {
enabled: params.introspectionAgentSettings?.enabled ?? false,
};
this.adaptiveThinking = {
enabled: params.adaptiveThinking?.enabled ?? false,
classifierModel: params.adaptiveThinking?.classifierModel ?? 'classifier',
};
this.continueOnFailedApiCall = params.continueOnFailedApiCall ?? true;
this.enableShellOutputEfficiency =
params.enableShellOutputEfficiency ?? true;
@@ -763,6 +777,13 @@ export class Config {
await this.contextManager.refresh();
}
this.adaptiveBudgetService = new AdaptiveBudgetService(this);
if (this.adaptiveThinking.enabled) {
debugLogger.debug(
`Adaptive Thinking Budget enabled (classifier: ${this.adaptiveThinking.classifierModel})`,
);
}
await this.geminiClient.initialize();
}
@@ -770,6 +791,10 @@ export class Config {
return this.contentGenerator;
}
getAdaptiveBudgetService(): AdaptiveBudgetService {
return this.adaptiveBudgetService;
}
async refreshAuth(authMethod: AuthType) {
// Reset availability service when switching auth
this.modelAvailabilityService.reset();
@@ -1664,6 +1689,10 @@ export class Config {
return this.introspectionAgentSettings;
}
getAdaptiveThinkingConfig(): { enabled: boolean; classifierModel: string } {
return this.adaptiveThinking;
}
async createToolRegistry(): Promise<ToolRegistry> {
const registry = new ToolRegistry(this, this.messageBus);

View File

@@ -28,6 +28,7 @@ import { GeminiChat } from './geminiChat.js';
import { retryWithBackoff } from '../utils/retry.js';
import { getErrorMessage } from '../utils/errors.js';
import { tokenLimit } from './tokenLimits.js';
import { partListUnionToString } from './geminiRequest.js';
import type {
ChatRecordingService,
ResumedSessionData,
@@ -620,6 +621,25 @@ export class GeminiClient {
// availability logic
const modelConfigKey: ModelConfigKey = { model: modelToUse };
// Adaptive Thinking Budget Integration
if (
!isInvalidStreamRetry &&
this.config.getAdaptiveThinkingConfig().enabled
) {
const userMessage = partListUnionToString(request);
if (userMessage) {
const adaptiveConfig = await this.config
.getAdaptiveBudgetService()
.determineAdaptiveConfig(userMessage, modelToUse);
if (adaptiveConfig) {
modelConfigKey.thinkingBudget = adaptiveConfig.thinkingBudget;
modelConfigKey.thinkingLevel = adaptiveConfig.thinkingLevel;
}
}
}
const { model: finalModel } = applyModelSelection(
this.config,
modelConfigKey,

View File

@@ -0,0 +1,88 @@
/**
* @license
* Copyright 2026 Google LLC
* SPDX-License-Identifier: Apache-2.0
*/
import { describe, it, expect, vi } from 'vitest';
import {
AdaptiveBudgetService,
ComplexityLevel,
} from './adaptiveBudgetService.js';
import type { Config } from '../config/config.js';
import { ThinkingLevel } from '@google/genai';
describe('AdaptiveBudgetService', () => {
it('should map complexity levels to correct V2 budgets', () => {
const service = new AdaptiveBudgetService({} as Config);
expect(service.getThinkingBudgetV2(ComplexityLevel.SIMPLE)).toBe(1024);
expect(service.getThinkingBudgetV2(ComplexityLevel.MODERATE)).toBe(4096);
expect(service.getThinkingBudgetV2(ComplexityLevel.HIGH)).toBe(16384);
expect(service.getThinkingBudgetV2(ComplexityLevel.EXTREME)).toBe(32768);
});
it('should map complexity levels to correct V3 levels', () => {
const service = new AdaptiveBudgetService({} as Config);
expect(service.getThinkingLevelV3(ComplexityLevel.SIMPLE)).toBe(
ThinkingLevel.LOW,
);
expect(service.getThinkingLevelV3(ComplexityLevel.MODERATE)).toBe(
ThinkingLevel.LOW,
);
expect(service.getThinkingLevelV3(ComplexityLevel.HIGH)).toBe(
ThinkingLevel.HIGH,
);
expect(service.getThinkingLevelV3(ComplexityLevel.EXTREME)).toBe(
ThinkingLevel.HIGH,
);
});
it('should determine adaptive config based on LLM response', async () => {
const mockGenerateContent = vi.fn().mockResolvedValue({
candidates: [{ content: { parts: [{ text: '3' }] } }],
});
const mockConfig = {
getBaseLlmClient: () => ({
generateContent: mockGenerateContent,
}),
getAdaptiveThinkingConfig: () => ({
enabled: true,
classifierModel: 'gemini-2.0-flash',
}),
} as unknown as Config;
const service = new AdaptiveBudgetService(mockConfig);
const result = await service.determineAdaptiveConfig(
'Complex task',
'gemini-2.5-pro',
);
expect(result?.complexity).toBe(ComplexityLevel.HIGH);
expect(result?.thinkingBudget).toBe(16384);
expect(mockGenerateContent).toHaveBeenCalled();
});
it('should handle Gemini 3 models with thinkingLevel', async () => {
const mockConfig = {
getBaseLlmClient: () => ({
generateContent: vi.fn().mockResolvedValue({
candidates: [{ content: { parts: [{ text: '1' }] } }],
}),
}),
getAdaptiveThinkingConfig: () => ({
enabled: true,
classifierModel: 'gemini-2.0-flash',
}),
} as unknown as Config;
const service = new AdaptiveBudgetService(mockConfig);
const result = await service.determineAdaptiveConfig(
'Hi',
'gemini-3-pro-preview',
);
expect(result?.complexity).toBe(ComplexityLevel.SIMPLE);
expect(result?.thinkingLevel).toBe(ThinkingLevel.LOW);
expect(result?.thinkingBudget).toBeUndefined();
});
});

View File

@@ -0,0 +1,132 @@
/**
* @license
* Copyright 2026 Google LLC
* SPDX-License-Identifier: Apache-2.0
*/
import type { Config } from '../config/config.js';
import { debugLogger } from '../utils/debugLogger.js';
import { isGemini2Model, isPreviewModel } from '../config/models.js';
import { ThinkingLevel } from '@google/genai';
export enum ComplexityLevel {
SIMPLE = 1,
MODERATE = 2,
HIGH = 3,
EXTREME = 4,
}
export const BUDGET_MAPPING_V2: Record<ComplexityLevel, number> = {
[ComplexityLevel.SIMPLE]: 1024,
[ComplexityLevel.MODERATE]: 4096,
[ComplexityLevel.HIGH]: 16384,
[ComplexityLevel.EXTREME]: 32768,
};
export const LEVEL_MAPPING_V3: Record<ComplexityLevel, ThinkingLevel> = {
[ComplexityLevel.SIMPLE]: ThinkingLevel.LOW,
[ComplexityLevel.MODERATE]: ThinkingLevel.LOW,
[ComplexityLevel.HIGH]: ThinkingLevel.HIGH,
[ComplexityLevel.EXTREME]: ThinkingLevel.HIGH,
};
export interface AdaptiveBudgetResult {
complexity: ComplexityLevel;
thinkingBudget?: number;
thinkingLevel?: ThinkingLevel;
strategyNote?: string;
}
export class AdaptiveBudgetService {
constructor(private config: Config) {}
/**
* Analyzes the user prompt and determines the optimal thinking configuration.
*
* Note on future scaling (per arXiv:2512.19585):
* At Complexity 4 (Extreme), we should consider:
* 1. Best-of-N: Generate multiple solutions.
* 2. LLM-as-a-Judge: Use a strong model to evaluate candidates.
* 3. Compiler Verification: Check code correctness via environment tools.
*/
async determineAdaptiveConfig(
userPrompt: string,
model: string,
): Promise<AdaptiveBudgetResult | undefined> {
const { classifierModel } = this.config.getAdaptiveThinkingConfig();
try {
const llm = this.config.getBaseLlmClient();
debugLogger.debug(
`AdaptiveBudgetService: Classifying prompt complexity using ${classifierModel}...`,
);
const systemPrompt = `You are a complexity classifier for a coding assistant.
Analyze the user's request and determine the complexity of the task.
Output ONLY a single integer from 1 to 4 based on the following scale:
1 (Simple): Quick fixes, syntax questions, simple explanations, greetings.
2 (Moderate): Function-level logic, writing small scripts, standard debugging.
3 (High): Module-level refactoring, complex feature implementation, multi-file changes.
4 (Extreme): Architecture design, deep root-cause analysis of obscure bugs, large-scale migrations.
Request: ${userPrompt}
Complexity Level:`;
const response = await llm.generateContent({
modelConfigKey: { model: classifierModel },
contents: [{ role: 'user', parts: [{ text: systemPrompt }] }],
promptId: 'adaptive-budget-classifier',
abortSignal: new AbortController().signal,
});
const text = response.candidates?.[0]?.content?.parts?.[0]?.text?.trim();
if (!text) {
debugLogger.debug(
'AdaptiveBudgetService: No response from classifier.',
);
return undefined;
}
const level = parseInt(text, 10) as ComplexityLevel;
if (isNaN(level) || level < 1 || level > 4) {
debugLogger.debug(
`AdaptiveBudgetService: Invalid complexity level returned: ${text}`,
);
return undefined;
}
const result: AdaptiveBudgetResult = { complexity: level };
// Determine mapping based on model version
// Gemini 3 uses ThinkingLevel, Gemini 2.x uses thinkingBudget
if (isPreviewModel(model)) {
result.thinkingLevel = LEVEL_MAPPING_V3[level] ?? ThinkingLevel.HIGH;
} else if (isGemini2Model(model)) {
result.thinkingBudget = BUDGET_MAPPING_V2[level];
}
if (level === ComplexityLevel.EXTREME) {
result.strategyNote =
'EXTREME complexity detected. Future implementations should use Best-of-N + Verification.';
}
debugLogger.debug(
`AdaptiveBudgetService: Complexity ${level} -> Thinking Param: ${result.thinkingLevel || result.thinkingBudget}`,
);
return result;
} catch (error) {
debugLogger.error(
'AdaptiveBudgetService: Error classifying complexity',
error,
);
return undefined;
}
}
getThinkingBudgetV2(level: ComplexityLevel): number {
return BUDGET_MAPPING_V2[level];
}
getThinkingLevelV3(level: ComplexityLevel): ThinkingLevel {
return LEVEL_MAPPING_V3[level] ?? ThinkingLevel.HIGH;
}
}

View File

@@ -4,7 +4,7 @@
* SPDX-License-Identifier: Apache-2.0
*/
import type { GenerateContentConfig } from '@google/genai';
import type { GenerateContentConfig, ThinkingLevel } from '@google/genai';
// The primary key for the ModelConfig is the model string. However, we also
// support a secondary key to limit the override scope, typically an agent name.
@@ -26,6 +26,10 @@ export interface ModelConfigKey {
// This allows overrides to specify different settings (e.g., higher temperature)
// specifically for retry scenarios.
isRetry?: boolean;
// Dynamic thinking configuration determined at runtime (e.g. via complexity classification)
thinkingBudget?: number;
thinkingLevel?: ThinkingLevel;
}
export interface ModelConfig {
@@ -205,6 +209,22 @@ export class ModelConfigService {
}
}
// Apply dynamic thinking parameters from context if present
if (
context.thinkingBudget !== undefined ||
context.thinkingLevel !== undefined
) {
resolvedConfig.thinkingConfig = {
...(resolvedConfig.thinkingConfig as object),
...(context.thinkingBudget !== undefined
? { thinkingBudget: context.thinkingBudget }
: {}),
...(context.thinkingLevel !== undefined
? { thinkingLevel: context.thinkingLevel }
: {}),
};
}
return {
model: baseModel,
generateContentConfig: resolvedConfig,

View File

@@ -1441,6 +1441,30 @@
}
},
"additionalProperties": false
},
"adaptiveThinking": {
"title": "Adaptive Thinking Settings",
"description": "Configuration for Adaptive Thinking Budget.",
"markdownDescription": "Configuration for Adaptive Thinking Budget.\n\n- Category: `Experimental`\n- Requires restart: `no`\n- Default: `{}`",
"default": {},
"type": "object",
"properties": {
"enabled": {
"title": "Enable Adaptive Thinking",
"description": "Enable adaptive thinking budget based on task complexity.",
"markdownDescription": "Enable adaptive thinking budget based on task complexity.\n\n- Category: `Experimental`\n- Requires restart: `no`\n- Default: `false`",
"default": false,
"type": "boolean"
},
"classifierModel": {
"title": "Classifier Model",
"description": "The model (or alias) to use for complexity classification.",
"markdownDescription": "The model (or alias) to use for complexity classification.\n\n- Category: `Experimental`\n- Requires restart: `no`\n- Default: `classifier`",
"default": "classifier",
"type": "string"
}
},
"additionalProperties": false
}
},
"additionalProperties": false