perf: skip pre-compression history on session resume

On resume (-r), the CLI was loading and replaying the entire session
recording, including messages that had already been compressed away.
For long-running Forever Mode sessions this made resume extremely slow.

Add lastCompressionIndex to ConversationRecord, stamped when
compression succeeds. On resume, only messages from that index
onward are loaded into the client history and UI. Fully backward
compatible — old sessions without the field load all messages as before.
This commit is contained in:
Sandy Tao
2026-03-05 16:44:25 -08:00
parent 79ea865790
commit e062f0d09a
15 changed files with 303 additions and 59 deletions
@@ -408,6 +408,8 @@ Operate as a **strategic orchestrator**. Your own context window is your most pr
When you delegate, the sub-agent's entire execution is consolidated into a single summary in your history, keeping your main loop lean.
**Concurrency Safety and Mandate:** You should NEVER run multiple subagents in a single turn if their abilities mutate the same files or resources. This is to prevent race conditions and ensure that the workspace is in a consistent state. Only run multiple subagents in parallel when their tasks are independent (e.g., multiple concurrent research or read-only tasks) or if parallel execution is explicitly requested by the user.
**High-Impact Delegation Candidates:**
- **Repetitive Batch Tasks:** Tasks involving more than 3 files or repeated steps (e.g., "Add license headers to all files in src/", "Fix all lint errors in the project").
- **High-Volume Output:** Commands or tools expected to return large amounts of data (e.g., verbose builds, exhaustive file searches).
@@ -1143,6 +1145,8 @@ Operate as a **strategic orchestrator**. Your own context window is your most pr
When you delegate, the sub-agent's entire execution is consolidated into a single summary in your history, keeping your main loop lean.
**Concurrency Safety and Mandate:** You should NEVER run multiple subagents in a single turn if their abilities mutate the same files or resources. This is to prevent race conditions and ensure that the workspace is in a consistent state. Only run multiple subagents in parallel when their tasks are independent (e.g., multiple concurrent research or read-only tasks) or if parallel execution is explicitly requested by the user.
**High-Impact Delegation Candidates:**
- **Repetitive Batch Tasks:** Tasks involving more than 3 files or repeated steps (e.g., "Add license headers to all files in src/", "Fix all lint errors in the project").
- **High-Volume Output:** Commands or tools expected to return large amounts of data (e.g., verbose builds, exhaustive file searches).
@@ -1294,6 +1298,8 @@ Operate as a **strategic orchestrator**. Your own context window is your most pr
When you delegate, the sub-agent's entire execution is consolidated into a single summary in your history, keeping your main loop lean.
**Concurrency Safety and Mandate:** You should NEVER run multiple subagents in a single turn if their abilities mutate the same files or resources. This is to prevent race conditions and ensure that the workspace is in a consistent state. Only run multiple subagents in parallel when their tasks are independent (e.g., multiple concurrent research or read-only tasks) or if parallel execution is explicitly requested by the user.
**High-Impact Delegation Candidates:**
- **Repetitive Batch Tasks:** Tasks involving more than 3 files or repeated steps (e.g., "Add license headers to all files in src/", "Fix all lint errors in the project").
- **High-Volume Output:** Commands or tools expected to return large amounts of data (e.g., verbose builds, exhaustive file searches).
@@ -1464,6 +1470,8 @@ Operate as a **strategic orchestrator**. Your own context window is your most pr
When you delegate, the sub-agent's entire execution is consolidated into a single summary in your history, keeping your main loop lean.
**Concurrency Safety and Mandate:** You should NEVER run multiple subagents in a single turn if their abilities mutate the same files or resources. This is to prevent race conditions and ensure that the workspace is in a consistent state. Only run multiple subagents in parallel when their tasks are independent (e.g., multiple concurrent research or read-only tasks) or if parallel execution is explicitly requested by the user.
**High-Impact Delegation Candidates:**
- **Repetitive Batch Tasks:** Tasks involving more than 3 files or repeated steps (e.g., "Add license headers to all files in src/", "Fix all lint errors in the project").
- **High-Volume Output:** Commands or tools expected to return large amounts of data (e.g., verbose builds, exhaustive file searches).
@@ -1608,6 +1616,8 @@ Operate as a **strategic orchestrator**. Your own context window is your most pr
When you delegate, the sub-agent's entire execution is consolidated into a single summary in your history, keeping your main loop lean.
**Concurrency Safety and Mandate:** You should NEVER run multiple subagents in a single turn if their abilities mutate the same files or resources. This is to prevent race conditions and ensure that the workspace is in a consistent state. Only run multiple subagents in parallel when their tasks are independent (e.g., multiple concurrent research or read-only tasks) or if parallel execution is explicitly requested by the user.
**High-Impact Delegation Candidates:**
- **Repetitive Batch Tasks:** Tasks involving more than 3 files or repeated steps (e.g., "Add license headers to all files in src/", "Fix all lint errors in the project").
- **High-Volume Output:** Commands or tools expected to return large amounts of data (e.g., verbose builds, exhaustive file searches).
@@ -3171,6 +3181,8 @@ Operate as a **strategic orchestrator**. Your own context window is your most pr
When you delegate, the sub-agent's entire execution is consolidated into a single summary in your history, keeping your main loop lean.
**Concurrency Safety and Mandate:** You should NEVER run multiple subagents in a single turn if their abilities mutate the same files or resources. This is to prevent race conditions and ensure that the workspace is in a consistent state. Only run multiple subagents in parallel when their tasks are independent (e.g., multiple concurrent research or read-only tasks) or if parallel execution is explicitly requested by the user.
**High-Impact Delegation Candidates:**
- **Repetitive Batch Tasks:** Tasks involving more than 3 files or repeated steps (e.g., "Add license headers to all files in src/", "Fix all lint errors in the project").
- **High-Volume Output:** Commands or tools expected to return large amounts of data (e.g., verbose builds, exhaustive file searches).
@@ -3323,6 +3335,8 @@ Operate as a **strategic orchestrator**. Your own context window is your most pr
When you delegate, the sub-agent's entire execution is consolidated into a single summary in your history, keeping your main loop lean.
**Concurrency Safety and Mandate:** You should NEVER run multiple subagents in a single turn if their abilities mutate the same files or resources. This is to prevent race conditions and ensure that the workspace is in a consistent state. Only run multiple subagents in parallel when their tasks are independent (e.g., multiple concurrent research or read-only tasks) or if parallel execution is explicitly requested by the user.
**High-Impact Delegation Candidates:**
- **Repetitive Batch Tasks:** Tasks involving more than 3 files or repeated steps (e.g., "Add license headers to all files in src/", "Fix all lint errors in the project").
- **High-Volume Output:** Commands or tools expected to return large amounts of data (e.g., verbose builds, exhaustive file searches).
@@ -3795,6 +3809,8 @@ Operate as a **strategic orchestrator**. Your own context window is your most pr
When you delegate, the sub-agent's entire execution is consolidated into a single summary in your history, keeping your main loop lean.
**Concurrency Safety and Mandate:** You should NEVER run multiple subagents in a single turn if their abilities mutate the same files or resources. This is to prevent race conditions and ensure that the workspace is in a consistent state. Only run multiple subagents in parallel when their tasks are independent (e.g., multiple concurrent research or read-only tasks) or if parallel execution is explicitly requested by the user.
**High-Impact Delegation Candidates:**
- **Repetitive Batch Tasks:** Tasks involving more than 3 files or repeated steps (e.g., "Add license headers to all files in src/", "Fix all lint errors in the project").
- **High-Volume Output:** Commands or tools expected to return large amounts of data (e.g., verbose builds, exhaustive file searches).
+4
View File
@@ -415,6 +415,7 @@ describe('Gemini Client (client.ts)', () => {
getChatRecordingService: vi.fn().mockReturnValue({
getConversation: vi.fn().mockReturnValue(null),
getConversationFilePath: vi.fn().mockReturnValue(null),
recordCompressionPoint: vi.fn(),
}),
};
client['chat'] = mockOriginalChat as GeminiChat;
@@ -649,6 +650,7 @@ describe('Gemini Client (client.ts)', () => {
const mockRecordingService = {
getConversation: vi.fn().mockReturnValue(mockConversation),
getConversationFilePath: vi.fn().mockReturnValue(mockFilePath),
recordCompressionPoint: vi.fn(),
};
vi.mocked(mockOriginalChat.getChatRecordingService!).mockReturnValue(
mockRecordingService as unknown as ChatRecordingService,
@@ -1552,6 +1554,7 @@ ${JSON.stringify(
getChatRecordingService: vi.fn().mockReturnValue({
getConversation: vi.fn(),
getConversationFilePath: vi.fn(),
recordCompressionPoint: vi.fn(),
}),
} as unknown as GeminiChat;
@@ -1565,6 +1568,7 @@ ${JSON.stringify(
getChatRecordingService: vi.fn().mockReturnValue({
getConversation: vi.fn(),
getConversationFilePath: vi.fn(),
recordCompressionPoint: vi.fn(),
}),
} as unknown as GeminiChat;
+5
View File
@@ -1222,6 +1222,11 @@ Do not wait for a reflection cycle if the information is critical for future tur
// capture current session data before resetting
const currentRecordingService =
this.getChat().getChatRecordingService();
// Mark this point in the recording so resume only loads
// messages from here onward (everything before was compressed).
currentRecordingService.recordCompressionPoint();
const conversation = currentRecordingService.getConversation();
const filePath = currentRecordingService.getConversationFilePath();
+2 -41
View File
@@ -178,6 +178,7 @@ export class PromptProvider {
approvedPlan: approvedPlanPath
? { path: approvedPlanPath }
: undefined,
taskTracker: config.isTrackerEnabled(),
}),
!isPlanMode,
),
@@ -189,6 +190,7 @@ export class PromptProvider {
planModeToolsList,
plansDir: config.storage.getPlansDir(),
approvedPlanPath: config.getApprovedPlanPath(),
taskTracker: config.isTrackerEnabled(),
}),
isPlanMode,
)
@@ -200,48 +202,7 @@ export class PromptProvider {
enableShellEfficiency: config.getEnableShellOutputEfficiency(),
interactiveShellEnabled: config.isInteractiveShellEnabled(),
})),
skills.length > 0,
),
hookContext: isSectionEnabled('hookContext') || undefined,
primaryWorkflows: this.withSection(
'primaryWorkflows',
() => ({
interactive: interactiveMode,
enableCodebaseInvestigator: enabledToolNames.has(
CodebaseInvestigatorAgent.name,
),
enableWriteTodosTool: enabledToolNames.has(WRITE_TODOS_TOOL_NAME),
enableEnterPlanModeTool: enabledToolNames.has(
ENTER_PLAN_MODE_TOOL_NAME,
),
enableGrep: enabledToolNames.has(GREP_TOOL_NAME),
enableGlob: enabledToolNames.has(GLOB_TOOL_NAME),
approvedPlan: approvedPlanPath
? { path: approvedPlanPath }
: undefined,
taskTracker: config.isTrackerEnabled(),
}),
!isPlanMode,
),
planningWorkflow: this.withSection(
'planningWorkflow',
() => ({
planModeToolsList,
plansDir: config.storage.getPlansDir(),
approvedPlanPath: config.getApprovedPlanPath(),
taskTracker: config.isTrackerEnabled(),
}),
isPlanMode,
),
taskTracker: config.isTrackerEnabled(),
operationalGuidelines: this.withSection(
'operationalGuidelines',
() => ({
interactive: interactiveMode,
enableShellEfficiency: config.getEnableShellOutputEfficiency(),
interactiveShellEnabled: config.isInteractiveShellEnabled(),
}),
),
sandbox: this.withSection('sandbox', () => getSandboxMode()),
interactiveYoloMode: this.withSection(
'interactiveYoloMode',
+1 -12
View File
@@ -138,6 +138,7 @@ export function getCoreSystemPrompt(options: SystemPromptOptions): string {
options.planningWorkflow
? renderPlanningWorkflow(options.planningWorkflow)
: renderPrimaryWorkflows(options.primaryWorkflows),
options.taskTracker ? renderTaskTracker() : '',
renderOperationalGuidelines(options.operationalGuidelines),
renderInteractiveYoloMode(options.interactiveYoloMode),
renderSandbox(options.sandbox),
@@ -151,18 +152,6 @@ export function getCoreSystemPrompt(options: SystemPromptOptions): string {
.trim();
}
${options.taskTracker ? renderTaskTracker() : ''}
${renderOperationalGuidelines(options.operationalGuidelines)}
${renderInteractiveYoloMode(options.interactiveYoloMode)}
${renderSandbox(options.sandbox)}
${renderGitRepo(options.gitRepo)}
`.trim();
}
/**
* Wraps the base prompt with user memory and approval mode plans.
*/
@@ -532,6 +532,60 @@ describe('ChatRecordingService', () => {
});
});
describe('recordCompressionPoint', () => {
it('should set lastCompressionIndex to the current message count and update on subsequent calls', () => {
chatRecordingService.initialize();
// Record a few messages
chatRecordingService.recordMessage({
type: 'user',
content: 'msg1',
model: 'm',
});
chatRecordingService.recordMessage({
type: 'gemini',
content: 'response1',
model: 'm',
});
chatRecordingService.recordMessage({
type: 'user',
content: 'msg2',
model: 'm',
});
// Record compression point
chatRecordingService.recordCompressionPoint();
const sessionFile = chatRecordingService.getConversationFilePath()!;
let conversation = JSON.parse(
fs.readFileSync(sessionFile, 'utf8'),
) as ConversationRecord;
expect(conversation.lastCompressionIndex).toBe(3);
// Record more messages
chatRecordingService.recordMessage({
type: 'gemini',
content: 'response2',
model: 'm',
});
chatRecordingService.recordMessage({
type: 'user',
content: 'msg3',
model: 'm',
});
// Record compression point again
chatRecordingService.recordCompressionPoint();
conversation = JSON.parse(
fs.readFileSync(sessionFile, 'utf8'),
) as ConversationRecord;
expect(conversation.lastCompressionIndex).toBe(5);
});
});
describe('ENOSPC (disk full) graceful degradation - issue #16266', () => {
it('should disable recording and not throw when ENOSPC occurs during initialize', () => {
const enospcError = new Error('ENOSPC: no space left on device');
@@ -104,6 +104,13 @@ export interface ConversationRecord {
directories?: string[];
/** The kind of conversation (main agent or subagent) */
kind?: 'main' | 'subagent';
/**
* The index into `messages` at which the last compression occurred.
* On resume, only messages from this index onward need to be loaded
* into the client history / UI earlier messages were already
* summarised and folded into the compressed context.
*/
lastCompressionIndex?: number;
}
/**
@@ -532,6 +539,25 @@ export class ChatRecordingService {
this.writeConversation(conversation);
}
/**
* Marks the current end of the messages array as the compression point.
* On resume, only messages from this index onward will be loaded.
*/
recordCompressionPoint(): void {
if (!this.conversationFile) return;
try {
this.updateConversation((conversation) => {
conversation.lastCompressionIndex = conversation.messages.length;
});
} catch (error) {
debugLogger.error(
'Error recording compression point in chat history.',
error,
);
}
}
/**
* Saves a summary for the current session.
*/
@@ -182,4 +182,70 @@ describe('convertSessionToClientHistory', () => {
},
]);
});
describe('startIndex parameter', () => {
const messages: ConversationRecord['messages'] = [
{
id: '1',
type: 'user',
timestamp: '2024-01-01T10:00:00Z',
content: 'First message',
},
{
id: '2',
type: 'gemini',
timestamp: '2024-01-01T10:01:00Z',
content: 'First response',
},
{
id: '3',
type: 'user',
timestamp: '2024-01-01T10:02:00Z',
content: 'Second message',
},
{
id: '4',
type: 'gemini',
timestamp: '2024-01-01T10:03:00Z',
content: 'Second response',
},
];
it('should only convert messages from startIndex onward', () => {
const history = convertSessionToClientHistory(messages, 2);
expect(history).toEqual([
{ role: 'user', parts: [{ text: 'Second message' }] },
{ role: 'model', parts: [{ text: 'Second response' }] },
]);
});
it('should convert all messages when startIndex is 0', () => {
const history = convertSessionToClientHistory(messages, 0);
expect(history).toEqual([
{ role: 'user', parts: [{ text: 'First message' }] },
{ role: 'model', parts: [{ text: 'First response' }] },
{ role: 'user', parts: [{ text: 'Second message' }] },
{ role: 'model', parts: [{ text: 'Second response' }] },
]);
});
it('should convert all messages when startIndex is undefined', () => {
const history = convertSessionToClientHistory(messages, undefined);
expect(history).toEqual([
{ role: 'user', parts: [{ text: 'First message' }] },
{ role: 'model', parts: [{ text: 'First response' }] },
{ role: 'user', parts: [{ text: 'Second message' }] },
{ role: 'model', parts: [{ text: 'Second response' }] },
]);
});
it('should return empty array when startIndex exceeds messages length', () => {
const history = convertSessionToClientHistory(messages, 100);
expect(history).toEqual([]);
});
});
});
+10 -1
View File
@@ -26,13 +26,22 @@ function ensurePartArray(content: PartListUnion): Part[] {
/**
* Converts session/conversation data into Gemini client history formats.
*
* @param messages - The full array of recorded messages.
* @param startIndex - If provided, only messages from this index onward are
* converted. Used on resume to skip pre-compression history.
*/
export function convertSessionToClientHistory(
messages: ConversationRecord['messages'],
startIndex?: number,
): Array<{ role: 'user' | 'model'; parts: Part[] }> {
const clientHistory: Array<{ role: 'user' | 'model'; parts: Part[] }> = [];
const slice =
startIndex != null && startIndex > 0
? messages.slice(startIndex)
: messages;
for (const msg of messages) {
for (const msg of slice) {
if (msg.type === 'info' || msg.type === 'error' || msg.type === 'warning') {
continue;
}