delete old design docs

This commit is contained in:
Your Name
2026-04-09 19:36:23 +00:00
parent 17c9b4341a
commit 2a5b169179
7 changed files with 3 additions and 544 deletions
-175
View File
@@ -1,175 +0,0 @@
# Context Manager V0: High-Level Design
## 1. Introduction & Motivation
This document provides a high-level orientation to the Context Management system
within `@google/gemini-cli-core`.
Previously, context management in the CLI was decentralized, synchronous, and
relied on fixed-function, destructive mutations of the raw Gemini `Content[]`
history. Because all context management was local, this approach made it nearly
impossible to reason about the global impact of any specific change. For
example, should we distill tool outputs, or mask them? Or maybe it's contextual?
What about other processors like the snapshotter, should they see masked
results? Distilled results? What about new approaches to context management, how
do they fit into the solution we've already built. The old approach to context
management made it nearly challenging to even attempt to answer any one of these
questions, let alone to try and answer all of them.
To address these issues, we went back to the drawing board to create an explicit
Context Manager. As opposed to our old approach, the new Context Manager V0 is a
robust, event-driven, pluggable system. It introduces a non-destructive Episodic
Intermediate Representation (IR) and an asynchronous processing pipeline,
allowing the CLI to run expensive LLM summarization tasks in the background and
opportunistically project an optimized view of the history only when budget
constraints require it.
---
## 2. Chief Innovations & Salient Features
The architecture is built upon seven core principles that distinguish it from
the legacy system:
1. **Centralized Budgeting:** The `ContextManager` is the sole source of truth
for the token budget. It makes the final, just-in-time decision about what
gets projected to the LLM.
2. **Statelessness via IR:** Raw history is never mutated or deleted. Instead,
it is translated into an Intermediate Representation (IR). Context reduction
is achieved by attaching compressed `Variant`s to the IR graph. The original
text is always recoverable.
3. **Asynchronicity:** Designed around a `ContextEventBus`. Heavy context
operations (like LLM-powered summarization) run as detached background tasks
without blocking the main agent loop.
4. **Configurability:** Driven by a typed JSON "Sidecar" configuration. Token
ceilings, fallback strategies, and processing pipelines are entirely
data-driven.
5. **Pluggability:** `ContextProcessor`s are isolated plugins with typed
schemas. They are registered via Dependency Injection and can be arranged
into arbitrary pipelines.
6. **Debuggability:** A built-in `ContextTracer` tracks every step of the
pipeline, providing full audit trails of exactly when, why, and how a
message was altered.
7. **Testability:** Global state has been eliminated. The system uses strict
Dependency Injection (`SidecarRegistry`, `ContextEnvironment`,
`ContextEventBus`), making every layer easily unit-testable.
8. **Orthogonality via Targets:** Processors do not implicitly scan the entire
history graph. The `ContextManager` computes exact Deltas (e.g., new nodes
just added, or specific nodes that just aged out of the retained buffer).
Processors are sandboxed by the `EpisodeEditor` to only iterate over and
mutate these specific `targetNodes`, ensuring surgical and highly efficient
reductions.
---
## 3. The Major Pieces: Roles & Responsibilities
### The Brain: `ContextManager`
The central coordinator. It owns the "Pristine History" (the ground-truth
Episodic IR graph). Its primary responsibility is exposing
`projectCompressedHistory()`, which flattens the IR graph into a standard
`Content[]` array strictly adhering to the configured token budget.
### The Data Model: Episodic Intermediate Representation (IR)
Instead of a flat array of messages, interactions are grouped into `Episode`s.
An Episode represents a single turn: a User Prompt, followed by the Agent's
Thoughts and Tool Executions (Steps), concluding with a Yield.
- **`IrNode`:** The base unit (e.g., `ToolExecution`, `AgentThought`).
- **`Variant`:** Compressed alternatives to the raw node (e.g.,
`SummaryVariant`, `MaskedVariant`, `SnapshotVariant`).
- **`IrMetadata`:** An audit trail attached to every node, tracking token counts
and the chronological list of `transformations` applied by processors.
### The Engine: `PipelineOrchestrator` & Sidecar
The orchestrator reads the `SidecarConfig`. It manages the lifecycle of the
pipelines, registering triggers and executing processors in order. It dictates
whether a pipeline blocks the main thread or runs in the background.
### The Workers: `ContextProcessor`s
Small, highly-focused classes that implement context reduction strategies. They
do not mutate the graph directly; instead, they are given an `EpisodeEditor`
which provides a safe, scoped API to attach `Variant`s and append metadata.
- _Examples:_ `ToolMaskingProcessor`, `NodeDistillationProcessor`,
`BlobDegradationProcessor`.
### The Glue: `ContextEventBus`
A Pub/Sub bus that decouples the components. It enables the `HistoryObserver` to
notify the system of new messages, and allows background processors to notify
the `ContextManager` when a new compressed variant is ready to be used.
---
## 4. How They Interact: The Life of a Message
To understand how these pieces fit together, let's walk through the lifecycle of
a single interaction as it moves through the context system.
### Phase 1: Ingestion & Translation
1. **Action:** The user sends a prompt, and the agent responds with a tool
call. These raw messages are appended to the standard `AgentChatHistory`.
2. **Observation:** The `HistoryObserver` detects the new messages.
3. **Translation:** The observer passes the raw `Content[]` to the `IrMapper`.
The mapper groups the prompt and the tool execution into a single,
structured `Episode`.
4. **Registration:** The new `Episode` is added to the `ContextManager`'s
pristine graph.
### Phase 2: Triggering the Pipelines
1. **Delta Generation:** The `ContextManager` receives the updated pristine
graph. It diffs it against the previous state and extracts a Delta—the exact
Set of new `IrNode` IDs.
2. **Event Emission:** The `ContextManager` fires a `ChunkReceivedEvent` (with
the Delta targets) over the `ContextEventBus`.
3. **Orchestration:** The `PipelineOrchestrator` hears the event and evaluates
its configured `PipelineDef`s. It finds a pipeline with the trigger
`on_turn`.
4. **Execution:** The Orchestrator creates an `EpisodeEditor` heavily sandboxed
to _only_ allow access to the targeted Delta nodes, and begins running the
processors in that pipeline sequentially.
### Phase 3: Processing & Safe Editing
1. **Processing:** A processor (e.g., `ToolMaskingProcessor`) receives the
`EpisodeEditor`. It iterates over `editor.targets` (ignoring the rest of the
historical graph). It identifies a massive JSON payload in one of the new
tool executions.
2. **Editing:** Instead of deleting the JSON, it calls `editor.editEpisode()`.
It creates a `MaskedVariant` containing a string summary of the JSON. If it
had attempted to edit a node outside its target Delta, the editor would have
thrown an error.
3. **Auditing:** The editor automatically appends a record to the node's
`IrMetadata.transformations` indicating that the `ToolMaskingProcessor`
applied a `MASKED` action.
### Phase 4: Async Resolution
1. **Completion:** The background pipeline finishes. The orchestrator fires a
`VariantReadyEvent` over the bus.
2. **Integration:** The `ContextManager` receives the event and securely
attaches the `MaskedVariant` to the correct `Episode` in the pristine graph.
(If the pipeline was synchronous/blocking, this happens immediately).
### Phase 5: Just-In-Time Projection
1. **Request:** The agent is ready to send the next prompt to Gemini. The core
routing logic calls `contextManager.projectCompressedHistory()`.
2. **Budget Evaluation:** The `IrProjector` calculates the current total tokens
of the pristine graph and compares it to the `SidecarConfig` budget.
3. **Variant Selection:** If the graph exceeds the budget, the projector looks
for available `Variant`s. It sees the newly attached `MaskedVariant` and
calculates the token deficit recovered by using it.
4. **Flattening:** The `graphUtils` safely swap the raw node for the
`MaskedVariant` in a temporary view, and flatten the Episodic IR back into a
raw Gemini `Content[]` array.
5. **Delivery:** The optimized, budget-compliant array is sent to the LLM. The
underlying pristine graph remains completely untouched and available for
future reference or alternative projections.
@@ -1,37 +0,0 @@
# The Nodes of Theseus Migration Checklist
- [x] **Phase 1: Core Types (`ir/types.ts`)**
- [x] Add `ConcreteNode` and `LogicalNode` types.
- [x] Add `episodeId` (or generic `parentId`) to all `ConcreteNode`
interfaces.
- [x] Add `replacesId` and `abstractsIds` pointers.
- [x] Remove `variants` dictionary from `IrNode`.
- [x] **Phase 2: Processor Pipeline (`pipeline.ts`)**
- [x] Delete `EpisodeEditor`.
- [x] Define `ContextPatch`.
- [x] Update `ContextProcessor` signature to accept `ProcessArgs` and return
`Promise<ContextPatch[]>`.
- [x] **Phase 3: The Reducer (`sidecar/orchestrator.ts`)**
- [x] Update `executePipeline` and `executeTriggerSync` to act as a reducer.
- [x] Map `ContextPatch` results onto the flat Nodes array.
- [x] **Phase 4: Pristine Graph & Mapping (`contextManager.ts` & `ir/toIr.ts`)**
- [x] Update `toIr` to produce a flat list of `ConcreteNode`s and a tree of
`LogicalNode`s.
- [x] Make `ContextManager` track the Pristine Graph and instantiate the flat
Nodes.
- [x] Commit patches to the Pristine Graph history.
- [x] **Phase 5: The Walker (`ir/projector.ts`)**
- [x] Update projection to simply walk the flat `ReadonlyArray<ConcreteNode>`.
- [x] Skip nodes whose IDs are in a "skipped" set (based on `abstractsIds`).
- [ ] **Phase 6: Refactoring Processors**
- [ ] `ToolMaskingProcessor`
- [ ] `NodeDistillationProcessor`
- [ ] `BlobDegradationProcessor`
- [ ] `HistoryTruncationProcessor`
- [ ] `NodeTruncationProcessor`
- [ ] `StateSnapshotProcessor`
@@ -1,183 +0,0 @@
export class ContextManager {
// The stateful, pristine Episodic Intermediate Representation graph.
// This allows the agent to remember and summarize continuously without losing data across turns.
private pristineEpisodes: Episode[] = [];
private readonly eventBus: ContextEventBus;
// Internal sub-components
// Synchronous processors are instantiated but effectively used as singletons within this class
private orchestrator: PipelineOrchestrator;
private historyObserver?: HistoryObserver;
static create(
sidecar: SidecarConfig,
env: ContextEnvironment,
tracer: ContextTracer,
orchestrator?: PipelineOrchestrator,
registry?: SidecarRegistry,
): ContextManager {
if (!registry) {
registry = new SidecarRegistry();
registerBuiltInProcessors(registry);
}
const orch =
orchestrator ||
new PipelineOrchestrator(sidecar, env, env.eventBus, tracer, registry);
return new ContextManager(sidecar, env, tracer, orch);
}
// Use ContextManager.create() instead
private constructor(
private sidecar: SidecarConfig,
private env: ContextEnvironment,
private readonly tracer: ContextTracer,
orchestrator: PipelineOrchestrator,
) {
this.eventBus = env.eventBus;
this.orchestrator = orchestrator;
this.eventBus.onPristineHistoryUpdated((event) => {
this.pristineEpisodes = event.episodes;
this.evaluateTriggers(event.newNodes);
});
this.eventBus.onVariantReady((event) => {
// Find the target episode in the pristine graph
const targetEp = this.pristineEpisodes.find(
(ep) => ep.id === event.targetId,
);
if (targetEp) {
if (!targetEp.variants) {
targetEp.variants = {};
}
targetEp.variants[event.variantId] = event.variant;
this.tracer.logEvent(
'ContextManager',
`Received async variant [${event.variantId}] for Episode ${event.targetId}`,
);
debugLogger.log(
`ContextManager: Received async variant [${event.variantId}] for Episode ${event.targetId}.`,
);
}
});
}
/**
* Safely stops background workers and clears event listeners.
*/
shutdown() {
this.orchestrator.shutdown();
if (this.historyObserver) {
this.historyObserver.stop();
}
}
/**
* Evaluates if the current working buffer exceeds configured budget thresholds,
* firing consolidation events if necessary.
*/
private evaluateTriggers(newNodes: Set<string>) {
if (!this.sidecar.budget) return;
const workingBuffer = this.getWorkingBufferView();
const currentTokens =
this.env.tokenCalculator.calculateEpisodeListTokens(workingBuffer);
this.tracer.logEvent('ContextManager', 'Evaluated triggers', {
currentTokens,
retainedTokens: this.sidecar.budget.retainedTokens,
});
// 1. Eager Compute Trigger (on_turn)
if (newNodes.size > 0) {
this.eventBus.emitChunkReceived({ episodes: this.pristineEpisodes, targetNodeIds: newNodes });
}
// 2. Budget Crossed Trigger
if (currentTokens > this.sidecar.budget.retainedTokens) {
const deficit = currentTokens - this.sidecar.budget.retainedTokens;
// Calculate exactly which nodes aged out of the retainedTokens budget to form our target delta
const agedOutNodes = new Set<string>();
let rollingTokens = 0;
// Start from newest and count backwards
for (let i = workingBuffer.length - 1; i >= 0; i--) {
const ep = workingBuffer[i];
const epTokens = this.env.tokenCalculator.calculateEpisodeListTokens([ep]);
rollingTokens += epTokens;
if (rollingTokens > this.sidecar.budget.retainedTokens) {
agedOutNodes.add(ep.id);
agedOutNodes.add(ep.trigger.id);
for (const step of ep.steps) agedOutNodes.add(step.id);
if (ep.yield) agedOutNodes.add(ep.yield.id);
}
}
this.tracer.logEvent(
'ContextManager',
'Budget crossed. Emitting ConsolidationNeeded',
{ deficit, agedOutCount: agedOutNodes.size },
);
this.eventBus.emitConsolidationNeeded({
episodes: workingBuffer,
targetDeficit: deficit,
targetNodeIds: agedOutNodes,
});
}
}
/**
* Subscribes to the core AgentChatHistory to natively track all message events,
* converting them seamlessly into pristine Episodes.
*/
subscribeToHistory(chatHistory: AgentChatHistory) {
if (this.historyObserver) {
this.historyObserver.stop();
}
this.historyObserver = new HistoryObserver(
chatHistory,
this.eventBus,
this.tracer,
this.env.tokenCalculator,
);
this.historyObserver.start();
}
/**
* Generates a computed view of the pristine log.
* Sweeps backwards (newest to oldest), tracking rolling tokens.
* When rollingTokens > retainedTokens, it injects the "best" available ready variant
* (snapshot > summary > masked) instead of the raw text.
* Handles N-to-1 variant skipping automatically.
*/
getWorkingBufferView(): Episode[] {
return generateWorkingBufferView(
this.pristineEpisodes,
this.sidecar.budget.retainedTokens,
this.tracer,
this.env,
);
}
/**
* Returns a temporary, compressed Content[] array to be used exclusively for the LLM request.
* This does NOT mutate the pristine episodic graph.
*/
async projectCompressedHistory(): Promise<Content[]> {
this.tracer.logEvent('ContextManager', 'Projection requested.');
const protectedIds = new Set<string>();
if (this.pristineEpisodes.length > 0) {
protectedIds.add(this.pristineEpisodes[0].id); // Structural invariant
}
return IrProjector.project(
this.getWorkingBufferView(),
this.orchestrator,
this.sidecar,
this.tracer,
this.env,
protectedIds,
);
}
}
@@ -55,7 +55,7 @@ export class NodeDistillationProcessor implements ContextProcessor {
try {
const response = await this.env.llmClient.generateContent({
role: LlmRole.UTILITY_COMPRESSOR,
modelConfigKey: { model: 'default' },
modelConfigKey: { model: 'gemini-3-flash-base' },
promptId: this.env.promptId,
abortSignal: new AbortController().signal,
contents: [
@@ -1,146 +0,0 @@
# Context Manager: The Pure Functional "Nodes of Theseus" IR
This document outlines the architectural transition from the V0 Mutating Editor
pattern to the V1 Pure Functional, Immutable Episodic IR, designed to scale into
a multi-agent, async state transformation system.
## 1. Core Philosophy: The Nodes of Theseus
The primary constraint of deep immutable trees is the cascading cost of cloning
parent nodes when a leaf node changes. To solve this, we decouple the structural
hierarchy of the context from the actual data sent to the LLM.
The IR is divided into two distinct domains:
1. **Logical Nodes:** Structural boundaries that define the hierarchy (e.g.,
`Task`, `Episode`). These nodes **do not render** to the LLM. They exist to
group related interactions and provide semantic meaning.
2. **Concrete Nodes:** The atomic, renderable pieces of data (e.g.,
`UserPrompt`, `ToolExecution`, `Snapshot`, `RollingSummary`). These are the
actual "planks" of the nodes.
Because Concrete Nodes carry a reference to their Logical Parent (e.g.,
`episodeId`), they can be stored and processed as a **Flat List**.
## 2. The Autonomous `ContextWorkingBuffer`
The "Nodes" is no longer a dumb array; it is encapsulated in a rich
`ContextWorkingBuffer` entity.
### Encapsulation of History
The Buffer manages its own audit trail and lineage. If a processor needs the
pristine, unaltered data of a deeply compressed node (e.g., a Snapshotter
summarizing masked tools), it queries the Buffer directly:
`buffer.getPristineNode(id)`
### Linear Temporal Progression (The Conveyor Belt)
Processors do not vote or compete. Context degradation is a linear temporal
progression defined by triggers:
1. **Frontbuffer Trim:** E.g., Tool Masking replaces raw tools immediately.
2. **Backbuffer Normalize:** E.g., Summarization replaces aging nodes in the
background.
3. **GC Backstop:** E.g., Truncation brutally destroys nodes only when the
absolute budget is breached.
When a pipeline triggers, the Orchestrator runs its processors, gathers their
`ContextPatch`es, and applies them to the Buffer immediately. The state simply
advances.
## 3. Type-Safe Async Coordination (The `ContextInbox`)
To solve the async/sync barrier (where a slow background worker generates a
summary that a fast synchronous emergency backstop needs instantly), we
introduce the `ContextInbox`.
This is a strictly-typed messaging system. A worker dispatches a
`SNAPSHOT_READY` message to the Inbox. The backstop peeks at the Inbox,
instantly retrieving the pre-computed summary and applying it.
## 4. The Processor Contract
Processors are purely functional map/filter operations. They evaluate a list of
unprotected targets and return the exact list of nodes they intend to
substitute. They do **not** generate manual `ContextPatch` objects or manage
`IrMetadata`.
```typescript
export type InboxMessage =
| { type: 'SNAPSHOT_READY'; snapshot: Snapshot; abstractsIds: string[] }
| { type: 'BACKGROUND_SUMMARY'; summary: RollingSummary; targetId: string };
export interface ContextInbox {
dispatch(message: InboxMessage): void;
peek<T extends InboxMessage['type']>(
type: T,
): Extract<InboxMessage, { type: T }> | undefined;
}
export interface ContextWorkingBuffer {
/** The current active (projected) flat list of ConcreteNodes. */
readonly nodes: ReadonlyArray<ConcreteNode>;
/** Retrieves the historical, pristine version of a node (before any masks/summaries). */
getPristineNode(id: string): ConcreteNode | undefined;
/** Retrieves the full audit lineage of a specific node ID. */
getLineage(id: string): ReadonlyArray<ConcreteNode>;
}
export interface ProcessArgs {
/** The rich buffer containing current nodes and their history. */
readonly buffer: ContextWorkingBuffer;
/**
* The specific unprotected, mutable nodes the pipeline is allowed to operate on.
* The Orchestrator strictly filters out ANY protected nodes (like active tasks) before calling.
* Processors can assume all targets passed here are legally theirs to mutate or drop.
*/
readonly targets: ReadonlyArray<ConcreteNode>;
/** The token budget and accounting state. */
readonly state: ContextAccountingState;
/** Type-safe messaging system for async/sync coordination. */
readonly inbox: ContextInbox;
}
export interface ContextProcessor {
readonly id: string;
readonly name: string;
/**
* A pure function. Returns the new state of the `targets`.
* If an ID from `targets` is missing in the return array, the Orchestrator deletes it.
* If a new synthetic node is in the return array, the Orchestrator inserts it.
* The Orchestrator automatically appends audit `IrMetadata` to any changes.
*/
process(args: ProcessArgs): Promise<ReadonlyArray<ConcreteNode>>;
}
```
## 5. The Node Taxonomy (`IrNodeType`)
The `IrNodeType` union explicitly defines all valid nodes. Synthetic nodes (like
`Snapshot`) are first-class citizens.
```typescript
export type IrNodeType =
// Logical Nodes
| 'TASK'
| 'EPISODE'
// Organic Concrete Nodes
| 'USER_PROMPT'
| 'SYSTEM_EVENT'
| 'AGENT_THOUGHT'
| 'TOOL_EXECUTION'
| 'AGENT_YIELD'
// Synthetic Concrete Nodes
| 'SNAPSHOT'
| 'ROLLING_SUMMARY'
| 'MASKED_TOOL';
```
@@ -40,7 +40,7 @@ Output ONLY the raw factual snapshot, formatted compactly. Do not include markdo
const response = await this.env.llmClient.generateContent({
role: LlmRole.UTILITY_STATE_SNAPSHOT_PROCESSOR,
modelConfigKey: { model: 'default' },
modelConfigKey: { model: 'gemini-3-flash-base' },
contents: [{ role: 'user', parts: [{ text: userPromptText }] }],
systemInstruction: { role: 'system', parts: [{ text: systemPrompt }] },
promptId: this.env.promptId,
+1 -1
View File
@@ -16,5 +16,5 @@ export enum LlmRole {
UTILITY_EDIT_CORRECTOR = 'utility_edit_corrector',
UTILITY_AUTOCOMPLETE = 'utility_autocomplete',
UTILITY_FAST_ACK_HELPER = 'utility_fast_ack_helper',
UTILITY_STATE_SNAPSHOT_PROCESSOR = 'utility_state_snapshot_processr',
UTILITY_STATE_SNAPSHOT_PROCESSOR = 'utility_state_snapshot_processor',
}