mirror of
https://github.com/google-gemini/gemini-cli.git
synced 2026-05-20 00:32:31 -07:00
cleanup docs for now
This commit is contained in:
@@ -1,52 +0,0 @@
|
||||
# Async Context Mutations (V1 Architecture)
|
||||
|
||||
## The Problem
|
||||
|
||||
In V0, the \`ContextManager\` processes LLM inputs sequentially and
|
||||
synchronously. Processors like \`NodeTruncation\` can safely mutate the graph
|
||||
because they hold an exclusive lock on the context state.
|
||||
|
||||
However, operations like \`StateSnapshotAsyncProcessor\` take a long time to run
|
||||
(distilling thousands of tokens). If they run synchronously, they block the user
|
||||
from interacting with the agent. If they run asynchronously in the background,
|
||||
by the time they finish, the active context graph has likely moved on (new
|
||||
messages, tool calls, or other truncations have occurred).
|
||||
|
||||
## The V1 Solution: Ancestral Replacement (Optimistic Concurrency)
|
||||
|
||||
To allow async background pipelines to mutate the live context graph safely, we
|
||||
use an Optimistic Concurrency Control mechanism called **Ancestral
|
||||
Replacement**.
|
||||
|
||||
### 1. Proof of Claim
|
||||
|
||||
When an \`AsyncContextProcessor\` triggers, it is handed a \`ProcessArgs\`
|
||||
containing a snapshot of the graph (the targets it was asked to process). The
|
||||
processor records the specific IDs of the \`ConcreteNode\`s it is reading. This
|
||||
is its "Proof of Claim".
|
||||
|
||||
### 2. Background Execution
|
||||
|
||||
The processor runs in the background, completely detached from the live graph.
|
||||
It synthesizes a new state (e.g., a summarized snapshot node).
|
||||
|
||||
### 3. The Commit Phase
|
||||
|
||||
When the processor finishes, it returns its proposed mutations (an array of new
|
||||
\`ConcreteNode\`s that specify which old nodes they replace via the
|
||||
\`replacesId\` property).
|
||||
|
||||
The Orchestrator then attempts to "rebase" this mutation into the live graph:
|
||||
|
||||
1. It looks at the live graph.
|
||||
2. It checks: _Do all the original nodes (the Proof of Claim) still exist
|
||||
unmodified in the live graph?_
|
||||
3. **If Yes (Clean Fast-Forward):** The orchestrator deletes the old nodes and
|
||||
inserts the new synthesized nodes.
|
||||
4. **If No (Conflict):** If _any_ of the original nodes were deleted or modified
|
||||
by another processor while the async task was running, the orchestrator
|
||||
rejects the async mutation entirely (or handles it via a conflict resolution
|
||||
strategy).
|
||||
|
||||
This guarantees that async pipelines can never corrupt the context state by
|
||||
overwriting newer information with stale data.
|
||||
@@ -1,243 +0,0 @@
|
||||
# Context Management Architecture: A Foundation for Scalable Context Exploration and Experimentation
|
||||
|
||||
## 1. Executive Summary & Motivation
|
||||
|
||||
As our agentic capabilities grow, the active context window becomes our most
|
||||
critical and constrained resource. Mismanaged context leads to broken caching,
|
||||
hallucination, and exorbitant token costs. Historically, context management has
|
||||
been decentralized, stateful, and highly coupled—making it dangerous to mutate
|
||||
and nearly impossible to safely experiment with.
|
||||
|
||||
Our primary goal with this architecture is to establish a rigorous, structured
|
||||
model for context computation that guarantees:
|
||||
|
||||
1. **Safety & Auditability:** Mutations happen in a predictable, auditable, and
|
||||
recoverable way.
|
||||
2. **Asynchronous Safety:** Long-running LLM-driven graph analysis can execute
|
||||
safely without blocking the user or creating race conditions.
|
||||
3. **Trivial Extensibility:** The system can be effortlessly augmented with new
|
||||
compression strategies to scale our experiments.
|
||||
4. **A Universal Data-Plane:** Beyond simple text compression, the architecture
|
||||
generalizes to safely structure _any_ computation in and around the context
|
||||
(e.g., continuous reflection, semantic routing, long-term memory extraction).
|
||||
|
||||
Because the ultimate "right answer" for context compression is unknown and
|
||||
constantly shifting, we have designed this architecture strictly around the
|
||||
**Open-Closed Principle**. We have "closed" the state-mutation engine to
|
||||
guarantee structural integrity, while leaving the behavioral logic entirely
|
||||
"open" for extension via decoupled primitives.
|
||||
|
||||
To be clear: this is not an exercise in over-engineering. Rather, we are
|
||||
applying proven, boring, industry-standard software paradigms—specifically
|
||||
Functional Reactive Programming (FRP), the Actor Model, and the Open-Closed
|
||||
Principle—to tame the inherent complexity of managing an agent's context and
|
||||
prevent the system from collapsing under its own state.
|
||||
|
||||
## 2. Embracing the Unknowns
|
||||
|
||||
Before defining the architecture, we must acknowledge the fundamental realities
|
||||
of context management:
|
||||
|
||||
- **The Trilemma (Quality vs. Cost vs. Caching):** There will never be one
|
||||
single correct way to manage history. Aggressive summarization saves tokens
|
||||
but breaks exact-string caching; retaining raw history maximizes caching but
|
||||
blows up token budgets.
|
||||
- **The Precision / Recall Tension:** Context compression inherently damages
|
||||
_recall_ (the ability to perfectly retrieve specific past details). However,
|
||||
it protects _precision_ (preventing the LLM from being distracted by
|
||||
irrelevant noise). We hypothesize that over time, active context management
|
||||
will focus primarily on protecting precision, while recall will be better
|
||||
protected in a targeted way via external memory, tasks, and planning systems.
|
||||
- **The Need for Rapid Experimentation:** Because these tradeoffs depend heavily
|
||||
on the specific model, task, and user budget, we need a system that lets us
|
||||
explore and express a wide range of options. We must be able to run a large
|
||||
number of experiments—including in production—without risking catastrophic
|
||||
state corruption.
|
||||
|
||||
## 3. The Core Architectural Primitives
|
||||
|
||||
To facilitate safe experimentation, we have separated the _execution_ of context
|
||||
mutations from the _logic_ of context mutations. This division reflects
|
||||
established, robust patterns.
|
||||
|
||||
### The "Closed" Foundation: Synchronous Pipelines (Functional Reactive Programming)
|
||||
|
||||
Drawing from the principles of **Functional Reactive Programming (FRP)**, the
|
||||
Context Working Buffer is treated as an immutable, ahead-of-time tracked graph.
|
||||
It can **only** be mutated synchronously, via an event-triggered **Pipeline**. A
|
||||
Pipeline is simply a linear list of functionally composed processors. By forcing
|
||||
all mutations through a synchronous, blocking pipeline of pure functions, we
|
||||
guarantee that the context is always modified in a sane, predictable, and
|
||||
mathematically sound sequence.
|
||||
|
||||
### The "Open" Extensions: Processors, AsyncProcessors, and Inboxes (The Actor Model)
|
||||
|
||||
To extend the system, developers author two types of plugins:
|
||||
|
||||
1. **Context Processors:** Pure, fast, synchronous functions that take an input
|
||||
graph and return an immutable mutated graph. They run inside Pipelines.
|
||||
2. **Context AsyncProcessors:** Inspired by the **Actor Model**, these are
|
||||
event-triggered background jobs designed for isolated, long-running async
|
||||
computations (e.g., asking an LLM to distill 50 turns of history).
|
||||
3. **Inboxes:** Because the graph can only be mutated synchronously,
|
||||
AsyncProcessors cannot touch the graph directly (preventing race conditions).
|
||||
Instead, they drop their results via message-passing into point-in-time
|
||||
snapshots called _Inboxes_. Processors later read from these Inboxes during a
|
||||
synchronous pipeline run to safely apply the async processor's findings.
|
||||
|
||||
## 4. Proofs of Construction
|
||||
|
||||
To demonstrate why these primitives are perfectly suited to the problem,
|
||||
consider the following structural pseudocode.
|
||||
|
||||
### Example A: Fast, Synchronous Sanitization (The Processor)
|
||||
|
||||
_Scenario: We need to immediately truncate massive tool outputs before they blow
|
||||
out the context window._
|
||||
|
||||
Because this requires no LLM calls, it is expressed as a simple **Processor**
|
||||
running in an ingestion pipeline.
|
||||
|
||||
```typescript
|
||||
// A pure function that guarantees safe mutation
|
||||
class ToolMaskingProcessor implements ContextProcessor {
|
||||
apply(graph: ContextGraph): ContextGraph {
|
||||
const mutatedGraph = graph.clone();
|
||||
|
||||
for (const node of mutatedGraph.getNodes('TOOL_OUTPUT')) {
|
||||
if (node.length > MAX_CHARS) {
|
||||
// Safely replace the node, retaining structural lineage
|
||||
mutatedGraph.replace(node, {
|
||||
type: 'MASKED_TOOL',
|
||||
text: `[Output truncated. Original size: ${node.length}]`,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return mutatedGraph;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Example B: Long-Running Summarization (async pipeline + Inbox + Processor)
|
||||
|
||||
_Scenario: The user has exceeded their token budget. We need to use an LLM to
|
||||
summarize the oldest 20 turns of conversation, but we cannot block the user from
|
||||
continuing to chat while the LLM generates the summary._
|
||||
|
||||
This requires our async-to-sync bridge.
|
||||
|
||||
**Step 1: The async pipeline (Async Analysis)**
|
||||
|
||||
```typescript
|
||||
class StateSnapshotasync pipeline implements Contextasync pipeline {
|
||||
// Triggers automatically in the background when the budget is exceeded
|
||||
async onBudgetExceeded(event: BudgetEvent, inbox: Inbox) {
|
||||
const agedOutNodes = event.getAgedOutNodes();
|
||||
|
||||
// Slow, async LLM call
|
||||
const summaryText = await llm.summarize(agedOutNodes);
|
||||
|
||||
// The async pipeline CANNOT mutate the graph. It leaves a message in the Inbox.
|
||||
inbox.deliver('SUMMARY_READY', {
|
||||
targetNodes: agedOutNodes,
|
||||
summary: summaryText,
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Step 2: The Processor (Sync Application)**
|
||||
|
||||
```typescript
|
||||
class StateSnapshotProcessor implements ContextProcessor {
|
||||
// Runs fast and synchronously during the next Pipeline execution
|
||||
apply(graph: ContextGraph, inbox: InboxSnapshot): ContextGraph {
|
||||
// Check if the async background pipeline finished its job
|
||||
const messages = inbox.read('SUMMARY_READY');
|
||||
if (messages.isEmpty()) return graph;
|
||||
|
||||
const mutatedGraph = graph.clone();
|
||||
|
||||
for (const msg of messages) {
|
||||
// Safely swap the old nodes for the new summary
|
||||
mutatedGraph.collapseNodes(msg.targetNodes, {
|
||||
type: 'ROLLING_SUMMARY',
|
||||
text: msg.summary,
|
||||
});
|
||||
inbox.markConsumed(msg);
|
||||
}
|
||||
|
||||
return mutatedGraph;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Example C: Downstream Observation & Memory Extraction (The Ledger)
|
||||
|
||||
_Scenario: We want to extract long-term memories from conversation turns
|
||||
immediately before they are permanently deleted from the active context window
|
||||
by a garbage collector (GC)._
|
||||
|
||||
Because the Context Working Buffer immutably tracks its own mathematical deltas
|
||||
(the Audit Log) and computes structural lineage Ahead-Of-Time (AOT), downstream
|
||||
processors don't have to guess what happened; they can simply read the math.
|
||||
|
||||
```typescript
|
||||
class MemoryExtractionProcessor implements ContextProcessor {
|
||||
// Runs sequentially AFTER a Garbage Collection (GC) Processor
|
||||
apply(graph: ContextGraph): ContextGraph {
|
||||
// 1. Look at the immutable Audit Log to see what the previous step did
|
||||
const latestMutation = graph.getAuditLog().latest();
|
||||
if (latestMutation.processorId !== 'HistoryTruncationProcessor') {
|
||||
return graph; // Nothing was GC'd, do nothing
|
||||
}
|
||||
|
||||
// 2. Identify the exact pristine nodes that were permanently lost
|
||||
const lostPristineNodes = new Set<ContextNode>();
|
||||
|
||||
for (const removedId of latestMutation.removedIds) {
|
||||
// Because the graph tracks provenance Ahead-Of-Time (AOT),
|
||||
// we perfectly resolve what original thoughts this synthetic node represented.
|
||||
// (e.g. Deleting a ROLLING_SUMMARY implies losing the 3 original USER_PROMPTS it summarized).
|
||||
const roots = graph.getPristineNodes(removedId);
|
||||
roots.forEach((r) => lostPristineNodes.add(r));
|
||||
}
|
||||
|
||||
// 3. Compare against the currently surviving graph to find the TRUE delta
|
||||
// (Ensure the roots aren't still surviving inside some OTHER summary node)
|
||||
for (const survivingNode of graph.getNodes()) {
|
||||
const survivingRoots = graph.getPristineNodes(survivingNode.id);
|
||||
survivingRoots.forEach((r) => lostPristineNodes.delete(r));
|
||||
}
|
||||
|
||||
// 4. Dispatch the permanently lost nodes to the long-term memory subsystem
|
||||
if (lostPristineNodes.size > 0) {
|
||||
// Fire-and-forget async dispatch (Actor Model) to external DB
|
||||
LongTermMemorySystem.dispatchForEmbedding(Array.from(lostPristineNodes));
|
||||
}
|
||||
|
||||
// This processor is purely observational; it returns the graph unmutated
|
||||
return graph;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 5. Conclusion & Future Evolution
|
||||
|
||||
By treating the Context Graph as an immutable ledger updated only via functional
|
||||
Pipelines, we have eliminated race conditions and untraceable graph corruption.
|
||||
By utilizing AsyncProcessors and Inboxes, we have safely bridged the gap between
|
||||
slow LLM analysis and fast, synchronous terminal UI updates.
|
||||
|
||||
We recognize this is not the final form—future iterations may require strict
|
||||
simple priority to updates, or more advanced generational garbage collection.
|
||||
However, this architecture provides a rock-solid, extensible foundation.
|
||||
|
||||
More importantly, while this system was born from the need to manage tokens, its
|
||||
immutable ledger and reactive pipelines generalize to something far more
|
||||
profound. We have built a safe, predictable computational engine capable of
|
||||
driving the _entire_ agentic loop—from background reflection, to semantic
|
||||
routing, to long-term memory extraction. It empowers us to safely deploy,
|
||||
observe, and scale a wide array of strategies in pursuit of the optimal balance
|
||||
between token cost, caching efficiency, and agentic quality.
|
||||
Reference in New Issue
Block a user