From 10009760b60b1bd02734d6d74f6bd63f3ea9e117 Mon Sep 17 00:00:00 2001 From: Your Name Date: Fri, 10 Apr 2026 16:54:19 +0000 Subject: [PATCH] cleanup docs for now --- docs/context-manager-async-mutations.md | 52 ---- .../src/context/context-manager-v0-design.md | 243 ------------------ 2 files changed, 295 deletions(-) delete mode 100644 docs/context-manager-async-mutations.md delete mode 100644 packages/core/src/context/context-manager-v0-design.md diff --git a/docs/context-manager-async-mutations.md b/docs/context-manager-async-mutations.md deleted file mode 100644 index db4545b4b5..0000000000 --- a/docs/context-manager-async-mutations.md +++ /dev/null @@ -1,52 +0,0 @@ -# Async Context Mutations (V1 Architecture) - -## The Problem - -In V0, the \`ContextManager\` processes LLM inputs sequentially and -synchronously. Processors like \`NodeTruncation\` can safely mutate the graph -because they hold an exclusive lock on the context state. - -However, operations like \`StateSnapshotAsyncProcessor\` take a long time to run -(distilling thousands of tokens). If they run synchronously, they block the user -from interacting with the agent. If they run asynchronously in the background, -by the time they finish, the active context graph has likely moved on (new -messages, tool calls, or other truncations have occurred). - -## The V1 Solution: Ancestral Replacement (Optimistic Concurrency) - -To allow async background pipelines to mutate the live context graph safely, we -use an Optimistic Concurrency Control mechanism called **Ancestral -Replacement**. - -### 1. Proof of Claim - -When an \`AsyncContextProcessor\` triggers, it is handed a \`ProcessArgs\` -containing a snapshot of the graph (the targets it was asked to process). The -processor records the specific IDs of the \`ConcreteNode\`s it is reading. This -is its "Proof of Claim". - -### 2. Background Execution - -The processor runs in the background, completely detached from the live graph. -It synthesizes a new state (e.g., a summarized snapshot node). - -### 3. The Commit Phase - -When the processor finishes, it returns its proposed mutations (an array of new -\`ConcreteNode\`s that specify which old nodes they replace via the -\`replacesId\` property). - -The Orchestrator then attempts to "rebase" this mutation into the live graph: - -1. It looks at the live graph. -2. It checks: _Do all the original nodes (the Proof of Claim) still exist - unmodified in the live graph?_ -3. **If Yes (Clean Fast-Forward):** The orchestrator deletes the old nodes and - inserts the new synthesized nodes. -4. **If No (Conflict):** If _any_ of the original nodes were deleted or modified - by another processor while the async task was running, the orchestrator - rejects the async mutation entirely (or handles it via a conflict resolution - strategy). - -This guarantees that async pipelines can never corrupt the context state by -overwriting newer information with stale data. diff --git a/packages/core/src/context/context-manager-v0-design.md b/packages/core/src/context/context-manager-v0-design.md deleted file mode 100644 index 9490d3b248..0000000000 --- a/packages/core/src/context/context-manager-v0-design.md +++ /dev/null @@ -1,243 +0,0 @@ -# Context Management Architecture: A Foundation for Scalable Context Exploration and Experimentation - -## 1. Executive Summary & Motivation - -As our agentic capabilities grow, the active context window becomes our most -critical and constrained resource. Mismanaged context leads to broken caching, -hallucination, and exorbitant token costs. Historically, context management has -been decentralized, stateful, and highly coupled—making it dangerous to mutate -and nearly impossible to safely experiment with. - -Our primary goal with this architecture is to establish a rigorous, structured -model for context computation that guarantees: - -1. **Safety & Auditability:** Mutations happen in a predictable, auditable, and - recoverable way. -2. **Asynchronous Safety:** Long-running LLM-driven graph analysis can execute - safely without blocking the user or creating race conditions. -3. **Trivial Extensibility:** The system can be effortlessly augmented with new - compression strategies to scale our experiments. -4. **A Universal Data-Plane:** Beyond simple text compression, the architecture - generalizes to safely structure _any_ computation in and around the context - (e.g., continuous reflection, semantic routing, long-term memory extraction). - -Because the ultimate "right answer" for context compression is unknown and -constantly shifting, we have designed this architecture strictly around the -**Open-Closed Principle**. We have "closed" the state-mutation engine to -guarantee structural integrity, while leaving the behavioral logic entirely -"open" for extension via decoupled primitives. - -To be clear: this is not an exercise in over-engineering. Rather, we are -applying proven, boring, industry-standard software paradigms—specifically -Functional Reactive Programming (FRP), the Actor Model, and the Open-Closed -Principle—to tame the inherent complexity of managing an agent's context and -prevent the system from collapsing under its own state. - -## 2. Embracing the Unknowns - -Before defining the architecture, we must acknowledge the fundamental realities -of context management: - -- **The Trilemma (Quality vs. Cost vs. Caching):** There will never be one - single correct way to manage history. Aggressive summarization saves tokens - but breaks exact-string caching; retaining raw history maximizes caching but - blows up token budgets. -- **The Precision / Recall Tension:** Context compression inherently damages - _recall_ (the ability to perfectly retrieve specific past details). However, - it protects _precision_ (preventing the LLM from being distracted by - irrelevant noise). We hypothesize that over time, active context management - will focus primarily on protecting precision, while recall will be better - protected in a targeted way via external memory, tasks, and planning systems. -- **The Need for Rapid Experimentation:** Because these tradeoffs depend heavily - on the specific model, task, and user budget, we need a system that lets us - explore and express a wide range of options. We must be able to run a large - number of experiments—including in production—without risking catastrophic - state corruption. - -## 3. The Core Architectural Primitives - -To facilitate safe experimentation, we have separated the _execution_ of context -mutations from the _logic_ of context mutations. This division reflects -established, robust patterns. - -### The "Closed" Foundation: Synchronous Pipelines (Functional Reactive Programming) - -Drawing from the principles of **Functional Reactive Programming (FRP)**, the -Context Working Buffer is treated as an immutable, ahead-of-time tracked graph. -It can **only** be mutated synchronously, via an event-triggered **Pipeline**. A -Pipeline is simply a linear list of functionally composed processors. By forcing -all mutations through a synchronous, blocking pipeline of pure functions, we -guarantee that the context is always modified in a sane, predictable, and -mathematically sound sequence. - -### The "Open" Extensions: Processors, AsyncProcessors, and Inboxes (The Actor Model) - -To extend the system, developers author two types of plugins: - -1. **Context Processors:** Pure, fast, synchronous functions that take an input - graph and return an immutable mutated graph. They run inside Pipelines. -2. **Context AsyncProcessors:** Inspired by the **Actor Model**, these are - event-triggered background jobs designed for isolated, long-running async - computations (e.g., asking an LLM to distill 50 turns of history). -3. **Inboxes:** Because the graph can only be mutated synchronously, - AsyncProcessors cannot touch the graph directly (preventing race conditions). - Instead, they drop their results via message-passing into point-in-time - snapshots called _Inboxes_. Processors later read from these Inboxes during a - synchronous pipeline run to safely apply the async processor's findings. - -## 4. Proofs of Construction - -To demonstrate why these primitives are perfectly suited to the problem, -consider the following structural pseudocode. - -### Example A: Fast, Synchronous Sanitization (The Processor) - -_Scenario: We need to immediately truncate massive tool outputs before they blow -out the context window._ - -Because this requires no LLM calls, it is expressed as a simple **Processor** -running in an ingestion pipeline. - -```typescript -// A pure function that guarantees safe mutation -class ToolMaskingProcessor implements ContextProcessor { - apply(graph: ContextGraph): ContextGraph { - const mutatedGraph = graph.clone(); - - for (const node of mutatedGraph.getNodes('TOOL_OUTPUT')) { - if (node.length > MAX_CHARS) { - // Safely replace the node, retaining structural lineage - mutatedGraph.replace(node, { - type: 'MASKED_TOOL', - text: `[Output truncated. Original size: ${node.length}]`, - }); - } - } - - return mutatedGraph; - } -} -``` - -### Example B: Long-Running Summarization (async pipeline + Inbox + Processor) - -_Scenario: The user has exceeded their token budget. We need to use an LLM to -summarize the oldest 20 turns of conversation, but we cannot block the user from -continuing to chat while the LLM generates the summary._ - -This requires our async-to-sync bridge. - -**Step 1: The async pipeline (Async Analysis)** - -```typescript -class StateSnapshotasync pipeline implements Contextasync pipeline { - // Triggers automatically in the background when the budget is exceeded - async onBudgetExceeded(event: BudgetEvent, inbox: Inbox) { - const agedOutNodes = event.getAgedOutNodes(); - - // Slow, async LLM call - const summaryText = await llm.summarize(agedOutNodes); - - // The async pipeline CANNOT mutate the graph. It leaves a message in the Inbox. - inbox.deliver('SUMMARY_READY', { - targetNodes: agedOutNodes, - summary: summaryText, - }); - } -} -``` - -**Step 2: The Processor (Sync Application)** - -```typescript -class StateSnapshotProcessor implements ContextProcessor { - // Runs fast and synchronously during the next Pipeline execution - apply(graph: ContextGraph, inbox: InboxSnapshot): ContextGraph { - // Check if the async background pipeline finished its job - const messages = inbox.read('SUMMARY_READY'); - if (messages.isEmpty()) return graph; - - const mutatedGraph = graph.clone(); - - for (const msg of messages) { - // Safely swap the old nodes for the new summary - mutatedGraph.collapseNodes(msg.targetNodes, { - type: 'ROLLING_SUMMARY', - text: msg.summary, - }); - inbox.markConsumed(msg); - } - - return mutatedGraph; - } -} -``` - -### Example C: Downstream Observation & Memory Extraction (The Ledger) - -_Scenario: We want to extract long-term memories from conversation turns -immediately before they are permanently deleted from the active context window -by a garbage collector (GC)._ - -Because the Context Working Buffer immutably tracks its own mathematical deltas -(the Audit Log) and computes structural lineage Ahead-Of-Time (AOT), downstream -processors don't have to guess what happened; they can simply read the math. - -```typescript -class MemoryExtractionProcessor implements ContextProcessor { - // Runs sequentially AFTER a Garbage Collection (GC) Processor - apply(graph: ContextGraph): ContextGraph { - // 1. Look at the immutable Audit Log to see what the previous step did - const latestMutation = graph.getAuditLog().latest(); - if (latestMutation.processorId !== 'HistoryTruncationProcessor') { - return graph; // Nothing was GC'd, do nothing - } - - // 2. Identify the exact pristine nodes that were permanently lost - const lostPristineNodes = new Set(); - - for (const removedId of latestMutation.removedIds) { - // Because the graph tracks provenance Ahead-Of-Time (AOT), - // we perfectly resolve what original thoughts this synthetic node represented. - // (e.g. Deleting a ROLLING_SUMMARY implies losing the 3 original USER_PROMPTS it summarized). - const roots = graph.getPristineNodes(removedId); - roots.forEach((r) => lostPristineNodes.add(r)); - } - - // 3. Compare against the currently surviving graph to find the TRUE delta - // (Ensure the roots aren't still surviving inside some OTHER summary node) - for (const survivingNode of graph.getNodes()) { - const survivingRoots = graph.getPristineNodes(survivingNode.id); - survivingRoots.forEach((r) => lostPristineNodes.delete(r)); - } - - // 4. Dispatch the permanently lost nodes to the long-term memory subsystem - if (lostPristineNodes.size > 0) { - // Fire-and-forget async dispatch (Actor Model) to external DB - LongTermMemorySystem.dispatchForEmbedding(Array.from(lostPristineNodes)); - } - - // This processor is purely observational; it returns the graph unmutated - return graph; - } -} -``` - -## 5. Conclusion & Future Evolution - -By treating the Context Graph as an immutable ledger updated only via functional -Pipelines, we have eliminated race conditions and untraceable graph corruption. -By utilizing AsyncProcessors and Inboxes, we have safely bridged the gap between -slow LLM analysis and fast, synchronous terminal UI updates. - -We recognize this is not the final form—future iterations may require strict -simple priority to updates, or more advanced generational garbage collection. -However, this architecture provides a rock-solid, extensible foundation. - -More importantly, while this system was born from the need to manage tokens, its -immutable ledger and reactive pipelines generalize to something far more -profound. We have built a safe, predictable computational engine capable of -driving the _entire_ agentic loop—from background reflection, to semantic -routing, to long-term memory extraction. It empowers us to safely deploy, -observe, and scale a wide array of strategies in pursuit of the optimal balance -between token cost, caching efficiency, and agentic quality.