cleanup docs for now

This commit is contained in:
Your Name
2026-04-10 16:54:19 +00:00
parent 60e9bc8ec5
commit 10009760b6
2 changed files with 0 additions and 295 deletions
-52
View File
@@ -1,52 +0,0 @@
# Async Context Mutations (V1 Architecture)
## The Problem
In V0, the \`ContextManager\` processes LLM inputs sequentially and
synchronously. Processors like \`NodeTruncation\` can safely mutate the graph
because they hold an exclusive lock on the context state.
However, operations like \`StateSnapshotAsyncProcessor\` take a long time to run
(distilling thousands of tokens). If they run synchronously, they block the user
from interacting with the agent. If they run asynchronously in the background,
by the time they finish, the active context graph has likely moved on (new
messages, tool calls, or other truncations have occurred).
## The V1 Solution: Ancestral Replacement (Optimistic Concurrency)
To allow async background pipelines to mutate the live context graph safely, we
use an Optimistic Concurrency Control mechanism called **Ancestral
Replacement**.
### 1. Proof of Claim
When an \`AsyncContextProcessor\` triggers, it is handed a \`ProcessArgs\`
containing a snapshot of the graph (the targets it was asked to process). The
processor records the specific IDs of the \`ConcreteNode\`s it is reading. This
is its "Proof of Claim".
### 2. Background Execution
The processor runs in the background, completely detached from the live graph.
It synthesizes a new state (e.g., a summarized snapshot node).
### 3. The Commit Phase
When the processor finishes, it returns its proposed mutations (an array of new
\`ConcreteNode\`s that specify which old nodes they replace via the
\`replacesId\` property).
The Orchestrator then attempts to "rebase" this mutation into the live graph:
1. It looks at the live graph.
2. It checks: _Do all the original nodes (the Proof of Claim) still exist
unmodified in the live graph?_
3. **If Yes (Clean Fast-Forward):** The orchestrator deletes the old nodes and
inserts the new synthesized nodes.
4. **If No (Conflict):** If _any_ of the original nodes were deleted or modified
by another processor while the async task was running, the orchestrator
rejects the async mutation entirely (or handles it via a conflict resolution
strategy).
This guarantees that async pipelines can never corrupt the context state by
overwriting newer information with stale data.
@@ -1,243 +0,0 @@
# Context Management Architecture: A Foundation for Scalable Context Exploration and Experimentation
## 1. Executive Summary & Motivation
As our agentic capabilities grow, the active context window becomes our most
critical and constrained resource. Mismanaged context leads to broken caching,
hallucination, and exorbitant token costs. Historically, context management has
been decentralized, stateful, and highly coupled—making it dangerous to mutate
and nearly impossible to safely experiment with.
Our primary goal with this architecture is to establish a rigorous, structured
model for context computation that guarantees:
1. **Safety & Auditability:** Mutations happen in a predictable, auditable, and
recoverable way.
2. **Asynchronous Safety:** Long-running LLM-driven graph analysis can execute
safely without blocking the user or creating race conditions.
3. **Trivial Extensibility:** The system can be effortlessly augmented with new
compression strategies to scale our experiments.
4. **A Universal Data-Plane:** Beyond simple text compression, the architecture
generalizes to safely structure _any_ computation in and around the context
(e.g., continuous reflection, semantic routing, long-term memory extraction).
Because the ultimate "right answer" for context compression is unknown and
constantly shifting, we have designed this architecture strictly around the
**Open-Closed Principle**. We have "closed" the state-mutation engine to
guarantee structural integrity, while leaving the behavioral logic entirely
"open" for extension via decoupled primitives.
To be clear: this is not an exercise in over-engineering. Rather, we are
applying proven, boring, industry-standard software paradigms—specifically
Functional Reactive Programming (FRP), the Actor Model, and the Open-Closed
Principle—to tame the inherent complexity of managing an agent's context and
prevent the system from collapsing under its own state.
## 2. Embracing the Unknowns
Before defining the architecture, we must acknowledge the fundamental realities
of context management:
- **The Trilemma (Quality vs. Cost vs. Caching):** There will never be one
single correct way to manage history. Aggressive summarization saves tokens
but breaks exact-string caching; retaining raw history maximizes caching but
blows up token budgets.
- **The Precision / Recall Tension:** Context compression inherently damages
_recall_ (the ability to perfectly retrieve specific past details). However,
it protects _precision_ (preventing the LLM from being distracted by
irrelevant noise). We hypothesize that over time, active context management
will focus primarily on protecting precision, while recall will be better
protected in a targeted way via external memory, tasks, and planning systems.
- **The Need for Rapid Experimentation:** Because these tradeoffs depend heavily
on the specific model, task, and user budget, we need a system that lets us
explore and express a wide range of options. We must be able to run a large
number of experiments—including in production—without risking catastrophic
state corruption.
## 3. The Core Architectural Primitives
To facilitate safe experimentation, we have separated the _execution_ of context
mutations from the _logic_ of context mutations. This division reflects
established, robust patterns.
### The "Closed" Foundation: Synchronous Pipelines (Functional Reactive Programming)
Drawing from the principles of **Functional Reactive Programming (FRP)**, the
Context Working Buffer is treated as an immutable, ahead-of-time tracked graph.
It can **only** be mutated synchronously, via an event-triggered **Pipeline**. A
Pipeline is simply a linear list of functionally composed processors. By forcing
all mutations through a synchronous, blocking pipeline of pure functions, we
guarantee that the context is always modified in a sane, predictable, and
mathematically sound sequence.
### The "Open" Extensions: Processors, AsyncProcessors, and Inboxes (The Actor Model)
To extend the system, developers author two types of plugins:
1. **Context Processors:** Pure, fast, synchronous functions that take an input
graph and return an immutable mutated graph. They run inside Pipelines.
2. **Context AsyncProcessors:** Inspired by the **Actor Model**, these are
event-triggered background jobs designed for isolated, long-running async
computations (e.g., asking an LLM to distill 50 turns of history).
3. **Inboxes:** Because the graph can only be mutated synchronously,
AsyncProcessors cannot touch the graph directly (preventing race conditions).
Instead, they drop their results via message-passing into point-in-time
snapshots called _Inboxes_. Processors later read from these Inboxes during a
synchronous pipeline run to safely apply the async processor's findings.
## 4. Proofs of Construction
To demonstrate why these primitives are perfectly suited to the problem,
consider the following structural pseudocode.
### Example A: Fast, Synchronous Sanitization (The Processor)
_Scenario: We need to immediately truncate massive tool outputs before they blow
out the context window._
Because this requires no LLM calls, it is expressed as a simple **Processor**
running in an ingestion pipeline.
```typescript
// A pure function that guarantees safe mutation
class ToolMaskingProcessor implements ContextProcessor {
apply(graph: ContextGraph): ContextGraph {
const mutatedGraph = graph.clone();
for (const node of mutatedGraph.getNodes('TOOL_OUTPUT')) {
if (node.length > MAX_CHARS) {
// Safely replace the node, retaining structural lineage
mutatedGraph.replace(node, {
type: 'MASKED_TOOL',
text: `[Output truncated. Original size: ${node.length}]`,
});
}
}
return mutatedGraph;
}
}
```
### Example B: Long-Running Summarization (async pipeline + Inbox + Processor)
_Scenario: The user has exceeded their token budget. We need to use an LLM to
summarize the oldest 20 turns of conversation, but we cannot block the user from
continuing to chat while the LLM generates the summary._
This requires our async-to-sync bridge.
**Step 1: The async pipeline (Async Analysis)**
```typescript
class StateSnapshotasync pipeline implements Contextasync pipeline {
// Triggers automatically in the background when the budget is exceeded
async onBudgetExceeded(event: BudgetEvent, inbox: Inbox) {
const agedOutNodes = event.getAgedOutNodes();
// Slow, async LLM call
const summaryText = await llm.summarize(agedOutNodes);
// The async pipeline CANNOT mutate the graph. It leaves a message in the Inbox.
inbox.deliver('SUMMARY_READY', {
targetNodes: agedOutNodes,
summary: summaryText,
});
}
}
```
**Step 2: The Processor (Sync Application)**
```typescript
class StateSnapshotProcessor implements ContextProcessor {
// Runs fast and synchronously during the next Pipeline execution
apply(graph: ContextGraph, inbox: InboxSnapshot): ContextGraph {
// Check if the async background pipeline finished its job
const messages = inbox.read('SUMMARY_READY');
if (messages.isEmpty()) return graph;
const mutatedGraph = graph.clone();
for (const msg of messages) {
// Safely swap the old nodes for the new summary
mutatedGraph.collapseNodes(msg.targetNodes, {
type: 'ROLLING_SUMMARY',
text: msg.summary,
});
inbox.markConsumed(msg);
}
return mutatedGraph;
}
}
```
### Example C: Downstream Observation & Memory Extraction (The Ledger)
_Scenario: We want to extract long-term memories from conversation turns
immediately before they are permanently deleted from the active context window
by a garbage collector (GC)._
Because the Context Working Buffer immutably tracks its own mathematical deltas
(the Audit Log) and computes structural lineage Ahead-Of-Time (AOT), downstream
processors don't have to guess what happened; they can simply read the math.
```typescript
class MemoryExtractionProcessor implements ContextProcessor {
// Runs sequentially AFTER a Garbage Collection (GC) Processor
apply(graph: ContextGraph): ContextGraph {
// 1. Look at the immutable Audit Log to see what the previous step did
const latestMutation = graph.getAuditLog().latest();
if (latestMutation.processorId !== 'HistoryTruncationProcessor') {
return graph; // Nothing was GC'd, do nothing
}
// 2. Identify the exact pristine nodes that were permanently lost
const lostPristineNodes = new Set<ContextNode>();
for (const removedId of latestMutation.removedIds) {
// Because the graph tracks provenance Ahead-Of-Time (AOT),
// we perfectly resolve what original thoughts this synthetic node represented.
// (e.g. Deleting a ROLLING_SUMMARY implies losing the 3 original USER_PROMPTS it summarized).
const roots = graph.getPristineNodes(removedId);
roots.forEach((r) => lostPristineNodes.add(r));
}
// 3. Compare against the currently surviving graph to find the TRUE delta
// (Ensure the roots aren't still surviving inside some OTHER summary node)
for (const survivingNode of graph.getNodes()) {
const survivingRoots = graph.getPristineNodes(survivingNode.id);
survivingRoots.forEach((r) => lostPristineNodes.delete(r));
}
// 4. Dispatch the permanently lost nodes to the long-term memory subsystem
if (lostPristineNodes.size > 0) {
// Fire-and-forget async dispatch (Actor Model) to external DB
LongTermMemorySystem.dispatchForEmbedding(Array.from(lostPristineNodes));
}
// This processor is purely observational; it returns the graph unmutated
return graph;
}
}
```
## 5. Conclusion & Future Evolution
By treating the Context Graph as an immutable ledger updated only via functional
Pipelines, we have eliminated race conditions and untraceable graph corruption.
By utilizing AsyncProcessors and Inboxes, we have safely bridged the gap between
slow LLM analysis and fast, synchronous terminal UI updates.
We recognize this is not the final form—future iterations may require strict
simple priority to updates, or more advanced generational garbage collection.
However, this architecture provides a rock-solid, extensible foundation.
More importantly, while this system was born from the need to manage tokens, its
immutable ledger and reactive pipelines generalize to something far more
profound. We have built a safe, predictable computational engine capable of
driving the _entire_ agentic loop—from background reflection, to semantic
routing, to long-term memory extraction. It empowers us to safely deploy,
observe, and scale a wide array of strategies in pursuit of the optimal balance
between token cost, caching efficiency, and agentic quality.