Agent & Protocol · Toolkit

Context & Memory Engineering

SIMULATEDVerified Jul 2, 2026

The same multi-turn task, four ways to carry the context. Drag the turns up and watch full-dump's cost and overflow risk climb while the others trade a little fidelity for a lot of headroom.

Same instrument · three industries — pick a use-case to reconfigure the run

Conversation turns6

Full dump

Send the whole history every call.

Context / call22k tok

$66/1k calls

Fidelity100%

Risk · overflow72%

Summarize

Rolling summary + last two turns.

Context / call12k tok

$36/1k calls

Fidelity76%

Risk · info loss24%

Compress

Semantic compression of history.

Context / call13k tok

$39/1k calls

Fidelity86%

Risk · detail loss12%

Sub-agent handoff

Only the current sub-task's brief.

Context / call7k tok

$21/1k calls

Fidelity70%

Risk · coordination30%

It's a dial, not a default

Full dump is right for a short, high-stakes exchange; summarize or compress for a long assistant thread; handoff when specialized sub-agents each need only their slice. The mistake is picking one and calling it the platform standard.

Steering-committee takeaway: Context strategy is a cost-fidelity dial. I set it per use case, not per platform.

How this is built

Per strategy, context/call grows differently with turns (full = linear, summarize = ~flat, compress = log, handoff = flat-small). Cost ≈ context × input price; fidelity and risk are modeled per strategy; overflow triggers when context exceeds a 24k working window. Memory retention applies each policy's eviction rule to a fixed fact set.

Stack: Next.js (static) + shared design system; deterministic client-side.

Limitations: token sizes and fidelity are illustrative; real numbers depend on the model, the summarizer, and the task. It shows the trade-offs' shape, not a benchmarked comparison.