Agent & Protocol · Toolkit

Agent Loop & Failure Inspector

SIMULATEDVerified Jul 2, 2026

Task: resolve a card dispute end-to-end. Step the loop, restructure it by architecture, then inject a failure — the failure, its detection signal, and the recovery policy are the part you actually budget for.

Architecture

Inject failure

0/9

Press Play or Step to run the trace.

The harness you budget for

  • Tool error

    non-2xx / timeout → retry + fallback

  • Loop

    repeated action ≥3 → cap + escalate

  • Hallucinated args

    schema validation → reject + re-ask

  • Context overflow

    token budget → summarize + evict

Every one needs a detection signal and a recovery policy. That's the observability line item.

Failure injection is the whole point

A happy-path demo tells you nothing about production. The four failures above are what actually happen — and each is cheap to catch with the right signal and expensive to miss. That gap is the observability budget.

Steering-committee takeaway: You don't budget for agents; you budget for agents plus the harness that catches these four failures.

How this is built

Each architecture defines a base Thought→Action→Observation trace; a selected failure splices a failure→detection→recovery triad into the loop. The stepper reveals steps in order.

Stack: Next.js (static) + shared design system; deterministic client-side.

Limitations: traces are illustrative constructions, not captured from a live agent; real detection needs instrumentation (traces, token meters, action fingerprints). It shows the failure taxonomy and response pattern, not a monitoring stack.