Business of AI · Gallery

Inference Cost Forecaster

SIMULATEDVerified Jul 2, 2026

Portfolio-level, 24 months out. API scales linearly with volume; self-host is fixed capacity that steps up. Where they cross is the cliff — and utilization decides where it lands. (Per-call economics live in GAP-06.)

Same instrument · three industries — pick a use-case to reconfigure the run

Starting volume0.5M/mo

Monthly growth6%

Tokens / call3,000

Share on frontier model40%

Self-host utilization60%

Ops headcount (FTE)1.5

Monthly run-rate · 24 months

API Self-host

The cliff

—

Beyond 24 mo

API · 24-mo total

$686k

Cumulative

Self-host · 24-mo total

$3.22M

Cumulative

No cliff inside 24 months

At these assumptions API stays cheaper for all 24 months. Raise growth or lower the frontier-model share to bring a cliff into view — or accept that self-host doesn't pay yet.

Steering-committee takeaway: The cliff is real but further out than vendors say — utilization assumptions decide it, not sticker price.

How this is built & assumptions

API/mo = volume × tokens/call × blended price ($3–$18/1M tokens by frontier share). Volume compounds at the monthly growth rate.

Self-host/mo = ⌈tokens ÷ (cluster capacity 2.5B × utilization)⌉ × $38k amortized + ops FTE × $22k. The cliff is the first month self-host < API.

Stack: Next.js (static) + shared design system; deterministic client-side.

Limitations: cluster capacity, amortization, and ops load are illustrative defaults; real forecasts need your hardware, contracts, and utilization telemetry. It finds the crossover's shape, not the exact date.