Business of AI · Gallery
Inference Cost Forecaster
SIMULATEDVerified Jul 2, 2026Portfolio-level, 24 months out. API scales linearly with volume; self-host is fixed capacity that steps up. Where they cross is the cliff — and utilization decides where it lands. (Per-call economics live in GAP-06.)
Same instrument · three industries — pick a use-case to reconfigure the run
Monthly run-rate · 24 months
Beyond 24 mo
Cumulative
Cumulative
No cliff inside 24 months
Steering-committee takeaway: The cliff is real but further out than vendors say — utilization assumptions decide it, not sticker price.
How this is built & assumptions
API/mo = volume × tokens/call × blended price ($3–$18/1M tokens by frontier share). Volume compounds at the monthly growth rate.
Self-host/mo = ⌈tokens ÷ (cluster capacity 2.5B × utilization)⌉ × $38k amortized + ops FTE × $22k. The cliff is the first month self-host < API.
Stack: Next.js (static) + shared design system; deterministic client-side.
Limitations: cluster capacity, amortization, and ops load are illustrative defaults; real forecasts need your hardware, contracts, and utilization telemetry. It finds the crossover's shape, not the exact date.