Document-heavy claims processing
Bursty claims volume means idle GPUs — utilization is the whole story.
Open the live lab · pre-loaded to this scenario
Inference Cost Forecaster
Context
An insurer runs document-heavy claims through AI — long inputs, moderate volume, but bursty (spikes after weather events). Self-host GPUs sit idle between surges.
The decision
Here utilization, not volume, decides. At 40% utilization the idle capacity keeps API ahead; drive utilization up (batch the backlog) and the cliff appears.
What most miss
Vendors quote self-host at 90% utilization. Bursty workloads run at 40% — the idle GPUs are the cost the pitch omits, and they move the cliff by years.
Stakes
Committing to self-host for a bursty workload can lock in idle-capacity cost that dwarfs the API bill it replaced.
Studied · Business of AI · verified 2026-07-03
Sources: Insurance claims AI (document-heavy, bursty volume); GPU utilization / idle-capacity economics