Bill shock response
AI cost spike: why your LLM bill increased (and how to fix it)
Most spikes come from a small set of patterns: token growth, retries, abuse traffic, or deploy drift. You need one repeatable response workflow.
Full guide: Bot attacks and LLM cost spikes: prevention playbook
Nine reasons LLM bills spike overnight
- Prompt or context growth after deploy
- Retry storms after timeout or rate-limit errors
- Traffic burst from one endpoint or tenant
- Bot abuse or leaked API key
- Model routing drift to higher-cost tiers
- Missing token caps on non-critical flows
- Batch job replay or duplicate jobs
- Unknown models priced late
- No budget thresholds with assigned owner
15-minute triage flow
- Confirm spike window and trend direction in Overview.
- Find endpoint concentration in Top Endpoints.
- Find tenant/user concentration in Top Users.
- Check Prompt Versions for deploy-linked cost/request drift.
- Contain retries and abuse before tuning prompts.
What to fix immediately
- Apply retry backoff and request-rate constraints.
- Set temporary model tiering for non-critical paths.
- Cap max tokens where quality impact is acceptable.
- Set warning/exceeded thresholds with one owner.
What to institutionalize
Convert this incident flow into a standard runbook for every workspace.
Treat every cost spike as a policy gap and close it with one permanent control.
Who this is for
- Security and platform teams responding to bot abuse, leaked keys, and spend fraud.
- Teams running public endpoints that need rate-limits and budget containment.
- Operators who need a repeatable playbook for cost spikes and traffic anomalies.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.