Fast checklist

Ops guideBOFU profile

15-minute LLM cost spike checklist for on-call teams

Use this page during incidents. For deeper diagnosis patterns and attribution logic, use the root-cause analysis guide.

Published: 2026-02-24Updated: 2026-02-26

Cost spikesChecklistOperations

Full guide: Bot attacks and LLM cost spikes: prevention playbook

What this guide answers

What changed in cost, cost per request, or budget posture.
Which endpoint, prompt, model, or tenant likely drove the delta.
Which validation step or control to apply next in Opsmeter.io.

What to alert on

request burst with low identity diversity
token-per-request surge without feature traffic growth
retry ratio increase without an upstream outage explanation
new high-cost endpointTag suddenly dominating spend

Execution checklist

Confirm abuse signal: burst, key leak, prompt injection, or scraping.
Rotate compromised keys and block abusive sources immediately.
Apply per-endpoint rate limits and output caps to contain spend.
Document dominant endpointTag, tenant/user concentration, and time window.
Convert the incident into one permanent guardrail update.

Minute 0-5: classify the spike

Is the change volume-driven, token-driven, or both?
Did any deploy happen in the same window?
Is the spike isolated to one endpoint or tenant?

Use this workflow

Turn diagnosis into action

Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.

Apply in your workspace

Re-run this workflow on your own spend data

Follow the same path from article insight to telemetry verification, then validate with your own cost signals.

Quickstart pathSend a first payload, confirm attribution, then return here for operations context.Open quickstart

Evaluation pathPair this guide with trust proof, status, and compare surfaces during review.Open trust proof pack

Minute 5-10: identify dominant driver

Open Top Endpoints and rank by spend.
Open Top Users and rank concentration.
Compare promptVersion cost/request before and after spike.

Minute 10-15: apply temporary controls

Contain retries and suspicious traffic.
Route non-critical paths to lower-cost model tier.
Set temporary token limits where acceptable.
Notify owner with exact endpoint/tenant/promptVersion driver.

Containment options by spike type

Volume spike: rate-limit the dominant endpointTag and throttle unknown identities.
Token spike: cap output tokens and reduce context (summarize history, shrink retrieval top-k).
Deploy spike: rollback the last promptVersion or gate traffic to a canary cohort.
Abuse spike: rotate keys, block sources, and isolate public endpoints.

Incident note template (write this while it is fresh)

Time window + baseline comparison window.
Dominant endpointTag driver + cost/request delta.
Dominant tenant/user driver + concentration percentage.
promptVersion correlation (what changed and when).
Action taken (cap, throttle, route, rollback) + owner.
Follow-up: one permanent guardrail to add.

After-action

Convert this checklist run into a permanent guardrail policy so the next spike is detected earlier.

FAQ

What is the fastest way to find the cost spike driver?

Start with Top Endpoints (feature concentration), then Top Users (tenant concentration), then Prompt Versions (deploy correlation). This order finds the dominant driver quickly without guessing.

Should we optimize prompts immediately during an incident?

No. Contain first (caps, throttles, routing, rollback). Optimization comes after spend stabilizes so you do not chase moving targets while the bill keeps growing.

How do we avoid false alarms from demo/test traffic?

Separate demo/test traffic using dataMode and environment. Alerts and burn-rate checks should be scoped to real production traffic so thresholds remain trustworthy.

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack