Opsmeter.io logo
Opsmeter.io
AI Cost & Inference Control

Root cause workflow

Ops guideMOFU profile

Root cause an LLM cost spike: endpoint, tenant, deploy

Use this guide for diagnosis logic and evidence order. For on-call execution, use the 15-minute checklist article.

AttributionRoot causePrompt versions

Full guide: Bot attacks and LLM cost spikes: prevention playbook

What this guide answers

  • What changed in cost, cost per request, or budget posture.
  • Which endpoint, prompt, model, or tenant likely drove the delta.
  • Which validation step or control to apply next in Opsmeter.io.

What to send (payload example)

{
  "externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
  "provider": "provider_id",
  "model": "model_id",
  "endpointTag": "public.chat",
  "promptVersion": "public_v1",
  "userId": "anon_ip_hash",
  "inputTokens": 260,
  "outputTokens": 190,
  "latencyMs": 892,
  "status": "success",
  "dataMode": "real",
  "environment": "prod"
}

Common mistakes

  • Looking only at monthly totals instead of isolating the spike time window (UTC).
  • Changing models globally instead of scoping fixes to one endpointTag.
  • Ignoring retry storms (the cost multiplier is calls per action, not token price).
  • Missing promptVersion tagging, so deploy correlation becomes guesswork.
  • Mixing demo/test traffic with production, then “fixing” a false spike.

How to verify in the Opsmeter.io dashboard

  1. In Overview, set the exact spike window and compare it to the prior baseline window.
  2. In Top Endpoints, rank spend and cost/request to find the dominant feature driver.
  3. In Top Users, check tenant/user concentration and unknown-user bursts.
  4. In Prompt Versions, compare before/after cost/request for versions shipped in the spike window.
  5. Validate with raw samples: retries, model mix, and token deltas match the hypothesized driver.

Use totals as entry point, not final answer

Totals tell you when spend changed, but they do not identify which endpoint or deploy created the change.

Root cause requires request-level tags and a deterministic investigation order.

Use this workflow

Turn diagnosis into action

Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.

Apply in your workspace

Re-run this workflow on your own spend data

Follow the same path from article insight to telemetry verification, then validate with your own cost signals.

Quickstart pathSend a first payload, confirm attribution, then return here for operations context.Open quickstart
Evaluation pathPair this guide with trust proof, status, and compare surfaces during review.Open trust proof pack

Root-cause sequence

  1. Step 1: validate spike window in Overview.
  2. Step 2: rank spend by endpointTag.
  3. Step 3: rank spend by userId/tenant mapping.
  4. Step 4: compare promptVersion cost/request before and after deploy.
  5. Step 5: isolate one dominant driver and assign owner.

Classify the spike type first (so you do not chase the wrong thing)

Spikes feel mysterious when you treat them as one category. Start by classifying: volume, efficiency, routing, retries, or abuse.

Each category has a different fix. Prompt tuning does not solve retry storms, and model swaps do not solve context bloat.

  • Volume spike: requests/hour up (same cost/request).
  • Efficiency regression: cost/request up (volume flat) after deploy.
  • Routing drift: a higher-cost model share increases on one endpoint.
  • Retry storm: errors + retries increase calls per user action.
  • Abuse/bot: unknown-user concentration + burst traffic on public endpoints.

Fields required for reliable attribution

  • externalRequestId (stable across retries)
  • endpointTag, promptVersion, userId (optional but recommended)
  • provider/model and token counts
  • latency and status for retry/noise correlation

Evidence checklist (what to compare before vs after)

  1. Requests/hour and tokens/request (input + output) per endpointTag.
  2. Cost/request and model mix per endpointTag.
  3. Error rate (429/5xx) and retry multiplier (same externalRequestId repeating).
  4. Tenant/user concentration change (top 1 tenant share and top 5 share).
  5. PromptVersion changes in the same window (deploy correlation).

Validation checklist before closing incident

  1. Cost/request returned to baseline.
  2. Dominant endpoint share normalized.
  3. No abnormal retry ratio in the same window.
  4. Alert threshold tuned to catch recurrence earlier.

Related guides

Use the 15-minute checklistOpen compare hubCompare alternatives

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack