Retry control
Retry storms: how retries can multiply your LLM bill
A retry storm is one of the fastest ways to inflate spend while still seeing mostly valid responses.
Full guide: Prompt deploy cost regressions: catch silent cost spikes
What this guide answers
- What changed in cost, cost per request, or budget posture.
- Which endpoint, prompt, model, or tenant likely drove the delta.
- Which validation step or control to apply next in Opsmeter.io.
What to send (payload example)
{
"externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
"provider": "provider_id",
"model": "model_id",
"endpointTag": "checkout.ai_summary",
"promptVersion": "summary_v3",
"userId": "tenant_acme_hash",
"inputTokens": 540,
"outputTokens": 180,
"latencyMs": 892,
"status": "success",
"dataMode": "real",
"environment": "prod"
}Common mistakes
- Comparing totals only instead of cost/request and token deltas by promptVersion.
- Skipping long-tail outlier review (p95/p99) where regressions hide.
- Letting retrieval config drift (top-k/chunk overlap) without a token budget.
- Not capping output tokens on low-risk endpoints after a deploy.
How to verify in the Opsmeter.io dashboard
- Use Overview to confirm spike window and budget posture.
- Use Top Endpoints to find feature-level concentration.
- Use Top Users to find tenant-level concentration.
- Use Prompt Versions to validate deploy-linked cost drift.
How retry storms start
- Aggressive client retries on timeout
- Shared retry policy across user-facing and batch paths
- Missing jitter and max-attempt caps
- No idempotency key on retried requests
Use this workflow
Turn diagnosis into action
Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.
Apply in your workspace
Re-run this workflow on your own spend data
Follow the same path from article insight to telemetry verification, then validate with your own cost signals.
Detection signals
- Request count rises faster than successful user actions.
- Latency and timeout ratio increase together with spend.
- Same endpointTag dominates both errors and spend.
- Duplicate externalRequestId patterns appear in telemetry.
Why retries multiply cost (simple math)
Retries are a cost multiplier because the provider bills per attempt, not per successful outcome. Even small increases in attempts-per-success can double effective cost.
Track attempts-per-success per endpointTag so you can contain the right feature path instead of guessing.
- attemptCost = average cost of one attempt
- attemptsPerSuccess = attempts / successful requests
- effectiveCostPerSuccess = attemptCost * attemptsPerSuccess
Containment
- Cap max retries by endpoint criticality.
- Use exponential backoff with jitter.
- Introduce circuit-breaker behavior for known provider failures.
- Separate batch retry policy from interactive traffic.
Idempotency and request IDs (avoid double-billing patterns)
- Reuse externalRequestId for one logical user action across retries.
- Track attempt number and final status so effective cost is explainable.
- Avoid layered retries (proxy + app) without one owner and one policy.
- Disable automatic retries on non-idempotent endpoints unless you can reconcile duplicates.
Long-term guardrail
Track retry ratio as a cost-control metric, not only reliability metric.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.