Alert design
AI cost anomaly detection: practical thresholds that actually work
Most alert systems fail from noisy thresholds. Better thresholds use trend context and clear owner workflows.
Full guide: LLM budget alert policy: thresholds and escalation
What this guide answers
- What changed in cost, cost per request, or budget posture.
- Which endpoint, prompt, model, or tenant likely drove the delta.
- Which validation step or control to apply next in Opsmeter.io.
What to send (payload example)
{
"externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
"provider": "provider_id",
"model": "model_id",
"endpointTag": "billing.guardrail_check",
"promptVersion": "budget_v1",
"userId": "tenant_acme_hash",
"inputTokens": 240,
"outputTokens": 80,
"latencyMs": 892,
"status": "success",
"dataMode": "real",
"environment": "prod"
}Common mistakes
- Setting static thresholds without burn-rate checks.
- No single owner or escalation path for warning/exceeded states.
- Alerting on totals only (missing endpoint and tenant concentration context).
- Including demo/staging traffic in production spend policy decisions.
How to verify in the Opsmeter.io dashboard
- Use Overview to confirm spike window and budget posture.
- Use Top Endpoints to find feature-level concentration.
- Use Top Users to find tenant-level concentration.
- Use Prompt Versions to validate deploy-linked cost drift.
Threshold model that scales
- Budget warning threshold (example: 80 percent)
- Budget exceeded threshold (100 percent)
- Burn-rate threshold versus trailing baseline
- Endpoint concentration threshold for dominant drivers
Use this workflow
Turn diagnosis into action
Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.
Apply in your workspace
Re-run this workflow on your own spend data
Follow the same path from article insight to telemetry verification, then validate with your own cost signals.
Threshold templates
- Low-volume workspace: prioritize budget warning + endpoint concentration.
- Growing workspace: add burn-rate > 2-3x baseline checks.
- High-volume workspace: add promptVersion drift checks after deploy.
- Critical workspace: require owner acknowledgement on exceeded state.
Reduce false positives
- Separate demo/test traffic with dataMode and environment.
- Correlate spend jump with request-volume jump before paging.
- Mute known migration windows with short maintenance policy.
- Keep one action owner per alert channel.
Developer-friendly signals (tokens/hour and cost/request)
Budget thresholds are necessary, but they are sometimes slow to react for fast incidents. Engineers often prefer rate-based signals they can reason about quickly.
Add one tokens/hour or requests/hour check, plus cost/request drift, so a regression is visible even when absolute spend is still small.
- tokens/hour or requests/hour vs trailing baseline (detect volume bursts)
- cost/request vs baseline (detect efficiency regressions)
- endpointTag concentration change (detect one feature going wild)
- promptVersion correlation (detect deploy-linked drift)
Alerts vs spending caps (set expectations)
Some teams expect a budget system to behave like a hard spending cap on API keys. In most stacks, alerts are an operations workflow, while hard caps require runtime enforcement.
Design your thresholds around the control you actually have: alerts and playbooks first, then enforcement where it is safe for user experience.
- Define what happens at warning (human workflow) vs exceeded (incident decision).
- Attach top endpointTag + tenant/user drivers to every alert.
- Decide how to degrade safely (smaller context, shorter outputs, fewer tools).
- Only hard-block non-critical endpoints when you have clear messaging.
What to do when anomaly fires
- Classify anomaly: traffic, token, deploy, or abuse.
- Open Top Endpoints and Top Users immediately.
- Apply temporary containment and log decision.
- Convert repeated anomalies into permanent guardrails.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.