Budgets
LLM budget alert cooldown and dedupe: stop notification noise
Alert quality determines response quality. Use cooldown and dedupe to reduce noise and keep escalation reliable.
Full guide: LLM budget alert policy: thresholds and escalation
Why alert noise is expensive
Repeated notifications for the same incident reduce trust and delay containment.
Cooldown and dedupe preserve signal quality so teams act quickly on real spend risk.
Default policy baseline
- Cooldown default: 300 seconds (5 minutes)
- Dedupe by incident key (scope + metric + threshold window)
- Severity transitions allowed: warning -> critical
- Every alert links to investigation window and dominant driver
Tuning strategy by workspace maturity
- Start with default cooldown and review false-positive rate weekly.
- Increase cooldown for noisy low-value endpoints.
- Lower cooldown only when incident ownership is mature.
- Keep warning/exceeded transitions visible even with dedupe.
What to validate in alerts inbox
- cooldownApplied and dedupeApplied flags are shown
- investigation range opens with correct baseline
- alert type and threshold match policy configuration
- delivery mode aligns with team cadence (immediate/daily/weekly)
What to alert on
- burn-rate acceleration vs baseline
- endpointTag concentration changes in short windows
- unexpected tenant concentration in Top Users
- budget warning, spend-alert, and exceeded state transitions
Execution checklist
- Confirm alert is real: dataMode, environment, and time window.
- Identify dominant endpointTag and tenant/user contributors.
- Contain: cap output, lower max tokens, or throttle non-critical paths.
- Assign one incident owner and one communication channel.
- Update policy thresholds or ownership to prevent repeat incidents.
FAQ
Is userId required?
No. userId is optional, but recommended for tenant-level attribution. If needed, send a hashed identifier.
Where should token usage values come from?
Prefer provider usage fields first. If unavailable, use tokenizer estimates and mark uncertainty in your workflow.
How should retries be handled?
Keep the same externalRequestId for the same logical request so idempotency remains stable across retries.
Can telemetry break production flow?
It should not. Use short timeouts, catch errors, and keep telemetry asynchronous so provider calls keep running.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.