Abuse protection
Bot abuse on LLM endpoints: stop fraudulent spend fast
Bot abuse is often visible as concentration anomalies. Fast identification of who/where patterns is the key control.
Full guide: Bot attacks and LLM cost spikes: prevention playbook
Abuse indicators
- Traffic burst with low identity diversity
- High cost concentration on one endpointTag
- Abnormal request cadence from a narrow actor set
- Rapid spend increase without related product events
Common bot abuse patterns (what it looks like)
- Scraping and prompt injection attempts on public chat endpoints.
- Credential stuffing / token theft leading to authenticated abuse.
- Key leaks from client-side code or logs (sudden spend from unknown sources).
- Enumeration attacks that probe expensive endpoints repeatedly.
- Retry amplification (bots trigger timeouts and multiply attempts).
Response workflow
- Isolate suspicious endpoints and actor patterns.
- Apply temporary limits and stricter auth checks.
- Route non-critical calls through lower-cost model tier.
- Watch post-containment spend trend for validation.
Rate limits and guardrails that work in practice
Rate-limiting is most effective when it is scoped by endpointTag and identity, not only by IP. Expensive endpoints need stricter limits than cheap ones.
Add token-based limits (tokens/minute) for endpoints where a single request can be very expensive.
- Per-endpointTag limits (protect high-cost routes).
- Per-identity limits (userId/tenantId or anon hash) to stop single-actor drain.
- Tokens/minute caps for public endpoints (prevents long-output abuse).
- Burst protection: short-window spike detection + temporary throttle.
Prevention
- Use per-endpoint thresholds and alerts.
- Segment test/demo traffic from real traffic.
- Review top user/tenant concentration regularly.
Key security basics (prevents the worst incidents)
- Never ship provider keys to the client (server-side only).
- Add secret scanning in CI and rotate keys on a schedule.
- Separate keys by environment (prod vs staging) to contain blast radius.
- Use least-privilege credentials and monitor for unusual key usage.
Important note
No single rule catches all abuse. Combine concentration metrics, threshold alerts, and incident runbooks.
What to alert on
- request burst with low identity diversity
- token-per-request surge without feature traffic growth
- retry ratio increase without an upstream outage explanation
- new high-cost endpointTag suddenly dominating spend
Execution checklist
- Confirm abuse signal: burst, key leak, prompt injection, or scraping.
- Rotate compromised keys and block abusive sources immediately.
- Apply per-endpoint rate limits and output caps to contain spend.
- Document dominant endpointTag, tenant/user concentration, and time window.
- Convert the incident into one permanent guardrail update.
FAQ
What is the fastest signal that bot abuse is causing spend?
Low identity diversity combined with a sudden endpointTag concentration change is the fastest indicator. If one endpointTag and a narrow actor set dominate spend in a short window, treat it as an abuse incident until proven otherwise.
Should we block traffic immediately when we detect abuse?
Containment should be immediate for public or non-critical endpoints. For critical paths, start with throttles and degraded mode, then escalate to blocks once the dominant driver is confirmed.
How do we prevent key-leak incidents from repeating?
Move keys server-side, enable secret scanning, rotate keys regularly, and monitor key-level usage anomalies. Treat leaked keys as a security incident with a written postmortem and permanent controls.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.