Opsmeter logo
Opsmeter
AI Cost & Inference Control

Abuse protection

Bot abuse on LLM endpoints: stop fraudulent spend fast

Bot abuse is often visible as concentration anomalies. Fast identification of who/where patterns is the key control.

SecurityAbuseCost spikes

Full guide: Bot attacks and LLM cost spikes: prevention playbook

Abuse indicators

  • Traffic burst with low identity diversity
  • High cost concentration on one endpointTag
  • Abnormal request cadence from a narrow actor set
  • Rapid spend increase without related product events

Common bot abuse patterns (what it looks like)

  • Scraping and prompt injection attempts on public chat endpoints.
  • Credential stuffing / token theft leading to authenticated abuse.
  • Key leaks from client-side code or logs (sudden spend from unknown sources).
  • Enumeration attacks that probe expensive endpoints repeatedly.
  • Retry amplification (bots trigger timeouts and multiply attempts).

Response workflow

  1. Isolate suspicious endpoints and actor patterns.
  2. Apply temporary limits and stricter auth checks.
  3. Route non-critical calls through lower-cost model tier.
  4. Watch post-containment spend trend for validation.

Rate limits and guardrails that work in practice

Rate-limiting is most effective when it is scoped by endpointTag and identity, not only by IP. Expensive endpoints need stricter limits than cheap ones.

Add token-based limits (tokens/minute) for endpoints where a single request can be very expensive.

  • Per-endpointTag limits (protect high-cost routes).
  • Per-identity limits (userId/tenantId or anon hash) to stop single-actor drain.
  • Tokens/minute caps for public endpoints (prevents long-output abuse).
  • Burst protection: short-window spike detection + temporary throttle.

Prevention

  • Use per-endpoint thresholds and alerts.
  • Segment test/demo traffic from real traffic.
  • Review top user/tenant concentration regularly.

Key security basics (prevents the worst incidents)

  • Never ship provider keys to the client (server-side only).
  • Add secret scanning in CI and rotate keys on a schedule.
  • Separate keys by environment (prod vs staging) to contain blast radius.
  • Use least-privilege credentials and monitor for unusual key usage.

Important note

No single rule catches all abuse. Combine concentration metrics, threshold alerts, and incident runbooks.

What to alert on

  • request burst with low identity diversity
  • token-per-request surge without feature traffic growth
  • retry ratio increase without an upstream outage explanation
  • new high-cost endpointTag suddenly dominating spend

Execution checklist

  1. Confirm abuse signal: burst, key leak, prompt injection, or scraping.
  2. Rotate compromised keys and block abusive sources immediately.
  3. Apply per-endpoint rate limits and output caps to contain spend.
  4. Document dominant endpointTag, tenant/user concentration, and time window.
  5. Convert the incident into one permanent guardrail update.

FAQ

What is the fastest signal that bot abuse is causing spend?

Low identity diversity combined with a sudden endpointTag concentration change is the fastest indicator. If one endpointTag and a narrow actor set dominate spend in a short window, treat it as an abuse incident until proven otherwise.

Should we block traffic immediately when we detect abuse?

Containment should be immediate for public or non-critical endpoints. For critical paths, start with throttles and degraded mode, then escalate to blocks once the dominant driver is confirmed.

How do we prevent key-leak incidents from repeating?

Move keys server-side, enable secret scanning, rotate keys regularly, and monitor key-level usage anomalies. Treat leaked keys as a security incident with a written postmortem and permanent controls.

Related guides

Open AI cost spike pageStart freeCompare alternatives

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack