Security operations
Abuse monitoring: prompt-injection traffic and cost-risk signals
Not every injection attempt is a security breach, but many create unnecessary token burn. Treat abuse as both security and margin risk.
Full guide: Bot attacks and LLM cost spikes: prevention playbook
Signals that usually appear together
- Unknown-user request ratio spikes.
- Single endpoint bursts with long output tokens.
- Retry volume increases after provider errors.
- Rapid promptVersion churn around exploit attempts.
Why prompt-injection traffic becomes a cost incident
Many injection attempts fail from a security perspective but still burn tokens: attackers prompt for long outputs, repeated retries, or tool-call loops.
If the endpoint is public, a small actor set can create disproportionate spend without ever authenticating as a real customer.
- Long prompts and long completions inflate token spend directly.
- Retries amplify cost (calls per action), especially during provider errors.
- Tool calls and agent loops multiply tokens across multiple requests.
Response pattern
- Throttle suspicious API keys or routes.
- Tighten max-token settings on vulnerable features.
- Review ingress logs with security owner and product owner.
- Document policy updates in incident notes.
Detection heuristics (add these to your dashboards)
- Unknown-user ratio and unknown-user spend share per endpointTag.
- tokens/request (input + output) drift by endpointTag and promptVersion.
- Error rate + retry multiplier (same externalRequestId repeating).
- Tenant/user concentration shift (top 1 and top 5 share).
- Outlier sampling: top 20 highest-token requests per day.
Controls that reduce prompt-injection cost impact
- Cap output tokens for public-facing endpoints.
- Block repeated high-token prompts from low-trust identities.
- Rate-limit by endpointTag (not just globally) to protect expensive routes.
- Separate demo/test traffic from real so anomaly baselines stay clean.
- Alert on unknown-user concentration and sudden token-per-request drift.
Preventative design (reduce blast radius)
- Require authentication for expensive endpoints; keep public endpoints low-cost by design.
- Gate tool calls behind allowlists and strict schemas (avoid unconstrained tool loops).
- Use degraded mode on public routes: shorter outputs, fewer tools, smaller context.
- Version prompt policies (promptVersion) so changes are attributable during incidents.
What to send (payload example)
{
"externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
"provider": "provider_id",
"model": "model_id",
"endpointTag": "public.chat",
"promptVersion": "public_v1",
"userId": "anon_ip_hash",
"inputTokens": 260,
"outputTokens": 190,
"latencyMs": 892,
"status": "success",
"dataMode": "real",
"environment": "prod"
}Common mistakes
- Treating abuse as purely a security issue and ignoring cost-risk signals.
- Relying on global rate limits instead of endpointTag-scoped protection.
- Not separating demo/test traffic, then tuning thresholds on noisy baselines.
- Only monitoring totals instead of concentration and outlier patterns.
- Missing externalRequestId, so retries and loops cannot be measured.
How to verify in Opsmeter Dashboard
- In Top Endpoints, find the public endpointTag with the largest spend delta.
- In Top Users, check unknown-user concentration and top tenant share changes.
- Compare tokens/request and error rate in the spike window vs baseline.
- Sample the highest-token requests and confirm injection/retry/tool-loop signatures.
- Apply endpoint-scoped caps and throttles, then verify cost/request returns to baseline.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.