Unit economics
LLM cost per user: a practical guide to tracking and allocation
Cost per tenant is strategic, but cost per user is often the fastest way to identify skew, abuse, and pricing mismatches.
Full guide: LLM cost attribution: endpoint, prompt version, tenant, and user
When to track per user versus per tenant
- Use per user to catch heavy spenders and abuse patterns early.
- Use per tenant for pricing and contract-level margin decisions.
- Use both when one tenant includes many usage personas.
Implementation model
- Map userId to tenantId in your internal analytics layer.
- Tag each request with endpointTag and promptVersion.
- Compute spend per user and per tenant on fixed intervals.
- Review top concentration before monthly billing closes.
Per-user metrics engineers actually use
Per-user attribution becomes actionable when it is expressed as a rate and a unit metric, not only as a monthly total.
Add one “speed” metric (tokens/hour) and one “unit economics” metric (cost per active user or cost per seat) so spikes and skew are obvious.
- tokens/hour and requests/hour per user (burst + abuse detection)
- cost per active user (DAU/WAU cohort economics)
- cost per seat (internal tools and enterprise allocations)
- cost per outcome (tickets resolved, docs summarized, proposals generated)
Identity normalization (avoid misleading concentration)
- Use stable hashed userId (avoid PII in telemetry).
- Handle anonymous traffic separately (anon_id or ip_hash) so it does not pollute user cohorts.
- Detect shared service accounts and allocate them explicitly.
- Backfill tenantId mapping so finance reports match contracts.
Guardrails: per-user quotas, alerts, and rate limits
- Set soft thresholds first (alerts) for high-variance users.
- Add per-endpoint rate limits for expensive flows (endpointTag).
- Escalate to “degraded mode” when budgets warn (shorter outputs, fewer tools).
- Use hard blocks only for non-critical or abuse-prone endpoints with clear UX messaging.
Allocation pitfalls
- Missing identity normalization across auth providers.
- Shared service users distorting real concentration.
- Ignoring free-tier or internal test usage in cost reports.
- Treating unknown users as permanent instead of a cleanup queue.
Showback/chargeback (what finance expects)
Per-user reporting can support internal showback (visibility) or chargeback (cost allocation). The key is consistency and an audit trail.
Keep the mapping rules stable and document exceptions (service accounts, demos, staging).
- Showback: transparent reporting per team/user without invoicing.
- Chargeback: allocate cost to cost centers using stable identity mapping.
- Exceptions: document service accounts and internal tooling separately.
Operational output
Use cost-per-user reports for pricing experiments, feature-tiering decisions, and support policy updates.
What to send (payload example)
{
"externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
"provider": "provider_id",
"model": "model_id",
"endpointTag": "checkout.ai_summary",
"promptVersion": "summary_v3",
"userId": "tenant_acme_hash",
"inputTokens": 540,
"outputTokens": 180,
"latencyMs": 892,
"status": "success",
"dataMode": "real",
"environment": "prod"
}Common mistakes
- Missing endpointTag or using inconsistent naming across teams.
- Not tagging promptVersion, so deploys cannot be linked to spend changes.
- Sending raw user identifiers instead of hashed mapping for privacy.
- Mixing demo/test dataMode into production operational reviews.
How to verify in Opsmeter Dashboard
- Use Overview to confirm spike window and budget posture.
- Use Top Endpoints to find feature-level concentration.
- Use Top Users to find tenant-level concentration.
- Use Prompt Versions to validate deploy-linked cost drift.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.