Unit economics

Ops guideMOFU profile

LLM cost per user: a practical guide to tracking and allocation

Cost per tenant is strategic, but cost per user is often the fastest way to identify skew, abuse, and pricing mismatches.

Published: 2026-02-24Updated: 2026-02-26

UsersTenantsMargin

Full guide: LLM cost attribution: endpoint, prompt version, tenant, and user

What this guide answers

What changed in cost, cost per request, or budget posture.
Which endpoint, prompt, model, or tenant likely drove the delta.
Which validation step or control to apply next in Opsmeter.io.

What to send (payload example)

{
  "externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
  "provider": "provider_id",
  "model": "model_id",
  "endpointTag": "checkout.ai_summary",
  "promptVersion": "summary_v3",
  "userId": "tenant_acme_hash",
  "inputTokens": 540,
  "outputTokens": 180,
  "latencyMs": 892,
  "status": "success",
  "dataMode": "real",
  "environment": "prod"
}

Common mistakes

Missing endpointTag or using inconsistent naming across teams.
Not tagging promptVersion, so deploys cannot be linked to spend changes.
Sending raw user identifiers instead of hashed mapping for privacy.
Mixing demo/test dataMode into production operational reviews.

How to verify in the Opsmeter.io dashboard

Use Overview to confirm spike window and budget posture.
Use Top Endpoints to find feature-level concentration.
Use Top Users to find tenant-level concentration.
Use Prompt Versions to validate deploy-linked cost drift.

When to track per user versus per tenant

Use per user to catch heavy spenders and abuse patterns early.
Use per tenant for pricing and contract-level margin decisions.
Use both when one tenant includes many usage personas.

Use this workflow

Turn diagnosis into action

Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.

Apply in your workspace

Re-run this workflow on your own spend data

Follow the same path from article insight to telemetry verification, then validate with your own cost signals.

Quickstart pathSend a first payload, confirm attribution, then return here for operations context.Open quickstart

Evaluation pathPair this guide with trust proof, status, and compare surfaces during review.Open trust proof pack

Implementation model

Map userId to tenantId in your internal analytics layer.
Tag each request with endpointTag and promptVersion.
Compute spend per user and per tenant on fixed intervals.
Review top concentration before monthly billing closes.

Per-user metrics engineers actually use

Per-user attribution becomes actionable when it is expressed as a rate and a unit metric, not only as a monthly total.

Add one “speed” metric (tokens/hour) and one “unit economics” metric (cost per active user or cost per seat) so spikes and skew are obvious.

tokens/hour and requests/hour per user (burst + abuse detection)
cost per active user (DAU/WAU cohort economics)
cost per seat (internal tools and enterprise allocations)
cost per outcome (tickets resolved, docs summarized, proposals generated)

Identity normalization (avoid misleading concentration)

Use stable hashed userId (avoid PII in telemetry).
Handle anonymous traffic separately (anon_id or ip_hash) so it does not pollute user cohorts.
Detect shared service accounts and allocate them explicitly.
Backfill tenantId mapping so finance reports match contracts.

Guardrails: per-user quotas, alerts, and rate limits

Set soft thresholds first (alerts) for high-variance users.
Add per-endpoint rate limits for expensive flows (endpointTag).
Escalate to “degraded mode” when budgets warn (shorter outputs, fewer tools).
Use hard blocks only for non-critical or abuse-prone endpoints with clear UX messaging.

Allocation pitfalls

Missing identity normalization across auth providers.
Shared service users distorting real concentration.
Ignoring free-tier or internal test usage in cost reports.
Treating unknown users as permanent instead of a cleanup queue.

Showback/chargeback (what finance expects)

Per-user reporting can support internal showback (visibility) or chargeback (cost allocation). The key is consistency and an audit trail.

Keep the mapping rules stable and document exceptions (service accounts, demos, staging).

Showback: transparent reporting per team/user without invoicing.
Chargeback: allocate cost to cost centers using stable identity mapping.
Exceptions: document service accounts and internal tooling separately.

Operational output

Use cost-per-user reports for pricing experiments, feature-tiering decisions, and support policy updates.

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack