Opsmeter logo
Opsmeter
AI Cost & Inference Control

Pillar

LLM cost attribution: endpoint, prompt version, tenant, and user

This is the core authority page for cost attribution workflows. Use it as the entry point for endpoint, tenant, and prompt-level cost analysis.

PillarAttributionCore

What LLM cost attribution means in production

Attribution connects spend changes to the exact feature path, customer segment, and prompt deploy that caused the bill change.

Totals alone are accounting views. Attribution is an operations workflow.

Required dimensions

  • endpointTag for feature-level cost concentration
  • promptVersion for deploy-linked regressions
  • userId and tenant mapping for customer-level unit economics
  • externalRequestId for retry-safe correlation

Minimal request-level schema (fields and why they matter)

Attribution gets dramatically easier when every request event carries a small, stable set of fields.

Keep identifiers stable and version anything that can change spend. That is what makes incidents explainable in minutes.

  • endpointTag (e.g., checkout.ai_summary): feature-level ownership and concentration analysis.
  • promptVersion (e.g., summary_v3): correlate cost/request drift to deploy changes.
  • externalRequestId (e.g., req_2026_02_26_001): retry-safe correlation (measure multipliers).
  • tenantId (hashed): customer-level unit economics and margin protection.
  • userId (hashed): per-user spend and abuse/unknown-user concentration.
  • provider/model (e.g., openai / gpt-4o-mini): model mix and pricing accuracy.
  • input/output tokens: cost math and regression signals (tokens/request).
  • latency/status: reliability-driven cost multipliers (timeouts/retries).

Where teams use this pillar

  1. Incident response for sudden cost spikes
  2. Weekly spend ownership reviews by feature team
  3. Pricing and margin decisions per tenant
  4. Post-deploy prompt regression checks

Attribution vs totals: why bills still feel mysterious

Most provider dashboards answer “how much” you spent, not “what changed” or “who owns it”. Totals-only views make cost management reactive.

Attribution turns LLM cost tracking into an operational loop: detect drift, isolate the driver, and ship one guardrail so the same incident does not repeat.

A practical tagging taxonomy (that stays stable as you scale)

The goal is consistency, not perfection. A small, stable schema beats a complex one that teams cannot maintain.

Use endpointTag as the product/feature label. Use promptVersion as the deployment label. Use tenant/user mapping as the commercial label.

  • endpointTag: feature ownership (e.g., checkout.ai_summary, support.reply)
  • promptVersion: deploy accountability (e.g., summary_v3, support_v5)
  • tenantId/userId: unit economics and concentration analysis (hash if needed)
  • environment and dataMode: keep staging/demo out of production decisioning

Investigation order for cost spikes (15 minutes, repeatable)

  1. Confirm the spike window in Overview (trend + delta).
  2. Rank spend by endpointTag to find the dominant feature driver.
  3. Rank spend by tenant/user to identify concentration or abuse patterns.
  4. Compare cost/request and token deltas by promptVersion in the same window.
  5. Contain first (caps, throttles, routing) and then optimize prompts.

Unit economics outputs that become decision-ready

Attribution is only useful when it produces an output that teams can act on. Tie every view to a decision: pricing, limits, routing, or rollback.

A good default is to track one “per X” metric for each stakeholder: engineering (per endpoint), finance (per tenant), and product (per workflow).

  • cost per API call by endpointTag (feature ownership)
  • cost per tenant / account (margin protection)
  • cost per workflow outcome (ticket resolved, email sent, proposal generated)
  • promptVersion regressions (deploy-time cost drift)

Privacy-safe identity mapping

You do not need to send PII to get per-user visibility. Use stable hashed identifiers and keep the lookup in your own system.

What matters is stability: the same logical user/tenant must map to the same identifier so concentration and lifecycle trends are real.

  • Hash user identifiers (or map to internal numeric IDs).
  • Send tenantId when you have multi-tenant economics.
  • Keep the same externalRequestId across retries for one logical request.

Data quality gates before you trust cost numbers

  1. Unknown-model ratio is below your internal threshold.
  2. Demo/test traffic is separated from real traffic (dataMode).
  3. endpointTag coverage is high (few untagged “other” requests).
  4. promptVersion is present on deploy-controlled endpoints.
  5. Daily aggregates reconcile with spot-checked request rows.

What to send (payload example)

{
  "externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
  "provider": "provider_id",
  "model": "model_id",
  "endpointTag": "checkout.ai_summary",
  "promptVersion": "summary_v3",
  "userId": "tenant_acme_hash",
  "inputTokens": 540,
  "outputTokens": 180,
  "latencyMs": 892,
  "status": "success",
  "dataMode": "real",
  "environment": "prod"
}

Common mistakes

  • Missing endpointTag or using inconsistent naming across teams.
  • Not tagging promptVersion, so deploys cannot be linked to spend changes.
  • Sending raw user identifiers instead of hashed mapping for privacy.
  • Mixing demo/test dataMode into production operational reviews.

How to verify in Opsmeter Dashboard

  1. Use Overview to confirm spike window and budget posture.
  2. Use Top Endpoints to find feature-level concentration.
  3. Use Top Users to find tenant-level concentration.
  4. Use Prompt Versions to validate deploy-linked cost drift.

Templates

Weekly AI spend ownership review (template)

# Weekly AI spend review

Window (UTC):
Workspace:

Top drivers (endpointTag):
- 

Top drivers (tenant/user):
- 

Deploy correlation:
- promptVersion changes in window:
- cost/request deltas:

Decisions:
- containment (caps/throttles/routing):
- follow-up owner + ETA:

One durable control to add this week:
- 

Cost spike triage note (template)

# Cost spike triage

Start time (UTC):
Baseline window:

Hypothesis (pick one): volume / efficiency / routing / retries / abuse

Evidence:
- requests/hour:
- tokens/request (in/out):
- cost/request:
- top endpointTag:
- top tenant/user:
- promptVersion correlation:

Containment taken:
- 

Next verification step:
- 

Related guides

Start freeOpen quickstartCompare alternatives

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack