Pillar

PillarMOFU profile

LLM cost attribution: endpoint, prompt version, tenant, and user

This is the core authority page for cost attribution workflows. Use it as the entry point for endpoint, tenant, and prompt-level cost analysis.

Published: 2026-02-24Updated: 2026-02-26

PillarAttributionCore

Opsmeter.io dashboard preview with endpoint attribution and budget posture. — Example decision surface: top drivers, budget posture, and request-level attribution.

What this guide answers

What category of cost or governance problem this topic solves.
Which request-level signals matter most when diagnosing it.
Which follow-up guide or control workflow to apply next.

What to send (payload example)

{
  "externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
  "provider": "provider_id",
  "model": "model_id",
  "endpointTag": "checkout.ai_summary",
  "promptVersion": "summary_v3",
  "userId": "tenant_acme_hash",
  "inputTokens": 540,
  "outputTokens": 180,
  "latencyMs": 892,
  "status": "success",
  "dataMode": "real",
  "environment": "prod"
}

Common mistakes

Missing endpointTag or using inconsistent naming across teams.
Not tagging promptVersion, so deploys cannot be linked to spend changes.
Sending raw user identifiers instead of hashed mapping for privacy.
Mixing demo/test dataMode into production operational reviews.

How to verify in the Opsmeter.io dashboard

Use Overview to confirm spike window and budget posture.
Use Top Endpoints to find feature-level concentration.
Use Top Users to find tenant-level concentration.
Use Prompt Versions to validate deploy-linked cost drift.

What LLM cost attribution means in production

Attribution connects spend changes to the exact feature path, customer segment, and prompt deploy that caused the bill change.

Totals alone are accounting views. Attribution is an operations workflow.

Use this workflow

Turn diagnosis into action

Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.

Apply in your workspace

Re-run this workflow on your own spend data

Follow the same path from article insight to telemetry verification, then validate with your own cost signals.

Quickstart pathSend a first payload, confirm attribution, then return here for operations context.Open quickstart

Evaluation pathPair this guide with trust proof, status, and compare surfaces during review.Open trust proof pack

Required dimensions

endpointTag for feature-level cost concentration
promptVersion for deploy-linked regressions
userId and tenant mapping for customer-level unit economics
externalRequestId for retry-safe correlation

Minimal request-level schema (fields and why they matter)

Attribution gets dramatically easier when every request event carries a small, stable set of fields.

Keep identifiers stable and version anything that can change spend. That is what makes incidents explainable in minutes.

endpointTag (e.g., checkout.ai_summary): feature-level ownership and concentration analysis.
promptVersion (e.g., summary_v3): correlate cost/request drift to deploy changes.
externalRequestId (e.g., req_2026_02_26_001): retry-safe correlation (measure multipliers).
tenantId (hashed): customer-level unit economics and margin protection.
userId (hashed): per-user spend and abuse/unknown-user concentration.
provider/model (e.g., openai / gpt-4o-mini): model mix and pricing accuracy.
input/output tokens: cost math and regression signals (tokens/request).
latency/status: reliability-driven cost multipliers (timeouts/retries).

Where teams use this pillar

Incident response for sudden cost spikes
Weekly spend ownership reviews by feature team
Pricing and margin decisions per tenant
Post-deploy prompt regression checks

Attribution vs totals: why bills still feel mysterious

Most provider dashboards answer “how much” you spent, not “what changed” or “who owns it”. Totals-only views make cost management reactive.

Attribution turns LLM cost tracking into an operational loop: detect drift, isolate the driver, and ship one guardrail so the same incident does not repeat.

A practical tagging taxonomy (that stays stable as you scale)

The goal is consistency, not perfection. A small, stable schema beats a complex one that teams cannot maintain.

Use endpointTag as the product/feature label. Use promptVersion as the deployment label. Use tenant/user mapping as the commercial label.

endpointTag: feature ownership (e.g., checkout.ai_summary, support.reply)
promptVersion: deploy accountability (e.g., summary_v3, support_v5)
tenantId/userId: unit economics and concentration analysis (hash if needed)
environment and dataMode: keep staging/demo out of production decisioning

Investigation order for cost spikes (15 minutes, repeatable)

Confirm the spike window in Overview (trend + delta).
Rank spend by endpointTag to find the dominant feature driver.
Rank spend by tenant/user to identify concentration or abuse patterns.
Compare cost/request and token deltas by promptVersion in the same window.
Contain first (caps, throttles, routing) and then optimize prompts.

Unit economics outputs that become decision-ready

Attribution is only useful when it produces an output that teams can act on. Tie every view to a decision: pricing, limits, routing, or rollback.

A good default is to track one “per X” metric for each stakeholder: engineering (per endpoint), finance (per tenant), and product (per workflow).

cost per API call by endpointTag (feature ownership)
cost per tenant / account (margin protection)
cost per workflow outcome (ticket resolved, email sent, proposal generated)
promptVersion regressions (deploy-time cost drift)

Privacy-safe identity mapping

You do not need to send PII to get per-user visibility. Use stable hashed identifiers and keep the lookup in your own system.

What matters is stability: the same logical user/tenant must map to the same identifier so concentration and lifecycle trends are real.

Hash user identifiers (or map to internal numeric IDs).
Send tenantId when you have multi-tenant economics.
Keep the same externalRequestId across retries for one logical request.

Data quality gates before you trust cost numbers

Unknown-model ratio is below your internal threshold.
Demo/test traffic is separated from real traffic (dataMode).
endpointTag coverage is high (few untagged “other” requests).
promptVersion is present on deploy-controlled endpoints.
Daily aggregates reconcile with spot-checked request rows.

Templates

Weekly AI spend ownership review (template)

# Weekly AI spend review

Window (UTC):
Workspace:

Top drivers (endpointTag):
- 

Top drivers (tenant/user):
- 

Deploy correlation:
- promptVersion changes in window:
- cost/request deltas:

Decisions:
- containment (caps/throttles/routing):
- follow-up owner + ETA:

One durable control to add this week:
-

Cost spike triage note (template)

# Cost spike triage

Start time (UTC):
Baseline window:

Hypothesis (pick one): volume / efficiency / routing / retries / abuse

Evidence:
- requests/hour:
- tokens/request (in/out):
- cost/request:
- top endpointTag:
- top tenant/user:
- promptVersion correlation:

Containment taken:
- 

Next verification step:
-

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack