Cost attribution

Ops guideBOFU profile

OpenAI Cost per Endpoint Guide

Endpoint-level cost is where product decisions happen. Use a normalized request model and include retry overhead to avoid false conclusions.

Published: 2026-02-24Updated: 2026-02-26

OpenAIOperations

Full guide: LLM cost attribution: endpoint, prompt version, tenant, and user

What this guide answers

What changed in cost, cost per request, or budget posture.
Which endpoint, prompt, model, or tenant likely drove the delta.
Which validation step or control to apply next in Opsmeter.io.

What to alert on

cost/request drift by endpointTag or promptVersion
unexpected tenant concentration in Top Users
request burst with falling success ratio
budget warning, spend-alert, and exceeded state transitions

Execution checklist

Confirm spike type: volume, token, deploy, or abuse signal.
Assign one incident owner and one communication channel.
Apply immediate containment before deep optimization.
Document the dominant endpoint, tenant, and promptVersion driver.
Convert findings into one permanent guardrail update.

Computation model

Map provider usage to input/output token fields.
Attach endpointTag and promptVersion per request.
Include retry attempts in effective cost per successful request.
Aggregate by endpoint and tenant for ownership reviews.

Use this workflow

Turn diagnosis into action

Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.

Apply in your workspace

Re-run this workflow on your own spend data

Follow the same path from article insight to telemetry verification, then validate with your own cost signals.

Quickstart pathSend a first payload, confirm attribution, then return here for operations context.Open quickstart

Evaluation pathPair this guide with trust proof, status, and compare surfaces during review.Open trust proof pack

Per-request cost formula (pricing table + token usage)

Do not hardcode costs into app logic. Use a versioned pricing table (effective dates) and compute request cost from measured usage.

The baseline formula is: cost = inputTokens * inputPrice + outputTokens * outputPrice (plus any additional token classes your provider bills).

Keep price snapshots immutable by effective date (audit-safe).
Separate input vs output token rates (they often differ).
Treat unknown-model requests as a queue to resolve, not as "other".

Retries, fallbacks, and success-adjusted endpoint cost

A cheaper attempt can still be more expensive if it takes more attempts to succeed. For endpoint ownership, track effective cost per successful request.

Roll retries and fallback calls into the same externalRequestId so one logical user action has one traceable cost.

Reuse externalRequestId across retries for the same logical request.
Record attempt number and final status (success/failure).
Compute attempts-per-success by endpointTag and model.
Alert when retries rise even if token price stays constant.

Common implementation errors

Mixing test/demo rows with real traffic.
Missing externalRequestId reuse across retries.
Ignoring unknown-model rows in endpoint totals.

Example endpoint rollup (what the report should show)

EndpointTag: support.reply
Volume: request count + successful request count
Cost: total cost + effective cost per successful request
Tokens: avgInputTokens vs avgOutputTokens + p95 outliers
Drivers: top tenants/users + promptVersion deltas in the same window

What to report for decision-ready endpoint ownership

cost/request plus token/request (input vs output split)
top tenants and users driving the endpoint spend
promptVersion changes in the same window
retry ratio and fallback behavior that inflates "effective" cost
a baseline comparison period so changes are explainable

FAQ

Do we need per-user tracking for OpenAI cost monitoring?

It depends. If you are B2B, tenant-level tracking is usually the fastest path to margin control. Per-user tracking helps when you need granular abuse detection or internal chargeback.

Should we ignore unknown models until later?

No. Unknown-model rows break endpoint ownership reports. Treat unknown-model pricing as an operational queue and resolve it quickly so dashboards stay trustworthy.

What if token usage is missing for some requests?

Use provider usage when available. If it is missing, fall back to tokenizer estimates and flag those rows so audits and pricing decisions are not built on uncertain data.

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack