Opsmeter logo
Opsmeter
AI Cost & Inference Control

Cost attribution

OpenAI cost per endpoint: how to compute cost per request correctly

Endpoint-level cost is where product decisions happen. Use a normalized request model and include retry overhead to avoid false conclusions.

OpenAIOperations

Full guide: LLM cost attribution: endpoint, prompt version, tenant, and user

Computation model

  • Map provider usage to input/output token fields.
  • Attach endpointTag and promptVersion per request.
  • Include retry attempts in effective cost per successful request.
  • Aggregate by endpoint and tenant for ownership reviews.

Per-request cost formula (pricing table + token usage)

Do not hardcode costs into app logic. Use a versioned pricing table (effective dates) and compute request cost from measured usage.

The baseline formula is: cost = inputTokens * inputPrice + outputTokens * outputPrice (plus any additional token classes your provider bills).

  • Keep price snapshots immutable by effective date (audit-safe).
  • Separate input vs output token rates (they often differ).
  • Treat unknown-model requests as a queue to resolve, not as "other".

Retries, fallbacks, and success-adjusted endpoint cost

A cheaper attempt can still be more expensive if it takes more attempts to succeed. For endpoint ownership, track effective cost per successful request.

Roll retries and fallback calls into the same externalRequestId so one logical user action has one traceable cost.

  1. Reuse externalRequestId across retries for the same logical request.
  2. Record attempt number and final status (success/failure).
  3. Compute attempts-per-success by endpointTag and model.
  4. Alert when retries rise even if token price stays constant.

Common implementation errors

  1. Mixing test/demo rows with real traffic.
  2. Missing externalRequestId reuse across retries.
  3. Ignoring unknown-model rows in endpoint totals.

Example endpoint rollup (what the report should show)

  • EndpointTag: support.reply
  • Volume: request count + successful request count
  • Cost: total cost + effective cost per successful request
  • Tokens: avgInputTokens vs avgOutputTokens + p95 outliers
  • Drivers: top tenants/users + promptVersion deltas in the same window

What to report for decision-ready endpoint ownership

  • cost/request plus token/request (input vs output split)
  • top tenants and users driving the endpoint spend
  • promptVersion changes in the same window
  • retry ratio and fallback behavior that inflates "effective" cost
  • a baseline comparison period so changes are explainable

What to alert on

  • cost/request drift by endpointTag or promptVersion
  • unexpected tenant concentration in Top Users
  • request burst with falling success ratio
  • budget warning, spend-alert, and exceeded state transitions

Execution checklist

  1. Confirm spike type: volume, token, deploy, or abuse signal.
  2. Assign one incident owner and one communication channel.
  3. Apply immediate containment before deep optimization.
  4. Document the dominant endpoint, tenant, and promptVersion driver.
  5. Convert findings into one permanent guardrail update.

FAQ

Do we need per-user tracking for OpenAI cost monitoring?

It depends. If you are B2B, tenant-level tracking is usually the fastest path to margin control. Per-user tracking helps when you need granular abuse detection or internal chargeback.

Should we ignore unknown models until later?

No. Unknown-model rows break endpoint ownership reports. Treat unknown-model pricing as an operational queue and resolve it quickly so dashboards stay trustworthy.

What if token usage is missing for some requests?

Use provider usage when available. If it is missing, fall back to tokenizer estimates and flag those rows so audits and pricing decisions are not built on uncertain data.

Related guides

Open cost-per-call pillarStart freeCompare alternatives

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack