Implementation guide

Ops guideBOFU profile

Track OpenAI usage per user, endpoint, and prompt version

Totals are not enough. Effective OpenAI cost monitoring needs request-level fields to explain what caused the bill and which deploy changed cost.

Published: 2026-02-20Updated: 2026-02-26

OpenAIAttributionPrompt versions

Full guide: LLM cost attribution: endpoint, prompt version, tenant, and user

What this guide answers

What changed in cost, cost per request, or budget posture.
Which endpoint, prompt, model, or tenant likely drove the delta.
Which validation step or control to apply next in Opsmeter.io.

What to alert on

cost/request drift after a promptVersion change
tokens/request increase (input or output) on a single endpointTag
endpointTag concentration shift (one feature dominates spend)
tenant/user concentration shift (one customer dominates spend)
retry multiplier increase (errors + repeated externalRequestId)

Execution checklist

Define a stable endpointTag taxonomy and document naming rules.
Implement promptVersion tagging and bump versions on deploy changes.
Adopt externalRequestId and keep it stable across retries and multi-call flows.
Add privacy-safe userId/tenantId mapping rules (hashing, service accounts, anonymous buckets).
Validate dashboards weekly and tune alerts on the top 3 cost drivers.

Why totals fail in production

Monthly totals answer how much you spent, not why you spent it.

Production teams usually need three dimensions first: endpoint, tenant/user, and promptVersion.

Use this workflow

Turn diagnosis into action

Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.

Apply in your workspace

Re-run this workflow on your own spend data

Follow the same path from article insight to telemetry verification, then validate with your own cost signals.

Quickstart pathSend a first payload, confirm attribution, then return here for operations context.Open quickstart

Evaluation pathPair this guide with trust proof, status, and compare surfaces during review.Open trust proof pack

Minimal telemetry fields for OpenAI cost monitoring

externalRequestId (retry-safe idempotency key)
provider, model, endpointTag, promptVersion
userId (recommended to track OpenAI usage per user, hash when needed)
inputTokens, outputTokens, totalTokens, latencyMs, status
dataMode and environment for real/test/demo separation

What OpenAI dashboards and APIs do (and do not) solve

Provider dashboards are great for reconciliation: they tell you what you owe for a time window.

They are usually not enough for root cause: your bill rarely maps cleanly to your internal features, endpoints, and deploys without app-level tags.

Use provider totals for accounting and invoice reconciliation.
Use endpointTag + promptVersion + user/tenant mapping for operational ownership.
Use externalRequestId to connect retries, failures, and multi-call workflows.

Dashboard workflow to find the cost driver

Start in Overview to confirm spend spike window.
Check Top Endpoints for feature-level concentration.
Check Top Users to find tenant concentration.
Check Prompt Versions for post-deploy cost/request drift.

Implementation notes that prevent noise

Keep externalRequestId stable on retries.
Use consistent endpointTag naming per feature path.
Treat userId as optional and privacy-safe metadata.
Use dataMode=test for synthetic traffic to protect real dashboards.

Identity mapping for per-user cost tracking (without privacy mistakes)

Per-user reporting only works when identity rules are stable. Treat userId and tenantId as routing metadata, not personal data.

Decide how you will represent service accounts, internal tooling, and anonymous traffic so your dashboards do not mix categories.

Hash userId/tenantId (or use internal IDs) when privacy requires it.
Bucket anonymous traffic (for example anon_ip_hash or session cohort).
Tag service accounts explicitly so they do not hide inside “Top Users”.
Normalize merges/renames so historical cost stays attributable.

PromptVersion strategy that makes cost regressions explainable

PromptVersion is your deploy signal. Without it, cost drift looks like random variance.

Bump the version whenever anything that can change spend changes: system prompt, tool policy, routing logic, retrieval config, or output formatting rules.

Add a promptVersion constant per endpointTag and bump on deploy.
Record short change notes (what changed and why).
Alert on cost/request and tokens/request deltas after a version change.
Keep rollback easy: one version change should be reversible.

Common pitfalls (the ones that break attribution)

Generating a new externalRequestId per retry attempt (inflates volume).
Changing endpointTag naming and losing historical continuity.
Mixing staging/demo traffic with production baselines.
Letting fallback routing change silently without promptVersion bumps.

FAQ

Can I track OpenAI usage per user without storing personal data?

Yes. Use stable hashed identifiers or internal IDs for userId/tenantId, and document how you represent service accounts and anonymous traffic. The goal is consistent attribution, not collecting PII.

What is the minimum I need for reliable OpenAI cost monitoring?

At minimum: endpointTag, promptVersion, token counts, and a retry-safe externalRequestId. Add user/tenant mapping when you need unit economics and customer-level concentration analysis.

Why does the same incident keep happening after we “fix the prompt”?

Because the driver is often volume, retries, abuse, or routing drift. Fixes stick when you add one permanent guardrail: caps, endpoint-level limits, budgets with an owner, and promptVersion-based regression checks.

Do I need a proxy/gateway to do per-endpoint tracking?

No. You can start with no-proxy telemetry: attach endpointTag/promptVersion/user context in your app and send direct ingest events. Add a gateway later only if you need centralized runtime enforcement or provider routing.

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack