Implementation guide
Track OpenAI usage per user, endpoint, and prompt version
Totals are not enough. Effective OpenAI cost monitoring needs request-level fields to explain what caused the bill and which deploy changed cost.
Full guide: LLM cost attribution: endpoint, prompt version, tenant, and user
Why totals fail in production
Monthly totals answer how much you spent, not why you spent it.
Production teams usually need three dimensions first: endpoint, tenant/user, and promptVersion.
Minimal telemetry fields for OpenAI cost monitoring
- externalRequestId (retry-safe idempotency key)
- provider, model, endpointTag, promptVersion
- userId (recommended to track OpenAI usage per user, hash when needed)
- inputTokens, outputTokens, totalTokens, latencyMs, status
- dataMode and environment for real/test/demo separation
What OpenAI dashboards and APIs do (and do not) solve
Provider dashboards are great for reconciliation: they tell you what you owe for a time window.
They are usually not enough for root cause: your bill rarely maps cleanly to your internal features, endpoints, and deploys without app-level tags.
- Use provider totals for accounting and invoice reconciliation.
- Use endpointTag + promptVersion + user/tenant mapping for operational ownership.
- Use externalRequestId to connect retries, failures, and multi-call workflows.
Dashboard workflow to find the cost driver
- Start in Overview to confirm spend spike window.
- Check Top Endpoints for feature-level concentration.
- Check Top Users to find tenant concentration.
- Check Prompt Versions for post-deploy cost/request drift.
Implementation notes that prevent noise
- Keep externalRequestId stable on retries.
- Use consistent endpointTag naming per feature path.
- Treat userId as optional and privacy-safe metadata.
- Use dataMode=test for synthetic traffic to protect real dashboards.
Identity mapping for per-user cost tracking (without privacy mistakes)
Per-user reporting only works when identity rules are stable. Treat userId and tenantId as routing metadata, not personal data.
Decide how you will represent service accounts, internal tooling, and anonymous traffic so your dashboards do not mix categories.
- Hash userId/tenantId (or use internal IDs) when privacy requires it.
- Bucket anonymous traffic (for example anon_ip_hash or session cohort).
- Tag service accounts explicitly so they do not hide inside “Top Users”.
- Normalize merges/renames so historical cost stays attributable.
PromptVersion strategy that makes cost regressions explainable
PromptVersion is your deploy signal. Without it, cost drift looks like random variance.
Bump the version whenever anything that can change spend changes: system prompt, tool policy, routing logic, retrieval config, or output formatting rules.
- Add a promptVersion constant per endpointTag and bump on deploy.
- Record short change notes (what changed and why).
- Alert on cost/request and tokens/request deltas after a version change.
- Keep rollback easy: one version change should be reversible.
Common pitfalls (the ones that break attribution)
- Generating a new externalRequestId per retry attempt (inflates volume).
- Changing endpointTag naming and losing historical continuity.
- Mixing staging/demo traffic with production baselines.
- Letting fallback routing change silently without promptVersion bumps.
What to alert on
- cost/request drift after a promptVersion change
- tokens/request increase (input or output) on a single endpointTag
- endpointTag concentration shift (one feature dominates spend)
- tenant/user concentration shift (one customer dominates spend)
- retry multiplier increase (errors + repeated externalRequestId)
Execution checklist
- Define a stable endpointTag taxonomy and document naming rules.
- Implement promptVersion tagging and bump versions on deploy changes.
- Adopt externalRequestId and keep it stable across retries and multi-call flows.
- Add privacy-safe userId/tenantId mapping rules (hashing, service accounts, anonymous buckets).
- Validate dashboards weekly and tune alerts on the top 3 cost drivers.
FAQ
Can I track OpenAI usage per user without storing personal data?
Yes. Use stable hashed identifiers or internal IDs for userId/tenantId, and document how you represent service accounts and anonymous traffic. The goal is consistent attribution, not collecting PII.
What is the minimum I need for reliable OpenAI cost monitoring?
At minimum: endpointTag, promptVersion, token counts, and a retry-safe externalRequestId. Add user/tenant mapping when you need unit economics and customer-level concentration analysis.
Why does the same incident keep happening after we “fix the prompt”?
Because the driver is often volume, retries, abuse, or routing drift. Fixes stick when you add one permanent guardrail: caps, endpoint-level limits, budgets with an owner, and promptVersion-based regression checks.
Do I need a proxy/gateway to do per-endpoint tracking?
No. You can start with no-proxy telemetry: attach endpointTag/promptVersion/user context in your app and send direct ingest events. Add a gateway later only if you need centralized runtime enforcement or provider routing.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.