Opsmeter logo
Opsmeter
AI Cost & Inference Control

Pillar

OpenAI cost per API call: a production-ready method

Cost per call must include endpoint, prompt, and retry context. This pillar defines a production-grade method.

PillarOpenAIUnit economics

Calculation model

  • Map provider usage into normalized token fields
  • Attach endpointTag and promptVersion to each request
  • Include retry multiplier in effective request cost
  • Aggregate by period and ownership unit

Why cost per API call is harder than it sounds

Token totals alone do not explain margin. The same model can produce very different costs depending on context size, output verbosity, and retry behavior.

If you want reliable OpenAI cost monitoring, compute cost per call at the request level, then roll up by endpointTag, promptVersion, tenant, and user.

Formula and normalization rules

Cost per call should combine input, output, and model-specific pricing dimensions from the ingestion timestamp.

Normalize usage fields before aggregation so endpoint and tenant slices remain comparable.

  • Treat missing provider usage as explicit uncertainty.
  • Keep retry attempts mapped to one logical request when possible.
  • Use consistent endpointTag and promptVersion taxonomy.
  • Separate test/demo data from production reporting views.

A practical formula (provider pricing changes over time)

Avoid hardcoding prices into application logic. Keep prices in a versioned catalog and apply the effective rate for the request timestamp.

Use a variable-based formula so your workflow stays valid even when OpenAI changes pricing.

  • requestCost = (inputTokens * inputRate) + (outputTokens * outputRate)
  • effectiveCost = requestCost * retryMultiplier (when retries represent one logical request)
  • costPerCall(window, endpointTag) = sum(effectiveCost) / successfulCallCount

Include retries and fallbacks (hidden multipliers)

Retries can multiply spend even when user-visible outcomes look fine. Treat retry ratio as a cost multiplier and keep correlation stable.

If you run fallbacks (model swaps or secondary calls), record them explicitly so cost per call reflects the real workflow.

  • Reuse externalRequestId across retries for the same logical request.
  • Record status and latency so you can separate reliability incidents from prompt regressions.
  • Tag fallback routes so you can see when cheaper models increase total cost via retries.

Report the metric in the views that drive decisions

  • Cost per endpointTag (feature ownership and routing decisions)
  • Cost per user and per tenant (unit economics and pricing decisions)
  • Cost per promptVersion (deploy accountability)
  • Top outliers (tail requests) to catch hidden regressions

Quality gates before reporting

  1. Top endpoint spend share aligns with expected traffic profile.
  2. Unknown model ratio stays below your policy threshold.
  3. Retry-driven cost inflation is visible and explained.
  4. Daily aggregates reconcile with request-level spot checks.

Common mistakes that break cost per call

  1. Mixing demo/test dataMode into production views.
  2. Ignoring output token growth (verbosity drift) after prompt changes.
  3. Treating missing usage fields as zero.
  4. Not versioning model rates by effective date.
  5. Comparing endpoints without normalizing retries and fallbacks.

What to send (payload example)

{
  "externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
  "provider": "openai",
  "model": "gpt-4o-mini",
  "endpointTag": "checkout.ai_summary",
  "promptVersion": "summary_v3",
  "userId": "tenant_acme_hash",
  "inputTokens": 540,
  "outputTokens": 180,
  "latencyMs": 892,
  "status": "success",
  "dataMode": "real",
  "environment": "prod"
}

Common mistakes

  • Generating a new externalRequestId on every retry.
  • Using inconsistent endpointTag or promptVersion naming conventions.
  • Mixing real traffic with test/demo dataMode in the same operational view.
  • Sending telemetry synchronously and risking user-path latency impact.

How to verify in Opsmeter Dashboard

  1. Use Overview to confirm spike window and budget posture.
  2. Use Top Endpoints to find feature-level concentration.
  3. Use Top Users to find tenant-level concentration.
  4. Use Prompt Versions to validate deploy-linked cost drift.

Related guides

Open quickstartStart freeCompare alternatives

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack