Pricing accuracy

Ops guideMOFU profile

Token cost calculation pitfalls: cached, audio, reasoning tokens

Cost models fail when teams assume all tokens are billed identically. Handle usage classes explicitly and verify mappings.

Published: 2026-02-24Updated: 2026-02-26

PricingOperations

Full guide: LLM pricing tables: keep costs accurate and handle unknown models

What this guide answers

What changed in cost, cost per request, or budget posture.
Which endpoint, prompt, model, or tenant likely drove the delta.
Which validation step or control to apply next in Opsmeter.io.

What to send (payload example)

{
  "externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
  "provider": "provider_id",
  "model": "model_id",
  "endpointTag": "catalog.pricing_reconcile",
  "promptVersion": "pricing_v2",
  "userId": "tenant_acme_hash",
  "inputTokens": 320,
  "outputTokens": 120,
  "latencyMs": 892,
  "status": "success",
  "dataMode": "real",
  "environment": "prod"
}

Common mistakes

Overwriting historical price snapshots instead of versioning by effective date.
Ignoring unknown-model rows until dashboards become untrustworthy.
Missing cached/reasoning token nuances and mispricing requests.
Mixing test/demo traffic into production pricing reconciliation.

How to verify in the Opsmeter.io dashboard

Open Catalog to confirm model mapping and pricing effective dates.
Check unknown-model visibility and resolve pending pricing rows.
Spot-check cost snapshots on recent requests to validate ingestion accuracy.
Reconcile aggregates against provider usage exports for the same window.

Where teams get it wrong

Merging cached and uncached tokens into one rate.
Ignoring provider-specific reasoning token fields.
Applying stale model pricing after provider updates.

Use this workflow

Turn diagnosis into action

Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.

Apply in your workspace

Re-run this workflow on your own spend data

Follow the same path from article insight to telemetry verification, then validate with your own cost signals.

Quickstart pathSend a first payload, confirm attribution, then return here for operations context.Open quickstart

Evaluation pathPair this guide with trust proof, status, and compare surfaces during review.Open trust proof pack

Counted tokens vs billed tokens (why estimates differ)

Developers often estimate tokens before the call (for example with a tokenizer like tiktoken) and then wonder why invoices do not match the estimate.

The billing source of truth is the provider usage fields for that request. Tokenizers are helpful for guardrails, but cost should reconcile to measured usage.

Use token counting for pre-call safety: max tokens, context budgets, and routing.
Use provider usage fields for post-call accounting: cost per request, audits, and invoice reconciliation.
Store both when possible so you can debug differences.

Cached input tokens and context caching

Prompt caching / context caching can reduce effective input cost, but only if you track cached token classes explicitly.

If you merge cached and uncached input tokens, your “cost per token” charts will drift and your pricing table will look wrong.

Keep cached vs uncached usage fields separate in storage.
Apply the correct cached rate by effective date (pricing can change).
Alert when cache hit-rate drops while inputTokens rise (common regression signal).

Reasoning tokens: treat them as a distinct cost driver

Reasoning tokens (or similar provider-specific classes) change the cost profile of the same prompt without changing “input/output” totals.

If you ignore reasoning token fields, you will misattribute cost regressions to prompts or traffic volume.

Track reasoning usage separately from input/output totals.
Compare reasoning share before vs after promptVersion deploys.
Watch tail outliers: reasoning-heavy requests often live in p95/p99.

Audio/image tokens and multi-modal usage

Multi-modal requests introduce token classes that do not behave like plain text. Some providers also support caching for image inputs.

Operationally, treat multi-modal usage as its own endpointTag class so budgets and alerts remain predictable.

Split text vs audio vs image usage fields if provided.
Avoid mixing multi-modal traffic into text-only baselines.
Cap and budget multi-modal endpoints separately (they are higher variance).

Verification checklist

Map raw provider usage fields to normalized schema.
Track unknown-model rows and resolve quickly.
Run weekly reconciliation against provider usage reports.
Snapshot effective rates at ingest time for history consistency.

Cost-per-request math (make it deterministic)

Store a request cost snapshot at ingest time (inputCostUsd, outputCostUsd, totalCostUsd).
Compute cost using a versioned pricing table (effective dates), not hardcoded constants.
Include all token classes your provider bills (cached, reasoning, audio/image when applicable).

Normalization rules that keep cost math consistent

Store raw usage fields (source of truth) alongside normalized token totals.
Version pricing by effective date; never rewrite historical snapshots.
Treat missing usage as explicit uncertainty, not zero.
Document which usage fields map to input/output/reasoning/cached classes.
Reconcile on a schedule so pricing drift is caught before finance reviews.

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack