Pricing accuracy
Token cost calculation pitfalls: cached, audio, reasoning tokens
Cost models fail when teams assume all tokens are billed identically. Handle usage classes explicitly and verify mappings.
Full guide: LLM pricing tables: keep costs accurate and handle unknown models
Where teams get it wrong
- Merging cached and uncached tokens into one rate.
- Ignoring provider-specific reasoning token fields.
- Applying stale model pricing after provider updates.
Counted tokens vs billed tokens (why estimates differ)
Developers often estimate tokens before the call (for example with a tokenizer like tiktoken) and then wonder why invoices do not match the estimate.
The billing source of truth is the provider usage fields for that request. Tokenizers are helpful for guardrails, but cost should reconcile to measured usage.
- Use token counting for pre-call safety: max tokens, context budgets, and routing.
- Use provider usage fields for post-call accounting: cost per request, audits, and invoice reconciliation.
- Store both when possible so you can debug differences.
Cached input tokens and context caching
Prompt caching / context caching can reduce effective input cost, but only if you track cached token classes explicitly.
If you merge cached and uncached input tokens, your “cost per token” charts will drift and your pricing table will look wrong.
- Keep cached vs uncached usage fields separate in storage.
- Apply the correct cached rate by effective date (pricing can change).
- Alert when cache hit-rate drops while inputTokens rise (common regression signal).
Reasoning tokens: treat them as a distinct cost driver
Reasoning tokens (or similar provider-specific classes) change the cost profile of the same prompt without changing “input/output” totals.
If you ignore reasoning token fields, you will misattribute cost regressions to prompts or traffic volume.
- Track reasoning usage separately from input/output totals.
- Compare reasoning share before vs after promptVersion deploys.
- Watch tail outliers: reasoning-heavy requests often live in p95/p99.
Audio/image tokens and multi-modal usage
Multi-modal requests introduce token classes that do not behave like plain text. Some providers also support caching for image inputs.
Operationally, treat multi-modal usage as its own endpointTag class so budgets and alerts remain predictable.
- Split text vs audio vs image usage fields if provided.
- Avoid mixing multi-modal traffic into text-only baselines.
- Cap and budget multi-modal endpoints separately (they are higher variance).
Verification checklist
- Map raw provider usage fields to normalized schema.
- Track unknown-model rows and resolve quickly.
- Run weekly reconciliation against provider usage reports.
- Snapshot effective rates at ingest time for history consistency.
Cost-per-request math (make it deterministic)
- Store a request cost snapshot at ingest time (inputCostUsd, outputCostUsd, totalCostUsd).
- Compute cost using a versioned pricing table (effective dates), not hardcoded constants.
- Include all token classes your provider bills (cached, reasoning, audio/image when applicable).
Normalization rules that keep cost math consistent
- Store raw usage fields (source of truth) alongside normalized token totals.
- Version pricing by effective date; never rewrite historical snapshots.
- Treat missing usage as explicit uncertainty, not zero.
- Document which usage fields map to input/output/reasoning/cached classes.
- Reconcile on a schedule so pricing drift is caught before finance reviews.
What to send (payload example)
{
"externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
"provider": "provider_id",
"model": "model_id",
"endpointTag": "catalog.pricing_reconcile",
"promptVersion": "pricing_v2",
"userId": "tenant_acme_hash",
"inputTokens": 320,
"outputTokens": 120,
"latencyMs": 892,
"status": "success",
"dataMode": "real",
"environment": "prod"
}Common mistakes
- Overwriting historical price snapshots instead of versioning by effective date.
- Ignoring unknown-model rows until dashboards become untrustworthy.
- Missing cached/reasoning token nuances and mispricing requests.
- Mixing test/demo traffic into production pricing reconciliation.
How to verify in Opsmeter Dashboard
- Open Catalog to confirm model mapping and pricing effective dates.
- Check unknown-model visibility and resolve pending pricing rows.
- Spot-check cost snapshots on recent requests to validate ingestion accuracy.
- Reconcile aggregates against provider usage exports for the same window.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.