Opsmeter logo
Opsmeter
AI Cost & Inference Control

Finance-ready ops

Unit economics for AI features: from tokens to margin

Unit economics converts telemetry into product decisions. Track cost per feature and per tenant to protect growth margin.

OperationsUse case

Full guide: OpenAI cost per API call: a production-ready method

Minimum model

  • Revenue per feature cohort
  • Direct model cost by endpointTag
  • Retry/fallback overhead estimate
  • Net margin trend by tenant segment

Weekly review format

  1. Top 3 negative-margin feature paths
  2. PromptVersion changes with margin impact
  3. Budget threshold adjustments by feature risk
  4. Pricing or quota actions for outlier tenants

Telemetry tags that make unit economics possible

  • endpointTag to map cost to features and teams
  • promptVersion to connect deploys to margin changes
  • tenant/user mapping to explain concentration and outliers
  • dataMode/environment to keep finance reporting clean
  • externalRequestId to correlate retries and workflow chains

From tokens to margin (the practical bridge)

Unit economics is not a spreadsheet exercise; it is a weekly decision loop. The bridge is request-level cost mapped to features and tenants.

Once cost is attributed to endpointTag and tenant, you can compare it to revenue (plan tier, overages, contract) and make a pricing or product decision.

  • cost per feature (endpointTag) = sum(requestCost) grouped by endpointTag
  • cost per tenant segment = sum(requestCost) grouped by segment label
  • margin trend = revenue trend - cost trend (by the same slice)

Decision levers when a feature is negative-margin

  • Reduce cost: cap output tokens, compress context, reduce tool calls.
  • Route: cheaper models for low-risk paths, better models for high-stakes paths.
  • Limit: quotas by endpointTag or by tenant segment.
  • Price: overages or tiered pricing aligned to cost drivers.
  • Ship: fix promptVersion regressions and retry multipliers.

Common pitfalls

  1. Reporting totals without mapping to endpoint ownership.
  2. Ignoring retries and fallbacks (effective cost is higher).
  3. Mixing demo/test traffic into production finance reporting.
  4. Not tracking promptVersion, so deploy impact is untraceable.
  5. Optimizing averages while tail outliers drive the variance.

What to alert on

  • cost/request drift by endpointTag or promptVersion
  • unexpected tenant concentration in Top Users
  • request burst with falling success ratio
  • budget warning, spend-alert, and exceeded state transitions

Execution checklist

  1. Confirm spike type: volume, token, deploy, or abuse signal.
  2. Assign one incident owner and one communication channel.
  3. Apply immediate containment before deep optimization.
  4. Document the dominant endpoint, tenant, and promptVersion driver.
  5. Convert findings into one permanent guardrail update.

FAQ

Is userId required?

No. userId is optional, but recommended for tenant-level attribution. If needed, send a hashed identifier.

Where should token usage values come from?

Prefer provider usage fields first. If unavailable, use tokenizer estimates and mark uncertainty in your workflow.

How should retries be handled?

Keep the same externalRequestId for the same logical request so idempotency remains stable across retries.

Can telemetry break production flow?

It should not. Use short timeouts, catch errors, and keep telemetry asynchronous so provider calls keep running.

Related guides

Open tenant profitability guideRead CFO reporting guideCompare alternatives

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack