Opsmeter logo
Opsmeter
AI Cost & Inference Control

Unit economics

Cost per workflow step: where agent spend concentrates

Agent workflows hide spend in intermediate steps. Per-step attribution makes budget controls actionable.

OperationsArchitecture

Full guide: OpenAI cost per API call: a production-ready method

Model the workflow as cost stages

  • Classify each stage as retrieval, reasoning, tool call, or summarization.
  • Attach endpointTag and promptVersion to every stage event.
  • Aggregate by tenant and feature to surface margin pressure.

Stage taxonomy (endpointTag examples for agents)

The simplest way to get per-step cost is to treat each agent stage as its own endpointTag. That makes cost ownership and alerting deterministic without inventing a new schema.

Keep stage names stable so regressions are attributable to promptVersion changes.

  • agent.plan (planner step)
  • agent.tool_search (search tool call wrapper)
  • agent.tool_sql (SQL/analytics tool call wrapper)
  • agent.retrieve (RAG retrieval wrapper)
  • agent.final (final answer synthesis)

Decision outputs to require weekly

  1. Top 3 expensive workflow stages
  2. Retry multiplier by stage
  3. Stage-level latency and cost per successful outcome
  4. Owner and fix plan for each outlier stage

Metrics that catch agent loops early

  • step count per user action (how many calls happen per outcome)
  • cost per successful outcome (include retries and fallbacks)
  • p95/p99 inputTokens for tool-heavy stages (payload bloat)
  • latency per stage (timeouts often trigger duplicate calls)
  • tenant concentration (one tenant can dominate agent spend)

Optimization playbook (reduce spend without breaking outcomes)

  1. Cap tool calls per workflow and add stop conditions (prevent loops).
  2. Summarize tool outputs before reinjection (fixed-size digest).
  3. Cache retrieval results for repeated queries (where allowed).
  4. Route low-risk stages to cheaper tiers (keep critical stages high quality).
  5. Alert on stage-level cost/request drift after promptVersion deploys.

Guardrails for runaway agent spend

  • Set max tool calls per workflow and alert on loop patterns.
  • Cap tokens per stage (retrieval, reasoning, summarization) separately.
  • Treat retries as cost multipliers and keep one externalRequestId per logical request.
  • Throttle non-critical agent paths when budgets approach warning thresholds.
  • Review stage-level outliers after every promptVersion deploy.

What to alert on

  • cost/request drift by endpointTag or promptVersion
  • unexpected tenant concentration in Top Users
  • request burst with falling success ratio
  • budget warning, spend-alert, and exceeded state transitions

Execution checklist

  1. Confirm spike type: volume, token, deploy, or abuse signal.
  2. Assign one incident owner and one communication channel.
  3. Apply immediate containment before deep optimization.
  4. Document the dominant endpoint, tenant, and promptVersion driver.
  5. Convert findings into one permanent guardrail update.

FAQ

How do we measure cost per step if an agent uses many tool calls?

Treat each stage/tool wrapper as its own endpointTag (agent.tool_search, agent.tool_sql, etc.). Then roll up cost by endpointTag and compare before/after windows by promptVersion to find the stage that regressed.

What usually makes agent workflows suddenly more expensive?

Step count increases (loops), tool outputs get larger (payload bloat), or retries increase due to timeouts. Those multipliers can raise total cost even if token price stays constant.

Should we route the entire agent to a cheaper model tier?

Usually no. Route by stage or endpointTag: keep critical reasoning/final steps on a stronger tier and move low-risk steps (summaries, formatting) to a cheaper tier with caps.

Related guides

Open unit economics pillarView docsCompare alternatives

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack