Unit economics

Ops guideBOFU profile

Cost per workflow step: where agent spend concentrates

Agent workflows hide spend in intermediate steps. Per-step attribution makes budget controls actionable.

Published: 2026-02-24Updated: 2026-02-26

OperationsArchitecture

Full guide: OpenAI cost per API call: a production-ready method

What this guide answers

What changed in cost, cost per request, or budget posture.
Which endpoint, prompt, model, or tenant likely drove the delta.
Which validation step or control to apply next in Opsmeter.io.

What to alert on

cost/request drift by endpointTag or promptVersion
unexpected tenant concentration in Top Users
request burst with falling success ratio
budget warning, spend-alert, and exceeded state transitions

Execution checklist

Confirm spike type: volume, token, deploy, or abuse signal.
Assign one incident owner and one communication channel.
Apply immediate containment before deep optimization.
Document the dominant endpoint, tenant, and promptVersion driver.
Convert findings into one permanent guardrail update.

Model the workflow as cost stages

Classify each stage as retrieval, reasoning, tool call, or summarization.
Attach endpointTag and promptVersion to every stage event.
Aggregate by tenant and feature to surface margin pressure.

Use this workflow

Turn diagnosis into action

Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.

Apply in your workspace

Re-run this workflow on your own spend data

Follow the same path from article insight to telemetry verification, then validate with your own cost signals.

Quickstart pathSend a first payload, confirm attribution, then return here for operations context.Open quickstart

Evaluation pathPair this guide with trust proof, status, and compare surfaces during review.Open trust proof pack

Stage taxonomy (endpointTag examples for agents)

The simplest way to get per-step cost is to treat each agent stage as its own endpointTag. That makes cost ownership and alerting deterministic without inventing a new schema.

Keep stage names stable so regressions are attributable to promptVersion changes.

agent.plan (planner step)
agent.tool_search (search tool call wrapper)
agent.tool_sql (SQL/analytics tool call wrapper)
agent.retrieve (RAG retrieval wrapper)
agent.final (final answer synthesis)

Decision outputs to require weekly

Top 3 expensive workflow stages
Retry multiplier by stage
Stage-level latency and cost per successful outcome
Owner and fix plan for each outlier stage

Metrics that catch agent loops early

step count per user action (how many calls happen per outcome)
cost per successful outcome (include retries and fallbacks)
p95/p99 inputTokens for tool-heavy stages (payload bloat)
latency per stage (timeouts often trigger duplicate calls)
tenant concentration (one tenant can dominate agent spend)

Optimization playbook (reduce spend without breaking outcomes)

Cap tool calls per workflow and add stop conditions (prevent loops).
Summarize tool outputs before reinjection (fixed-size digest).
Cache retrieval results for repeated queries (where allowed).
Route low-risk stages to cheaper tiers (keep critical stages high quality).
Alert on stage-level cost/request drift after promptVersion deploys.

Guardrails for runaway agent spend

Set max tool calls per workflow and alert on loop patterns.
Cap tokens per stage (retrieval, reasoning, summarization) separately.
Treat retries as cost multipliers and keep one externalRequestId per logical request.
Throttle non-critical agent paths when budgets approach warning thresholds.
Review stage-level outliers after every promptVersion deploy.

FAQ

How do we measure cost per step if an agent uses many tool calls?

Treat each stage/tool wrapper as its own endpointTag (agent.tool_search, agent.tool_sql, etc.). Then roll up cost by endpointTag and compare before/after windows by promptVersion to find the stage that regressed.

What usually makes agent workflows suddenly more expensive?

Step count increases (loops), tool outputs get larger (payload bloat), or retries increase due to timeouts. Those multipliers can raise total cost even if token price stays constant.

Should we route the entire agent to a cheaper model tier?

Usually no. Route by stage or endpointTag: keep critical reasoning/final steps on a stronger tier and move low-risk steps (summaries, formatting) to a cheaper tier with caps.

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack