Agent workflows
Tool output ballooning: when agent tools quietly double token costs
Tool outputs can dominate token spend in multi-step flows. Treat tool payload size as a first-class cost metric.
Full guide: LLM cost attribution: endpoint, prompt version, tenant, and user
What this guide answers
- What changed in cost, cost per request, or budget posture.
- Which endpoint, prompt, model, or tenant likely drove the delta.
- Which validation step or control to apply next in Opsmeter.io.
What to send (payload example)
{
"externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
"provider": "provider_id",
"model": "model_id",
"endpointTag": "agent.workflow",
"promptVersion": "agent_v2",
"userId": "tenant_acme_hash",
"inputTokens": 980,
"outputTokens": 420,
"latencyMs": 892,
"status": "success",
"dataMode": "real",
"environment": "prod"
}Common mistakes
- Missing endpointTag or using inconsistent naming across teams.
- Not tagging promptVersion, so deploys cannot be linked to spend changes.
- Sending raw user identifiers instead of hashed mapping for privacy.
- Mixing demo/test dataMode into production operational reviews.
How to verify in the Opsmeter.io dashboard
- Use Overview to confirm spike window and budget posture.
- Use Top Endpoints to find feature-level concentration.
- Use Top Users to find tenant-level concentration.
- Use Prompt Versions to validate deploy-linked cost drift.
Why tool outputs inflate cost
- Verbose JSON payloads are re-injected into downstream prompts.
- Multiple tools emit overlapping context.
- Retry loops duplicate large tool outputs across attempts.
Use this workflow
Turn diagnosis into action
Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.
Apply in your workspace
Re-run this workflow on your own spend data
Follow the same path from article insight to telemetry verification, then validate with your own cost signals.
Where ballooning shows up (common tool types)
- Search and browsing tools that return full pages instead of extracted answers.
- SQL / analytics tools that return wide tables with unused columns.
- CRM / ticketing tools that dump the entire record instead of relevant fields.
- Code and diff tools that return large files instead of minimal patches.
- Tracing/log tools that inject raw logs back into the model.
Mitigation pattern
- Summarize tool payloads before reinjection.
- Set max-size policy per tool output class.
- Track cost per workflow step for tool-heavy endpoints.
Compression strategies that preserve quality
Do not rely on prompt tweaks alone. Put the constraint at the source: the tool output, the schema, or the reinjection step.
If the agent needs full data for debugging, store it out-of-band and pass a short pointer (ID + summary) to the model.
- Use schema-minimal outputs (only required fields).
- Chunk and paginate large results; fetch more only when needed.
- Summarize long tool outputs into a fixed-size digest.
- Deduplicate overlapping context across tools before reinjection.
- Cap tool call count and set stop conditions to prevent loops.
Telemetry fields that expose tool output bloat
- tool name and tool call count per request
- approx payload size (bytes) or token estimate per tool output
- retry ratio for tool-heavy workflows
- promptVersion correlation (bloat often starts after deploys)
- top endpoints where tool output dominates inputTokens
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.