Agent workflows
Tool output ballooning: when agent tools quietly double token costs
Tool outputs can dominate token spend in multi-step flows. Treat tool payload size as a first-class cost metric.
Full guide: LLM cost attribution: endpoint, prompt version, tenant, and user
Why tool outputs inflate cost
- Verbose JSON payloads are re-injected into downstream prompts.
- Multiple tools emit overlapping context.
- Retry loops duplicate large tool outputs across attempts.
Where ballooning shows up (common tool types)
- Search and browsing tools that return full pages instead of extracted answers.
- SQL / analytics tools that return wide tables with unused columns.
- CRM / ticketing tools that dump the entire record instead of relevant fields.
- Code and diff tools that return large files instead of minimal patches.
- Tracing/log tools that inject raw logs back into the model.
Mitigation pattern
- Summarize tool payloads before reinjection.
- Set max-size policy per tool output class.
- Track cost per workflow step for tool-heavy endpoints.
Compression strategies that preserve quality
Do not rely on prompt tweaks alone. Put the constraint at the source: the tool output, the schema, or the reinjection step.
If the agent needs full data for debugging, store it out-of-band and pass a short pointer (ID + summary) to the model.
- Use schema-minimal outputs (only required fields).
- Chunk and paginate large results; fetch more only when needed.
- Summarize long tool outputs into a fixed-size digest.
- Deduplicate overlapping context across tools before reinjection.
- Cap tool call count and set stop conditions to prevent loops.
Telemetry fields that expose tool output bloat
- tool name and tool call count per request
- approx payload size (bytes) or token estimate per tool output
- retry ratio for tool-heavy workflows
- promptVersion correlation (bloat often starts after deploys)
- top endpoints where tool output dominates inputTokens
What to send (payload example)
{
"externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
"provider": "provider_id",
"model": "model_id",
"endpointTag": "agent.workflow",
"promptVersion": "agent_v2",
"userId": "tenant_acme_hash",
"inputTokens": 980,
"outputTokens": 420,
"latencyMs": 892,
"status": "success",
"dataMode": "real",
"environment": "prod"
}Common mistakes
- Missing endpointTag or using inconsistent naming across teams.
- Not tagging promptVersion, so deploys cannot be linked to spend changes.
- Sending raw user identifiers instead of hashed mapping for privacy.
- Mixing demo/test dataMode into production operational reviews.
How to verify in Opsmeter Dashboard
- Use Overview to confirm spike window and budget posture.
- Use Top Endpoints to find feature-level concentration.
- Use Top Users to find tenant-level concentration.
- Use Prompt Versions to validate deploy-linked cost drift.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.