Opsmeter logo
Opsmeter
AI Cost & Inference Control

Agent workflows

Tool output ballooning: when agent tools quietly double token costs

Tool outputs can dominate token spend in multi-step flows. Treat tool payload size as a first-class cost metric.

ArchitectureOperations

Full guide: LLM cost attribution: endpoint, prompt version, tenant, and user

Why tool outputs inflate cost

  • Verbose JSON payloads are re-injected into downstream prompts.
  • Multiple tools emit overlapping context.
  • Retry loops duplicate large tool outputs across attempts.

Where ballooning shows up (common tool types)

  • Search and browsing tools that return full pages instead of extracted answers.
  • SQL / analytics tools that return wide tables with unused columns.
  • CRM / ticketing tools that dump the entire record instead of relevant fields.
  • Code and diff tools that return large files instead of minimal patches.
  • Tracing/log tools that inject raw logs back into the model.

Mitigation pattern

  1. Summarize tool payloads before reinjection.
  2. Set max-size policy per tool output class.
  3. Track cost per workflow step for tool-heavy endpoints.

Compression strategies that preserve quality

Do not rely on prompt tweaks alone. Put the constraint at the source: the tool output, the schema, or the reinjection step.

If the agent needs full data for debugging, store it out-of-band and pass a short pointer (ID + summary) to the model.

  1. Use schema-minimal outputs (only required fields).
  2. Chunk and paginate large results; fetch more only when needed.
  3. Summarize long tool outputs into a fixed-size digest.
  4. Deduplicate overlapping context across tools before reinjection.
  5. Cap tool call count and set stop conditions to prevent loops.

Telemetry fields that expose tool output bloat

  • tool name and tool call count per request
  • approx payload size (bytes) or token estimate per tool output
  • retry ratio for tool-heavy workflows
  • promptVersion correlation (bloat often starts after deploys)
  • top endpoints where tool output dominates inputTokens

What to send (payload example)

{
  "externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
  "provider": "provider_id",
  "model": "model_id",
  "endpointTag": "agent.workflow",
  "promptVersion": "agent_v2",
  "userId": "tenant_acme_hash",
  "inputTokens": 980,
  "outputTokens": 420,
  "latencyMs": 892,
  "status": "success",
  "dataMode": "real",
  "environment": "prod"
}

Common mistakes

  • Missing endpointTag or using inconsistent naming across teams.
  • Not tagging promptVersion, so deploys cannot be linked to spend changes.
  • Sending raw user identifiers instead of hashed mapping for privacy.
  • Mixing demo/test dataMode into production operational reviews.

How to verify in Opsmeter Dashboard

  1. Use Overview to confirm spike window and budget posture.
  2. Use Top Endpoints to find feature-level concentration.
  3. Use Top Users to find tenant-level concentration.
  4. Use Prompt Versions to validate deploy-linked cost drift.

Related guides

Open workflow cost guideStart freeCompare alternatives

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack