Architecture

Ops guideMOFU profile

No-Proxy LLM Telemetry Setup for Cost Tracking

Implementation-first guide for shipping no-proxy telemetry safely: contract design, async delivery, retries, and verification.

Published: 2026-02-20Updated: 2026-02-26

ArchitectureNo-proxyTelemetry

Full guide: Proxy vs No-Proxy LLM Observability Guide

What this guide answers

What changed in cost, cost per request, or budget posture.
Which endpoint, prompt, model, or tenant likely drove the delta.
Which validation step or control to apply next in Opsmeter.io.

What to send (payload example)

{
  "externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
  "provider": "provider_id",
  "model": "model_id",
  "endpointTag": "checkout.ai_summary",
  "promptVersion": "summary_v3",
  "userId": "tenant_acme_hash",
  "inputTokens": 540,
  "outputTokens": 180,
  "latencyMs": 892,
  "status": "success",
  "dataMode": "real",
  "environment": "prod"
}

Common mistakes

Choosing a proxy for visibility, then inheriting new failure modes.
Instrumenting too late (no endpointTag/promptVersion in production).
Treating cost control as a billing problem, not an operations workflow.
No owner for budgets and escalation after integration.

How to verify in the Opsmeter.io dashboard

Use Overview to confirm spike window and budget posture.
Use Top Endpoints to find feature-level concentration.
Use Top Users to find tenant-level concentration.
Use Prompt Versions to validate deploy-linked cost drift.

Why teams choose no-proxy first

You avoid inserting a network gateway in production traffic paths.

Integration can happen incrementally in application code without changing provider routing.

Use this workflow

Turn diagnosis into action

Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.

Apply in your workspace

Re-run this workflow on your own spend data

Follow the same path from article insight to telemetry verification, then validate with your own cost signals.

Quickstart pathSend a first payload, confirm attribution, then return here for operations context.Open quickstart

Evaluation pathPair this guide with trust proof, status, and compare surfaces during review.Open trust proof pack

Telemetry contract (minimum fields)

externalRequestId (stable across retries)
provider + model identifiers (normalized)
endpointTag + promptVersion (ownership and deploy correlation)
userId and/or tenantId (hashed if needed)
inputTokens + outputTokens + latencyMs + status
dataMode + environment (keep test vs prod separate)

Reference architecture

Layer A: Provider call and usage extraction
Layer B: Telemetry client with timeout, swallow-on-error, and retry policy
Layer C: Dashboard attribution by endpointTag, userId, and promptVersion

Reliability pattern (do not break the user path)

No-proxy telemetry should never be a production dependency. Keep ingest async, time-bounded, and safe to fail.

If telemetry fails, provider calls should still succeed. Treat observability as best-effort and monitor ingestion health separately.

Short timeouts and swallow-on-error behavior.
Async or background ingest (avoid blocking user requests).
Batching for high-volume endpoints when needed.
Sampling for extremely high volume (keep attribution coverage on top drivers).

Implementation checklist

Keep externalRequestId stable on retries.
Map provider usage fields into a normalized token model.
Tag telemetry with dataMode and environment.
Set short timeout and non-blocking telemetry behavior.

Tradeoff to communicate clearly

No-proxy does not block provider calls directly. Guardrail actions run in app logic and operational workflows.

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack