Opsmeter.io logo
Opsmeter.io
AI Cost & Inference Control

Architecture

Ops guideMOFU profile

No-proxy implementation guide: send LLM cost telemetry without a gateway

Implementation-first guide for shipping no-proxy telemetry safely: contract design, async delivery, retries, and verification.

ArchitectureNo-proxyTelemetry

Full guide: Proxy vs no-proxy LLM observability: tradeoffs for production teams

What this guide answers

  • What changed in cost, cost per request, or budget posture.
  • Which endpoint, prompt, model, or tenant likely drove the delta.
  • Which validation step or control to apply next in Opsmeter.io.

What to send (payload example)

{
  "externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
  "provider": "provider_id",
  "model": "model_id",
  "endpointTag": "checkout.ai_summary",
  "promptVersion": "summary_v3",
  "userId": "tenant_acme_hash",
  "inputTokens": 540,
  "outputTokens": 180,
  "latencyMs": 892,
  "status": "success",
  "dataMode": "real",
  "environment": "prod"
}

Common mistakes

  • Choosing a proxy for visibility, then inheriting new failure modes.
  • Instrumenting too late (no endpointTag/promptVersion in production).
  • Treating cost control as a billing problem, not an operations workflow.
  • No owner for budgets and escalation after integration.

How to verify in the Opsmeter.io dashboard

  1. Use Overview to confirm spike window and budget posture.
  2. Use Top Endpoints to find feature-level concentration.
  3. Use Top Users to find tenant-level concentration.
  4. Use Prompt Versions to validate deploy-linked cost drift.

Why teams choose no-proxy first

You avoid inserting a network gateway in production traffic paths.

Integration can happen incrementally in application code without changing provider routing.

Use this workflow

Turn diagnosis into action

Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.

Apply in your workspace

Re-run this workflow on your own spend data

Follow the same path from article insight to telemetry verification, then validate with your own cost signals.

Quickstart pathSend a first payload, confirm attribution, then return here for operations context.Open quickstart
Evaluation pathPair this guide with trust proof, status, and compare surfaces during review.Open trust proof pack

Telemetry contract (minimum fields)

  • externalRequestId (stable across retries)
  • provider + model identifiers (normalized)
  • endpointTag + promptVersion (ownership and deploy correlation)
  • userId and/or tenantId (hashed if needed)
  • inputTokens + outputTokens + latencyMs + status
  • dataMode + environment (keep test vs prod separate)

Reference architecture

  • Layer A: Provider call and usage extraction
  • Layer B: Telemetry client with timeout, swallow-on-error, and retry policy
  • Layer C: Dashboard attribution by endpointTag, userId, and promptVersion

Reliability pattern (do not break the user path)

No-proxy telemetry should never be a production dependency. Keep ingest async, time-bounded, and safe to fail.

If telemetry fails, provider calls should still succeed. Treat observability as best-effort and monitor ingestion health separately.

  1. Short timeouts and swallow-on-error behavior.
  2. Async or background ingest (avoid blocking user requests).
  3. Batching for high-volume endpoints when needed.
  4. Sampling for extremely high volume (keep attribution coverage on top drivers).

Implementation checklist

  1. Keep externalRequestId stable on retries.
  2. Map provider usage fields into a normalized token model.
  3. Tag telemetry with dataMode and environment.
  4. Set short timeout and non-blocking telemetry behavior.

Tradeoff to communicate clearly

No-proxy does not block provider calls directly. Guardrail actions run in app logic and operational workflows.

Related guides

Open integration docsView quickstartCompare alternatives

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack