Opsmeter logo
Opsmeter
AI Cost & Inference Control

Architecture

No-proxy implementation guide: send LLM cost telemetry without a gateway

Implementation-first guide for shipping no-proxy telemetry safely: contract design, async delivery, retries, and verification.

ArchitectureNo-proxyTelemetry

Full guide: Proxy vs no-proxy LLM observability: tradeoffs for production teams

Why teams choose no-proxy first

You avoid inserting a network gateway in production traffic paths.

Integration can happen incrementally in application code without changing provider routing.

Telemetry contract (minimum fields)

  • externalRequestId (stable across retries)
  • provider + model identifiers (normalized)
  • endpointTag + promptVersion (ownership and deploy correlation)
  • userId and/or tenantId (hashed if needed)
  • inputTokens + outputTokens + latencyMs + status
  • dataMode + environment (keep test vs prod separate)

Reference architecture

  • Layer A: Provider call and usage extraction
  • Layer B: Telemetry client with timeout, swallow-on-error, and retry policy
  • Layer C: Dashboard attribution by endpointTag, userId, and promptVersion

Reliability pattern (do not break the user path)

No-proxy telemetry should never be a production dependency. Keep ingest async, time-bounded, and safe to fail.

If telemetry fails, provider calls should still succeed. Treat observability as best-effort and monitor ingestion health separately.

  1. Short timeouts and swallow-on-error behavior.
  2. Async or background ingest (avoid blocking user requests).
  3. Batching for high-volume endpoints when needed.
  4. Sampling for extremely high volume (keep attribution coverage on top drivers).

Implementation checklist

  1. Keep externalRequestId stable on retries.
  2. Map provider usage fields into a normalized token model.
  3. Tag telemetry with dataMode and environment.
  4. Set short timeout and non-blocking telemetry behavior.

Tradeoff to communicate clearly

No-proxy does not block provider calls directly. Guardrail actions run in app logic and operational workflows.

What to send (payload example)

{
  "externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
  "provider": "provider_id",
  "model": "model_id",
  "endpointTag": "checkout.ai_summary",
  "promptVersion": "summary_v3",
  "userId": "tenant_acme_hash",
  "inputTokens": 540,
  "outputTokens": 180,
  "latencyMs": 892,
  "status": "success",
  "dataMode": "real",
  "environment": "prod"
}

Common mistakes

  • Choosing a proxy for visibility, then inheriting new failure modes.
  • Instrumenting too late (no endpointTag/promptVersion in production).
  • Treating cost control as a billing problem, not an operations workflow.
  • No owner for budgets and escalation after integration.

How to verify in Opsmeter Dashboard

  1. Use Overview to confirm spike window and budget posture.
  2. Use Top Endpoints to find feature-level concentration.
  3. Use Top Users to find tenant-level concentration.
  4. Use Prompt Versions to validate deploy-linked cost drift.

Related guides

Open integration docsView quickstartCompare alternatives

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack