Architecture
No-proxy implementation guide: send LLM cost telemetry without a gateway
Implementation-first guide for shipping no-proxy telemetry safely: contract design, async delivery, retries, and verification.
Full guide: Proxy vs no-proxy LLM observability: tradeoffs for production teams
Why teams choose no-proxy first
You avoid inserting a network gateway in production traffic paths.
Integration can happen incrementally in application code without changing provider routing.
Telemetry contract (minimum fields)
- externalRequestId (stable across retries)
- provider + model identifiers (normalized)
- endpointTag + promptVersion (ownership and deploy correlation)
- userId and/or tenantId (hashed if needed)
- inputTokens + outputTokens + latencyMs + status
- dataMode + environment (keep test vs prod separate)
Reference architecture
- Layer A: Provider call and usage extraction
- Layer B: Telemetry client with timeout, swallow-on-error, and retry policy
- Layer C: Dashboard attribution by endpointTag, userId, and promptVersion
Reliability pattern (do not break the user path)
No-proxy telemetry should never be a production dependency. Keep ingest async, time-bounded, and safe to fail.
If telemetry fails, provider calls should still succeed. Treat observability as best-effort and monitor ingestion health separately.
- Short timeouts and swallow-on-error behavior.
- Async or background ingest (avoid blocking user requests).
- Batching for high-volume endpoints when needed.
- Sampling for extremely high volume (keep attribution coverage on top drivers).
Implementation checklist
- Keep externalRequestId stable on retries.
- Map provider usage fields into a normalized token model.
- Tag telemetry with dataMode and environment.
- Set short timeout and non-blocking telemetry behavior.
Tradeoff to communicate clearly
No-proxy does not block provider calls directly. Guardrail actions run in app logic and operational workflows.
What to send (payload example)
{
"externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
"provider": "provider_id",
"model": "model_id",
"endpointTag": "checkout.ai_summary",
"promptVersion": "summary_v3",
"userId": "tenant_acme_hash",
"inputTokens": 540,
"outputTokens": 180,
"latencyMs": 892,
"status": "success",
"dataMode": "real",
"environment": "prod"
}Common mistakes
- Choosing a proxy for visibility, then inheriting new failure modes.
- Instrumenting too late (no endpointTag/promptVersion in production).
- Treating cost control as a billing problem, not an operations workflow.
- No owner for budgets and escalation after integration.
How to verify in Opsmeter Dashboard
- Use Overview to confirm spike window and budget posture.
- Use Top Endpoints to find feature-level concentration.
- Use Top Users to find tenant-level concentration.
- Use Prompt Versions to validate deploy-linked cost drift.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.