Production setup

Ops guideBOFU profile

No-SDK LLM cost tracking: production setup with direct ingest API

You can run Opsmeter.io in production today without SDK wrappers. Use a stable payload contract and non-blocking ingest flow.

Published: 2026-02-22Updated: 2026-02-26

ArchitectureNo-SDKOperations

Full guide: Proxy vs No-Proxy LLM Observability Guide

What this guide answers

What changed in cost, cost per request, or budget posture.
Which endpoint, prompt, model, or tenant likely drove the delta.
Which validation step or control to apply next in Opsmeter.io.

What to alert on

cost/request drift by endpointTag or promptVersion
unexpected tenant concentration in Top Users
request burst with falling success ratio
budget warning, spend-alert, and exceeded state transitions

Execution checklist

Decide your endpointTag taxonomy (feature ownership) and promptVersion rules (deploy accountability).
Implement externalRequestId and keep it stable across retries.
Send telemetry asynchronously with timeout + swallow-on-error behavior.
Separate demo/test from production with dataMode + environment.
Validate dashboards (Top Endpoints, Top Users, Prompt Versions) on a canary window before scaling.

When no-SDK setup is the right first move

Teams often start with direct API when they want fast rollout without waiting for package adoption.

The no-SDK path keeps provider traffic unchanged and adds telemetry in app logic.

Use this workflow

Turn diagnosis into action

Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.

Apply in your workspace

Re-run this workflow on your own spend data

Follow the same path from article insight to telemetry verification, then validate with your own cost signals.

Quickstart pathSend a first payload, confirm attribution, then return here for operations context.Open quickstart

Evaluation pathPair this guide with trust proof, status, and compare surfaces during review.Open trust proof pack

Minimum production contract

externalRequestId stable across retries
provider, model, endpointTag, promptVersion
inputTokens/outputTokens/latencyMs/status
dataMode and environment for clean operational segmentation

Architecture overview (direct ingest flow)

The simplest no-SDK pattern is: call your provider as usual, extract usage fields from the response, and send one telemetry event to Opsmeter.io.

Keep ingest off the critical path. Telemetry should never block user requests.

Provider call completes.
You attach endpointTag + promptVersion + externalRequestId to the event.
You send the event asynchronously with a short timeout and swallow-on-error behavior.
Dashboards aggregate by endpoint, tenant/user, and deploy so incidents are explainable.

Data quality rules (prevent noisy attribution)

Normalize model identifiers into a stable catalog key (provider + model).
Keep endpointTag taxonomy stable; never embed user-specific values.
Hash userId/tenantId when needed and document the mapping rules.
Always include dataMode + environment so synthetic traffic does not pollute baselines.

Retry safety and idempotency (why externalRequestId matters)

Without a stable externalRequestId, retries look like new work and your dashboards over-count both volume and spend.

If your app retries upstream calls, keep the same externalRequestId so you can measure retry multipliers and isolate reliability-driven cost spikes.

Generate externalRequestId once per user action (not per attempt).
Pass it through logs and telemetry so incidents are traceable end-to-end.
On 429, respect Retry-After and avoid tight loops that multiply cost.

Production-safe send pattern

Send telemetry asynchronously (fire-and-forget).
Use timeout + swallow to protect user request path.
On 429 read Retry-After and back off.
On 402 pause telemetry and keep provider calls running.

Rollout plan (reduce integration risk)

Start with one high-volume endpointTag to validate attribution.
Verify tokens, cost/request, and model mapping on a small canary window.
Expand to the next 5 endpoints that dominate spend (80/20).
Add budgets/alerts only after traffic classification (dataMode + env) is clean.
Document naming conventions so teams add tags consistently.

What SDK wrappers will add later

Automatic capture wrappers in common frameworks.
Runtime enforcement patterns (clamp/fallback/queue).
Standard policy contracts for machine-readable actions.

FAQ

Will no-SDK telemetry slow down user requests?

It should not. Keep telemetry off the critical path: async send, short timeouts, and swallow-on-error behavior. Your product flow should succeed even when telemetry is temporarily unavailable.

Can we track per-user cost without storing PII?

Yes. Use stable hashed identifiers (or internal IDs) for userId/tenantId, and document the mapping rules. The goal is consistent attribution, not collecting personal data.

Do we need a proxy or gateway to get reliable cost tracking?

No. You can start with direct ingest (no-proxy/no-SDK) for fast adoption. Add gateway routing later only if you truly need centralized runtime enforcement or multi-provider routing in the request path.

What breaks most no-SDK setups in production?

Missing externalRequestId (retries inflate totals), inconsistent endpointTag naming, and mixed demo/test traffic that corrupts alert baselines. Fix those three and the rest becomes much easier.

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack