Production setup
No-SDK LLM cost tracking: production setup with direct ingest API
You can run Opsmeter in production today without SDK wrappers. Use a stable payload contract and non-blocking ingest flow.
Full guide: Proxy vs no-proxy LLM observability: tradeoffs for production teams
When no-SDK setup is the right first move
Teams often start with direct API when they want fast rollout without waiting for package adoption.
The no-SDK path keeps provider traffic unchanged and adds telemetry in app logic.
Minimum production contract
- externalRequestId stable across retries
- provider, model, endpointTag, promptVersion
- inputTokens/outputTokens/latencyMs/status
- dataMode and environment for clean operational segmentation
Architecture overview (direct ingest flow)
The simplest no-SDK pattern is: call your provider as usual, extract usage fields from the response, and send one telemetry event to Opsmeter.
Keep ingest off the critical path. Telemetry should never block user requests.
- Provider call completes.
- You attach endpointTag + promptVersion + externalRequestId to the event.
- You send the event asynchronously with a short timeout and swallow-on-error behavior.
- Dashboards aggregate by endpoint, tenant/user, and deploy so incidents are explainable.
Data quality rules (prevent noisy attribution)
- Normalize model identifiers into a stable catalog key (provider + model).
- Keep endpointTag taxonomy stable; never embed user-specific values.
- Hash userId/tenantId when needed and document the mapping rules.
- Always include dataMode + environment so synthetic traffic does not pollute baselines.
Retry safety and idempotency (why externalRequestId matters)
Without a stable externalRequestId, retries look like new work and your dashboards over-count both volume and spend.
If your app retries upstream calls, keep the same externalRequestId so you can measure retry multipliers and isolate reliability-driven cost spikes.
- Generate externalRequestId once per user action (not per attempt).
- Pass it through logs and telemetry so incidents are traceable end-to-end.
- On 429, respect Retry-After and avoid tight loops that multiply cost.
Production-safe send pattern
- Send telemetry asynchronously (fire-and-forget).
- Use timeout + swallow to protect user request path.
- On 429 read Retry-After and back off.
- On 402 pause telemetry and keep provider calls running.
Rollout plan (reduce integration risk)
- Start with one high-volume endpointTag to validate attribution.
- Verify tokens, cost/request, and model mapping on a small canary window.
- Expand to the next 5 endpoints that dominate spend (80/20).
- Add budgets/alerts only after traffic classification (dataMode + env) is clean.
- Document naming conventions so teams add tags consistently.
What SDK wrappers will add later
- Automatic capture wrappers in common frameworks.
- Runtime enforcement patterns (clamp/fallback/queue).
- Standard policy contracts for machine-readable actions.
What to alert on
- cost/request drift by endpointTag or promptVersion
- unexpected tenant concentration in Top Users
- request burst with falling success ratio
- budget warning, spend-alert, and exceeded state transitions
Execution checklist
- Decide your endpointTag taxonomy (feature ownership) and promptVersion rules (deploy accountability).
- Implement externalRequestId and keep it stable across retries.
- Send telemetry asynchronously with timeout + swallow-on-error behavior.
- Separate demo/test from production with dataMode + environment.
- Validate dashboards (Top Endpoints, Top Users, Prompt Versions) on a canary window before scaling.
FAQ
Will no-SDK telemetry slow down user requests?
It should not. Keep telemetry off the critical path: async send, short timeouts, and swallow-on-error behavior. Your product flow should succeed even when telemetry is temporarily unavailable.
Can we track per-user cost without storing PII?
Yes. Use stable hashed identifiers (or internal IDs) for userId/tenantId, and document the mapping rules. The goal is consistent attribution, not collecting personal data.
Do we need a proxy or gateway to get reliable cost tracking?
No. You can start with direct ingest (no-proxy/no-SDK) for fast adoption. Add gateway routing later only if you truly need centralized runtime enforcement or multi-provider routing in the request path.
What breaks most no-SDK setups in production?
Missing externalRequestId (retries inflate totals), inconsistent endpointTag naming, and mixed demo/test traffic that corrupts alert baselines. Fix those three and the rest becomes much easier.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.