Opsmeter.io logo
Opsmeter.io
AI Cost & Inference Control
Integration docs

Implementation guide for AI cost control

Ship telemetry fast, keep attribution stable, and move from ingest to governance without changing your app network path.

Recommended path

Start with direct ingest, validate one stable payload shape, then harden budgets, retries, and incident workflows.

What this page answers
  • How to send the first telemetry payload with a stable schema.
  • Which fields unlock attribution, prompt impact, and budget workflows.
  • What to validate before moving from demo setup to production.
QuickstartShip a first payload and validate required fields in minutes.Open guide
No-proxy telemetryKeep your app path unchanged while preserving request-level attribution.Review flow
OperationsSet budget posture, retry rules, and escalation checks for production.Harden runtime
Updated for 2026API v1 Read cost playbook npmPyPIGitHub

Implementation rhythm

5 minfirst ingest path
No proxyapp path unchanged
One schemaacross providers

Use quickstart for initial payloads, then operations docs to harden production workflows.

Quickstart

No SDK required today.

Direct ingest API is production-ready and supports all core workflows. SDK wrappers are optional convenience layers. Current package links and provider support live in the package section below.

Goal: find what caused your AI bill in the first 60-120 seconds after signup.

01

Generate a workspace API key

After sign-in, rotate a new key in Settings → API keys. Keys are shown only once. Then continue with a ready payload and send your first ingest call.

02

Send telemetry

Post LLM request metadata to the ingest endpoint after each model call. Include externalRequestId on every request for idempotency. Treat it like a unique, per-request key you generate on your side.

03

Track budgets

Use the Dashboard to monitor spend, latency, and budget posture. Basic budget alerts are available on Starter and above plans; advanced alerts are available on Pro and above plans. Ingest responses include explicit status fields such as reason, telemetryPaused, providerCallsContinue, isPlanLimitReached, plus legacy budget flags (budgetWarning, budgetExceeded) for backward compatibility. See Limits & budgets and n8n integration for branching examples.

First-ingest checklist: API key ready, payload sent, response ok=true, row visible in Dashboard.

After first traffic, open Settings → First-ingest checklist to validate endpointTag, promptVersion, identity, unknown models, usage fields, and data-mode alignment.

Must-have fields: externalRequestId, provider/model, token usage source. If plan limit is reached, ingest returns 402 with reason=plan_limit_reached and telemetryPaused=true (telemetry pause only).

Pro and above plans include Tracking quality score for ongoing telemetry health.

Need copy-paste snippets? Open Integration examples (docs) or use the GitHub examples repo and send the sample payload first.

Plans & limits

Plan limits apply to telemetry ingest only. If the request limit is reached, telemetry is paused and provider calls continue.

PlanRequests / moAlertsExportFilters / KPI
Free10kNo alert deliveryNoneBasic only
Starter100kBasic email alertsCSVNo advanced filters / No prompt KPI
Pro500kEmail + webhookCSV + JSONAdvanced filters + prompt KPI
Team2MEmail + webhookCSV + JSONAdvanced + multi-workspace + RBAC
EnterpriseCustomCustom policyCustomCustom governance controls

Feature highlights by plan: Starter adds Investigate Spike, Alerts Inbox, and savings opportunities. Pro adds Prompt Impact compare, webhook delivery, and JSON export. Team adds feature analysis, tenant profitability, custom date-range board pack export, and weekly executive report.

Prod-ready quickstart

Move from hello-world telemetry to production-grade instrumentation with clear guardrails.

01

externalRequestId playbook

Generate one ID per LLM call and reuse it on retries. Store it in request context (middleware/local variable) so the same ID flows through every retry.

const ctx = { externalRequestId: existingId ?? null };
const externalRequestId = ctx.externalRequestId ?? crypto.randomUUID();
ctx.externalRequestId = externalRequestId;

for (let attempt = 0; attempt < 3; attempt++) {
  try {
    return await llmCall();
  } catch (err) {
    if (attempt === 2) throw err;
  }
}
// telemetry uses the same externalRequestId from ctx
DoDon't
Generate once per LLM call, reuse on retry.Create a new ID on every retry.
Pass the same ID through all telemetry fields.Use timestamps alone as the ID.
Store it in request context for downstream access.Recompute the ID in each layer.
02

endpointTag examples

Tag = product feature, not endpoint path. Example tags: checkout.ai_summary, support.reply, invoice.extract.

03

promptVersion strategy

Treat prompt versions like deploy labels: summarizer_v3, chat-v5.2. Rule: new deploy = new version.

04

userId optional + PII warning

userId is optional. If omitted, requests group into unknown. Never send PII; hash identifiers if you need stable user grouping.

05

Latency measurement

Capture a timestamp before the LLM call and compute latency after it finishes.

const start = Date.now();
const latencyMs = Date.now() - start;
06

Token sources

  • Provider response usage fields.
  • Approximate with a tokenizer library.
07

Telemetry should never break prod (Golden Rule)

  • Non-blocking calls only.
  • Timeouts: 300-800ms.
  • Try/catch and swallow telemetry errors.
  • Fire-and-forget ingestion.
  • On 429, show/log: Telemetry throttled, retry after Xs.
08

402 Telemetry Paused handling

402 = telemetry pause. LLM calls continue. Do not retry telemetry immediately. Pause ingestion for X minutes (for example 10-15), show a UI banner, and surface an upgrade CTA.

09

Batching / queue (high volume)

For high throughput, queue and batch telemetry. Opsmeter.io ingestion can be async and should not sit on the request path.

SDK packages

Official packages

SDK wrappers are optional. Node and Python packages are available today. The .NET package is not published yet.

NuGet

Opsmeter.io .NET SDK with automatic telemetry capture.

Coming soon

npm

Node.js SDK with no-proxy telemetry capture.

Python

FastAPI middleware and typed client for ingestion.