Architecture

Ops guideMOFU profile

Provider routing for cost: when gateway mode makes sense

No-proxy adoption is faster, but some teams need routing control. Choose architecture based on ownership and failure tolerance.

Published: 2026-02-24Updated: 2026-02-26

ArchitectureComparisons

Full guide: LLM cost attribution: endpoint, prompt version, tenant, and user

What this guide answers

What changed in cost, cost per request, or budget posture.
Which endpoint, prompt, model, or tenant likely drove the delta.
Which validation step or control to apply next in Opsmeter.io.

What to send (payload example)

{
  "externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
  "provider": "provider_id",
  "model": "model_id",
  "endpointTag": "checkout.ai_summary",
  "promptVersion": "summary_v3",
  "userId": "tenant_acme_hash",
  "inputTokens": 540,
  "outputTokens": 180,
  "latencyMs": 892,
  "status": "success",
  "dataMode": "real",
  "environment": "prod"
}

Common mistakes

Choosing gateway mode for “future flexibility” before governance identifiers exist.
Adding routing without promptVersion tagging (config drift becomes invisible).
Letting both the gateway and the app retry aggressively (retry storms multiply cost).
Optimizing token price while latency/retry behavior worsens total cost.
Routing experiments in production without separating dataMode/environment.

How to verify in the Opsmeter.io dashboard

Validate attribution first: Top Endpoints and Prompt Versions show the same dominant drivers week to week.
If routing is introduced, confirm model mix changes are visible and tagged by promptVersion.
Compare cost/request and retry ratios before vs after routing changes on the same endpointTag.
Ensure externalRequestId stays stable so retries are not double-counted.
Separate routing experiments from production baselines with dataMode/environment.

No-proxy first path

Fast onboarding and low integration risk.
Strong attribution and budget governance workflows.
No request-path dependency on a new proxy tier.

Use this workflow

Turn diagnosis into action

Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.

Apply in your workspace

Re-run this workflow on your own spend data

Follow the same path from article insight to telemetry verification, then validate with your own cost signals.

Quickstart pathSend a first payload, confirm attribution, then return here for operations context.Open quickstart

Evaluation pathPair this guide with trust proof, status, and compare surfaces during review.Open trust proof pack

What gateway mode gives you (when you truly need it)

Centralized routing between providers/models at runtime.
Hard enforcement (block or degrade) outside app code paths.
Central key management and request policy controls.
Uniform logging and request shaping for multiple apps.

When gateway mode is justified

Runtime routing or hard enforcement is mandatory.
Multi-provider failover must be centralized.
You can support added infra and operational complexity.

Decision matrix (quick questions that usually decide it)

Do you need hard enforcement outside app code paths (hard caps, blocklists)?
Do you need centralized multi-provider failover across multiple apps?
Can you tolerate a new request-path dependency and operate it 24/7?
Do you already have stable endpointTag/promptVersion/externalRequestId tagging?
Is your top pain attribution + governance (start no-proxy) or runtime routing (gateway)?

Gateway failure modes (the hidden cost)

New latency and availability dependency in the request path.
Config drift: routing changes without promptVersion tagging become untraceable.
Retry storms can amplify cost if both proxy and app retry.
Complex incident response (is it the provider, proxy, or app?).

Migration path (reduce risk while you add routing)

You can adopt governance first and routing later. Start with no-proxy attribution so you can see where cost concentrates and which endpoints need controls.

If you later introduce a gateway, keep identifiers consistent so dashboards remain comparable.

Start no-proxy: instrument endpointTag + promptVersion and add budgets/alerts.
Prove guardrails: caps and degraded modes in app logic on the top drivers.
Add gateway only for endpoints that truly need centralized routing/enforcement.
Keep externalRequestId stable across layers to avoid double-counting retries.

Architecture choice by operational ownership

If you need attribution + budgets fast, start no-proxy and add routing later.
If you need runtime enforcement, accept gateway failure modes upfront.
Keep externalRequestId stable so retries remain attributable across layers.
Separate dataMode and environment so routing experiments do not pollute production reporting.
Define one owner for routing policy, budgets, and incident response.

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack