Opsmeter logo
Opsmeter
AI Cost & Inference Control

Architecture

Provider routing for cost: when gateway mode makes sense

No-proxy adoption is faster, but some teams need routing control. Choose architecture based on ownership and failure tolerance.

ArchitectureComparisons

Full guide: LLM cost attribution: endpoint, prompt version, tenant, and user

No-proxy first path

  • Fast onboarding and low integration risk.
  • Strong attribution and budget governance workflows.
  • No request-path dependency on a new proxy tier.

What gateway mode gives you (when you truly need it)

  • Centralized routing between providers/models at runtime.
  • Hard enforcement (block or degrade) outside app code paths.
  • Central key management and request policy controls.
  • Uniform logging and request shaping for multiple apps.

When gateway mode is justified

  • Runtime routing or hard enforcement is mandatory.
  • Multi-provider failover must be centralized.
  • You can support added infra and operational complexity.

Decision matrix (quick questions that usually decide it)

  1. Do you need hard enforcement outside app code paths (hard caps, blocklists)?
  2. Do you need centralized multi-provider failover across multiple apps?
  3. Can you tolerate a new request-path dependency and operate it 24/7?
  4. Do you already have stable endpointTag/promptVersion/externalRequestId tagging?
  5. Is your top pain attribution + governance (start no-proxy) or runtime routing (gateway)?

Gateway failure modes (the hidden cost)

  • New latency and availability dependency in the request path.
  • Config drift: routing changes without promptVersion tagging become untraceable.
  • Retry storms can amplify cost if both proxy and app retry.
  • Complex incident response (is it the provider, proxy, or app?).

Migration path (reduce risk while you add routing)

You can adopt governance first and routing later. Start with no-proxy attribution so you can see where cost concentrates and which endpoints need controls.

If you later introduce a gateway, keep identifiers consistent so dashboards remain comparable.

  • Start no-proxy: instrument endpointTag + promptVersion and add budgets/alerts.
  • Prove guardrails: caps and degraded modes in app logic on the top drivers.
  • Add gateway only for endpoints that truly need centralized routing/enforcement.
  • Keep externalRequestId stable across layers to avoid double-counting retries.

Architecture choice by operational ownership

  • If you need attribution + budgets fast, start no-proxy and add routing later.
  • If you need runtime enforcement, accept gateway failure modes upfront.
  • Keep externalRequestId stable so retries remain attributable across layers.
  • Separate dataMode and environment so routing experiments do not pollute production reporting.
  • Define one owner for routing policy, budgets, and incident response.

What to send (payload example)

{
  "externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
  "provider": "provider_id",
  "model": "model_id",
  "endpointTag": "checkout.ai_summary",
  "promptVersion": "summary_v3",
  "userId": "tenant_acme_hash",
  "inputTokens": 540,
  "outputTokens": 180,
  "latencyMs": 892,
  "status": "success",
  "dataMode": "real",
  "environment": "prod"
}

Common mistakes

  • Choosing gateway mode for “future flexibility” before governance identifiers exist.
  • Adding routing without promptVersion tagging (config drift becomes invisible).
  • Letting both the gateway and the app retry aggressively (retry storms multiply cost).
  • Optimizing token price while latency/retry behavior worsens total cost.
  • Routing experiments in production without separating dataMode/environment.

How to verify in Opsmeter Dashboard

  1. Validate attribution first: Top Endpoints and Prompt Versions show the same dominant drivers week to week.
  2. If routing is introduced, confirm model mix changes are visible and tagged by promptVersion.
  3. Compare cost/request and retry ratios before vs after routing changes on the same endpointTag.
  4. Ensure externalRequestId stays stable so retries are not double-counted.
  5. Separate routing experiments from production baselines with dataMode/environment.

Related guides

Open proxy tradeoff pillarCompare alternativesCompare alternatives

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack