Architecture
Provider routing for cost: when gateway mode makes sense
No-proxy adoption is faster, but some teams need routing control. Choose architecture based on ownership and failure tolerance.
Full guide: LLM cost attribution: endpoint, prompt version, tenant, and user
No-proxy first path
- Fast onboarding and low integration risk.
- Strong attribution and budget governance workflows.
- No request-path dependency on a new proxy tier.
What gateway mode gives you (when you truly need it)
- Centralized routing between providers/models at runtime.
- Hard enforcement (block or degrade) outside app code paths.
- Central key management and request policy controls.
- Uniform logging and request shaping for multiple apps.
When gateway mode is justified
- Runtime routing or hard enforcement is mandatory.
- Multi-provider failover must be centralized.
- You can support added infra and operational complexity.
Decision matrix (quick questions that usually decide it)
- Do you need hard enforcement outside app code paths (hard caps, blocklists)?
- Do you need centralized multi-provider failover across multiple apps?
- Can you tolerate a new request-path dependency and operate it 24/7?
- Do you already have stable endpointTag/promptVersion/externalRequestId tagging?
- Is your top pain attribution + governance (start no-proxy) or runtime routing (gateway)?
Gateway failure modes (the hidden cost)
- New latency and availability dependency in the request path.
- Config drift: routing changes without promptVersion tagging become untraceable.
- Retry storms can amplify cost if both proxy and app retry.
- Complex incident response (is it the provider, proxy, or app?).
Migration path (reduce risk while you add routing)
You can adopt governance first and routing later. Start with no-proxy attribution so you can see where cost concentrates and which endpoints need controls.
If you later introduce a gateway, keep identifiers consistent so dashboards remain comparable.
- Start no-proxy: instrument endpointTag + promptVersion and add budgets/alerts.
- Prove guardrails: caps and degraded modes in app logic on the top drivers.
- Add gateway only for endpoints that truly need centralized routing/enforcement.
- Keep externalRequestId stable across layers to avoid double-counting retries.
Architecture choice by operational ownership
- If you need attribution + budgets fast, start no-proxy and add routing later.
- If you need runtime enforcement, accept gateway failure modes upfront.
- Keep externalRequestId stable so retries remain attributable across layers.
- Separate dataMode and environment so routing experiments do not pollute production reporting.
- Define one owner for routing policy, budgets, and incident response.
What to send (payload example)
{
"externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
"provider": "provider_id",
"model": "model_id",
"endpointTag": "checkout.ai_summary",
"promptVersion": "summary_v3",
"userId": "tenant_acme_hash",
"inputTokens": 540,
"outputTokens": 180,
"latencyMs": 892,
"status": "success",
"dataMode": "real",
"environment": "prod"
}Common mistakes
- Choosing gateway mode for “future flexibility” before governance identifiers exist.
- Adding routing without promptVersion tagging (config drift becomes invisible).
- Letting both the gateway and the app retry aggressively (retry storms multiply cost).
- Optimizing token price while latency/retry behavior worsens total cost.
- Routing experiments in production without separating dataMode/environment.
How to verify in Opsmeter Dashboard
- Validate attribution first: Top Endpoints and Prompt Versions show the same dominant drivers week to week.
- If routing is introduced, confirm model mix changes are visible and tagged by promptVersion.
- Compare cost/request and retry ratios before vs after routing changes on the same endpointTag.
- Ensure externalRequestId stays stable so retries are not double-counted.
- Separate routing experiments from production baselines with dataMode/environment.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.