Architecture
Multi-provider strategy: cost, latency, and reliability tradeoffs
Multiple providers can improve resilience, but only when telemetry, attribution, and policy ownership stay consistent.
Full guide: LLM cost attribution: endpoint, prompt version, tenant, and user
Where multi-provider helps
- Regional reliability needs across customer segments.
- Price-performance variance by workload type.
- Vendor concentration risk mitigation.
What breaks if governance is weak
- Inconsistent model naming across providers.
- Missing endpointTag or promptVersion mapping.
- No shared budget and alert ownership model.
Minimum governance to keep costs comparable
- Normalize provider/model identifiers into one catalog view.
- Keep the same endpointTag taxonomy across providers.
- Tag promptVersion at deploy time so changes are attributable.
- Define one budget owner and escalation path per workspace.
- Reconcile monthly against provider exports to catch drift.
Routing policy patterns (simple beats clever)
Multi-provider routing becomes expensive when policy logic is unclear. Keep routing rules explainable so incidents can be debugged quickly.
Use endpointTag as the policy unit. Route different feature paths differently based on risk and business value.
- Primary/secondary: one default provider, one fallback provider for reliability incidents.
- Tiering: cheaper providers/models for low-risk endpoints, higher quality for high-stakes endpoints.
- Geo routing: keep data residency and latency constraints explicit.
- Tenant routing: premium tenants can receive higher quality routes (if pricing supports it).
Cost and reliability tradeoffs by workload
- Chatbots: optimize for latency and predictability; cap outputs and control retries.
- RAG: optimize for inputTokens; retrieval config often dominates cost more than model choice.
- Agent workflows: optimize for step count and tool output size; loops are the main multiplier.
- Batch jobs: optimize for cost per outcome; hard caps are safer than soft caps.
Failure modes to plan for (before you ship routing)
- Provider outage triggers fallbacks and retry storms (cost multiplier).
- Model identifier drift breaks pricing and creates unknown-model rows.
- Different providers return different usage fields (normalization required).
- Routing changes without promptVersion tagging become untraceable.
- Cost spikes can be caused by policy bugs, not only prompt changes.
Dashboards to keep multi-provider cost under control
- Spend and cost/request by provider and by endpointTag
- Fallback frequency and retry ratio by provider
- Unknown-model ratio (pricing coverage)
- PromptVersion regressions after routing policy deploys
- Top tenants affected by routing changes (concentration)
What to send (payload example)
{
"externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
"provider": "provider_id",
"model": "model_id",
"endpointTag": "checkout.ai_summary",
"promptVersion": "summary_v3",
"userId": "tenant_acme_hash",
"inputTokens": 540,
"outputTokens": 180,
"latencyMs": 892,
"status": "success",
"dataMode": "real",
"environment": "prod"
}Common mistakes
- Missing endpointTag or using inconsistent naming across teams.
- Not tagging promptVersion, so deploys cannot be linked to spend changes.
- Sending raw user identifiers instead of hashed mapping for privacy.
- Mixing demo/test dataMode into production operational reviews.
How to verify in Opsmeter Dashboard
- Use Overview to confirm spike window and budget posture.
- Use Top Endpoints to find feature-level concentration.
- Use Top Users to find tenant-level concentration.
- Use Prompt Versions to validate deploy-linked cost drift.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.