Pillar

PillarBOFU profile

Per-tenant LLM margin operating model for AI SaaS

Tenant profitability connects telemetry to finance decisions. This pillar standardizes per-tenant profitability workflows.

Published: 2026-02-24Updated: 2026-02-26

PillarTenantsMargin

What this guide answers

What category of cost or governance problem this topic solves.
Which request-level signals matter most when diagnosing it.
Which follow-up guide or control workflow to apply next.

What to alert on

cost/request drift by endpointTag or promptVersion
unexpected tenant concentration in Top Users
request burst with falling success ratio
budget warning, spend-alert, and exceeded state transitions

Execution checklist

Confirm spike type: volume, token, deploy, or abuse signal.
Assign one incident owner and one communication channel.
Apply immediate containment before deep optimization.
Document the dominant endpoint, tenant, and promptVersion driver.
Convert findings into one permanent guardrail update.

What to monitor weekly

Tenant spend concentration and trend
Tenant cost per workflow or feature
Negative-margin tenant early signals
Budget drift caused by one customer segment

Use this workflow

Turn diagnosis into action

Identify the cost driver, validate it with attribution, then apply one durable control before the next billing cycle.

Apply in your workspace

Re-run this workflow on your own spend data

Follow the same path from article insight to telemetry verification, then validate with your own cost signals.

Quickstart pathSend a first payload, confirm attribution, then return here for operations context.Open quickstart

Evaluation pathPair this guide with trust proof, status, and compare surfaces during review.Open trust proof pack

Margin review cadence

Review top 10 tenants by spend and by margin delta.
Track cost per workflow for high-volume tenants.
Compare planned versus actual margin after major deploys.
Flag tenants with sustained negative unit economics.

Actions when one tenant drives risk

Apply tenant-specific budget thresholds.
Tune model tiering for non-critical flows.
Introduce pricing or quota adjustments with customer success.
Document exception policy for strategic accounts.

Why per-tenant visibility is mandatory in AI SaaS

Workspace totals can look healthy while one customer quietly destroys margin. Tenant-level attribution turns “we spent more” into “this account drove the change”.

Per-tenant LLM costs also reduce internal debate: instead of guessing, you can show the feature mix, promptVersion changes, and endpoints responsible.

Minimum telemetry to make tenant profitability real

tenantId (or stable tenant mapping) for commercial ownership
endpointTag for feature-level margin breakdown
promptVersion for deploy-linked cost drift
dataMode/environment to keep reporting clean
plan tier or segment label (so finance can interpret outcomes)

A simple margin model you can run weekly

You do not need perfect accounting to make good decisions. Start with a simple, repeatable model and improve it over time.

The goal is to identify negative-margin tenants early, understand the drivers, and choose a policy response: price, quota, routing, or product change.

Gross margin (tenant) = revenue - (LLM cost + variable infra cost estimate)
LLM cost (tenant) = sum(requestCost) grouped by tenantId
Driver view = endpointTag + promptVersion + retry ratio over the same window

Segment tenants by cost drivers (not just ARR)

High-volume / low-risk: route to cheaper models, strict output caps.
Low-volume / high-stakes: allow flagship models with tighter QA gates.
Tool-heavy workflows: monitor tool output bloat and step counts.
RAG-heavy workflows: track avgInputTokens and retrieval parameters.

Per-tenant budgets and recovery actions

Tenant budgets are an escalation mechanism. They protect shared margin and create a clear owner path for exceptions.

When a tenant hits warning/exceeded, you need a pre-decided set of actions that preserve user experience while containing spend.

Warning: notify account owner with top endpointTag and promptVersion drivers.
Contain: cap output tokens and throttle non-critical endpoints for the tenant.
Degrade: route low-risk paths to cheaper models for the exceeded tenant.
Decide: approve overrun, enforce quota, or upsell to a higher tier.
Document: one permanent policy update for the next cycle.

Pricing levers that protect margin without harming retention

The best pricing policy matches cost drivers. If cost is driven by a small set of endpoints, attach quotas and overages to those workflows.

Avoid surprise enforcement. Use clear warning thresholds, transparent quotas, and a documented path to upgrades.

Quota by feature (endpointTag) rather than only by total requests.
Overage pricing for heavy usage instead of silently absorbing cost.
Separate demo/test usage from billable reporting (dataMode).
Use plan-tier routing rules to keep costs predictable.

FAQ

Is userId required?

No. userId is optional, but recommended for tenant-level attribution. If needed, send a hashed identifier.

Where should token usage values come from?

Prefer provider usage fields first. If unavailable, use tokenizer estimates and mark uncertainty in your workflow.

How should retries be handled?

Keep the same externalRequestId for the same logical request so idempotency remains stable across retries.

Can telemetry break production flow?

It should not. Use short timeouts, catch errors, and keep telemetry asynchronous so provider calls keep running.

Evaluation resources

For security and procurement reviews, use our trust summary before final tool selection.

Open trust proof pack