Use case
LLM cost per support ticket: pricing and margin guide
Support teams need cost per resolved ticket, not only request totals. This guide maps spend to resolution outcomes.
Full guide: Cost attribution by use-case: templates for real apps
Metrics that matter
- cost per resolved ticket
- cost per escalation avoided
- cost per tenant for high-volume support accounts
- promptVersion impact on token efficiency
Pick one outcome definition (so numbers are comparable)
Support workflows have multiple “wins”: first response, deflection, resolution, or agent time saved. If you do not define the outcome, cost-per-ticket metrics will drift with workflow changes.
Choose one primary outcome per support flow and keep it stable across releases so weekly trends stay meaningful.
- Resolved ticket: closed without human escalation (primary).
- Deflected ticket: user solved issue via self-serve answer.
- Escalation avoided: bot routed correctly and reduced agent workload.
- Time saved: minutes saved per handled conversation (optional).
Support endpoint taxonomy (examples that scale)
- support.reply (user-facing answer)
- support.thread_summary (long-thread summarization)
- support.ticket_draft (handoff draft for agents)
- support.intent_route (routing and triage)
- support.rag_answer (knowledge-base RAG answer)
Execution model
- Tag support endpoints separately from internal tooling.
- Track unknown user traffic and map to tenant when possible.
- Set budget thresholds before peak support windows.
- Review weekly export with support and finance leads.
How to compute cost per resolved ticket (simple, reliable)
- ticketCost = sum(cost of all requests linked to the ticket workflow)
- resolvedTicketCount = number of tickets resolved in the same window
- costPerResolvedTicket = ticketCost / resolvedTicketCount
- Include retries/fallback attempts in ticketCost (effective cost per outcome).
Pricing levers that protect support margins
- Route low-value tickets to cheaper models or templated answers first.
- Cap output tokens for auto-replies and long-thread summaries.
- Use per-tenant budgets for high-volume customers.
- Track cost per resolved ticket by promptVersion after releases.
- Create an upgrade path for heavy usage instead of silently absorbing cost.
Alerts and guardrails (prevent support from becoming a cost sink)
- Alert on cost/request drift after promptVersion deploys (support.reply, support.thread_summary).
- Alert on tokens/hour bursts for public support channels (abuse and scraping).
- Cap thread history size and summarize older turns before reinjection.
- Add per-tenant budgets for high-volume accounts and document the escalation owner.
- Review p95/p99 token outliers weekly (long threads often dominate spend).
What to alert on
- cost/request drift by endpointTag or promptVersion
- unexpected tenant concentration in Top Users
- request burst with falling success ratio
- budget warning, spend-alert, and exceeded state transitions
Execution checklist
- Confirm spike type: volume, token, deploy, or abuse signal.
- Assign one incident owner and one communication channel.
- Apply immediate containment before deep optimization.
- Document the dominant endpoint, tenant, and promptVersion driver.
- Convert findings into one permanent guardrail update.
FAQ
Should support pricing be per ticket or per token?
Per ticket is easier for customers to understand and aligns with outcomes. Internally, track both: cost per resolved ticket for packaging and cost/request + tokens/request for engineering optimization.
Why do long support threads become expensive?
Conversation history gets re-sent every turn, so inputTokens grow linearly with thread length. Summarize older turns and cap context to prevent long-tail tickets from dominating spend.
Do we need per-tenant budgets for support?
If one tenant can generate a large share of tickets, yes. Per-tenant budgets prevent a single high-volume customer from draining shared margin and make escalation ownership clear.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.