Prompt regression
System prompt growth: how hidden context quietly inflates LLM spend
System prompts change less visibly than user prompts, but they often drive sustained token inflation after release cycles.
Full guide: Prompt deploy cost regressions: catch silent cost spikes
Where hidden growth appears
- Policy and style instructions appended over time.
- Safety and routing directives duplicated across layers.
- Embedded examples that never get pruned.
Containment checks
- Diff system prompt payloads per release.
- Track baseline input token deltas by promptVersion.
- Set max token guardrails for low-risk endpoints.
Measure a system prompt budget (so drift is visible)
Hidden context is expensive because it is paid on every request. Treat system prompt size as a budgeted dependency like latency or error rate.
Track the baseline inputTokens for each endpointTag and alert when it grows after a promptVersion or routing change.
- Baseline: avgInputTokens and p95 inputTokens per endpointTag.
- Change detection: compare before/after windows per promptVersion.
- Ownership: assign a maintainer for shared instruction layers.
Pruning strategies that reduce hidden context
- Remove duplicated policy blocks across layers (system + tool + router).
- Replace long examples with short templates and references.
- Move rarely-used instructions into on-demand retrieval.
- Keep a strict "prompt budget" per endpointTag and enforce it.
- Review promptVersion diffs with token deltas, not only output quality.
Deploy guardrails (keep the prompt from growing back)
- Require a promptVersion bump when shared instruction layers change.
- Gate releases on inputTokens deltas, not only output quality.
- Cap output tokens on endpoints where verbosity drift is likely.
- Move long policies into retrieval only when needed (avoid always-on context).
- Review tail outliers (p95/p99) where hidden context hurts most.
What to send (payload example)
{
"externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY",
"provider": "provider_id",
"model": "model_id",
"endpointTag": "checkout.ai_summary",
"promptVersion": "summary_v3",
"userId": "tenant_acme_hash",
"inputTokens": 540,
"outputTokens": 180,
"latencyMs": 892,
"status": "success",
"dataMode": "real",
"environment": "prod"
}Common mistakes
- Comparing totals only instead of cost/request and token deltas by promptVersion.
- Skipping long-tail outlier review (p95/p99) where regressions hide.
- Letting retrieval config drift (top-k/chunk overlap) without a token budget.
- Not capping output tokens on low-risk endpoints after a deploy.
How to verify in Opsmeter Dashboard
- Use Overview to confirm spike window and budget posture.
- Use Top Endpoints to find feature-level concentration.
- Use Top Users to find tenant-level concentration.
- Use Prompt Versions to validate deploy-linked cost drift.
Related guides
Evaluation resources
For security and procurement reviews, use our trust summary before final tool selection.