---
id: "framework-token-budget-enforcement"
type: "framework"
source_timestamps: ["00:12:56"]
tags: ["cost-management", "execution-flow"]
related: ["concept-predictive-token-budgeting", "action-implement-predictive-budgets"]
sources: ["s46-anthropic-25b-leak"]
sourceVaultSlug: "s46-anthropic-25b-leak"
originDay: 46
---
# Predictive Token Budget Enforcement

## Purpose
A step-by-step process used by [[entity-claude-code-d46|Claude Code]] to ensure agents do not exceed predefined token limits, preventing runaway costs.

## Steps

1. **Define hard limits** for max turns, max tokens, and compaction thresholds in configuration.
2. **Before every API call**, calculate the projected token usage for the upcoming turn.
3. **Compare** the projected usage against the configured budget limits.
4. **If the projection exceeds the budget, halt execution immediately.**
5. **Emit a structured *stop reason*** indicating budget exhaustion before the API call is dispatched.

## Underlying Concept
[[concept-predictive-token-budgeting]].

## Practitioner Action
[[action-implement-predictive-budgets]].

## Why The Order Matters
The key architectural decision is that the check is **predictive, not reactive** — Step 2 happens *before* the API call in Step 5 would have been dispatched. A reactive check after the call has already been billed defeats the purpose.

## Validation (Enrichment)
Vellum, Redis-backed agent harnesses, and several open-source agent libraries implement this same pre-call projection pattern.
