---
id: "claim-caching-discount"
type: "claim"
source_timestamps: ["00:18:00"]
tags: ["pricing", "api-features"]
related: ["concept-prompt-caching", "action-implement-caching"]
speakers: ["Nate B. Jones"]
confidence: "high"
testable: true
sources: ["s45-claude-limit-chatgpt-habit"]
sourceVaultSlug: "s45-claude-limit-chatgpt-habit"
originDay: 45
---
# Prompt Caching Provides a 90% Discount on Stable Context

## Claim
Using API-level prompt caching for stable, repeated context (system prompts, tool schemas, reference docs) reduces those input-token costs by **90%** — e.g., $5.00/M → $0.50/M.

## What's Cached
- System prompts / persona instructions
- Tool definitions and schemas
- Static reference documents (API docs, codebases, manuals)

Detailed mechanics in [[concept-prompt-caching]].

## Validation Status (from enrichment overlay)
**Fully validated for the Anthropic API.** Claude's prompt caching (launched 2024) offers exactly the 90% discount on cached input tokens — e.g., Sonnet's $3.75/M input drops to $0.375/M for cache reads. OpenAI offers similar via Batch API but less directly. **Caveats**: Gemini and Mistral lacked native caching equivalents as of 2026; Helicone.ai analyses confirm 75–90% savings range.

## Confidence
**High**, directly verifiable via published API pricing.

## Why It Matters
For any production agent with stable context, *not* implementing caching is described as a severe architectural error — and one of the audited items in [[framework-stupid-button-audit]] and [[framework-kiss-commands]].

## Linked Action
[[action-implement-caching]]
