---
id: "prereq-inference-costs"
type: "prerequisite"
source_timestamps: ["00:04:20", "00:07:30"]
tags: ["economics", "compute"]
related: ["concept-cloud-ai-economics", "concept-local-ai-economics"]
sources: ["s19-apple-trillion"]
sourceVaultSlug: "s19-apple-trillion"
originDay: 19
---
# Understanding of AI Inference Costs

## What You Need to Know

The speaker assumes the audience understands the difference between:

- **Fixed cost (CapEx):** the up-front cost of buying a processor (chip, GPU, neural engine)
- **Variable cost (OpEx):** the per-token marginal cost of running inference in the cloud

And, crucially:

- **Output tokens cost more than input tokens** (often ~4×) because they require sequential generation through the model
- **Long context windows are expensive** — every additional input token consumes compute through the attention mechanism
- **Reasoning models** (chain-of-thought, multi-step agents) burn far more tokens per user-visible answer than instant-response models

## Why It's Required

Without this distinction, the entire core argument collapses:

- [[concept-cloud-ai-economics]] (variable-cost, structurally unprofitable for heavy users)
- [[concept-local-ai-economics]] (fixed-cost, near-zero marginal cost after hardware purchase)
- [[claim-cloud-ai-unprofitable]] (frontier labs losing money on consumer subscriptions)
- [[concept-two-class-ai]] (throttling driven by per-query costs)

## Quick Reference

If you are skipping ahead in this vault and your background isn't in cloud-cost or AI-platform economics, read this prerequisite *first*.
