---
id: "prereq-training-vs-inference"
type: "prereq"
source_timestamps: ["00:03:10"]
tags: ["machine-learning", "hardware"]
related: ["concept-inference-wall", "concept-training-inference-chip-divergence"]
reason: "Understanding why Sora failed requires knowing that building a model (training) and running a model for users (inference) require vastly different compute resources and architectures."
sources: ["s17-3-model-drops"]
sourceVaultSlug: "s17-3-model-drops"
originDay: 17
---
# AI Training vs. Inference

## Why You Need This

The core argument about the [[concept-inference-wall]] depends on the technical and economic distinction between **training** and **inference**.

## The Distinction

| Aspect | Training | Inference |
|---|---|---|
| Frequency | One-time (or periodic) | Per-query, continuous |
| Cost shape | Massive upfront capex | Ongoing opex per request |
| Hardware emphasis | Raw matmul throughput, large clusters | Low latency, memory compression, per-query efficiency |
| Optimization target | Model quality | Cost-per-output and latency |

## Why It Matters For This Vault

Without this distinction, [[claim-sora-economics]] looks like a generic "product was unprofitable" story. With this distinction, it becomes the canonical case for a structural mismatch in the AI hardware stack — see [[concept-training-inference-chip-divergence]].

## Related
- [[concept-inference-wall]]
- [[concept-training-inference-chip-divergence]]
- [[quote-inference-chips]]
