---
type: "synthesis"
spans: ["s24", "s25", "s26", "s28"]
tags: ["arc", "failure-modes", "contrarian"]
id: "arc-success-at-wrong-metric-generalized"
sources: ["cross-day"]
---
# Success at the Wrong Metric — Generalized Across the Series

[[contrarian-success-is-failure]] from S24 ("AI succeeding at the wrong metric is *worse* than AI failing") is treated as a Klarna-specific insight in its home vault. In fact, it is **the unifying failure-mode pattern across all four days**. Each video diagnoses a different version of the same disease.

## Four canonical instances

| Domain | What "succeeded" | At what wrong metric | True cost |
|---|---|---|---|
| Enterprise AI (S24) | Klarna AI customer service | Cost, speed | Customer LTV, brand. See [[claim-klarna-intent-failure]], [[quote-klarna-ceo-quality]]. |
| Individual builders (S25) | [[concept-vibe-coding|Vibe coding]] velocity | Lines of code shipped | [[concept-experiential-debt]], [[concept-archaeological-programming]]. See [[claim-vibe-coding-debt]]. |
| Model evaluation (S26) | Public benchmark scores | Easy, clean tasks | Real-world *carrying* capacity. See [[claim-public-benchmarks-flatten]], [[concept-can-it-carry]]. |
| Startup strategy (S28) | Thin-wrapper UI velocity | App-builder vanity metrics (DAU, ARR growth) | No moat, structural decay. See [[claim-thin-wrappers-dead]], [[claim-curation-scarcest-resource]]. |

## The shared mechanism

In every case the same three-step trap appears:

1. A **proxy metric** is chosen because it is measurable, public, or easy.
2. AI optimizes the proxy efficiently.
3. The proxy is *not* the true objective. Optimization scales the misalignment.

What makes the trap dangerous in all four cases is *speed*: AI optimization is fast enough that the misaligned system gets *expanded* before the misalignment is detected. Klarna scaled the AI before realizing customer satisfaction had decayed. Vibe coders ship features before the experiential debt surfaces. Benchmark-leading models get adopted before "carry" failures appear in production. Thin-wrapper startups raise rounds before the no-moat reality bites.

## The shared antidote

The antidote across all four is also identical: **encode the true objective in the system itself, not just the human's head**.

- S24: [[concept-machine-readable-okrs]] + [[action-translate-okrs]] — make the real objective machine-readable.
- S25: [[concept-temporal-separation]] + [[action-reflect-mode]] — schedule judgment time outside the optimization loop.
- S26: [[framework-private-bench-suite]] — build adversarial private evals that measure the real objective.
- S28: [[framework-strategic-litmus-test]] — apply the 10× test against the *real* moat, not the proxy.

This is why [[arc-private-judgment-thread]] matters: only judgment that is private, deliberate, and non-commoditized can resist proxy-metric capture.

See also [[arc-debt-and-decay-patterns]] for the specific debt forms this failure mode produces.