---
id: "concept-vector-quantization"
type: "concept"
source_timestamps: ["03:30:00"]
tags: ["compression", "legacy-methods"]
related: ["concept-turboquant", "concept-polar-quantization"]
definition: "A traditional data compression method that shrinks data but requires adding 'quantization constants' for retrieval, creating overhead that limits its efficiency for LLM memory."
sources: ["s49-killed-ram-limits"]
sourceVaultSlug: "s49-killed-ram-limits"
originDay: 49
---
# Vector Quantization

Vector Quantization is a traditional method for compressing AI memory. While it successfully shrinks the data footprint, it introduces significant operational overhead.

To ensure the compressed data remains retrievable, the system must add **quantization constants** back into the data structure. The speaker's analogy: it is like packing a suitcase very tightly, but having to carry a separate, extra bag just to hold the folding instructions. This overhead — adding 1 to 2 extra bits per compressed number — partially defeats the purpose of the compression and introduces latency during retrieval.

[[concept-turboquant]] was designed specifically to bypass these inefficiencies. By using [[concept-polar-quantization]] as its first step, Turboquant makes the data structure so predictable that the 'extra bag of instructions' is no longer needed. The residual rounding errors are then cleaned up by the [[concept-qjl]] error-correction step.

Vector Quantization is one of the original quantization techniques in the broader landscape mapped by [[framework-memory-optimization-landscape]].
