www.lesswrong.com/posts/GcbkprYPCjXdysLq4/text-compression-can-help-secure-model...
1 correction found
At 2 bytes per token, that's roughly 500 GB.
This is a math/unit-conversion error: 1 trillion tokens/day × 2 bytes/token ≈ 2 TB/day (≈ 2000 GB), not ~500 GB.
Full reasoning
The post first states ~1 trillion tokens per day. Multiplying by the stated 2 bytes per token gives:
- 1,000,000,000,000 tokens/day × 2 bytes/token = 2,000,000,000,000 bytes/day.
Using standard SI prefixes:
- 1 GB = 10^9 bytes and 1 TB = 10^12 bytes, so
- 2,000,000,000,000 bytes/day = 2×10^12 bytes/day = 2 TB/day = 2000 GB/day.
So the claim that this is “roughly 500 GB” is off by about a factor of 4.
Even if one instead interprets storage using binary units (GiB/TiB), the conclusion is still nowhere near 500 GB: 2×10^12 bytes ≈ 1.82 TiB ≈ 1862 GiB, still ~4× larger than 500 GB.
Because this sentence is used as the basis for the follow-on estimate (“250 GB of output tokens per day”), that downstream estimate would inherit the same ~4× undercount under the stated assumptions.
3 sources
- Terabyte (QUDT vocabulary)
Defines 1 terabyte as 10^12 bytes (1 trillion bytes) and 1000 gigabytes.
- Metric (SI) Prefixes | NIST
NIST’s SI prefix table defines tera (T) as 10^12 and giga (G) as 10^9.
- Definitions of the SI units: The binary prefixes (NIST)
NIST notes 1 GB = 10^9 B and provides examples contrasting SI vs binary multiples (e.g., 1 GiB = 2^30 B).