www.planned-obsolescence.org/p/i-underestimated-ai-capabilities
1 correction found
the measured time horizon for Opus 4.5 on TH 1.1 is ~5h20min.
This uses METR’s older TH1.1 figure. After METR’s March 3, 2026 correction, the public TH1.1 estimate for Claude Opus 4.5 was about 293 minutes (~4h53m), not ~5h20m.
Full reasoning
METR’s initial TH1.1 announcement on January 29, 2026 listed Claude Opus 4.5 at 320 on TH1.1, which corresponds to about 5h20m. But METR later says its live time-horizons page was updated on March 3, 2026 and that it "Corrected a regularization mistake that affected our measurements."
In METR’s updated public TH1.1 raw data, the Opus 4.5 entry is:
claude_opus_4_5_inspectp50_horizon_lengthestimate: 292.994594
That is about 293 minutes, or roughly 4h53m, not 5h20m.
So the article’s wording is using a superseded pre-correction TH1.1 number. Because this post was published on March 5, 2026, after METR’s March 3 correction, the accurate TH1.1 figure at publication time was ~4h53m.
3 sources
- Task-Completion Time Horizons of Frontier AI Models - METR
LAST UPDATED March 3, 2026 ... Updates March 3rd, 2026: Corrected a regularization mistake that affected our measurements.
- benchmark_results_1_1.yaml
claude_opus_4_5_inspect: ... p50_horizon_length: ... estimate: 292.994594 ... release_date: 2025-11-24
- Time Horizon 1.1 - METR
Changes to Model Horizon Estimates ... Claude Opus 4.5 289 [110,1268] 320 [170,729] +11%