Substack March 18, 2026 at 11:35 PM

i-underestimated-ai-capabilities

1 correction found

Claim

the measured time horizon for Opus 4.5 on TH 1.1 is ~5h20min.

Correction

This uses METR’s older TH1.1 figure. After METR’s March 3, 2026 correction, the public TH1.1 estimate for Claude Opus 4.5 was about 293 minutes (~4h53m), not ~5h20m.

Full reasoning

METR’s initial TH1.1 announcement on January 29, 2026 listed Claude Opus 4.5 at 320 on TH1.1, which corresponds to about 5h20m. But METR later says its live time-horizons page was updated on March 3, 2026 and that it "Corrected a regularization mistake that affected our measurements."

In METR’s updated public TH1.1 raw data, the Opus 4.5 entry is:

claude_opus_4_5_inspect
p50_horizon_length
estimate: 292.994594

That is about 293 minutes, or roughly 4h53m, not 5h20m.

So the article’s wording is using a superseded pre-correction TH1.1 number. Because this post was published on March 5, 2026, after METR’s March 3 correction, the accurate TH1.1 figure at publication time was ~4h53m.

3 sources

Task-Completion Time Horizons of Frontier AI Models - METR
LAST UPDATED March 3, 2026 ... Updates March 3rd, 2026: Corrected a regularization mistake that affected our measurements.
benchmark_results_1_1.yaml
claude_opus_4_5_inspect: ... p50_horizon_length: ... estimate: 292.994594 ... release_date: 2025-11-24
Time Horizon 1.1 - METR
Changes to Model Horizon Estimates ... Claude Opus 4.5 289 [110,1268] 320 [170,729] +11%

Model: OPENAI_GPT_5 Prompt: v1.16.0