All corrections
www.planned-obsolescence.org/p/i-underestimated-ai-capabilities
1 correction found
1
Claim
the measured time horizon for Opus 4.5 on TH 1.1 is ~5h20min.
Correction
METR’s published TH 1.1 raw results put Claude Opus 4.5’s 50% time horizon at about 4h53m, not ~5h20m.
Full reasoning
METR publishes the raw results for Time Horizon 1.1 as a YAML file linked from its time-horizons page.
- In METR’s TH 1.1 raw results file, the entry
claude_opus_4_5_inspectlistsp50_horizon_length: estimate: 292.994594. - The same file lists
claude_opus_4_6_inspectwithp50_horizon_length: estimate: 718.80683, which corresponds to ≈12 hours (718.8 minutes ÷ 60 ≈ 11.98 hours). This cross-check strongly indicates the rawp50_horizon_lengthvalues are in minutes.
So for Opus 4.5:
- 292.994594 minutes ÷ 60 ≈ 4.88 hours, i.e. about 4h53m.
That contradicts the post’s statement that the measured TH 1.1 time horizon for Opus 4.5 is ~5h20m (≈5.33 hours).
2 sources
- METR — TH 1.1 raw results (benchmark_results_1_1.yaml)
... claude_opus_4_5_inspect: ... p50_horizon_length: ... estimate: 292.994594 ...
- METR — Task-Completion Time Horizons of Frontier AI Models
Time Horizon 1.1 (Current) ... Raw data available here ... (page last updated March 3, 2026).