Substack March 6, 2026 at 01:53 AM

i-underestimated-ai-capabilities

1 correction found

Claim

the measured time horizon for Opus 4.5 on TH 1.1 is ~5h20min.

Correction

METR’s published TH 1.1 raw results put Claude Opus 4.5’s 50% time horizon at about 4h53m, not ~5h20m.

Full reasoning

METR publishes the raw results for Time Horizon 1.1 as a YAML file linked from its time-horizons page.

In METR’s TH 1.1 raw results file, the entry claude_opus_4_5_inspect lists p50_horizon_length: estimate: 292.994594.
The same file lists claude_opus_4_6_inspect with p50_horizon_length: estimate: 718.80683, which corresponds to ≈12 hours (718.8 minutes ÷ 60 ≈ 11.98 hours). This cross-check strongly indicates the raw p50_horizon_length values are in minutes.

So for Opus 4.5:

That contradicts the post’s statement that the measured TH 1.1 time horizon for Opus 4.5 is ~5h20m (≈5.33 hours).

2 sources

METR — TH 1.1 raw results (benchmark_results_1_1.yaml)
... claude_opus_4_5_inspect: ... p50_horizon_length: ... estimate: 292.994594 ...
METR — Task-Completion Time Horizons of Frontier AI Models
Time Horizon 1.1 (Current) ... Raw data available here ... (page last updated March 3, 2026).

Model: OPENAI_GPT_5 Prompt: v1.6.0