All corrections
Substack March 6, 2026 at 01:53 AM

www.planned-obsolescence.org/p/i-underestimated-ai-capabilities

1 correction found

1
Claim
the measured time horizon for Opus 4.5 on TH 1.1 is ~5h20min.
Correction

METR’s published TH 1.1 raw results put Claude Opus 4.5’s 50% time horizon at about 4h53m, not ~5h20m.

Full reasoning

METR publishes the raw results for Time Horizon 1.1 as a YAML file linked from its time-horizons page.

  • In METR’s TH 1.1 raw results file, the entry claude_opus_4_5_inspect lists p50_horizon_length: estimate: 292.994594.
  • The same file lists claude_opus_4_6_inspect with p50_horizon_length: estimate: 718.80683, which corresponds to ≈12 hours (718.8 minutes ÷ 60 ≈ 11.98 hours). This cross-check strongly indicates the raw p50_horizon_length values are in minutes.

So for Opus 4.5:

  • 292.994594 minutes ÷ 60 ≈ 4.88 hours, i.e. about 4h53m.

That contradicts the post’s statement that the measured TH 1.1 time horizon for Opus 4.5 is ~5h20m (≈5.33 hours).

2 sources
Model: OPENAI_GPT_5 Prompt: v1.6.0