All corrections
X April 8, 2026 at 04:26 PM

x.com/TheZvi/status/2041841249364140293

1 correction found

1
Claim
They accidentally trained against the CoT for Opus 4.6, Sonnet 4.6 and Mythos for 8% of RL.
Correction

This overstates the 8% figure. Anthropic’s official Opus 4.6 system card says scratchpad/CoT content was considered on less than 0.01% of Opus 4.6 training episodes, not 8%.

Full reasoning

Anthropic’s own documentation contradicts the claim as written.

According to Anthropic’s official Claude Opus 4.6 system card, the relevant training bug for Opus 4.6 was: “A technical error led to the reward signal considering scratchpad content on <0.01% of training episodes”—not 8%.

So the post is incorrect to say that Opus 4.6 was trained against chain-of-thought for “8% of RL.” At minimum, it applies the 8% number too broadly.

Anthropic’s transparency hub also identifies that PDF as the official system card for Claude Opus 4.6. Taken together, these sources show that the 8% figure does not describe Opus 4.6’s training exposure.

2 sources
  • System Card (Claude Opus 4.6)

    As with the Claude 4.5 models, we attempted to avoid any direct influences on scratchpad content during reinforcement learning training. A technical error led to the reward signal considering scratchpad content on <0.01% of training episodes...

  • Anthropic’s Transparency Hub

    Claude Opus 4.6 Summary Table ... See our Claude Opus 4.6 system card ... The following are summaries of key safety evaluations from our Claude Opus 4.6 system card.

Model: OPENAI_GPT_5 Prompt: v1.16.0