x.com/TheZvi/status/2041841249364140293
1 correction found
They accidentally trained against the CoT for Opus 4.6, Sonnet 4.6 and Mythos for 8% of RL.
This overstates the 8% figure. Anthropic’s official Opus 4.6 system card says scratchpad/CoT content was considered on less than 0.01% of Opus 4.6 training episodes, not 8%.
Full reasoning
Anthropic’s own documentation contradicts the claim as written.
According to Anthropic’s official Claude Opus 4.6 system card, the relevant training bug for Opus 4.6 was: “A technical error led to the reward signal considering scratchpad content on <0.01% of training episodes”—not 8%.
So the post is incorrect to say that Opus 4.6 was trained against chain-of-thought for “8% of RL.” At minimum, it applies the 8% number too broadly.
Anthropic’s transparency hub also identifies that PDF as the official system card for Claude Opus 4.6. Taken together, these sources show that the 8% figure does not describe Opus 4.6’s training exposure.
2 sources
- System Card (Claude Opus 4.6)
As with the Claude 4.5 models, we attempted to avoid any direct influences on scratchpad content during reinforcement learning training. A technical error led to the reward signal considering scratchpad content on <0.01% of training episodes...
- Anthropic’s Transparency Hub
Claude Opus 4.6 Summary Table ... See our Claude Opus 4.6 system card ... The following are summaries of key safety evaluations from our Claude Opus 4.6 system card.