x.com/jeffreyleefunk/status/2042805247010349295
1 correction found
the severe zero-day reports rely on just 198 manual reviews
The '198 manual reviews' figure is not the total human review behind Anthropic's zero-day reports. Anthropic says it manually validates every bug report before sending it to maintainers; 198 was only the sample used to compare human severity ratings with the model's ratings.
Full reasoning
Anthropic's April 7, 2026 Mythos technical writeup says two different things:
- All bug reports sent out are manually validated. In the section "And several thousand more," Anthropic writes that it has hired professional security contractors to assist "by manually validating every bug report before we send it out."
- The number 198 refers only to a severity-agreement sample. In that same section, Anthropic says that in "198 manually reviewed vulnerability reports," human validators agreed with Claude's severity rating 89% of the time exactly and 98% within one severity level.
So the post's wording is misleading: it treats 198 as though it were the full human-review basis for Anthropic's severe zero-day reporting, when Anthropic explicitly describes it as a subset used to measure severity-rating agreement, while separately stating that every report sent to maintainers is manually validated.
Anthropic's coordinated disclosure policy from March 6, 2026 is consistent with this: it says every report they send generally reflects a finding that a human security researcher has reviewed and confirmed.
Because of that, the claim that the severe zero-day reports "rely on just 198 manual reviews" is inaccurate.
2 sources
- Claude Mythos Preview | red.anthropic.com
We have contracted a number of professional security contractors to assist in our disclosure process by manually validating every bug report before we send it out... in 89% of the 198 manually reviewed vulnerability reports, our expert contractors agreed with Claude’s severity assessment exactly, and 98% of the assessments were within one severity level.
- Coordinated vulnerability disclosure for Claude-discovered vulnerabilities | Anthropic
Every report we send generally reflects a finding that a human security researcher has reviewed and confirmed.