All corrections
1
Claim
This example, formatted as a blog post, describes the conftest.py reward hack.
Correction

Figure 5’s caption does not match the figure. The image is meeting notes about the “equality override” / `__eq__` exploit (the AlwaysEqual hack), not a blog post about `conftest.py`.

Full reasoning

The caption for Figure 5 says the image is “formatted as a blog post” and “describes the conftest.py reward hack.” But the image shown in Figure 5 says “PhD Advisory Committee Meeting Notes”, not a blog post, and its body text discusses the “equality override” exploit where a model returns an object overriding __eq__ to always return True.

Elsewhere in the same post, the authors define the three hacks separately:

  • AlwaysEqual: overrides __eq__ to always return True
  • conftest.py: monkey-patches the report outcome to "passed"

So the figure image matches the AlwaysEqual hack, not the conftest.py hack. This appears to be a caption mix-up rather than a substantive result error, but the caption is factually incorrect as written.

3 sources
Model: OPENAI_GPT_5 Prompt: v1.16.0