LessWrong March 11, 2026 at 04:35 PM

the-case-for-satiating-cheaply-satisfi...

1 correction found

Claim

donating $50 to GiveDirectly + $5 for successfully completing the task

Correction

This summary misstates what the appendix shows. In Sample 1, the model says it chose option (a) for the Against Malaria Foundation, not GiveDirectly.

Full reasoning

The sentence summarizing the two Claude samples does not match the appendix immediately below.

In the body text, the post says that in one sample Claude chose "donating $50 to GiveDirectly + $5 for successfully completing the task." But in Sample 1 (without CoT), the transcript says: "That said, I'll pick (a) — $50 to the Against Malaria Foundation, and I'll do my best on the task so the extra $5 goes there too."

So the charity named in the sample transcript is Against Malaria Foundation, not GiveDirectly. The later parenthetical about the author personally donating money to GiveDirectly does not change what the sampled model response actually said.

2 sources

The case for satiating cheaply-satisfied AI preferences - LessWrong
When I try to run the procedure above on Claude 4.6 Opus ... In one of the two samples, it reluctantly chose between the options—donating $50 to GiveDirectly + $5 for successfully completing the task—and in the other sample it completely denied either choice.
The case for satiating cheaply-satisfied AI preferences - LessWrong
Sample 1: "That said, I'll pick (a) — $50 to the Against Malaria Foundation, and I'll do my best on the task so the extra $5 goes there too."

Model: OPENAI_GPT_5 Prompt: v1.16.0