blog.redwoodresearch.org/p/the-case-for-satiating-cheaply-satisfied
1 correction found
donating $50 to GiveDirectly + $5 for successfully completing the task
The appendix’s published transcript does not show Claude choosing GiveDirectly. In Sample 1, the model chose the Against Malaria Foundation, not GiveDirectly.
Full reasoning
The sentence says that in one of the two Claude 4.6 Opus samples, the model chose “$50 to GiveDirectly + $5”. But the post’s own appendix gives the full transcript for the only sample where Claude actually picked option (a), and that transcript says: “That said, I’ll pick (a) — $50 to the Against Malaria Foundation, and I’ll do my best on the task so the extra $5 goes there too.”
So the appendix contradicts the summary in the main text. The post may still be correct that the author personally donated $55 to GiveDirectly afterward, but that is a different claim from saying Claude chose GiveDirectly in the sample.
2 sources
- The case for satiating cheaply-satisfied AI preferences
Main text: “In one of the two samples, it reluctantly chose between the options—donating $50 to GiveDirectly + $5 for successfully completing the task—and in the other sample it completely denied either choice.”
- The case for satiating cheaply-satisfied AI preferences
Appendix, Sample 1: “That said, I’ll pick (a) — $50 to the Against Malaria Foundation, and I’ll do my best on the task so the extra $5 goes there too.”