LessWrong June 2, 2026 at 07:19 PM

dissolving-the-deep-learning-sample-ef...

1 correction found

Claim

prior VPT and behavioral-cloning baselines which used 100x more data.

Correction

The 100×-more-data comparison applies to OpenAI’s VPT baseline, not to the behavioral-cloning baselines in Dreamer 4. The paper says the BC baselines use the same contractor dataset, while only VPT used the much larger annotated-YouTube dataset.

Full reasoning

In the Dreamer 4 paper, the 100× less data claim is made specifically against OpenAI’s VPT offline agent, not against the paper’s behavioral-cloning baselines.

The paper’s Figure 3 caption says: “All methods have access to the same contractor dataset” and then separately says “Dreamer 4 substantially outperforms OpenAI’s VPT offline agent while using 100× less data.”

The methods section then distinguishes the baselines:

VPT (finetuned) is described as trained on 270K hours of synthetically annotated YouTube gameplay videos.
BC (notask) is described as training directly and only on the relevant subset of the contractor actions.
BC is described as trained on the same filtered contractor dataset as BC (notask).

So the article overgeneralizes the 100×-more-data comparison from VPT to the behavioral-cloning baselines. The Dreamer 4 paper supports saying that Dreamer 4 beat VPT while using 100× less data, but not that the behavioral-cloning baselines used 100× more data.

1 source

Training Agents Inside of Scalable World Models
Figure 3: “All methods have access to the same contractor dataset”… “Dreamer 4 substantially outperforms OpenAI’s VPT offline agent while using 100× less data.” Later, the paper defines BC baselines: “BC (notask)… trains directly and only on the relevant subset of the contractor actions” and “BC… is trained on the same filtered contractor dataset as BC (notask).”

Model: OPENAI_GPT_5 Prompt: v1.16.0