LessWrong April 9, 2026 at 03:39 AM

reward-hacking-behavior-can-generalize...

1 correction found

Claim

We fine-tune gpt-3.5-0613-turbo through the OpenAI API using default hyperparameters on approximately 2000 examples of prompt/scratchpad completions.

Correction

The model name here is reversed. OpenAI’s model ID is `gpt-3.5-turbo-0613`, not `gpt-3.5-0613-turbo`.

Full reasoning

This sentence uses an incorrect OpenAI model identifier.

OpenAI’s official naming for the June 2023 GPT‑3.5 Turbo snapshot is gpt-3.5-turbo-0613. The post itself uses that spelling elsewhere (for example in the experiment settings sections), but this sentence says gpt-3.5-0613-turbo, which is the components in the wrong order.

That matters because model IDs are exact strings in the OpenAI API; gpt-3.5-0613-turbo is not the documented model name.

2 sources

Function calling and other API updates | OpenAI
OpenAI’s June 2023 update lists the model as `gpt-3.5-turbo-0613`: “`gpt-3.5-turbo-0613` includes the same function calling as GPT‑4...”
Reward hacking behavior can generalize across tasks - LessWrong
Elsewhere in the same post, the authors refer to the model as `gpt-3.5-turbo-0613`, e.g. “All expert iteration experiments in this report are done on gpt-3.5-turbo-0613...” and “All fine-tuning in this report is done on gpt-3.5-turbo-0613...”

Model: OPENAI_GPT_5 Prompt: v1.16.0