www.lesswrong.com/posts/Ge55vxEmKXunFFwoe/reward-hacking-behavior-can-generalize...
1 correction found
We fine-tune gpt-3.5-0613-turbo through the OpenAI API using default hyperparameters on approximately 2000 examples of prompt/scratchpad completions.
The model name here is reversed. OpenAI’s model ID is `gpt-3.5-turbo-0613`, not `gpt-3.5-0613-turbo`.
Full reasoning
This sentence uses an incorrect OpenAI model identifier.
OpenAI’s official naming for the June 2023 GPT‑3.5 Turbo snapshot is gpt-3.5-turbo-0613. The post itself uses that spelling elsewhere (for example in the experiment settings sections), but this sentence says gpt-3.5-0613-turbo, which is the components in the wrong order.
That matters because model IDs are exact strings in the OpenAI API; gpt-3.5-0613-turbo is not the documented model name.
2 sources
- Function calling and other API updates | OpenAI
OpenAI’s June 2023 update lists the model as `gpt-3.5-turbo-0613`: “`gpt-3.5-turbo-0613` includes the same function calling as GPT‑4...”
- Reward hacking behavior can generalize across tasks - LessWrong
Elsewhere in the same post, the authors refer to the model as `gpt-3.5-turbo-0613`, e.g. “All expert iteration experiments in this report are done on gpt-3.5-turbo-0613...” and “All fine-tuning in this report is done on gpt-3.5-turbo-0613...”