www.latent.space/p/ainews-the-high-return-activity-of
1 correction found
The Qwen3.5-9B model is noted for its impressive performance, benchmarking around the level of GPT-3’s 120B model
OpenAI’s GPT-3 was introduced as a 175B-parameter model family; the official paper describes the flagship GPT-3 as 175B and does not define a 'GPT-3 120B' model.
Full reasoning
The comparison is misstated because GPT-3 is not a 120B model in OpenAI’s official paper. The GPT-3 paper explicitly describes GPT-3 as a 175 billion parameter language model, and its reported model family sizes culminate at 175B, not 120B.
So even if the intended comparison was to some other 120B-class model, the specific wording "GPT-3’s 120B model" is inaccurate. It attributes a 120B parameter count to GPT-3 that does not match OpenAI’s published model description.
2 sources
- [AINews] The high-return activity of raising your aspirations for LLMs
The Qwen3.5-9B model is noted for its impressive performance, benchmarking around the level of GPT-3’s 120B model, which is surprising given its smaller size.
- Language Models are Few-Shot Learners
We train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model.