x.com/Dorialexander/status/2041817479488389324
1 correction found
there is in all likelihood no model deployed with more active parameters than 2020 GPT-3.
This is contradicted by Meta’s own documentation: Meta AI on WhatsApp and meta.ai can use Llama 405B, and Meta’s Llama 3 paper says that model is a dense 405B-parameter transformer. Because GPT-3 had 175B parameters, a deployed dense 405B model has more active parameters than 2020 GPT-3.
Full reasoning
OpenAI’s original GPT-3 paper states that GPT-3 was trained as a 175 billion parameter autoregressive language model.
Meta then announced that Meta AI on WhatsApp and meta.ai can use Llama 405B: “You now have the option to use our largest and most advanced open-source model inside of Meta AI on WhatsApp and meta.ai. Llama 405B…” That means the model is not merely released for download; it is deployed inside a live consumer product.
Meta’s Llama 3 technical paper further states: “Our largest model is a dense Transformer with 405B parameters.” For a dense model, all parameters are active during inference, so its active parameter count is 405B.
That gives a direct counterexample to the post’s claim: a deployed model (Llama 3.1 405B in Meta AI) has 405B active parameters, which is well above GPT-3’s 175B. Therefore the statement that there is “in all likelihood no model deployed with more active parameters than 2020 GPT-3” is incorrect.
3 sources
- Language Models are Few-Shot Learners
Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters...
- Meta AI is Now Multilingual, More Creative and Smarter
You now have the option to use our largest and most advanced open-source model inside of Meta AI on WhatsApp and meta.ai. Llama 405B's improved reasoning capabilities...
- The Llama 3 Herd of Models
Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens.