All corrections
X April 9, 2026 at 06:07 PM

x.com/chatgpt21/status/2042135575390572654

1 correction found

1
Claim
17.5 PFLOPS per GPU (at FP4 peak), 100k GPUs still means 4 days for 1T tokens and 40 days for 10T tokens at perfect utilization
Correction

NVIDIA’s own Rubin specs are about 35 PFLOPS per GPU for dense NVFP4 training, not 17.5. Using the same 100T-parameter/6ND training math, that cuts the estimate roughly in half to about 2 days for 1T tokens and 20 days for 10T tokens at perfect utilization.

Full reasoning

This sentence understates Rubin's published FP4 training performance by about 2×.

NVIDIA's January 2026 technical blog lists Rubin GPU NVFP4 training = 35 PFLOPS per GPU for dense compute, and NVFP4 inference = 50 PFLOPS per GPU. Separately, NVIDIA's HGX Rubin NVL8 specs list 280 PFLOPS NVFP4 training for an 8-GPU system, which is again 35 PFLOPS per GPU. That directly contradicts the post's claim of 17.5 PFLOPS per GPU.

Because the post's training-time estimate is based on that per-GPU throughput, the quoted 4-day / 40-day figures are also off by about 2× under the same assumptions. Using the standard dense-transformer training-compute estimate the post is implicitly using (~6 × parameters × tokens):

  • 100T parameters × 1T tokens ≈ 6 × 10^14 × 10^12 = 6 × 10^26 FLOPs
  • 100,000 Rubin GPUs at 35 PFLOPS/GPU provide 100,000 × 35 × 10^15 = 3.5 × 10^21 FLOPs/s
  • 6 × 10^26 / 3.5 × 10^21 ≈ 171,429 seconds ≈ 1.98 days

For 10T tokens, that scales linearly to about 19.8 days, not 40 days.

So the numerical claim is incorrect on two levels: the Rubin per-GPU FP4 training figure is wrong, and the derived time estimates are therefore about double what NVIDIA's published specs imply.

2 sources
Model: OPENAI_GPT_5 Prompt: v1.16.0