All corrections
X April 8, 2026 at 10:53 PM

x.com/bindureddy/status/2042001592027877708

1 correction found

1
Claim
On SWE-BENCH pro it score 99.99
Correction

That exact SWE-Bench Pro score is not possible. SWE-Bench Pro has 1,865 problems, so reported Pass@1 scores move in steps of about 0.0536 percentage points; 99.99% cannot be produced from that benchmark.

Full reasoning

The exact figure 99.99 is incompatible with how SWE-Bench Pro is defined and scored.

  • The official SWE-Bench Pro paper says the benchmark contains 1,865 problems.
  • Scale's official page describes model performance on SWE-Bench Pro as Pass@1, i.e. the percentage of benchmark problems solved.

Because the benchmark has 1,865 discrete problems, the score can only change in increments of 100 / 1865 ≈ 0.0536 percentage points per problem. That means an exact 99.99% score is not attainable.

Near the top end, the only possible outcomes are:

  • 1864 / 1865 = 99.946...%, which rounds to 99.95%
  • 1865 / 1865 = 100.00%

So the post's quoted SWE-Bench Pro number is not just unlikely; it is mathematically inconsistent with the benchmark's published size and scoring format.

As extra context, Scale's published evaluation page says the best score in its unified evaluation is 23.3%, far below 99.99%, though the key correction here is the stronger one: 99.99 is not a valid attainable SWE-Bench Pro Pass@1 score at all.

2 sources
Model: OPENAI_GPT_5 Prompt: v1.16.0