The Stockfish ELO problem

Graham Banks · Post by **Graham Banks** » Mon Aug 08, 2022 2:40 am

RubiChess wrote: ↑Sun Aug 07, 2022 10:02 pm
Sopel wrote: ↑Sun Aug 07, 2022 9:43 pm
Rebel wrote: ↑Sun Aug 07, 2022 8:21 pm
Sopel wrote: ↑Sun Aug 07, 2022 2:49 pm
Rebel wrote: ↑Sat Aug 06, 2022 10:04 pm Some remarks
1. Komodo scales extremely well (+56,+61,+59).
2. SF15 went down from +82 to +26 (last SF run, equals 40m/26m about CCRL 40/15 | CEGT 40/20 at 2CPU).
3. SF13 went up from -100 to -19.
4. Draw rate last SF run 91.7% but SF15 never lost a game.
That's very good news. An oracle engine doesn't scale, so Stockfish is closer to a perfect engine compared to komodo.

Meaning at increasing time control and more threads Komodo can catch up and overtake you? Oh wait, it already happened
You can come up at any result with flawed enough methodology. This has the same issues as CCRL.
The main issue in this rating list seems that SF15/4threads wasn't tested, only 14.1. At least SF15/4CPU is not mentioned in http://www.cegt.net/40_40%20Rating%20Li ... liste.html

But as this list also uses moves/time control, I want to mention this https://github.com/official-stockfish/S ... ssues/4000 again.

Regards, Andreas

http://www.computerchess.org.uk/ccrl/40 ... ons_only=1

jkominek · Post by **jkominek** » Wed Aug 24, 2022 8:09 am

CCRL and CEGT rely on normal openings, TCEC does not. Other examples using unusual openings are the lists of Stefan Pohl and my own GRL. All 3 don't show the CCRL / CEGT pattern and have no problem to show significant elo progress, example. Unusual openings favor Stockfish search.

A question for Ed. I downloaded the games played from your Gambit Rating List competition (mainbase-40-2.pgn, dated Dec 15 2021) and processed it to extract book moves. I find 92 unique lines. Since you typically play 200 games per encounter, either your book is under-specified and these matches play 8 duplicate pairs, or I've done something wrong and am missing 8 lines. To cross-check my analysis I looked around for the book on your Rebel web site but could not spot it. Do you have a pgn of your Gambit book posted online?

By my counting grl-20-cores.pgn contains 91 unique opening lines, none novel to the main file.

Rebel · Post by **Rebel** » Wed Aug 24, 2022 9:04 am

jkominek wrote: ↑Wed Aug 24, 2022 8:09 am
CCRL and CEGT rely on normal openings, TCEC does not. Other examples using unusual openings are the lists of Stefan Pohl and my own GRL. All 3 don't show the CCRL / CEGT pattern and have no problem to show significant elo progress, example. Unusual openings favor Stockfish search.
A question for Ed. I downloaded the games played from your Gambit Rating List competition (mainbase-40-2.pgn, dated Dec 15 2021) and processed it to extract book moves. I find 92 unique lines. Since you typically play 200 games per encounter, either your book is under-specified and these matches play 8 duplicate pairs, or I've done something wrong and am missing 8 lines. To cross-check my analysis I looked around for the book on your Rebel web site but could not spot it. Do you have a pgn of your Gambit book posted online?

By my counting grl-20-cores.pgn contains 91 unique opening lines, none novel to the main file.

http://rebel13.nl/gambits-100.pgn

My double check PGN util says : 0 doubles.

jkominek · Post by **jkominek** » Wed Aug 24, 2022 9:56 am

http://rebel13.nl/gambits-100.pgn

My double check PGN util says : 0 doubles.

Thank you very much Ed!

It could be that your PGN utility is configured to look for exact doubles, including Event, Site and Date fields. This would be a problem if you pulled the lines from multiple sources. One example of a duplicate pair is Round 50 and Round 68:

Code: Select all

[Event "YAT"]
[Site "Deventer"]
[Date "2021.04.21"]
[Round "50"]
[White ""]
[Black ""]
[Result "*"]
[BlackElo ""]
[WhiteElo ""]

1. d4 d5 2. c4 c6 3. Nf3 Nf6 4. Nc3 e6 5. Bg5 dxc4 6. e4 b5 7. e5 h6 8. 
Bh4 g5 9. Nxg5 hxg5 10. Bxg5 Nbd7 11. exf6 Bb7 12. g3 Qb6 13. Bg2 O-O-O 
14. O-O c5 15. d5 b4 16. Rb1 *

[Event "Noomen Twenty Gambits 2016"]
[Site "Netherlands"]
[Date "2021.00.49"]
[Round "68"]
[White "Semi-Slav"]
[Black "Botwinnik variation"]
[Result "*"]
[BlackElo ""]
[WhiteElo ""]

1. d4 d5 2. c4 c6 3. Nf3 Nf6 4. Nc3 e6 5. Bg5 dxc4 6. e4 b5 7. e5 h6 8. 
Bh4 g5 9. Nxg5 hxg5 10. Bxg5 Nbd7 11. exf6 Bb7 12. g3 Qb6 13. Bg2 O-O-O 
14. O-O c5 15. d5 b4 16. Rb1 *

These are the 10 duplicates I detect.

Code: Select all

d4 d5 c4 c6 Nf3 Nf6 Nc3 e6 Bg5 dxc4 e4 b5 e5 h6 Bh4 g5 Nxg5 hxg5 Bxg5 Nbd7 exf6 Bb7 g3 Qb6 Bg2 O-O-O O-O c5 d5 b4 Rb1
d4 d5 c4 e6 Nc3 c6 e4 dxe4 Nxe4 Bb4+ Bd2 Qxd4 Bxb4 Qxe4+
d4 Nf6 Bg5 Ne4 Bf4 c5 f3 Qa5+ c3 Nf6 d5 Qb6 e4 Qxb2 Nd2 Qxc3 Bc7
d4 Nf6 c4 c5 d5 b5 cxb5 a6 f3 e6 e4 exd5 e5 Qe7 Qe2 Ng8 Nc3 Bb7 Nh3 c4
d4 Nf6 c4 c5 d5 e6 Nf3 b5 dxe6 fxe6 cxb5 a6 e3 Be7
e4 c6 d4 d5 f3 dxe4 fxe4 e5 Nf3 exd4 Bc4
e4 d5 exd5 Nf6 d4 Bg4 f3 Bf5 c4 e6 dxe6 Nc6 Be3 Qe7
e4 e5 Nf3 Nc6 Bb5 f5 Nc3 fxe4 Nxe4 d5 Nxe5 dxe4 Nxc6 Qg5 Qe2 Nf6 f4 Qxf4 Nxa7+ Bd7 Bxd7+ Kxd7 Qb5+ Ke6
e4 e5 Nf3 Nc6 Bc4 Nf6 Ng5 d5 exd5 Na5 Bb5+ c6 dxc6 bxc6 Be2 h6 Nh3
e4 e6 d4 d5 Nc3 Bb4 Qg4 Nf6 Qxg7 Rg8 Qh6 c5

The gambits-100.pgn contains a bonus entry, by the way, so we end up with 91 unique lines, the same number I found processing grl-20-cores.pgn.

Code: Select all

data/chess/games/Rebel$ grep Event gambits-100.pgn | wc -l
101

Off-topic a bit to finish, but browsing through your web site my favorite are your time odds matches. Those are informative experiments.

Rebel · Post by **Rebel** » Wed Aug 24, 2022 6:59 pm

You are right, I will look into it, thanks.

Jouni · Post by **Jouni** » Fri Sep 02, 2022 3:34 pm

There is interesting experiment https://www.melonimarco.it/en/2021/03/0 ... -of-nodes/ (no date). According to test SF reach practically max ELO 3500 after 10M nodes only. 1 second in modern CPU! After that only marginal gain.

jkominek · Post by **jkominek** » Sat Sep 03, 2022 1:19 am

Jouni wrote: ↑Fri Sep 02, 2022 3:34 pm There is interesting experiment https://www.melonimarco.it/en/2021/03/0 ... -of-nodes/ (no date). According to test SF reach practically max ELO 3500 after 10M nodes only. 1 second in modern CPU! After that only marginal gain.

I like his blog post, and have conducted a very similar experiment myself using fixed node counts (not including LC0) updated to Stockfish 15. My measurements reveal the same relationship between the NNUE and HCE curves, with HCE almost reaching NNUE at 256m nodes. I plan to push the node count over a billion to see how much the gap closes down under convergence. The notable conclusion is that the evaluation net does not gift Stockfish an appreciably higher asymptote. What it does dramatically accomplish is assist in finding good moves at a much much lower node count, hence time, even at a 2:1 nps ratio. The relationship of nodes/time vs. Elo is pushed leftward.

Also worthwhile is his companion blog post. https://www.melonimarco.it/en/2021/10/0 ... ntium-90/

His rating list is notable for using classical time control of 40mv/120min (not seconds), and in anchoring the scale to human-computer matches. Included in his calibration pool are Rebel matches. Ed S. had the rare privilege of playing Anand back in the 90s.
https://www.rebel.nl/anand.htm.

Those were quite the distinctive cartoons, btw. I've long wondered what the background story is behind the Rebel cartoons.

Modern Times · Post by **Modern Times** » Sat Sep 03, 2022 1:25 am

jkominek wrote: ↑Sat Sep 03, 2022 1:19 am
His rating list is notable for using classical time control of 40mv/120min (not seconds),

That is based on a Pentium 90, his actual time control is 40 moves/125” or 40/130”, so 40 moves in just over 2 minutes

Time for each match has been fixed to 40 moves/120 minutes repeated, calibrated on a Pentium 90 processing power. The processing power has been emulated, after estimation by using benchmarks with real P90 results. Accordingly, on modern PC the effective match time was fixed to 40 moves/125” or 40/130” depending on the PC

jkominek · Post by **jkominek** » Sat Sep 03, 2022 2:00 am

AndrewGrant wrote: ↑Sun Aug 07, 2022 12:43 am If you want to compare scaling against a pool of opponents, do exactly that. Get the same opponents. Run the same games, same openings, same machines.

Andrew made the fair suggestion of drawing a comparison from an identical pool of opponents under identical test conditions. Short of conducting an extensive new experiment, we can approximate it by filtering the CCRL opponent pool. Stockfish 15 and Komodo Dragon 3.1 have an intersection set of 14 engines. Restricting to this subset eliminates one source of variability, and may assist in a clearer picture emerging. Earlier versions of Stockfish and Komodo cannot be included in this comparison as they were paired with different (versions of) engines, and so have little overlap.

Here are the ratings I calculate using ordo. To set the scale each list is single-point anchored to Houdini 6.

Code: Select all

   # PLAYER                     :  RATING  ERROR  POINTS  PLAYED   (%)
   1 Stockfish 15 1cpu          :  3621.2   11.6   630.5     932    68
   2 SugaR AI 2.50 1cpu         :  3619.4   17.7    52.0     104    50
   3 KomodoDragon 3.1 1cpu      :  3617.5   11.8   584.0     870    67
   4 Fat Fritz 2 1cpu           :  3595.2   19.3    54.0     116    47
   5 Ethereal 13.75 1cpu        :  3543.1   28.3    44.0     112    39
   6 Revenge 3.0 1cpu           :  3533.2   28.7    42.5     112    38
   7 SlowChess 2.9 1cpu         :  3502.4   32.0    41.0     121    34
   8 Koivisto 8.0 1cpu          :  3486.0   32.2    38.5     121    32
   9 Berserk 9 1cpu             :  3476.0   36.0    35.5     116    31
  10 RubiChess 20220223 1cpu    :  3469.0   34.4    36.0     121    30
  11 RofChade 3.0 1cpu          :  3449.5   31.1    48.0     175    27
  12 Seer 2.5.0 1cpu            :  3445.7   35.7    33.0     122    27
  13 Arasan 23.4 1cpu           :  3432.4   36.2    38.5     150    26
  14 Minic 3.24 1cpu            :  3422.6   42.2    26.5     108    25
  15 Rebel 15.1 1cpu            :  3413.7   42.3    25.5     108    24
  16 Houdini 6 1cpu             :  3327.0   56.6    16.5     104    16

White advantage = 3.46 +/- 4.54
Draw rate (equal opponents) = 94.09 % +/- 1.42

   # PLAYER                     :  RATING  ERROR  POINTS  PLAYED   (%)
   1 Stockfish 15 4cpu          :  3598.2   13.6   357.0     550    65
   2 KomodoDragon 3.1 4cpu      :  3588.9   14.0   325.0     512    63
   3 SugaR AI 2.50 4cpu         :  3588.1   18.3    31.5      64    49
   4 Fat Fritz 2 4cpu           :  3577.1   20.1    30.5      64    48
   5 Revenge 3.0 4cpu           :  3549.5   30.3    28.0      64    44
   6 Berserk 9 4cpu             :  3532.7   33.4    26.5      64    41
   7 Ethereal 13.75 4cpu        :  3521.3   34.7    25.5      64    40
   8 SlowChess 2.9 4cpu         :  3515.6   36.2    25.0      64    39
   9 Koivisto 8.0 4cpu          :  3488.8   36.6    29.0      82    35
  10 RubiChess 20220223 4cpu    :  3455.6   35.3    36.0     116    31
  11 RofChade 3.0 4cpu          :  3455.3   47.1    20.0      64    31
  12 Igel 3.1.0 4cpu            :  3455.3   45.7    20.0      64    31
  13 Arasan 23.4 4cpu           :  3453.7   38.2    30.0      96    31
  14 Seer 2.5.0 4cpu            :  3435.8   49.9    18.5      64    29
  15 Houdini 6 4cpu             :  3386.0   58.2    15.0      64    23
  16 Tucano 10.00 4cpu          :  3345.3   64.0    12.5      64    20

White advantage = 6.15 +/- 5.31
Draw rate (equal opponents) = 97.16 % +/- 1.60

The 1 CPU ranking places Stockfish 15 only 3.7 Elo over Komodo 3.1, though in their direct match Stockfish performed 31.1 Elo stronger. In the 4 CPU rankings Stockfish is 9.3 Elo above Komodo, with a direct match performance of 21.7 Elo.

As well, the head-to-head listings are included below.

There was a recent TCEC bonus in which the Top 3 played the starting position against a gauntlet of 40 weaker engines. To keep the tension high the encounters were more or less in ascending order. One observation is that is takes about a rating of 3450 (Igel 3.1.4) to lay claim to having "conquered" the standard opening position.

A second observation was that Komodo showed itself move effective at beating up on weaker opponents than either Stockfish or LCZero. But as the goings got tougher Stockfish reasserted its dominance. This pattern is also expressed in the CCRL match-ups. This would seem to be part of the explanation as to why the overall separation is small. The sum total is a balance between long-distance performance, somewhat favoring Komodo, and up-close results somewhat favoring Stockfish.

Estimating ratings of the top floors of the skyscraper is a hazardous endeavor. Not until an engine is surrounded by close cohorts above and below do the ratings lock in.

Code: Select all

 1) Stockfish 15 1cpu       3621.2 :    932 (+331,=599,-2)  67.7%

    vs.                            :  games (   +,   =, -)  Draw  Perc    Perf :    Diff    SD    LOS
    SugaR AI 2.50 1cpu             :     52 (   1,  51, 0)  98.1  51.0    +6.7 :    +1.8   9.3   57.9
    KomodoDragon 3.1 1cpu          :     56 (   5,  51, 0)  91.1  54.5   +31.1 :    +3.7   8.3   67.3
    Fat Fritz 2 1cpu               :     60 (   5,  55, 0)  91.7  54.2   +29.0 :   +26.0  10.4   99.4
    Ethereal 13.75 1cpu            :     56 (  14,  42, 0)  75.0  62.5   +88.7 :   +78.1  15.0  100.0
    Revenge 3.0 1cpu               :     56 (  13,  43, 0)  76.8  61.6   +82.2 :   +88.0  15.6  100.0
    SlowChess 2.9 1cpu             :     65 (  23,  42, 0)  64.6  67.7  +128.5 :  +118.8  17.3  100.0
    Koivisto 8.0 1cpu              :     65 (  22,  42, 1)  64.6  66.2  +116.4 :  +135.3  17.7  100.0
    Berserk 9 1cpu                 :     60 (  28,  32, 0)  53.3  73.3  +175.7 :  +145.3  19.3  100.0
    RubiChess 20220223 1cpu        :     65 (  27,  37, 1)  56.9  70.0  +147.2 :  +152.3  18.8  100.0
    RofChade 3.0 1cpu              :    119 (  54,  65, 0)  54.6  72.7  +170.1 :  +171.7  16.4  100.0
    Seer 2.5.0 1cpu                :     66 (  29,  37, 0)  56.1  72.0  +163.8 :  +175.6  19.4  100.0
    Arasan 23.4 1cpu               :     48 (  21,  27, 0)  56.2  71.9  +163.0 :  +188.9  20.1  100.0
    Minic 3.24 1cpu                :     56 (  26,  30, 0)  53.6  73.2  +174.7 :  +198.7  22.8  100.0
    Rebel 15.1 1cpu                :     56 (  28,  28, 0)  50.0  75.0  +190.8 :  +207.6  23.0  100.0
    Houdini 6 1cpu                 :     52 (  35,  17, 0)  32.7  83.7  +283.6 :  +294.2  30.6  100.0

 3) KomodoDragon 3.1 1cpu   3617.5 :    870 (+306,=556,-8)  67.1%

    vs.                            :  games (   +,   =, -)  Draw  Perc    Perf :    Diff    SD    LOS
    Stockfish 15 1cpu              :     56 (   0,  51, 5)  91.1  45.5   -31.1 :    -3.7   8.3   32.7
    SugaR AI 2.50 1cpu             :     52 (   0,  51, 1)  98.1  49.0    -6.7 :    -1.8   9.4   42.2
    Fat Fritz 2 1cpu               :     56 (   5,  49, 2)  87.5  52.7   +18.6 :   +22.3   9.9   98.8
    Ethereal 13.75 1cpu            :     56 (  10,  46, 0)  82.1  58.9   +62.7 :   +74.4  15.4  100.0
    Revenge 3.0 1cpu               :     56 (  14,  42, 0)  75.0  62.5   +88.7 :   +84.3  15.5  100.0
    SlowChess 2.9 1cpu             :     56 (  16,  40, 0)  71.4  64.3  +102.1 :  +115.1  17.6  100.0
    Koivisto 8.0 1cpu              :     56 (  23,  33, 0)  58.9  70.5  +151.6 :  +131.6  17.5  100.0
    Berserk 9 1cpu                 :     56 (  17,  39, 0)  69.6  65.2  +108.9 :  +141.6  19.7  100.0
    RubiChess 20220223 1cpu        :     56 (  23,  33, 0)  58.9  70.5  +151.6 :  +148.6  18.7  100.0
    RofChade 3.0 1cpu              :     56 (  25,  31, 0)  55.4  72.3  +166.8 :  +168.0  16.9  100.0
    Seer 2.5.0 1cpu                :     56 (  27,  29, 0)  51.8  74.1  +182.7 :  +171.9  19.6  100.0
    Arasan 23.4 1cpu               :    102 (  52,  50, 0)  49.0  75.5  +195.4 :  +185.2  19.2  100.0
    Minic 3.24 1cpu                :     52 (  29,  23, 0)  44.2  77.9  +218.7 :  +195.0  22.7  100.0
    Rebel 15.1 1cpu                :     52 (  29,  23, 0)  44.2  77.9  +218.7 :  +203.9  23.0  100.0
    Houdini 6 1cpu                 :     52 (  36,  16, 0)  30.8  84.6  +296.1 :  +290.5  30.9  100.0

Code: Select all

 1) Stockfish 15 4cpu       3598.2 :    550 (+165,=384,-1)  64.9%

    vs.                            :  games (   +,   =, -)  Draw  Perc    Perf :    Diff    SD    LOS
    KomodoDragon 3.1 4cpu          :     32 (   2,  30, 0)  93.8  53.1   +21.7 :    +9.3   9.7   83.2
    SugaR AI 2.50 4cpu             :     32 (   1,  31, 0)  96.9  51.6   +10.9 :   +10.2   9.9   84.9
    Fat Fritz 2 4cpu               :     32 (   2,  30, 0)  93.8  53.1   +21.7 :   +21.1  10.9   97.3
    Revenge 3.0 4cpu               :     32 (   4,  28, 0)  87.5  56.2   +43.7 :   +48.7  16.2   99.9
    Berserk 9 4cpu                 :     32 (   5,  26, 1)  81.2  56.2   +43.7 :   +65.5  18.1  100.0
    Ethereal 13.75 4cpu            :     32 (   8,  24, 0)  75.0  62.5   +88.7 :   +76.9  18.8  100.0
    SlowChess 2.9 4cpu             :     32 (   7,  25, 0)  78.1  60.9   +77.2 :   +82.6  19.6  100.0
    Koivisto 8.0 4cpu              :     50 (  16,  34, 0)  68.0  66.0  +115.2 :  +109.4  19.8  100.0
    RubiChess 20220223 4cpu        :     84 (  31,  53, 0)  63.1  68.5  +134.6 :  +142.6  18.7  100.0
    Igel 3.1.0 4cpu                :     32 (  11,  21, 0)  65.6  67.2  +124.5 :  +142.9  24.5  100.0
    RofChade 3.0 4cpu              :     32 (  16,  16, 0)  50.0  75.0  +190.8 :  +142.9  25.5  100.0
    Arasan 23.4 4cpu               :     32 (  14,  18, 0)  56.2  71.9  +163.0 :  +144.5  20.9  100.0
    Seer 2.5.0 4cpu                :     32 (  13,  19, 0)  59.4  70.3  +149.8 :  +162.5  26.8  100.0
    Houdini 6 4cpu                 :     32 (  17,  15, 0)  46.9  76.6  +205.6 :  +212.2  32.0  100.0
    Tucano 10.00 4cpu              :     32 (  18,  14, 0)  43.8  78.1  +221.1 :  +252.9  34.8  100.0

 2) KomodoDragon 3.1 4cpu   3588.9 :    512 (+140,=370,-2)  63.5%

    vs.                            :  games (   +,   =, -)  Draw  Perc    Perf :    Diff    SD    LOS
    Stockfish 15 4cpu              :     32 (   0,  30, 2)  93.8  46.9   -21.7 :    -9.3   9.7   16.8
    SugaR AI 2.50 4cpu             :     32 (   0,  32, 0) 100.0  50.0    +0.0 :    +0.8   9.6   53.4
    Fat Fritz 2 4cpu               :     32 (   1,  31, 0)  96.9  51.6   +10.9 :   +11.8  10.8   86.1
    Revenge 3.0 4cpu               :     32 (   4,  28, 0)  87.5  56.2   +43.7 :   +39.4  16.1   99.3
    Berserk 9 4cpu                 :     32 (   7,  25, 0)  78.1  60.9   +77.2 :   +56.2  18.1   99.9
    Ethereal 13.75 4cpu            :     32 (   5,  27, 0)  84.4  57.8   +54.7 :   +67.6  19.1  100.0
    SlowChess 2.9 4cpu             :     32 (   7,  25, 0)  78.1  60.9   +77.2 :   +73.3  19.7  100.0
    Koivisto 8.0 4cpu              :     32 (   8,  24, 0)  75.0  62.5   +88.7 :  +100.0  20.2  100.0
    RubiChess 20220223 4cpu        :     32 (  13,  19, 0)  59.4  70.3  +149.8 :  +133.2  19.3  100.0
    Igel 3.1.0 4cpu                :     32 (  13,  19, 0)  59.4  70.3  +149.8 :  +133.6  24.7  100.0
    RofChade 3.0 4cpu              :     32 (   8,  24, 0)  75.0  62.5   +88.7 :  +133.6  25.5  100.0
    Arasan 23.4 4cpu               :     64 (  22,  42, 0)  65.6  67.2  +124.5 :  +135.1  20.1  100.0
    Seer 2.5.0 4cpu                :     32 (  14,  18, 0)  56.2  71.9  +163.0 :  +153.1  27.1  100.0
    Houdini 6 4cpu                 :     32 (  17,  15, 0)  46.9  76.6  +205.6 :  +202.9  31.9  100.0
    Tucano 10.00 4cpu              :     32 (  21,  11, 0)  34.4  82.8  +273.2 :  +243.6  35.0  100.0

jkominek · Post by **jkominek** » Sat Sep 03, 2022 2:08 am

Modern Times wrote: ↑Sat Sep 03, 2022 1:25 am
jkominek wrote: ↑Sat Sep 03, 2022 1:19 am
His rating list is notable for using classical time control of 40mv/120min (not seconds),
That is based on a Pentium 90, his actual time control is 40 moves/125” or 40/130”, so 40 moves in just over 2 minutes

Time for each match has been fixed to 40 moves/120 minutes repeated, calibrated on a Pentium 90 processing power. The processing power has been emulated, after estimation by using benchmarks with real P90 results. Accordingly, on modern PC the effective match time was fixed to 40 moves/125” or 40/130” depending on the PC

That is true. I read that part but failed to mention it here. Thank you for pointing it out.

There is not an abundance of open source data that I am aware of for calibrating against human performance. Or at least: under well controlled circumstances, with players not temped to hit a quick "I resign" button to start a new game. I recall that SSDF in the early year put effort into calibrating their list. I recently went searching for their notes on human calibration experiments but could not find it. My memory says it was based on Swedish club players in the 1500-2200 range going up against, for the most part, dedicated boards.

The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem