Here, the testing results:
Celo-Gap Stockfish 16/Torch 2:
strongest 5 opponents list : 32 Celo
Full 15 opponents list : 20 Celo
weakest 5 opponents list(s): 9 Celo
The effect, I mentioned, that Stockfish scores weaker versus weaker opponents than Torch, can be seen here, very clearly. That means, I underestimated the effect, when looking at my full UHO-ratinglist, where Torch 2 is 3 Elo ahead of Stockfish 16. Because in this full UHO-ratinglist, Stockfish has played 40000 games and Torch 2 only 24000. So, Stockfish played versus way more weaker engines, than Torch 2 did. So, the rating of SF 16 in my full ratinglist is weaker here, than in my experiments below.
And we learn, that a ratinglist, which is not a RoundRobin tournament (all engines have the same opponents) is very susceptible to distortions (another bad news for CCRL/CEGT). Especially, when engines with a high EAS-scoring, participate. Luckily, my UHO-Top15 Ratinglist is a RoundRobin, but my full UHO-ratinglist, where all played games/engines are collected, can be affected by this effect, too (see above).
Code: Select all
Program Elo + - Games Score Av.Op. Draws
1 Stockfish 16.1 240224 : 3833 4 4 15000 71.0% 3670 47.9%
2 Stockfish 16 230630 : 3821 4 4 15000 69.5% 3670 48.0%
3 Torch 2 popavx2 : 3801 4 4 15000 67.0% 3672 48.1%
4 Berserk 13 avx2 : 3747 4 4 15000 59.7% 3675 48.9%
5 KomodoDragon 3.3 avx2 : 3735 4 4 15000 58.0% 3676 49.7%
6 Ethereal 14.38 avx2 : 3699 4 4 15000 52.9% 3678 49.1%
7 Obsidian 12.0 avx2 : 3693 4 4 15000 52.0% 3679 50.7%
8 Caissa 1.18 avx2 : 3675 4 4 15000 49.4% 3680 49.0%
9 RubiChess 240112 avx2 : 3652 4 4 15000 46.2% 3682 48.7%
10 PlentyChess 1.0 avx2 : 3630 4 4 15000 43.0% 3683 50.1%
11 Alexandria 6.1.0 avx2 : 3609 4 4 15000 40.0% 3684 49.5%
12 Seer 2.8.0 avx2 : 3605 4 4 15000 39.4% 3685 48.9%
13 CSTal 2.0 avx2 : 3597 4 4 15000 38.4% 3685 51.1%
14 Uralochka 3.41a avx2 : 3596 4 4 15000 38.2% 3685 48.3%
15 Rebel 16.3 avx2 : 3595 4 4 15000 38.0% 3685 49.9%
16 Titan 1.0 avx2 : 3590 4 4 15000 37.3% 3686 49.7%
Games : 120000 (finished)
White Wins : 58277 (48.6 %)
Black Wins : 2651 (2.2 %)
Draws : 59072 (49.2 %)
------------------------------------------------------------------------------
5 strongest engines (opponents):
Code: Select all
Program Elo + - Games Score Av.Op. Draws
1 Stockfish 16.1 240224 : 3833 6 6 5000 63.7% 3732 49.9%
2 Stockfish 16 230630 : 3813 6 6 5000 60.5% 3736 49.9%
3 Torch 2 popavx2 : 3781 6 6 5000 55.4% 3742 49.9%
4 Berserk 13 avx2 : 3707 6 6 5000 43.2% 3757 50.7%
5 KomodoDragon 3.3 avx2 : 3698 6 6 5000 41.7% 3759 50.8%
6 Ethereal 14.38 avx2 : 3659 6 6 5000 35.4% 3767 49.9%
Games : 15000 (finished)
White Wins : 7322 (48.8 %)
Black Wins : 148 (1.0 %)
Draws : 7530 (50.2 %)
------------------------------------------------------------------------------
5 weakest engines (opponents for SF / Torch 2):
Stockfish 16:
Code: Select all
Program Elo + - Games Score Av.Op. Draws
1 Stockfish 16 230630 : 3783 6 6 5000 75.0% 3590 47.4%
2 Seer 2.8.0 avx2 : 3600 6 6 5000 46.6% 3627 49.0%
3 CSTal 2.0 avx2 : 3594 6 6 5000 45.5% 3628 52.6%
4 Uralochka 3.41a avx2 : 3589 6 6 5000 44.8% 3629 49.4%
5 Rebel 16.3 avx2 : 3587 6 6 5000 44.5% 3629 50.6%
6 Titan 1.0 avx2 : 3582 6 6 5000 43.6% 3630 50.7%
Games : 15000 (finished)
White Wins : 7216 (48.1 %)
Black Wins : 290 (1.9 %)
Draws : 7494 (50.0 %)
Torch 2:
Code: Select all
Program Elo + - Games Score Av.Op. Draws
1 Torch 2 popavx2 : 3774 7 7 5000 74.1% 3590 46.9%
2 Seer 2.8.0 avx2 : 3600 6 6 5000 46.8% 3625 48.6%
3 CSTal 2.0 avx2 : 3594 6 6 5000 45.8% 3626 52.5%
4 Uralochka 3.41a avx2 : 3589 6 6 5000 44.9% 3628 49.5%
5 Rebel 16.3 avx2 : 3587 6 6 5000 44.6% 3628 50.4%
6 Titan 1.0 avx2 : 3582 6 6 5000 43.9% 3629 50.8%
Games : 15000 (finished)
White Wins : 7254 (48.4 %)
Black Wins : 279 (1.9 %)
Draws : 7467 (49.8 %)