Komodo-Dragon-2 vs Stockfish 14 at knight odss
Moderator: Ras
-
connor_mcmonigle
- Posts: 544
- Joined: Sun Sep 06, 2020 4:40 am
- Full name: Connor McMonigle
Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss
Also relevant is that CCRL's Blitz list uses a 2m+1s TC while Ed is using a 40/2 TC.
-
lkaufman
- Posts: 6284
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss
Well ccrl switched from 40/2, so their data is a mixture. This might affect individual ratings but not the range significantly.connor_mcmonigle wrote: ↑Sun Sep 26, 2021 11:46 pm Also relevant is that CCRL's Blitz list uses a 2m+1s TC while Ed is using a 40/2 TC.
Komodo rules!
-
lkaufman
- Posts: 6284
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss
1.The ccrl rapid list may be better, but the blitz list is more comparable to yours in game quality. Using the 4 ref. Engines you lost, the ccrl blitz list is about 48 elo above the rapid list. If I subtract 48 from the 3400 range engines in question I get ratings similar to yours. So no real disparity, just the absence of the expected gambit effect.Rebel wrote: ↑Sun Sep 26, 2021 11:42 pm1. Well, rating lists are not exact science. I have chosen for the 40/15 list because the ratings are more reliable. An example would the comparison between SF12 and SF13. If you look at the 40/2 list SF12 is rated higher than SF13 while on the 40/15 list SF13 is rated higher, as it normally should.lkaufman wrote: ↑Sun Sep 26, 2021 9:23 pmI think you missed my point, that your CCRl vs GRL comparison seems to be the CCRL Rapid list vs. the GRL blitz list, which is not the proper comparison; should be CCRL Blitz list vs. GRL Blitz list. I quite like and agree with your use of gambit openings for your list, that's why I'm so puzzled that your ratings show a smaller range than the CCRL BLITZ list does. I can't explain it. I must be missing something. One other question, are you now using AVX2 versions of Dragon and Stockfish, or is the hardware too old? If not, that might explain why you got a much smaller elo gain than CEGT, which reports +73 elo in blitz over Dragon 2.Rebel wrote: ↑Sun Sep 26, 2021 8:31 pmThe height of elo values in rating lists are defined by using anchor engines, for the GRL I use 4 anchor engines to be more or less compatible with the CCRL values. Anchor engines are rock solid engines that played thousands of games and thus have a reliable elo. For instance, I use Critter 1.6a as an anchor engine with a fixed elo of 3150 which I borrowed from CCRL 40/15, it currently has 3157.I use Houdini 6 (derivatives come in handy) as an anchor engine of 3400 elo, it currently has 3394. Fruit 2.1 as 2700, Nemo as 2850, also borrowed from CCRL 40/15. Now suppose I change the value of Houdini to 3500, the rating list values of 3400+ engines will go up unrealistic big time, lowering it to 3300 will have the opposite effect. Meaning, with anchor engines I can create a rating list with SF14 on top with 2000 elo, however... the order remains the same.lkaufman wrote: ↑Sun Sep 26, 2021 4:35 pmThese engines are all rated over 3400 on the CCRL blitz list (or nearly identical versions, like Komodo 11.01), quite a bit higher than your own list average of 3364. I wondered why this was so. On your main page for the gambit rating list you have a comparison with CCRL, but I think you are comparing your blitz ratings to their Rapid ratings, should be comparing blitz for both. Since CCRL uses BayesElo which contracts rating differences, I would expect ratings of engines near the top to be lower on their blitz list than on yours, but they are clearly higher! I'm trying to think of an explanation for this, do you have any idea? I wouldn't expect your choice of gambit openings to shrink rating differences, that is bizarre.Rebel wrote: ↑Sun Sep 26, 2021 2:52 pm Odds match minus pawn f2 - Dragon 2.5 vs an elo pool of 3364 engines
Code: Select all
Odds match minus pawn f2 - Dragon 2.5 vs an elo pool of 3364 engines Time Control : Time control : 40/120 Games : 700 Results from file all.pgn: No. Name Win Draw Loss Unf. Score Games % -------------------------------------------------------------- 1 Komodo-Dragon 2.5 +225 =306 -169 *0 378.0 700 54.0% 2 Ethereal 12.75 +31 =50 -19 *0 56.0 100 56.0% 3 Pedone 3.1 +21 =61 -18 *0 51.5 100 51.5% 4 Komodo 12 +32 =34 -34 *0 49.0 100 49.0% 5 Komodo 11 +31 =33 -36 *0 47.5 100 47.5% 6 Stockfish 8 +25 =34 -41 *0 42.0 100 42.0% 7 Igel 3.0.5 +13 =53 -34 *0 39.5 100 39.5% 8 Igel 3.0.0 +16 =41 -43 *0 36.5 100 36.5%
Secondly responding on the part I bold, have a look at my research CCRL vs GRL - a comparison, gambit openings do make sense. If they did not I would have stopped the GRL long time ago.
2. As for AVX2, I have AVX but not AVX2. But I don't think it matters much because all the opponents Dragon played had the same hardware.
3. Elo pools are important. Perhaps (emphasis added) it could be that the performance of 2.5 would be (somewhat) better if 2.5 had played the exact same opponents version 2.0 played.
4. I tested the NPS -
Dragon 2.0 - 700,000
Dragon 2.5 - 570,000
2. Avx2 is huge for all nnue engines, not for others. So your list is fair for comparing engines of the same category but it does understate the gain from nnue.
4. Your NPS ratio is somewhat larger than on avx2, so this might account for about 8 elo of the lower gain for dragon 2.5 I estimate.
Komodo rules!
-
Rebel
- Posts: 7475
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss
Larry, how can I convince that the Gambit Rating List is a different animal. Positions are tactical from the get go. It's why the top engines (Stockfish and Komodo in particular) profit massively in comparison with other rating lists. From the CCRL-GRL comparison :
Some profit more than others, Nemorino even loses elo.
SF14 (4300 games) only gained 15 elo over SF13 (3500 games), other rating lists reported a much higher elo gain.
What to think about Benjamin, its rating is totally unrealistic when it has to play normal openings.
Speaking about playing normal openings, I ran Dragon 2.5 vs Dragon 2, 1000 games, tc 40/120, 8-moves.pgn, thus normal openings.
And it produces your estimated elo gain of +63
Code: Select all
# PLAYER : RATING PLAYED CCRL Gambit
1 Stockfish 13 : 3667.5 3500 3506 161
2 Komodo-Dragon 1 : 3581.3 3000 3469 112
3 Lc0 v27 : 3529.8 800 ----
4 SlowChess 2.6 : 3421.9 2400 3379 42
5 RubiChess 2.1 : 3380.5 2900 3338 42
6 Pedone 3.1 : 3361.7 2900 3334 27
7 Igel 3.0.5 : 3355.0 2900 3342 13
8 Ethereal 12.75 : 3353.8 2700 3320 33
9 Nemorino 6.00 : 3309.5 2900 3344 -35SF14 (4300 games) only gained 15 elo over SF13 (3500 games), other rating lists reported a much higher elo gain.
What to think about Benjamin, its rating is totally unrealistic when it has to play normal openings.
Speaking about playing normal openings, I ran Dragon 2.5 vs Dragon 2, 1000 games, tc 40/120, 8-moves.pgn, thus normal openings.
Code: Select all
Dragon 2.5 vs Dragon 2.0 [8-moves.pgn] [normal openings]
Time Control : Time control : 40/120
Games : 1000
Results from file all.pgn:
No. Name Win Draw Loss Unf. Score Games %
--------------------------------------------------------------
1 Komodo-Dragon 2.5 +216 =744 -40 *0 588.0 1000 58.8%
2 Komodo-Dragon 2 +40 =744 -216 *0 412.0 1000 41.2%
Total Games: 1000
White Wins: 154 (15.4%)
Black Wins: 102 (10.2%)
Draws: 744 (74.4%)
Unfinished: 0 (0.0%)
Estimated ratings for this elo 3590 pool
# PLAYER : RATING POINTS PLAYED (%)
1 Komodo-Dragon 2.5 : 3621.2 588.0 1000 59
2 Komodo-Dragon 2 : 3558.8 412.0 1000 4190% of coding is debugging, the other 10% is writing bugs.
-
lkaufman
- Posts: 6284
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss
It would be very interesting to do the same run with your gambit openings, to see whether they are the reason for the smaller elo gain you report on the gambit list or something else. I know you don't have enough positions for 1000 games; you can set "Variety" to a small number like 3 on both engines and the effect on elo will be trivial and balanced to get as many games as you want, or else you can vary the time control by a tiny amount to avoid repeat games. I know that engines will perform differently with gambit openings than with normal ones, for example the good results for "Benjamin" make sense to me because it excels when getting knight odds from Dragon (compared to other similarly rated engines). I am interested in the question of whether the "spread" of ratings (say the standard deviation for a given group of engines) increases going from CCRL to GRL. using BLITZ lists for both to be fair. Logically, it should increase, but it doesn't look like this is the case. Of course the spread is greater with blitz compared to rapid, because the draw percentage is lower in blitz.Rebel wrote: ↑Mon Sep 27, 2021 9:58 am Larry, how can I convince that the Gambit Rating List is a different animal. Positions are tactical from the get go. It's why the top engines (Stockfish and Komodo in particular) profit massively in comparison with other rating lists. From the CCRL-GRL comparison :
Some profit more than others, Nemorino even loses elo.Code: Select all
# PLAYER : RATING PLAYED CCRL Gambit 1 Stockfish 13 : 3667.5 3500 3506 161 2 Komodo-Dragon 1 : 3581.3 3000 3469 112 3 Lc0 v27 : 3529.8 800 ---- 4 SlowChess 2.6 : 3421.9 2400 3379 42 5 RubiChess 2.1 : 3380.5 2900 3338 42 6 Pedone 3.1 : 3361.7 2900 3334 27 7 Igel 3.0.5 : 3355.0 2900 3342 13 8 Ethereal 12.75 : 3353.8 2700 3320 33 9 Nemorino 6.00 : 3309.5 2900 3344 -35
SF14 (4300 games) only gained 15 elo over SF13 (3500 games), other rating lists reported a much higher elo gain.
What to think about Benjamin, its rating is totally unrealistic when it has to play normal openings.
Speaking about playing normal openings, I ran Dragon 2.5 vs Dragon 2, 1000 games, tc 40/120, 8-moves.pgn, thus normal openings.
And it produces your estimated elo gain of +63Code: Select all
Dragon 2.5 vs Dragon 2.0 [8-moves.pgn] [normal openings] Time Control : Time control : 40/120 Games : 1000 Results from file all.pgn: No. Name Win Draw Loss Unf. Score Games % -------------------------------------------------------------- 1 Komodo-Dragon 2.5 +216 =744 -40 *0 588.0 1000 58.8% 2 Komodo-Dragon 2 +40 =744 -216 *0 412.0 1000 41.2% Total Games: 1000 White Wins: 154 (15.4%) Black Wins: 102 (10.2%) Draws: 744 (74.4%) Unfinished: 0 (0.0%) Estimated ratings for this elo 3590 pool # PLAYER : RATING POINTS PLAYED (%) 1 Komodo-Dragon 2.5 : 3621.2 588.0 1000 59 2 Komodo-Dragon 2 : 3558.8 412.0 1000 41
By the way, my estimate for single thread blitz was +72 elo (CEGT reports +73), and 72 minus 63 is 9. I estimated from your NPS that no AVX2 should cost us 9 elo just between Dragon 2 and 2.5, so I was off by 1!
Komodo rules!
-
Rebel
- Posts: 7475
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss
I currently have no comp free, I have the 20 core edition running which will take at least 5 days.
After 5 games
+254 elo 
After 5 games
Code: Select all
Gambit Rating List
Running : Gauntlet Dragon 2.5 for the GRL 20 cores rating list
Time Control : Time control : 40/120
Games : 500
Results from file top5.pgn:
No. Name Win Draw Loss Unf. Score Games %
----------------------------------------------------------------
1 Stockfish 14 +214 =280 -7 *0 354.0 501 70.7%
2 Komodo-Dragon 2 +137 =325 -38 *0 299.5 500 59.9%
3 Ethereal 13.25-NNUE +58 =326 -117 *0 221.0 501 44.1%
4 Koivisto 6.16 +65 =302 -134 *0 216.0 501 43.1%
5 SlowChess 2.7 +51 =315 -135 *0 208.5 501 41.6%
6 RubiChess 2.2 +45 =313 -143 *0 201.5 501 40.2%
7 Komodo-Dragon 2.5 +4 =1 -0 *0 4.5 5 90.0%
Total Games: 1505
White Wins: 281 (18.7%)
Black Wins: 293 (19.5%)
Draws: 931 (61.9%)
Unfinished: 0 (0.0%)
Estimated elo gain for Komodo-Dragon_2.5
Elo pool : 3491
Komodo-Dragon 2 : 3567.0
Komodo-Dragon_2.5 : 3821.9
Difference : 254.990% of coding is debugging, the other 10% is writing bugs.
-
Rebel
- Posts: 7475
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss
Good news
I ran a provisional rating list.
Dragon 2.5 : 3649
Dragon 2.0 : 3590
Progress : 59 elo
Meaning, something is wrong with the estimated live elo calculation during gauntlet matches. I have to look into that!
I ran a provisional rating list.
Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%) CFS(%) W D L D(%)
1 Stockfish 14 : 3693.5 19.7 3527.0 4500 78 93 2657 1740 103 39
2 Stockfish 13 : 3677.1 9.4 2660.0 3500 76 99 1915 1490 95 43
3 Stockfish 21-05-18 : 3659.4 19.4 841.0 1100 76 81 617 448 35 41
4 Komodo-Dragon 2.5 : 3649.0 6.3 1441.0 2000 72 100 986 910 104 46
5 Stockfish 12 : 3625.8 12.4 1903.0 2800 68 100 1222 1362 216 49
6 Komodo-Dragon 2 : 3592.9 10.9 3388.5 4700 72 63 2385 2007 308 43
7 Komodo-Dragon : 3590.3 19.3 1998.5 3000 67 51 1317 1363 320 45
8 Lc0 v28 : 3589.9 25.4 591.5 1000 59 100 335 513 152 51
9 Lc0-v27 : 3537.2 16.8 501.0 800 63 98 307 388 105 49
10 Stockfish 11 : 3509.8 20.5 800.0 1300 62 95 518 564 218 43Dragon 2.0 : 3590
Progress : 59 elo
Meaning, something is wrong with the estimated live elo calculation during gauntlet matches. I have to look into that!
90% of coding is debugging, the other 10% is writing bugs.