Komodo-Dragon-2 vs Stockfish 14 at knight odss

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

connor_mcmonigle
Posts: 544
Joined: Sun Sep 06, 2020 4:40 am
Full name: Connor McMonigle

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Post by connor_mcmonigle »

Also relevant is that CCRL's Blitz list uses a 2m+1s TC while Ed is using a 40/2 TC.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Post by lkaufman »

connor_mcmonigle wrote: Sun Sep 26, 2021 11:46 pm Also relevant is that CCRL's Blitz list uses a 2m+1s TC while Ed is using a 40/2 TC.
Well ccrl switched from 40/2, so their data is a mixture. This might affect individual ratings but not the range significantly.
Komodo rules!
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Post by lkaufman »

:evil:
Rebel wrote: Sun Sep 26, 2021 11:42 pm
lkaufman wrote: Sun Sep 26, 2021 9:23 pm
Rebel wrote: Sun Sep 26, 2021 8:31 pm
lkaufman wrote: Sun Sep 26, 2021 4:35 pm
Rebel wrote: Sun Sep 26, 2021 2:52 pm Odds match minus pawn f2 - Dragon 2.5 vs an elo pool of 3364 engines

Code: Select all

Odds match minus pawn f2 - Dragon 2.5 vs an elo pool of 3364 engines 
Time Control : Time control : 40/120
Games        : 700

Results from file all.pgn:

No. Name               Win Draw Loss Unf.  Score Games       %
--------------------------------------------------------------
  1 Komodo-Dragon 2.5 +225 =306 -169   *0  378.0   700   54.0%
  2 Ethereal 12.75     +31  =50  -19   *0   56.0   100   56.0%
  3 Pedone 3.1         +21  =61  -18   *0   51.5   100   51.5%
  4 Komodo 12          +32  =34  -34   *0   49.0   100   49.0%
  5 Komodo 11          +31  =33  -36   *0   47.5   100   47.5%
  6 Stockfish 8        +25  =34  -41   *0   42.0   100   42.0%
  7 Igel 3.0.5         +13  =53  -34   *0   39.5   100   39.5%
  8 Igel 3.0.0         +16  =41  -43   *0   36.5   100   36.5%
These engines are all rated over 3400 on the CCRL blitz list (or nearly identical versions, like Komodo 11.01), quite a bit higher than your own list average of 3364. I wondered why this was so. On your main page for the gambit rating list you have a comparison with CCRL, but I think you are comparing your blitz ratings to their Rapid ratings, should be comparing blitz for both. Since CCRL uses BayesElo which contracts rating differences, I would expect ratings of engines near the top to be lower on their blitz list than on yours, but they are clearly higher! I'm trying to think of an explanation for this, do you have any idea? I wouldn't expect your choice of gambit openings to shrink rating differences, that is bizarre.
The height of elo values in rating lists are defined by using anchor engines, for the GRL I use 4 anchor engines to be more or less compatible with the CCRL values. Anchor engines are rock solid engines that played thousands of games and thus have a reliable elo. For instance, I use Critter 1.6a as an anchor engine with a fixed elo of 3150 which I borrowed from CCRL 40/15, it currently has 3157.I use Houdini 6 (derivatives come in handy) as an anchor engine of 3400 elo, it currently has 3394. Fruit 2.1 as 2700, Nemo as 2850, also borrowed from CCRL 40/15. Now suppose I change the value of Houdini to 3500, the rating list values of 3400+ engines will go up unrealistic big time, lowering it to 3300 will have the opposite effect. Meaning, with anchor engines I can create a rating list with SF14 on top with 2000 elo, however... the order remains the same.

Secondly responding on the part I bold, have a look at my research CCRL vs GRL - a comparison, gambit openings do make sense. If they did not I would have stopped the GRL long time ago.
I think you missed my point, that your CCRl vs GRL comparison seems to be the CCRL Rapid list vs. the GRL blitz list, which is not the proper comparison; should be CCRL Blitz list vs. GRL Blitz list. I quite like and agree with your use of gambit openings for your list, that's why I'm so puzzled that your ratings show a smaller range than the CCRL BLITZ list does. I can't explain it. I must be missing something. One other question, are you now using AVX2 versions of Dragon and Stockfish, or is the hardware too old? If not, that might explain why you got a much smaller elo gain than CEGT, which reports +73 elo in blitz over Dragon 2.
1. Well, rating lists are not exact science. I have chosen for the 40/15 list because the ratings are more reliable. An example would the comparison between SF12 and SF13. If you look at the 40/2 list SF12 is rated higher than SF13 while on the 40/15 list SF13 is rated higher, as it normally should.

2. As for AVX2, I have AVX but not AVX2. But I don't think it matters much because all the opponents Dragon played had the same hardware.

3. Elo pools are important. Perhaps (emphasis added) it could be that the performance of 2.5 would be (somewhat) better if 2.5 had played the exact same opponents version 2.0 played.

4. I tested the NPS -
Dragon 2.0 - 700,000
Dragon 2.5 - 570,000
1.The ccrl rapid list may be better, but the blitz list is more comparable to yours in game quality. Using the 4 ref. Engines you lost, the ccrl blitz list is about 48 elo above the rapid list. If I subtract 48 from the 3400 range engines in question I get ratings similar to yours. So no real disparity, just the absence of the expected gambit effect.
2. Avx2 is huge for all nnue engines, not for others. So your list is fair for comparing engines of the same category but it does understate the gain from nnue.
4. Your NPS ratio is somewhat larger than on avx2, so this might account for about 8 elo of the lower gain for dragon 2.5 I estimate.
Komodo rules!
User avatar
Rebel
Posts: 7475
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Post by Rebel »

Larry, how can I convince that the Gambit Rating List is a different animal. Positions are tactical from the get go. It's why the top engines (Stockfish and Komodo in particular) profit massively in comparison with other rating lists. From the CCRL-GRL comparison :

Code: Select all

   # PLAYER                :  RATING PLAYED  CCRL Gambit
   1 Stockfish 13          :  3667.5  3500   3506  161
   2 Komodo-Dragon 1       :  3581.3  3000   3469  112
   3 Lc0 v27               :  3529.8   800   ----
   4 SlowChess 2.6         :  3421.9  2400   3379   42
   5 RubiChess 2.1         :  3380.5  2900   3338   42
   6 Pedone 3.1            :  3361.7  2900   3334   27
   7 Igel 3.0.5            :  3355.0  2900   3342   13
   8 Ethereal 12.75        :  3353.8  2700   3320   33
   9 Nemorino 6.00         :  3309.5  2900   3344  -35
Some profit more than others, Nemorino even loses elo.

SF14 (4300 games) only gained 15 elo over SF13 (3500 games), other rating lists reported a much higher elo gain.

What to think about Benjamin, its rating is totally unrealistic when it has to play normal openings.

Speaking about playing normal openings, I ran Dragon 2.5 vs Dragon 2, 1000 games, tc 40/120, 8-moves.pgn, thus normal openings.

Code: Select all

Dragon 2.5 vs Dragon 2.0 [8-moves.pgn] [normal openings]
Time Control : Time control : 40/120
Games        : 1000

Results from file all.pgn:

No. Name               Win Draw Loss Unf.  Score Games       %
--------------------------------------------------------------
  1 Komodo-Dragon 2.5 +216 =744  -40   *0  588.0  1000   58.8%
  2 Komodo-Dragon 2    +40 =744 -216   *0  412.0  1000   41.2%

Total Games:    1000
White Wins:      154 (15.4%)
Black Wins:      102 (10.2%)
Draws:           744 (74.4%)
Unfinished:        0 (0.0%)

Estimated ratings for this elo 3590 pool

   # PLAYER               :  RATING  POINTS  PLAYED   (%)
   1 Komodo-Dragon 2.5    :  3621.2   588.0    1000    59
   2 Komodo-Dragon 2      :  3558.8   412.0    1000    41
And it produces your estimated elo gain of +63
90% of coding is debugging, the other 10% is writing bugs.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Post by lkaufman »

Rebel wrote: Mon Sep 27, 2021 9:58 am Larry, how can I convince that the Gambit Rating List is a different animal. Positions are tactical from the get go. It's why the top engines (Stockfish and Komodo in particular) profit massively in comparison with other rating lists. From the CCRL-GRL comparison :

Code: Select all

   # PLAYER                :  RATING PLAYED  CCRL Gambit
   1 Stockfish 13          :  3667.5  3500   3506  161
   2 Komodo-Dragon 1       :  3581.3  3000   3469  112
   3 Lc0 v27               :  3529.8   800   ----
   4 SlowChess 2.6         :  3421.9  2400   3379   42
   5 RubiChess 2.1         :  3380.5  2900   3338   42
   6 Pedone 3.1            :  3361.7  2900   3334   27
   7 Igel 3.0.5            :  3355.0  2900   3342   13
   8 Ethereal 12.75        :  3353.8  2700   3320   33
   9 Nemorino 6.00         :  3309.5  2900   3344  -35
Some profit more than others, Nemorino even loses elo.

SF14 (4300 games) only gained 15 elo over SF13 (3500 games), other rating lists reported a much higher elo gain.

What to think about Benjamin, its rating is totally unrealistic when it has to play normal openings.

Speaking about playing normal openings, I ran Dragon 2.5 vs Dragon 2, 1000 games, tc 40/120, 8-moves.pgn, thus normal openings.

Code: Select all

Dragon 2.5 vs Dragon 2.0 [8-moves.pgn] [normal openings]
Time Control : Time control : 40/120
Games        : 1000

Results from file all.pgn:

No. Name               Win Draw Loss Unf.  Score Games       %
--------------------------------------------------------------
  1 Komodo-Dragon 2.5 +216 =744  -40   *0  588.0  1000   58.8%
  2 Komodo-Dragon 2    +40 =744 -216   *0  412.0  1000   41.2%

Total Games:    1000
White Wins:      154 (15.4%)
Black Wins:      102 (10.2%)
Draws:           744 (74.4%)
Unfinished:        0 (0.0%)

Estimated ratings for this elo 3590 pool

   # PLAYER               :  RATING  POINTS  PLAYED   (%)
   1 Komodo-Dragon 2.5    :  3621.2   588.0    1000    59
   2 Komodo-Dragon 2      :  3558.8   412.0    1000    41
And it produces your estimated elo gain of +63
It would be very interesting to do the same run with your gambit openings, to see whether they are the reason for the smaller elo gain you report on the gambit list or something else. I know you don't have enough positions for 1000 games; you can set "Variety" to a small number like 3 on both engines and the effect on elo will be trivial and balanced to get as many games as you want, or else you can vary the time control by a tiny amount to avoid repeat games. I know that engines will perform differently with gambit openings than with normal ones, for example the good results for "Benjamin" make sense to me because it excels when getting knight odds from Dragon (compared to other similarly rated engines). I am interested in the question of whether the "spread" of ratings (say the standard deviation for a given group of engines) increases going from CCRL to GRL. using BLITZ lists for both to be fair. Logically, it should increase, but it doesn't look like this is the case. Of course the spread is greater with blitz compared to rapid, because the draw percentage is lower in blitz.
By the way, my estimate for single thread blitz was +72 elo (CEGT reports +73), and 72 minus 63 is 9. I estimated from your NPS that no AVX2 should cost us 9 elo just between Dragon 2 and 2.5, so I was off by 1!
Komodo rules!
User avatar
Rebel
Posts: 7475
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Post by Rebel »

I currently have no comp free, I have the 20 core edition running which will take at least 5 days.

After 5 games

Code: Select all

Gambit Rating List
Running      : Gauntlet Dragon 2.5 for the GRL 20 cores rating list
Time Control : Time control : 40/120
Games        : 500

Results from file top5.pgn:

No. Name                 Win Draw Loss Unf.  Score Games       %
----------------------------------------------------------------
  1 Stockfish 14        +214 =280   -7   *0  354.0   501   70.7%
  2 Komodo-Dragon 2     +137 =325  -38   *0  299.5   500   59.9%
  3 Ethereal 13.25-NNUE  +58 =326 -117   *0  221.0   501   44.1%
  4 Koivisto 6.16        +65 =302 -134   *0  216.0   501   43.1%
  5 SlowChess 2.7        +51 =315 -135   *0  208.5   501   41.6%
  6 RubiChess 2.2        +45 =313 -143   *0  201.5   501   40.2%
  7 Komodo-Dragon 2.5     +4   =1   -0   *0    4.5     5   90.0%

Total Games:    1505
White Wins:      281 (18.7%)
Black Wins:      293 (19.5%)
Draws:           931 (61.9%)
Unfinished:        0 (0.0%)

Estimated elo gain for Komodo-Dragon_2.5
Elo pool : 3491
Komodo-Dragon 2 : 3567.0
Komodo-Dragon_2.5 : 3821.9
Difference : 254.9
+254 elo :D
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 7475
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Post by Rebel »

Good news :!:

I ran a provisional rating list.

Code: Select all

   # PLAYER                 :  RATING  ERROR  POINTS  PLAYED   (%)  CFS(%)     W     D     L  D(%)
   1 Stockfish 14           :  3693.5   19.7  3527.0    4500    78      93  2657  1740   103    39
   2 Stockfish 13           :  3677.1    9.4  2660.0    3500    76      99  1915  1490    95    43
   3 Stockfish 21-05-18     :  3659.4   19.4   841.0    1100    76      81   617   448    35    41
   4 Komodo-Dragon 2.5      :  3649.0    6.3  1441.0    2000    72     100   986   910   104    46
   5 Stockfish 12           :  3625.8   12.4  1903.0    2800    68     100  1222  1362   216    49
   6 Komodo-Dragon 2        :  3592.9   10.9  3388.5    4700    72      63  2385  2007   308    43
   7 Komodo-Dragon          :  3590.3   19.3  1998.5    3000    67      51  1317  1363   320    45
   8 Lc0 v28                :  3589.9   25.4   591.5    1000    59     100   335   513   152    51
   9 Lc0-v27                :  3537.2   16.8   501.0     800    63      98   307   388   105    49
  10 Stockfish 11           :  3509.8   20.5   800.0    1300    62      95   518   564   218    43
Dragon 2.5 : 3649
Dragon 2.0 : 3590
Progress : 59 elo

Meaning, something is wrong with the estimated live elo calculation during gauntlet matches. I have to look into that!
90% of coding is debugging, the other 10% is writing bugs.