Komodo-Dragon-2 vs Stockfish 14 at knight odss

Rebel · Post by **Rebel** » Sun Sep 26, 2021 2:52 pm

Odds match minus pawn f2 - Dragon 2.5 vs an elo pool of 3364 engines

Odds match minus pawn f2 - Dragon 2.5 vs an elo pool of 3364 engines 
Time Control : Time control : 40/120
Games        : 700

Results from file all.pgn:

No. Name               Win Draw Loss Unf.  Score Games       %
--------------------------------------------------------------
  1 Komodo-Dragon 2.5 +225 =306 -169   *0  378.0   700   54.0%
  2 Ethereal 12.75     +31  =50  -19   *0   56.0   100   56.0%
  3 Pedone 3.1         +21  =61  -18   *0   51.5   100   51.5%
  4 Komodo 12          +32  =34  -34   *0   49.0   100   49.0%
  5 Komodo 11          +31  =33  -36   *0   47.5   100   47.5%
  6 Stockfish 8        +25  =34  -41   *0   42.0   100   42.0%
  7 Igel 3.0.5         +13  =53  -34   *0   39.5   100   39.5%
  8 Igel 3.0.0         +16  =41  -43   *0   36.5   100   36.5%

lkaufman · Post by **lkaufman** » Sun Sep 26, 2021 4:35 pm

Rebel wrote: ↑Sun Sep 26, 2021 2:52 pm Odds match minus pawn f2 - Dragon 2.5 vs an elo pool of 3364 engines

Code: Select all

Odds match minus pawn f2 - Dragon 2.5 vs an elo pool of 3364 engines 
Time Control : Time control : 40/120
Games        : 700

Results from file all.pgn:

No. Name               Win Draw Loss Unf.  Score Games       %
--------------------------------------------------------------
  1 Komodo-Dragon 2.5 +225 =306 -169   *0  378.0   700   54.0%
  2 Ethereal 12.75     +31  =50  -19   *0   56.0   100   56.0%
  3 Pedone 3.1         +21  =61  -18   *0   51.5   100   51.5%
  4 Komodo 12          +32  =34  -34   *0   49.0   100   49.0%
  5 Komodo 11          +31  =33  -36   *0   47.5   100   47.5%
  6 Stockfish 8        +25  =34  -41   *0   42.0   100   42.0%
  7 Igel 3.0.5         +13  =53  -34   *0   39.5   100   39.5%
  8 Igel 3.0.0         +16  =41  -43   *0   36.5   100   36.5%

These engines are all rated over 3400 on the CCRL blitz list (or nearly identical versions, like Komodo 11.01), quite a bit higher than your own list average of 3364. I wondered why this was so. On your main page for the gambit rating list you have a comparison with CCRL, but I think you are comparing your blitz ratings to their Rapid ratings, should be comparing blitz for both. Since CCRL uses BayesElo which contracts rating differences, I would expect ratings of engines near the top to be lower on their blitz list than on yours, but they are clearly higher! I'm trying to think of an explanation for this, do you have any idea? I wouldn't expect your choice of gambit openings to shrink rating differences, that is bizarre.

Rebel · Post by **Rebel** » Sun Sep 26, 2021 4:43 pm

Trying other engines with nite-odds.......

First Lc0-v28 CPU

Code: Select all

Knight odds match - Lc0 v28 CPU vs an elo pool of 2715 engines 
Time Control : Time control : 40/120
Games        : 700

Results from file all.pgn:

No. Name          Win Draw Loss Unf.  Score Games       %
---------------------------------------------------------
  1 ProDeo 2.2    +20   =0   -0   *0   20.0    20  100.0%
  2 Velvet 1.2.0  +18   =2   -0   *0   19.0    20   95.0%
  3 Benjamin 1.0  +18   =0   -2   *0   18.0    20   90.0%
  4 Fruit 2.1     +18   =0   -3   *0   18.0    21   85.7%
  5 Zahak 5.0     +18   =0   -3   *0   18.0    21   85.7%
  6 k2 099        +17   =1   -2   *0   17.5    20   87.5%
  7 Dumb 1.8      +17   =0   -3   *0   17.0    20   85.0%
  8 Lzero v28     +13   =3 -126   *0   14.5   142   10.2%

Aborted after 100+ games, no match for Dragon.

Chessqueen · Post by **Chessqueen** » Sun Sep 26, 2021 4:58 pm

You should switch to TC 40/300 and see what happen with this Pool of Engines, with 500 games

Trying other engines with Knight-odds.......

First Lc0-v28 CPU

Code: Select all

Knight odds match - Dragon2.5 vs an elo pool of 2500 engines 
Time Control : Time control : 40/300
Games        : 500

Uri Blass · Post by **Uri Blass** » Sun Sep 26, 2021 6:51 pm

Chessqueen wrote: ↑Sun Sep 26, 2021 4:58 pm You should switch to TC 40/300 and see what happen with this Pool of Engines, with 500 games

Trying other engines with Knight-odds.......

First Lc0-v28 CPU
Code: Select all
Knight odds match - Dragon2.5 vs an elo pool of 2500 engines 
Time Control : Time control : 40/300
Games        : 500

I prefer to see with odd match ponder off constant time control for the opponents of dragon and more time for dragon to see if at some point more time for dragon is simply counter productive(not because the opponent play better but because dragon afraid of things there are no chances the opponent is going to see)

It may be interesting also to try non stockfish dragon engines.
I tried Wasp 7 cores contempt 0 with b1 knight odds+every legal move against Fruit21 at time control 40 moves/120 seconds

Fruit won 11-7 with 1 draw.

Note that every forcing Wasp to get every legal move is another handicap relative to knight odds because in a real knight odd match you usually have the choice of playing a better first move.

I may repeat it also with positive contempt and also repeat it with one core later to find the difference in score

lkaufman · Post by **lkaufman** » Sun Sep 26, 2021 7:02 pm

Uri Blass wrote: ↑Sun Sep 26, 2021 6:51 pm
Chessqueen wrote: ↑Sun Sep 26, 2021 4:58 pm You should switch to TC 40/300 and see what happen with this Pool of Engines, with 500 games

Trying other engines with Knight-odds.......

First Lc0-v28 CPU
Code: Select all
Knight odds match - Dragon2.5 vs an elo pool of 2500 engines 
Time Control : Time control : 40/300
Games        : 500
I prefer to see with odd match ponder off constant time control for the opponents of dragon and more time for dragon to see if at some point more time for dragon is simply counter productive(not because the opponent play better but because dragon afraid of things there are no chances the opponent is going to see)

It may be interesting also to try non stockfish dragon engines.
I tried Wasp 7 cores contempt 0 with b1 knight odds+every legal move against Fruit21 at time control 40 moves/120 seconds

Fruit won 11-7 with 1 draw.

Note that every forcing Wasp to get every legal move is another handicap relative to knight odds because in a real knight odd match you usually have the choice of playing a better first move.

I may repeat it also with positive contempt and also repeat it with one core later to find the difference in score

I've done tests like that with Dragon 2.5 but using fixed depth on one thread as the variable for Dragon, with Critter 1.6a at 11 ply as the engine getting knight odds. Critter 1.6a is of course too strong for knight odds in timed games, but with 11 ply it is about the level in blitz of the engines that are a good match for Dragon at knight odds. Peak performance giving knight odds seems to be at 23 ply for Dragon, it goes downhill beyond that (although not completely proven beyond error margins). With more threads peak comes a bit earlier, looks like 22 ply on four threads is about optimum. I think that with eight or more threads just using Skill 34 (rather than 35) is near-optimal for knight odds.

Rebel · Post by **Rebel** » Sun Sep 26, 2021 8:31 pm

lkaufman wrote: ↑Sun Sep 26, 2021 4:35 pm
Rebel wrote: ↑Sun Sep 26, 2021 2:52 pm Odds match minus pawn f2 - Dragon 2.5 vs an elo pool of 3364 engines
Code: Select all
Odds match minus pawn f2 - Dragon 2.5 vs an elo pool of 3364 engines 
Time Control : Time control : 40/120
Games        : 700

Results from file all.pgn:

No. Name               Win Draw Loss Unf.  Score Games       %
--------------------------------------------------------------
  1 Komodo-Dragon 2.5 +225 =306 -169   *0  378.0   700   54.0%
  2 Ethereal 12.75     +31  =50  -19   *0   56.0   100   56.0%
  3 Pedone 3.1         +21  =61  -18   *0   51.5   100   51.5%
  4 Komodo 12          +32  =34  -34   *0   49.0   100   49.0%
  5 Komodo 11          +31  =33  -36   *0   47.5   100   47.5%
  6 Stockfish 8        +25  =34  -41   *0   42.0   100   42.0%
  7 Igel 3.0.5         +13  =53  -34   *0   39.5   100   39.5%
  8 Igel 3.0.0         +16  =41  -43   *0   36.5   100   36.5%
These engines are all rated over 3400 on the CCRL blitz list (or nearly identical versions, like Komodo 11.01), quite a bit higher than your own list average of 3364. I wondered why this was so. On your main page for the gambit rating list you have a comparison with CCRL, but I think you are comparing your blitz ratings to their Rapid ratings, should be comparing blitz for both. Since CCRL uses BayesElo which contracts rating differences, I would expect ratings of engines near the top to be lower on their blitz list than on yours, but they are clearly higher! I'm trying to think of an explanation for this, do you have any idea? I wouldn't expect your choice of gambit openings to shrink rating differences, that is bizarre.

The height of elo values in rating lists are defined by using anchor engines, for the GRL I use 4 anchor engines to be more or less compatible with the CCRL values. Anchor engines are rock solid engines that played thousands of games and thus have a reliable elo. For instance, I use Critter 1.6a as an anchor engine with a fixed elo of 3150 which I borrowed from CCRL 40/15, it currently has 3157.I use Houdini 6 (derivatives come in handy) as an anchor engine of 3400 elo, it currently has 3394. Fruit 2.1 as 2700, Nemo as 2850, also borrowed from CCRL 40/15. Now suppose I change the value of Houdini to 3500, the rating list values of 3400+ engines will go up unrealistic big time, lowering it to 3300 will have the opposite effect. Meaning, with anchor engines I can create a rating list with SF14 on top with 2000 elo, however... the order remains the same.

Secondly responding on the part I bold, have a look at my research CCRL vs GRL - a comparison, gambit openings do make sense. If they did not I would have stopped the GRL long time ago.

Rebel · Post by **Rebel** » Sun Sep 26, 2021 8:35 pm

Koivisto does a lot better than SF14

Code: Select all

Knight odds match - Koivisto 6.16 vs an elo pool of 2715 engines 
Time Control : Time control : 40/120
Games        : 700

Results from file all.pgn:

No. Name           Win Draw Loss Unf.  Score Games       %
----------------------------------------------------------
  1 Koivisto 6.16 +299  =67 -334   *0  332.5   700   47.5%
  2 Benjamin 1.0   +67   =6  -27   *0   70.0   100   70.0%
  3 Dumb 1.8       +53  =12  -35   *0   59.0   100   59.0%
  4 ProDeo 2.2     +55   =4  -41   *0   57.0   100   57.0%
  5 k2 099         +46  =11  -43   *0   51.5   100   51.5%
  6 Zahak 5.0      +46   =9  -45   *0   50.5   100   50.5%
  7 Velvet 1.2.0   +33  =17  -50   *0   41.5   100   41.5%
  8 Fruit 2.1      +34   =8  -58   *0   38.0   100   38.0%

lkaufman · Post by **lkaufman** » Sun Sep 26, 2021 9:23 pm

Rebel wrote: ↑Sun Sep 26, 2021 8:31 pm
lkaufman wrote: ↑Sun Sep 26, 2021 4:35 pm
Rebel wrote: ↑Sun Sep 26, 2021 2:52 pm Odds match minus pawn f2 - Dragon 2.5 vs an elo pool of 3364 engines
Code: Select all
Odds match minus pawn f2 - Dragon 2.5 vs an elo pool of 3364 engines 
Time Control : Time control : 40/120
Games        : 700

Results from file all.pgn:

No. Name               Win Draw Loss Unf.  Score Games       %
--------------------------------------------------------------
  1 Komodo-Dragon 2.5 +225 =306 -169   *0  378.0   700   54.0%
  2 Ethereal 12.75     +31  =50  -19   *0   56.0   100   56.0%
  3 Pedone 3.1         +21  =61  -18   *0   51.5   100   51.5%
  4 Komodo 12          +32  =34  -34   *0   49.0   100   49.0%
  5 Komodo 11          +31  =33  -36   *0   47.5   100   47.5%
  6 Stockfish 8        +25  =34  -41   *0   42.0   100   42.0%
  7 Igel 3.0.5         +13  =53  -34   *0   39.5   100   39.5%
  8 Igel 3.0.0         +16  =41  -43   *0   36.5   100   36.5%
These engines are all rated over 3400 on the CCRL blitz list (or nearly identical versions, like Komodo 11.01), quite a bit higher than your own list average of 3364. I wondered why this was so. On your main page for the gambit rating list you have a comparison with CCRL, but I think you are comparing your blitz ratings to their Rapid ratings, should be comparing blitz for both. Since CCRL uses BayesElo which contracts rating differences, I would expect ratings of engines near the top to be lower on their blitz list than on yours, but they are clearly higher! I'm trying to think of an explanation for this, do you have any idea? I wouldn't expect your choice of gambit openings to shrink rating differences, that is bizarre.
The height of elo values in rating lists are defined by using anchor engines, for the GRL I use 4 anchor engines to be more or less compatible with the CCRL values. Anchor engines are rock solid engines that played thousands of games and thus have a reliable elo. For instance, I use Critter 1.6a as an anchor engine with a fixed elo of 3150 which I borrowed from CCRL 40/15, it currently has 3157.I use Houdini 6 (derivatives come in handy) as an anchor engine of 3400 elo, it currently has 3394. Fruit 2.1 as 2700, Nemo as 2850, also borrowed from CCRL 40/15. Now suppose I change the value of Houdini to 3500, the rating list values of 3400+ engines will go up unrealistic big time, lowering it to 3300 will have the opposite effect. Meaning, with anchor engines I can create a rating list with SF14 on top with 2000 elo, however... the order remains the same.

Secondly responding on the part I bold, have a look at my research CCRL vs GRL - a comparison, gambit openings do make sense. If they did not I would have stopped the GRL long time ago.

I think you missed my point, that your CCRl vs GRL comparison seems to be the CCRL Rapid list vs. the GRL blitz list, which is not the proper comparison; should be CCRL Blitz list vs. GRL Blitz list. I quite like and agree with your use of gambit openings for your list, that's why I'm so puzzled that your ratings show a smaller range than the CCRL BLITZ list does. I can't explain it. I must be missing something. One other question, are you now using AVX2 versions of Dragon and Stockfish, or is the hardware too old? If not, that might explain why you got a much smaller elo gain than CEGT, which reports +73 elo in blitz over Dragon 2.

Rebel · Post by **Rebel** » Sun Sep 26, 2021 11:42 pm

lkaufman wrote: ↑Sun Sep 26, 2021 9:23 pm
Rebel wrote: ↑Sun Sep 26, 2021 8:31 pm
lkaufman wrote: ↑Sun Sep 26, 2021 4:35 pm
Rebel wrote: ↑Sun Sep 26, 2021 2:52 pm Odds match minus pawn f2 - Dragon 2.5 vs an elo pool of 3364 engines
Code: Select all
Odds match minus pawn f2 - Dragon 2.5 vs an elo pool of 3364 engines 
Time Control : Time control : 40/120
Games        : 700

Results from file all.pgn:

No. Name               Win Draw Loss Unf.  Score Games       %
--------------------------------------------------------------
  1 Komodo-Dragon 2.5 +225 =306 -169   *0  378.0   700   54.0%
  2 Ethereal 12.75     +31  =50  -19   *0   56.0   100   56.0%
  3 Pedone 3.1         +21  =61  -18   *0   51.5   100   51.5%
  4 Komodo 12          +32  =34  -34   *0   49.0   100   49.0%
  5 Komodo 11          +31  =33  -36   *0   47.5   100   47.5%
  6 Stockfish 8        +25  =34  -41   *0   42.0   100   42.0%
  7 Igel 3.0.5         +13  =53  -34   *0   39.5   100   39.5%
  8 Igel 3.0.0         +16  =41  -43   *0   36.5   100   36.5%
These engines are all rated over 3400 on the CCRL blitz list (or nearly identical versions, like Komodo 11.01), quite a bit higher than your own list average of 3364. I wondered why this was so. On your main page for the gambit rating list you have a comparison with CCRL, but I think you are comparing your blitz ratings to their Rapid ratings, should be comparing blitz for both. Since CCRL uses BayesElo which contracts rating differences, I would expect ratings of engines near the top to be lower on their blitz list than on yours, but they are clearly higher! I'm trying to think of an explanation for this, do you have any idea? I wouldn't expect your choice of gambit openings to shrink rating differences, that is bizarre.
The height of elo values in rating lists are defined by using anchor engines, for the GRL I use 4 anchor engines to be more or less compatible with the CCRL values. Anchor engines are rock solid engines that played thousands of games and thus have a reliable elo. For instance, I use Critter 1.6a as an anchor engine with a fixed elo of 3150 which I borrowed from CCRL 40/15, it currently has 3157.I use Houdini 6 (derivatives come in handy) as an anchor engine of 3400 elo, it currently has 3394. Fruit 2.1 as 2700, Nemo as 2850, also borrowed from CCRL 40/15. Now suppose I change the value of Houdini to 3500, the rating list values of 3400+ engines will go up unrealistic big time, lowering it to 3300 will have the opposite effect. Meaning, with anchor engines I can create a rating list with SF14 on top with 2000 elo, however... the order remains the same.

Secondly responding on the part I bold, have a look at my research CCRL vs GRL - a comparison, gambit openings do make sense. If they did not I would have stopped the GRL long time ago.
I think you missed my point, that your CCRl vs GRL comparison seems to be the CCRL Rapid list vs. the GRL blitz list, which is not the proper comparison; should be CCRL Blitz list vs. GRL Blitz list. I quite like and agree with your use of gambit openings for your list, that's why I'm so puzzled that your ratings show a smaller range than the CCRL BLITZ list does. I can't explain it. I must be missing something. One other question, are you now using AVX2 versions of Dragon and Stockfish, or is the hardware too old? If not, that might explain why you got a much smaller elo gain than CEGT, which reports +73 elo in blitz over Dragon 2.

1. Well, rating lists are not exact science. I have chosen for the 40/15 list because the ratings are more reliable. An example would the comparison between SF12 and SF13. If you look at the 40/2 list SF12 is rated higher than SF13 while on the 40/15 list SF13 is rated higher, as it normally should.

2. As for AVX2, I have AVX but not AVX2. But I don't think it matters much because all the opponents Dragon played had the same hardware.

3. Elo pools are important. Perhaps (emphasis added) it could be that the performance of 2.5 would be (somewhat) better if 2.5 had played the exact same opponents version 2.0 played.

4. I tested the NPS -
Dragon 2.0 - 700,000
Dragon 2.5 - 570,000

Komodo-Dragon-2 vs Stockfish 14 at knight odss

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss

Re: Komodo-Dragon-2 vs Stockfish 14 at knight odss