Stockfish and Dragon time odds matches

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Rebel
Posts: 7468
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Stockfish and Dragon time odds matches

Post by Rebel »

A couple of years ago I did a fun research how much time odds lower rated engines needed to beat Stockfish 11, see : http://rebel13.nl/rebel13/stockfish-han ... tches.html

This time we do it different, we pitch SF14 vs sf14 in 3 steps:

Match-1 : SF14 (40/40) vs SF14 (40/80) - one-vs-two-seconds
Match-2 : SF14 (40/40) vs SF14 (40/160) - one-vs-four-seconds
Match-3 : SF14 (40/40) vs SF14 (40/320) - one-vs-eight-seconds

And measure the elo gain.

We do the same with Komodo Dragon 2.5

First match at - http://rebel13.nl/b/grl.htm
90% of coding is debugging, the other 10% is writing bugs.
lkaufman
Posts: 6281
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Stockfish and Dragon time odds matches

Post by lkaufman »

Rebel wrote: Tue Oct 05, 2021 5:08 pm A couple of years ago I did a fun research how much time odds lower rated engines needed to beat Stockfish 11, see : http://rebel13.nl/rebel13/stockfish-han ... tches.html

This time we do it different, we pitch SF14 vs sf14 in 3 steps:

Match-1 : SF14 (40/40) vs SF14 (40/80) - one-vs-two-seconds
Match-2 : SF14 (40/40) vs SF14 (40/160) - one-vs-four-seconds
Match-3 : SF14 (40/40) vs SF14 (40/320) - one-vs-eight-seconds

And measure the elo gain.

We do the same with Komodo Dragon 2.5

First match at - http://rebel13.nl/b/grl.htm
This is very interesting, and I look forward to the results, but the extremely high draw percentage (85% at even the fastest level as I write) will of course lead to small elo gains per doubling even if the win to loss ratios are huge. I notice that so far the games all seem to start with 1a3 or 1b3, what is the reasoning for this? This could increase the draw percentage; in top level chess, it is primarily the fact that White has a noticeable edge after any of the top four first moves that makes for a reasonable number of victories; if all games opened with 1a3 (after which chances are virtually equal), the draw percentage would shoot way up. It is primarily the size of the opening advantage given with preset openings that determines the draw percentage between the top engines; as the edge gets larger the draw percentage drops to about 50% when you reach the win/draw line (about 70 centipawn in traditional eval, more like twice that in SF14 eval). Beyond that point it drops further, but then the frequency of 1 to 1 results goes up. So doubling from 1 to 2 seconds might be worth just 36 elo with the book you are using, but with some seriously unbalanced book, such as one from S. Pohl, doubling might be a hundred elo or more. Anyway regardless of this, your test is fair to both engines and will show that elo gains decline with more time due to increasing draw percentage. But the win to loss ratios may actually increase with more time!
Komodo rules!
amanjpro
Posts: 883
Joined: Sat Mar 13, 2021 1:47 am
Full name: Amanj Sherwany

Re: Stockfish and Dragon time odds matches

Post by amanjpro »

I did something similar for Zahak, but with double the threads for the side at disadvantage. And still single threaded won convinsingly
Uri Blass
Posts: 11148
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish and Dragon time odds matches

Post by Uri Blass »

lkaufman wrote: Tue Oct 05, 2021 6:20 pm
Rebel wrote: Tue Oct 05, 2021 5:08 pm A couple of years ago I did a fun research how much time odds lower rated engines needed to beat Stockfish 11, see : http://rebel13.nl/rebel13/stockfish-han ... tches.html

This time we do it different, we pitch SF14 vs sf14 in 3 steps:

Match-1 : SF14 (40/40) vs SF14 (40/80) - one-vs-two-seconds
Match-2 : SF14 (40/40) vs SF14 (40/160) - one-vs-four-seconds
Match-3 : SF14 (40/40) vs SF14 (40/320) - one-vs-eight-seconds

And measure the elo gain.

We do the same with Komodo Dragon 2.5

First match at - http://rebel13.nl/b/grl.htm
This is very interesting, and I look forward to the results, but the extremely high draw percentage (85% at even the fastest level as I write) will of course lead to small elo gains per doubling even if the win to loss ratios are huge. I notice that so far the games all seem to start with 1a3 or 1b3, what is the reasoning for this? This could increase the draw percentage; in top level chess, it is primarily the fact that White has a noticeable edge after any of the top four first moves that makes for a reasonable number of victories; if all games opened with 1a3 (after which chances are virtually equal), the draw percentage would shoot way up. It is primarily the size of the opening advantage given with preset openings that determines the draw percentage between the top engines; as the edge gets larger the draw percentage drops to about 50% when you reach the win/draw line (about 70 centipawn in traditional eval, more like twice that in SF14 eval). Beyond that point it drops further, but then the frequency of 1 to 1 results goes up. So doubling from 1 to 2 seconds might be worth just 36 elo with the book you are using, but with some seriously unbalanced book, such as one from S. Pohl, doubling might be a hundred elo or more. Anyway regardless of this, your test is fair to both engines and will show that elo gains decline with more time due to increasing draw percentage. But the win to loss ratios may actually increase with more time!
I used stockfish developement version against itslef.
7 cores 30 seconds+0.3 seconds per move against 300+3 fischer time control(meaning time odds of 1:10) and got 3-0 for the stronger side with 37 draws.
Note that I had no adjudications.

Note that I believe that 30+0.3 is practically higher quality for stockfish than 40/40 even with a single core and I used 7 cores.

opening book has all possible white moves.
White as the stronger side drew all games.
White as the weaker side lost 3 games(1.f3 1.g4 1.c4)

I believe that 1.g4 is losing and there are some other positions when black has better advantage then white's initial advantage(for example 1.f3 or 1.Na3)
User avatar
Rebel
Posts: 7468
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Stockfish and Dragon time odds matches

Post by Rebel »

These are all balanced positions despite the weird first move. So it's fair. We can do Stefan's positions later if time allows me.
90% of coding is debugging, the other 10% is writing bugs.
lkaufman
Posts: 6281
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Stockfish and Dragon time odds matches

Post by lkaufman »

Rebel wrote: Tue Oct 05, 2021 7:10 pm These are all balanced positions despite the weird first move. So it's fair. We can do Stefan's positions later if time allows me.
Yes, it is certainly fair, comparisons of different engines and different time controls will be valid. But if the positions are truly balanced, this is not typical of real chess, where White almost always gets an opening advantage in top GM play, and so the draw percentages and elo gains will be smaller with balanced positions than with normal openings that favor White noticeably.
Komodo rules!
User avatar
Rebel
Posts: 7468
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Stockfish and Dragon time odds matches

Post by Rebel »

First results SF14

Code: Select all

No. Name           Win Draw Loss Unf.  Score Games       %
----------------------------------------------------------
  1 SF14-two-secs +133 =841  -26   *0  553.5  1000   55.4%  +37 elo
  2 SF14-one-sec   +26 =841 -133   *0  446.5  1000   44.6%

Code: Select all

No. Name            Win Draw Loss Unf.  Score Games       %
-----------------------------------------------------------
  1 SF14-four-secs +212 =774  -14   *0  599.0  1000   59.9%  +70 elo
  2 SF14-one-sec    +14 =774 -212   *0  401.0  1000   40.1%

Code: Select all

No. Name             Win Draw Loss Unf.  Score Games       %
------------------------------------------------------------
  1 SF14-eight-secs +132 =361   -7   *0  312.5   500   62.5%  +87 elo 
  2 SF14-one-sec      +7 =361 -132   *0  187.5   500   37.5%
Because of the sudden drop in elo gain (87-70=17) I extend the experiment with one more match, 1 second vs 16 seconds.

Overview at - https://prodeo.actieforum.com/t570-stoc ... tches#4865
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 7468
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Stockfish and Dragon time odds matches

Post by Rebel »

Made a start with a web page - http://rebel13.nl/rebel13/time-odds-matches.html

Meanwhile it's the turn of Komodo Dragon 2.5.

http://rebel13.nl/b/grl.htm
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 7468
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Stockfish and Dragon time odds matches

Post by Rebel »

Komodo is amazing, it still produces +50 elo after each doubling of speed (time control).

Komodo Dragon 2.5 Results

Code: Select all

No. Name                 Win Draw Loss Unf.  Score Games       %
----------------------------------------------------------------
  1 Dragon-2.5-two-secs +195 =758  -47   *0  574.0  1000   57.4%  +51 elo
  2 Dragon-2.5-one-sec   +47 =758 -195   *0  426.0  1000   42.6%

Code: Select all

No. Name                  Win Draw Loss Unf.  Score Games       %
-----------------------------------------------------------------
  1 Dragon-2.5-four-secs +304 =674  -22   *0  641.0  1000   64.1%  +98.7 elo
  2 Dragon-2.5-one-sec    +22 =674 -304   *0  359.0  1000   35.9%
Current factor-8 match currently (after 150 games) is at +160

http://rebel13.nl/b/grl.htm
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 7468
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Stockfish and Dragon time odds matches

Post by Rebel »

First part of the time-odds experiment finished.

Image

See page - http://rebel13.nl/rebel13/time-odds-matches.html

Next, measuring the diminishing returns.

http://rebel13.nl/b/grl.htm
90% of coding is debugging, the other 10% is writing bugs.