Stockfish Handicap Matches

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Stockfish Handicap Matches

Post by Rebel »

Trying the elo 2900 pool.

Stockfish gauntlet, knight-odds, tc=40/10

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3273.3     128.0     200   64.0%
   2 Bobcat_8        : 3250.9     122.0     200   61.0%
   3 Stockfish_11    : 3172.5     269.5     600   44.9%
   4 Crafty_25.6     : 3103.3      80.5     200   40.3%
tc=40/20

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3315.9     137.5     200   68.8%
   2 Bobcat_8        : 3294.0     132.0     200   66.0%
   3 Stockfish_11    : 3177.8     274.5     600   45.8%
   4 Crafty_25.6     : 3012.3      56.0     200   28.0%
Next, same engines at tc=40/40

Thereafter 2800 pool.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Stockfish Handicap Matches

Post by Rebel »

Finished the elo 2900 pool.

Stockfish gauntlet, knight-odds, tc=40/10

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3273.3     128.0     200   64.0%
   2 Bobcat_8        : 3250.9     122.0     200   61.0%
   3 Stockfish_11    : 3172.5     269.5     600   44.9%
   4 Crafty_25.6     : 3103.3      80.5     200   40.3%
tc=40/20

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3315.9     137.5     200   68.8%
   2 Bobcat_8        : 3294.0     132.0     200   66.0%
   3 Stockfish_11    : 3177.8     274.5     600   45.8%
   4 Crafty_25.6     : 3012.3      56.0     200   28.0%
tc=40/40

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3318.8     146.0     200   73.0%
   2 Bobcat_8        : 3295.0     140.5     200   70.3%
   3 Stockfish_11    : 3144.5     242.0     600   40.3%
   4 Crafty_25.6     : 3041.7      71.5     200   35.8%
Next, 2800 pool.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Stockfish Handicap Matches

Post by Rebel »

Stockfish gauntlet, knight-odds, ccrl pool 2800 elo.

tc=40/40 only.

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw
   0 Stockfish_11                  -35      26     600   44.9%   14.2%
   1 Weiss_1.0                      63      46     200   59.0%   12.0%
   2 Laser_1.2                      31      44     200   54.5%   16.0%
   3 RuyDos_1.1.11                  12      45     200   51.7%   14.5%
Next and last, ccrl pool < 2800 elo.
90% of coding is debugging, the other 10% is writing bugs.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Stockfish Handicap Matches

Post by lkaufman »

Rebel wrote: Wed Jun 24, 2020 12:18 pm Finished the elo 2900 pool.

Stockfish gauntlet, knight-odds, tc=40/10

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3273.3     128.0     200   64.0%
   2 Bobcat_8        : 3250.9     122.0     200   61.0%
   3 Stockfish_11    : 3172.5     269.5     600   44.9%
   4 Crafty_25.6     : 3103.3      80.5     200   40.3%
tc=40/20

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3315.9     137.5     200   68.8%
   2 Bobcat_8        : 3294.0     132.0     200   66.0%
   3 Stockfish_11    : 3177.8     274.5     600   45.8%
   4 Crafty_25.6     : 3012.3      56.0     200   28.0%
tc=40/40

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3318.8     146.0     200   73.0%
   2 Bobcat_8        : 3295.0     140.5     200   70.3%
   3 Stockfish_11    : 3144.5     242.0     600   40.3%
   4 Crafty_25.6     : 3041.7      71.5     200   35.8%
Next, 2800 pool.
So results improved steadily with more time as expected for cheng and bobcat, but not for crafty (between 40/10 and 40/20 regression); I wonder why? Two questions: How were the positions used chosen from the ChrisW set? I'm finding that taking them from the middle (pruning equal number from each end) is the fairest and closest simulation to real knight odds. Also, did Stockfish use default Contempt, or 0, or max (100)? It would do best with 100 I'm sure.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Stockfish Handicap Matches

Post by lkaufman »

chrisw wrote: Wed Jun 24, 2020 10:21 am
Rebel wrote: Wed Jun 24, 2020 10:15 am Trying the elo 2900 pool.

Stockfish gauntlet, knight-odds, tc=40/10

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3273.3     128.0     200   64.0%
   2 Bobcat_8        : 3250.9     122.0     200   61.0%
   3 Stockfish_11    : 3172.5     269.5     600   44.9%
   4 Crafty_25.6     : 3103.3      80.5     200   40.3%
Next, same engines at tc=40/20
Stockfish11 can give knights odds to the latest Crafty and still win?!
Incredible.
Yes, I've noticed before that Stockfish, Komodo, and even some versions of Lc0 can give knight odds to very strong programs, ones that should be on the level of Carlsen at least in Rapid chess, and still come out ahead at Rapid time controls, but humans only need around 2300 Fide to come out ahead under the same conditions. It's partly due to some engines not knowing the simple principle of trading when ahead, but that's not the whole story. If we could figure out just why this is happening, we could probably improve all engines. My tests showed that if the reduced skill levels of Stockfish are used in place of these weaker engines, the weakened SF level will do MUCH better than engines of equal strength when up a knight. That could be a clue.
Komodo rules!
Chessqueen
Posts: 5576
Joined: Wed Sep 05, 2018 2:16 am
Location: Moving
Full name: Jorge Picado

Re: Stockfish Handicap Matches

Post by Chessqueen »

Rebel wrote: Wed Jun 24, 2020 10:15 am Trying the elo 2900 pool.

Stockfish gauntlet, knight-odds, tc=40/10

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3273.3     128.0     200   64.0%
   2 Bobcat_8        : 3250.9     122.0     200   61.0%
   3 Stockfish_11    : 3172.5     269.5     600   44.9%
   4 Crafty_25.6     : 3103.3      80.5     200   40.3%
Next, same engines at tc=40/20
You should give these two a try :roll:
63 SmarThink 1.98 64-bit 3043 +8 −8 49.0% +5.8 33.5% 6042
64 Spike 1.4 Leiden 4CPU 3040 +8 −8 41.3% +61.0 38.2% 5289
Do NOT worry and be happy, we all live a short life :roll:
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Stockfish Handicap Matches

Post by lkaufman »

Chessqueen wrote: Wed Jun 24, 2020 5:43 pm
Rebel wrote: Wed Jun 24, 2020 10:15 am Trying the elo 2900 pool.

Stockfish gauntlet, knight-odds, tc=40/10

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3273.3     128.0     200   64.0%
   2 Bobcat_8        : 3250.9     122.0     200   61.0%
   3 Stockfish_11    : 3172.5     269.5     600   44.9%
   4 Crafty_25.6     : 3103.3      80.5     200   40.3%
Next, same engines at tc=40/20
You should give SmartThink a try :roll:
63 SmarThink 1.98 64-bit 3043 +8 −8 49.0% +5.8 33.5% 6042 69.4%
Since the engines rated around 2900 were mostly too strong for knight odds, why would he want one rated over 3000? Now if you know if an engine rated well below 2900 that would do well at knight odds, that would be worth testing.
Komodo rules!
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Stockfish Handicap Matches

Post by Rebel »

lkaufman wrote: Wed Jun 24, 2020 5:24 pm
Rebel wrote: Wed Jun 24, 2020 12:18 pm Finished the elo 2900 pool.

Stockfish gauntlet, knight-odds, tc=40/10

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3273.3     128.0     200   64.0%
   2 Bobcat_8        : 3250.9     122.0     200   61.0%
   3 Stockfish_11    : 3172.5     269.5     600   44.9%
   4 Crafty_25.6     : 3103.3      80.5     200   40.3%
tc=40/20

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3315.9     137.5     200   68.8%
   2 Bobcat_8        : 3294.0     132.0     200   66.0%
   3 Stockfish_11    : 3177.8     274.5     600   45.8%
   4 Crafty_25.6     : 3012.3      56.0     200   28.0%
tc=40/40

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3318.8     146.0     200   73.0%
   2 Bobcat_8        : 3295.0     140.5     200   70.3%
   3 Stockfish_11    : 3144.5     242.0     600   40.3%
   4 Crafty_25.6     : 3041.7      71.5     200   35.8%
Next, 2800 pool.
So results improved steadily with more time as expected for cheng and bobcat, but not for crafty (between 40/10 and 40/20 regression); I wonder why? Two questions: How were the positions used chosen from the ChrisW set? I'm finding that taking them from the middle (pruning equal number from each end) is the fairest and closest simulation to real knight odds. Also, did Stockfish use default Contempt, or 0, or max (100)? It would do best with 100 I'm sure.
I prefer out of the box testing.
90% of coding is debugging, the other 10% is writing bugs.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Stockfish Handicap Matches

Post by chrisw »

lkaufman wrote: Wed Jun 24, 2020 5:24 pm
Rebel wrote: Wed Jun 24, 2020 12:18 pm Finished the elo 2900 pool.

Stockfish gauntlet, knight-odds, tc=40/10

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3273.3     128.0     200   64.0%
   2 Bobcat_8        : 3250.9     122.0     200   61.0%
   3 Stockfish_11    : 3172.5     269.5     600   44.9%
   4 Crafty_25.6     : 3103.3      80.5     200   40.3%
tc=40/20

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3315.9     137.5     200   68.8%
   2 Bobcat_8        : 3294.0     132.0     200   66.0%
   3 Stockfish_11    : 3177.8     274.5     600   45.8%
   4 Crafty_25.6     : 3012.3      56.0     200   28.0%
tc=40/40

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3318.8     146.0     200   73.0%
   2 Bobcat_8        : 3295.0     140.5     200   70.3%
   3 Stockfish_11    : 3144.5     242.0     600   40.3%
   4 Crafty_25.6     : 3041.7      71.5     200   35.8%
Next, 2800 pool.
So results improved steadily with more time as expected for cheng and bobcat, but not for crafty (between 40/10 and 40/20 regression); I wonder why? Two questions: How were the positions used chosen from the ChrisW set? I'm finding that taking them from the middle (pruning equal number from each end) is the fairest and closest simulation to real knight odds.
We can’t go cherry picking positions according to subjective criteria. And this concept of “real knight odds” is about as subjective as it gets, and it isn’t reached by asking an engine to evaluate at the root and using that as the definition. Imagine defining “real chess odds” by asking an engine to search from the root and give the answer. 42?
There are no “real knight odds”, all there is are positions without the knight and see how the results work out from *many* tests. We can try to use “natural” positions without either side having an apparent head start, eg remove the outliers.
Nor are we trying to determine what knights odds are in some numerical sense, we trying to determine how modern engines do against strong oldies with various handicaps, the first handicap being minus a knight.

Also, did Stockfish use default Contempt, or 0, or max (100)? It would do best with 100 I'm sure.
It’s better to just use defaults, too much parameter fiddling around just confuses everything.

Anyway, I prepared suites of 25, 100, 250 and 1000 epds. They are each a randomly selected subset of about 1200 epds taken from, I forget, it says in the github readme, roughly 370 to 420 I think. Probably that selection is actually in line with your desires, actually.

Posit from me: the most sensible course would be to use those sets only for a while, we’ll soon see if the 25 suite gives very different results from the 1000 suite, and then we can start worrying if small subsets and the positions in general are too noisy. For example, we don’t know right now if the anomalous(?) results of Crafty are down to unlucky position selection.
Last edited by chrisw on Wed Jun 24, 2020 7:31 pm, edited 1 time in total.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Stockfish Handicap Matches

Post by Rebel »

Stockfish gauntlet, knight-odds, ccrl elo pool <2800

tc=40/40 only.

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw
   0 Stockfish_11                   52      20    1000   57.4%   16.2%
   1 Benjamin                       26      43     200   53.8%   19.5%
   2 ProDeo                         17      45     200   52.5%   14.0%
   3 Fruit_2.3                     -54      45     200   42.3%   16.5%
   4 Fruit_2.1                    -123      47     200   33.0%   15.0%
   5 Ruffian_2                    -135      47     200   31.5%   16.0%   
90% of coding is debugging, the other 10% is writing bugs.