Stockfish Handicap Matches

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Stockfish Handicap Matches

Post by Rebel »

Now that Crafty results are ok, the knight-odds engines list at 40m/40s sofar.

SF11 gauntlet at knight-odds created with ORDO -a 3100

Code: Select all

   # ENGINE             : RATING    POINTS  PLAYED    (%)
   1 Stockfish_11     > : 3764.7     200.0     200  100.0%
   2 Komodo_14          : 3569.0     198.5     200   99.3%
   3 Houdini_6.03       : 3337.7     194.5     200   97.3%
   4 Ethereal_12.25     : 3307.5     193.5     200   96.8%
   5 rofChade_2.3       : 3238.3     190.5     200   95.3%
   6 Fire_7.1           : 3228.8     190.0     200   95.0%
   7 Xiphos_0.6         : 3219.8     189.5     200   94.8%
   8 Andscacs_0.95      : 3153.1     185.0     200   92.5%
   9 Booot_6.4          : 3135.0     183.5     200   91.8%
  10 RubiChess_1.7.3    : 3129.3     183.0     200   91.5%
  11 Laser_1.7          : 3118.3     182.0     200   91.0%
  12 Schooner_2.2       : 3093.1     179.5     200   89.8%
  13 Demolito           : 3057.9     175.5     200   87.8%
  14 Wasp_4.00          : 3049.9     174.5     200   87.3%
  15 Senpai_2           : 3013.4     169.5     200   84.8%
  16 Defenchess_2.2     : 3010.0     169.0     200   84.5%
  17 ice_4.0            : 3006.7     168.5     200   84.3%
  18 Texel_1.7          : 2997.0     167.0     200   83.5%
  19 Arasan_22          : 2987.6     165.5     200   82.8%
  20 Vajolet_2.8        : 2958.5     160.5     200   80.3%
  21 Shredder_13        : 2947.7     158.5     200   79.3%
  22 cheng4_4.39        : 2861.3     140.0     200   70.0%
  23 Weiss_1.0          : 2851.0     137.5     200   68.8%
  24 Bobcat_8           : 2840.9     135.0     200   67.5%
  25 Crafty_25.6        : 2782.1     119.5     200   59.8%
  26 Benjamin           : 2739.2     107.5     200   53.8%
  27 ProDeo             : 2730.4     105.0     200   52.5%
  28 SF11               : 2712.8    1264.0    6000   21.1%
  29 Fruit_2.3          : 2658.1      84.5     200   42.3%
  30 Fruit_2.1          : 2588.7      66.0     200   33.0%
  31 Ruffian_2          : 2576.7      63.0     200   31.5%
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Stockfish Handicap Matches

Post by Rebel »

lkaufman wrote: Fri Jun 26, 2020 2:22 am Very good, it looks like Benjamin is much closer to what I'm looking for than most (if not all) other engines. Based on CCRL ratings and my estimates for converting to FIDE rapid, it should be roughly a fair match at standard chess, Rapid (15' + 10") time control, with Magnus Carlsen on one core on a modern I7. By extrapolating to about 25" per move it should score somewhere in the 80 to 90% range at knight odds vs Komodo. That's probably still less than Carlsen would score, but it's not ridiculous. So, some questions about Benjamin.
1. Is it currently available? If so, how?
http://rebel13.nl/home/benjamin.html
lkaufman wrote: Fri Jun 26, 2020 2:22 am 2. Is there a Linux version? I use Windows myself, but our Komodo tester uses Linux. I can test either way, but much faster on our tester.
Windows only.
lkaufman wrote: Fri Jun 26, 2020 2:22 am 3. Does it have a way to reduce the level of play (moderately), other than just giving it less time? If not, shortening the time should work fine.
Various options, as the web-page already stated set the [Gambit = 120] to 110 or 100. As for time control there are 2 parameters:

Code: Select all

[Time Control = 100]         // for regular time control, lower values will result is faster play.
[Blitz Time Control = 120]   // for game in x minutes + fischer bonus, it's not very well tested, plays too fast, better value probably 150.
lkaufman wrote: Fri Jun 26, 2020 2:22 am 4. Any insights into why it is so much better at this than Fruit? You mentioned gambit-style play, but that's not normally the sensible way to exploit an extra piece. More like the way to play when down a piece!
Well, I respectfully disagree :D Playing a 700 elo higher rated engine, don't give it the chance to outsearch you by passive play so that it can create counter chances, setup a king attack, create a dangerous passer.
lkaufman wrote: Fri Jun 26, 2020 2:22 am 5. Final question: When you say "1 sec" (for example) above, is that movetime = 1 second, or 40 moves in 40 seconds so average one second? It's not a huge difference, but obviously quality of play is higher in the second case.
Average one second, 40m/40s.
lkaufman wrote: Fri Jun 26, 2020 2:22 am The idea here is that when and if we find a way to make Komodo play much better down a piece than it does now, we need a way to prove this without going to the trouble and expense of a GM match without some idea that we might do well. While it is unrealistic to expect to beat an active GM at knight odds in a Rapid match, if we specify Armageddon knight odds, meaning that draws count as wins for White (which is logical, he's down a piece), then it becomes a realistic goal.
Here are some hints playing GM's during the 1997-2003 period and the things I learned:
1. Complicate the position by leaving your pieces en prise applying a small bonus for root moves, hanging pawn 0.05, knight/bishop 0.10, or so.
2. Play on the clock of your opponent by moving much faster when the time control (usually move 40) is near.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Handicap Matches

Post by Laskos »

lkaufman wrote: Thu Jun 25, 2020 5:56 pm
Laskos wrote: Thu Jun 25, 2020 7:47 am
Laskos wrote: Thu Jun 25, 2020 7:08 am
lkaufman wrote: Thu Jun 25, 2020 4:52 am
Rebel wrote: Wed Jun 24, 2020 8:30 pm
lkaufman wrote: Wed Jun 24, 2020 7:57 pm
Rebel wrote: Wed Jun 24, 2020 7:31 pm Stockfish gauntlet, knight-odds, ccrl elo pool <2800

tc=40/40 only.

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw 
   0 Stockfish_11                   52      20    1000   57.4%   16.2%  
   1 Benjamin                       26      43     200   53.8%   19.5%
   2 ProDeo                         17      45     200   52.5%   14.0%
   3 Fruit_2.3                     -54      45     200   42.3%   16.5%
   4 Fruit_2.1                    -123      47     200   33.0%   15.0%  
   5 Ruffian_2                    -135      47     200   31.5%   16.0%   
Most of these engines don't have an exactly named copy in CCRL 40/15 (Benjamin and ProDeo have version numbers, two others aren't identical), but it looks like roughly 2750 on that list is the break-even point for SF 11 at that TC. I'll be curious to see if Komodo 14 can score as well against the same opponents. It would score much better with high Contempt, but so would Stockfish, so I guess it's fair enough to compare.
Have it running, takes about 2 hours.

Regarding elo's:

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw  CCRL
   0 Stockfish_11                   52      20    1000   57.4%   16.2%  3537
   1 Benjamin                       26      43     200   53.8%   19.5%  2646
   2 ProDeo 2.2                     17      45     200   52.5%   14.0%  2770
   3 Fruit_2.3                     -54      45     200   42.3%   16.5%  2783
   4 Fruit_2.1                    -123      47     200   33.0%   15.0%  2684
   5 Ruffian_2                    -135      47     200   31.5%   16.0%  2608
Interesting is that Benjamin while listed 124 elo less than ProDeo scores better. I think that much of this kind of testing has to do with the killer instinct of an engine. And Benjamin is the gambit version of ProDeo. Style decides?
I think I've figured out a big part of the mystery of why both Stockfish and Komodo can give knight odds to such strong engines in your tests. When I first read that you were testing at 40/10, I assumed that the 10 meant ten minutes, since we are constantly referring to CCRL tests at 40/15 meaning 15 minutes. But when you starting giving results at 40/40 and even 40/80, I realized that you must mean 40/x SECONDS, not minutes, since you are unlikely to have enough hardware to play so many games so quickly (correct me if I'm wrong here)! So even these 40/40 results are (if I'm right now) bullet games, not rapid games. At bullet chess even the top human GMs would have trouble beating Stockfish or Komodo at knight odds. So the question is, what level engine can Komodo and Stockfish give knight odds to in rapid chess (15' + 10" being the standard now)? Obviously, it will be a weaker engine than these 2700+ engines, but how much weaker? I do have one data point: at 3' + 2" (blitz, roughly midway between bullet and rapid) I got a +94 elo result for Komodo 14 vs. Arasan 14 64 bit, about 2640 on CCRL 40/15 list (est. based on versions just before and after), which is a 2734 performance. But I used Contempt 150, which helps a lot at knight odds; I'll have to redo the test without Contempt (or with the default of just 4). My best guess is that with default Contempt, Komodo 14 and SF need opponents in the mid 2600s in blitz, and in the mid 2500s in Rapid (15' + 10"). I can run Komodo vs. Arasan 14 overnight at Rapid with default Contempt; if I'm correct Komodo will lose but not too badly. I run 63 games at once, so the thousand games won't finish in 8 or 9 hours, but maybe in 15 hours or so.
We hyper-analyzed these issues several years ago. IIRC, at bullet engine-engine Knight odds are some 600 (logistic) engine Elo points, at rapid 45min + 15s some 1200 logistic engine ELO points, and by continuation at tournament TC maybe 1400 engine Elo points (the last one was never really tested). The handicap is heavily TC dependent, and with humans it could be even more dependent (humans are weak at bullet and blitz). I don't think that even a perfect engine can give Knight odds to Carlsen at tournament TC. Currently SF and Komodo in whatever configuration are no stronger than 2100 FIDE Elo points Knight odds at tournament TC against a human.
My estimates for good "human" sparring at tournament TC is Lc0

11248 at 1000 nodes --- 2750 FIDE
11248 at 100 nodes --- 2450 FIDE
11248 at 10 nodes --- 2100 FIDE.

Again, tournament TC. At Blitz 5min + 3s add some 300 Elo points to those to get FIDE Blitz rating. All this is rough estimation, but it can be improved by testing.


Yes, we reached those conclusions a few years ago based on self-testing of the same engine at different time controls, IIRC. Whether it applies to games involving humans or to games with NN or NNUE engines that didn't exist then is unknown. Your estimate of 2100 FIDE Elo for Knight odds at TC vs human is pretty consistent with my estimate of 2300 for same at 15' + 10". In this new world of online-only chess it seems that 5 hour games may be mostly a thing of the past, and that Rapid has become the main form of chess, like it or not. Lc0 is probably a better substitute for a human than a conventional program for these tests; note that the networks starting with 70xxx are now the best for such purposes as they reverted to the no-resign training that made 11248 so good at playing lost positions, and they now play quite well down a piece or more, and they are stronger and play more sensibly than the old network. For testing against CPU engines there is the problem that you can't make proper use of a machine with many CPUs and just one GPU; you could use the Lc0 cpu version I suppose. Perhaps the best "human" for such tests might be an NNUE like Stockfish NNUE, but it's a bit early to say yet.
My overnight test of Komodo 14 giving knight odds to Arasan 14 at 15' + 10" on one thread is down 8.6 elo after 686 games, implying an elo of about 2570 CCRL Rapid at knight odds. This would probably mean something like 2670 on 32 Threads (each doubling is worth much less giving knight odds than in normal chess, so maybe 20 elo per doubling), which in turn means something like 2900 FIDE Rapid based on some estimates I made of the likely human equivalence in Rapid of CCRL rapid ratings, and this was without even setting Contempt. So it appears that using conventional engines to predict the rating needed to beat Komodo at knight odds in Rapid overstates the reality by something like 600 elo! Exactly why this is so is a bit of a mystery, even allowing for the fact that the engines don't fully appreciate the circumstances and aren't optimized for winning when up a piece. Perhaps this won't be the case if we substitute an NN (or NNUE) engine for the human.
Thanks for the tip on T70, it really a top dog now in these handicap matches. I adjusted the following:

SF nodes=20000 is sensibly equal to T70 nodes=60 and can be approximated to ~2300 FIDE Elo at tournament TC and 2600 FIDE Elo at 5min + 3sec. But in handicap match, full strength Komodo on 4 strong i7 threads at 60s + 0.6s with Contempt=150 Knight Odds, T70 performs incomparably better compared to SF:

Code: Select all

Rank Name                         Elo     +/-   Games   Score    Draw
   0 K_14 60s+0.6s no Knight       53      70      80   57.5%   20.0%
   
   1 T70 nodes=60                  53     103      40   57.5%   15.0%
   2 SF_dev nodes=20000          -168     105      40   27.5%   25.0%
The difference compared to SF is more than 200 Elo points, which is a big deal at Knight Odds. The strength of Komodo of 4 strong i7 threads at 60s + 0.6s is about that of one regular core at 5min + 3 sec. So, on one thread if T70 nodes=60 mimics well humans and converts a half of wins, Komodo (Contempt 150) can give Knight Odds at 5min + 3sec to FIDE 2600 Elo humans. On 32 cores, the difference with Knight Odds is not that large, maybe 2650 FIDE Elo points at 5min + 3sec. So, playing games in 5min + 3sec at Knight Odds with Komodo against a strong GM seems reasonable. At 15min + 10sec, the odds can be given to probably a 2450 FIDE rated player. At tournament time control, to no more than 2100 FIDE rated player.

These are rough estimates which can be off by 100 Elo points. It is interesting to observe that regular engines like SF perform pretty miserably compared to T70.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Stockfish Handicap Matches

Post by lkaufman »

Rebel wrote: Fri Jun 26, 2020 12:33 pm
lkaufman wrote: Fri Jun 26, 2020 2:22 am Very good, it looks like Benjamin is much closer to what I'm looking for than most (if not all) other engines. Based on CCRL ratings and my estimates for converting to FIDE rapid, it should be roughly a fair match at standard chess, Rapid (15' + 10") time control, with Magnus Carlsen on one core on a modern I7. By extrapolating to about 25" per move it should score somewhere in the 80 to 90% range at knight odds vs Komodo. That's probably still less than Carlsen would score, but it's not ridiculous. So, some questions about Benjamin.
1. Is it currently available? If so, how?
http://rebel13.nl/home/benjamin.html
lkaufman wrote: Fri Jun 26, 2020 2:22 am 2. Is there a Linux version? I use Windows myself, but our Komodo tester uses Linux. I can test either way, but much faster on our tester.
Windows only.
lkaufman wrote: Fri Jun 26, 2020 2:22 am 3. Does it have a way to reduce the level of play (moderately), other than just giving it less time? If not, shortening the time should work fine.
Various options, as the web-page already stated set the [Gambit = 120] to 110 or 100. As for time control there are 2 parameters:

Code: Select all

[Time Control = 100]         // for regular time control, lower values will result is faster play.
[Blitz Time Control = 120]   // for game in x minutes + fischer bonus, it's not very well tested, plays too fast, better value probably 150.
lkaufman wrote: Fri Jun 26, 2020 2:22 am 4. Any insights into why it is so much better at this than Fruit? You mentioned gambit-style play, but that's not normally the sensible way to exploit an extra piece. More like the way to play when down a piece!
Well, I respectfully disagree :D Playing a 700 elo higher rated engine, don't give it the chance to outsearch you by passive play so that it can create counter chances, setup a king attack, create a dangerous passer.
lkaufman wrote: Fri Jun 26, 2020 2:22 am 5. Final question: When you say "1 sec" (for example) above, is that movetime = 1 second, or 40 moves in 40 seconds so average one second? It's not a huge difference, but obviously quality of play is higher in the second case.
Average one second, 40m/40s.
lkaufman wrote: Fri Jun 26, 2020 2:22 am The idea here is that when and if we find a way to make Komodo play much better down a piece than it does now, we need a way to prove this without going to the trouble and expense of a GM match without some idea that we might do well. While it is unrealistic to expect to beat an active GM at knight odds in a Rapid match, if we specify Armageddon knight odds, meaning that draws count as wins for White (which is logical, he's down a piece), then it becomes a realistic goal.

Since it is not available in Linux, does anyone know of any readily available engines of similar or somewhat lower rating than Benjamin on the CCRL list that are available in Linux and UCI?
Here are some hints playing GM's during the 1997-2003 period and the things I learned:
1. Complicate the position by leaving your pieces en prise applying a small bonus for root moves, hanging pawn 0.05, knight/bishop 0.10, or so.
2. Play on the clock of your opponent by moving much faster when the time control (usually move 40) is near.
Thanks, Ed! I downloaded Benjamin and it is playing a match vs. Komodo 14 now at knight odds, 2' + 1", with Komodo using 7 cores/threads, Benjamin using just one (apparently it's not MP, but that's fine for my purpose). After eight games, Benjamin has a one game lead, which is pretty good considering the 7 to 1 thread advantage for Komodo. I tried some positions on it and watched some of the games vs. Komodo. My impression is that it does well in this not because of gambit style but simply because it seems to play more like a strong human player than is usually the case with engines around that level. Anyway it seems better than other engines of similar level at converting the extra knight to a win, which is what I was looking for.
Regarding the 2 points you mention regarding playing GMs, I don't quite get the point of rewarding blunders; if the pawn or piece is really hanging, a small bonus won't cause the engine to play it. Maybe I misunderstand your meaning of "hanging". I do quite agree with second point, in fact I believe that in any game with Ponder on at knight odds, regardless of your opponent (assuming he is suitably below your rating), it pays to play 2 or 3 times faster than the normal rate of play. The slight weakening doesn't matter much when you are losing anyway, but depriving the opponent of ponder time does help. We've already used this in test games.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Stockfish Handicap Matches

Post by lkaufman »

Laskos wrote: Fri Jun 26, 2020 7:55 pm
lkaufman wrote: Thu Jun 25, 2020 5:56 pm
Laskos wrote: Thu Jun 25, 2020 7:47 am
Laskos wrote: Thu Jun 25, 2020 7:08 am
lkaufman wrote: Thu Jun 25, 2020 4:52 am
Rebel wrote: Wed Jun 24, 2020 8:30 pm
lkaufman wrote: Wed Jun 24, 2020 7:57 pm
Rebel wrote: Wed Jun 24, 2020 7:31 pm Stockfish gauntlet, knight-odds, ccrl elo pool <2800

tc=40/40 only.

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw 
   0 Stockfish_11                   52      20    1000   57.4%   16.2%  
   1 Benjamin                       26      43     200   53.8%   19.5%
   2 ProDeo                         17      45     200   52.5%   14.0%
   3 Fruit_2.3                     -54      45     200   42.3%   16.5%
   4 Fruit_2.1                    -123      47     200   33.0%   15.0%  
   5 Ruffian_2                    -135      47     200   31.5%   16.0%   
Most of these engines don't have an exactly named copy in CCRL 40/15 (Benjamin and ProDeo have version numbers, two others aren't identical), but it looks like roughly 2750 on that list is the break-even point for SF 11 at that TC. I'll be curious to see if Komodo 14 can score as well against the same opponents. It would score much better with high Contempt, but so would Stockfish, so I guess it's fair enough to compare.
Have it running, takes about 2 hours.

Regarding elo's:

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw  CCRL
   0 Stockfish_11                   52      20    1000   57.4%   16.2%  3537
   1 Benjamin                       26      43     200   53.8%   19.5%  2646
   2 ProDeo 2.2                     17      45     200   52.5%   14.0%  2770
   3 Fruit_2.3                     -54      45     200   42.3%   16.5%  2783
   4 Fruit_2.1                    -123      47     200   33.0%   15.0%  2684
   5 Ruffian_2                    -135      47     200   31.5%   16.0%  2608
Interesting is that Benjamin while listed 124 elo less than ProDeo scores better. I think that much of this kind of testing has to do with the killer instinct of an engine. And Benjamin is the gambit version of ProDeo. Style decides?
I think I've figured out a big part of the mystery of why both Stockfish and Komodo can give knight odds to such strong engines in your tests. When I first read that you were testing at 40/10, I assumed that the 10 meant ten minutes, since we are constantly referring to CCRL tests at 40/15 meaning 15 minutes. But when you starting giving results at 40/40 and even 40/80, I realized that you must mean 40/x SECONDS, not minutes, since you are unlikely to have enough hardware to play so many games so quickly (correct me if I'm wrong here)! So even these 40/40 results are (if I'm right now) bullet games, not rapid games. At bullet chess even the top human GMs would have trouble beating Stockfish or Komodo at knight odds. So the question is, what level engine can Komodo and Stockfish give knight odds to in rapid chess (15' + 10" being the standard now)? Obviously, it will be a weaker engine than these 2700+ engines, but how much weaker? I do have one data point: at 3' + 2" (blitz, roughly midway between bullet and rapid) I got a +94 elo result for Komodo 14 vs. Arasan 14 64 bit, about 2640 on CCRL 40/15 list (est. based on versions just before and after), which is a 2734 performance. But I used Contempt 150, which helps a lot at knight odds; I'll have to redo the test without Contempt (or with the default of just 4). My best guess is that with default Contempt, Komodo 14 and SF need opponents in the mid 2600s in blitz, and in the mid 2500s in Rapid (15' + 10"). I can run Komodo vs. Arasan 14 overnight at Rapid with default Contempt; if I'm correct Komodo will lose but not too badly. I run 63 games at once, so the thousand games won't finish in 8 or 9 hours, but maybe in 15 hours or so.
We hyper-analyzed these issues several years ago. IIRC, at bullet engine-engine Knight odds are some 600 (logistic) engine Elo points, at rapid 45min + 15s some 1200 logistic engine ELO points, and by continuation at tournament TC maybe 1400 engine Elo points (the last one was never really tested). The handicap is heavily TC dependent, and with humans it could be even more dependent (humans are weak at bullet and blitz). I don't think that even a perfect engine can give Knight odds to Carlsen at tournament TC. Currently SF and Komodo in whatever configuration are no stronger than 2100 FIDE Elo points Knight odds at tournament TC against a human.
My estimates for good "human" sparring at tournament TC is Lc0

11248 at 1000 nodes --- 2750 FIDE
11248 at 100 nodes --- 2450 FIDE
11248 at 10 nodes --- 2100 FIDE.

Again, tournament TC. At Blitz 5min + 3s add some 300 Elo points to those to get FIDE Blitz rating. All this is rough estimation, but it can be improved by testing.


Yes, we reached those conclusions a few years ago based on self-testing of the same engine at different time controls, IIRC. Whether it applies to games involving humans or to games with NN or NNUE engines that didn't exist then is unknown. Your estimate of 2100 FIDE Elo for Knight odds at TC vs human is pretty consistent with my estimate of 2300 for same at 15' + 10". In this new world of online-only chess it seems that 5 hour games may be mostly a thing of the past, and that Rapid has become the main form of chess, like it or not. Lc0 is probably a better substitute for a human than a conventional program for these tests; note that the networks starting with 70xxx are now the best for such purposes as they reverted to the no-resign training that made 11248 so good at playing lost positions, and they now play quite well down a piece or more, and they are stronger and play more sensibly than the old network. For testing against CPU engines there is the problem that you can't make proper use of a machine with many CPUs and just one GPU; you could use the Lc0 cpu version I suppose. Perhaps the best "human" for such tests might be an NNUE like Stockfish NNUE, but it's a bit early to say yet.
My overnight test of Komodo 14 giving knight odds to Arasan 14 at 15' + 10" on one thread is down 8.6 elo after 686 games, implying an elo of about 2570 CCRL Rapid at knight odds. This would probably mean something like 2670 on 32 Threads (each doubling is worth much less giving knight odds than in normal chess, so maybe 20 elo per doubling), which in turn means something like 2900 FIDE Rapid based on some estimates I made of the likely human equivalence in Rapid of CCRL rapid ratings, and this was without even setting Contempt. So it appears that using conventional engines to predict the rating needed to beat Komodo at knight odds in Rapid overstates the reality by something like 600 elo! Exactly why this is so is a bit of a mystery, even allowing for the fact that the engines don't fully appreciate the circumstances and aren't optimized for winning when up a piece. Perhaps this won't be the case if we substitute an NN (or NNUE) engine for the human.
Thanks for the tip on T70, it really a top dog now in these handicap matches. I adjusted the following:

SF nodes=20000 is sensibly equal to T70 nodes=60 and can be approximated to ~2300 FIDE Elo at tournament TC and 2600 FIDE Elo at 5min + 3sec. But in handicap match, full strength Komodo on 4 strong i7 threads at 60s + 0.6s with Contempt=150 Knight Odds, T70 performs incomparably better compared to SF:

Code: Select all

Rank Name                         Elo     +/-   Games   Score    Draw
   0 K_14 60s+0.6s no Knight       53      70      80   57.5%   20.0%
   
   1 T70 nodes=60                  53     103      40   57.5%   15.0%
   2 SF_dev nodes=20000          -168     105      40   27.5%   25.0%
The difference compared to SF is more than 200 Elo points, which is a big deal at Knight Odds. The strength of Komodo of 4 strong i7 threads at 60s + 0.6s is about that of one regular core at 5min + 3 sec. So, on one thread if T70 nodes=60 mimics well humans and converts a half of wins, Komodo (Contempt 150) can give Knight Odds at 5min + 3sec to FIDE 2600 Elo humans. On 32 cores, the difference with Knight Odds is not that large, maybe 2650 FIDE Elo points at 5min + 3sec. So, playing games in 5min + 3sec at Knight Odds with Komodo against a strong GM seems reasonable. At 15min + 10sec, the odds can be given to probably a 2450 FIDE rated player. At tournament time control, to no more than 2100 FIDE rated player.

These are rough estimates which can be off by 100 Elo points. It is interesting to observe that regular engines like SF perform pretty miserably compared to T70.
Based on actual human results, the T70 net, while doing much better than SF, is still well below human performance at knight odds if your Elo estimates for T70 are accurate. Actual performance by Komodo (regular, not MCTS) at knight odds vs humans is about 2300 at 15' +10" Rapid and about 2500 at 3' + 2" blitz (probably would be more like 2450 at 5' + 3"). This seems more consistent with your 2100 estimate for standard time control; I think that the difference in playing level for humans going from standard to 15' + 10" rapid is in the ballpark of 200 elo, certainly not 350. Kasparov once scored 7 out of 8 giving a standard clock simul to the Israeli Olympic team, all rated about 2600 FIDE, which is roughly like rapid to standard time odds with no ponder.
One test you might try is having the T70 give the knight odds to SF dev; I think it might score similarly to Komodo. You could also have it give the knight to the 60 node version of itself, but that might not be comparable to Komodo playing T70, since self-play is very different than playing unrelated opponents.
Komodo rules!
Chessqueen
Posts: 5583
Joined: Wed Sep 05, 2018 2:16 am
Location: Moving
Full name: Jorge Picado

Re: Stockfish Handicap Matches

Post by Chessqueen »

lkaufman wrote: Fri Jun 26, 2020 9:00 pm
Laskos wrote: Fri Jun 26, 2020 7:55 pm
lkaufman wrote: Thu Jun 25, 2020 5:56 pm
Laskos wrote: Thu Jun 25, 2020 7:47 am
Laskos wrote: Thu Jun 25, 2020 7:08 am
lkaufman wrote: Thu Jun 25, 2020 4:52 am
Rebel wrote: Wed Jun 24, 2020 8:30 pm
lkaufman wrote: Wed Jun 24, 2020 7:57 pm
Rebel wrote: Wed Jun 24, 2020 7:31 pm Stockfish gauntlet, knight-odds, ccrl elo pool <2800

tc=40/40 only.

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw 
   0 Stockfish_11                   52      20    1000   57.4%   16.2%  
   1 Benjamin                       26      43     200   53.8%   19.5%
   2 ProDeo                         17      45     200   52.5%   14.0%
   3 Fruit_2.3                     -54      45     200   42.3%   16.5%
   4 Fruit_2.1                    -123      47     200   33.0%   15.0%  
   5 Ruffian_2                    -135      47     200   31.5%   16.0%   
Most of these engines don't have an exactly named copy in CCRL 40/15 (Benjamin and ProDeo have version numbers, two others aren't identical), but it looks like roughly 2750 on that list is the break-even point for SF 11 at that TC. I'll be curious to see if Komodo 14 can score as well against the same opponents. It would score much better with high Contempt, but so would Stockfish, so I guess it's fair enough to compare.
Have it running, takes about 2 hours.

Regarding elo's:

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw  CCRL
   0 Stockfish_11                   52      20    1000   57.4%   16.2%  3537
   1 Benjamin                       26      43     200   53.8%   19.5%  2646
   2 ProDeo 2.2                     17      45     200   52.5%   14.0%  2770
   3 Fruit_2.3                     -54      45     200   42.3%   16.5%  2783
   4 Fruit_2.1                    -123      47     200   33.0%   15.0%  2684
   5 Ruffian_2                    -135      47     200   31.5%   16.0%  2608
Interesting is that Benjamin while listed 124 elo less than ProDeo scores better. I think that much of this kind of testing has to do with the killer instinct of an engine. And Benjamin is the gambit version of ProDeo. Style decides?
I think I've figured out a big part of the mystery of why both Stockfish and Komodo can give knight odds to such strong engines in your tests. When I first read that you were testing at 40/10, I assumed that the 10 meant ten minutes, since we are constantly referring to CCRL tests at 40/15 meaning 15 minutes. But when you starting giving results at 40/40 and even 40/80, I realized that you must mean 40/x SECONDS, not minutes, since you are unlikely to have enough hardware to play so many games so quickly (correct me if I'm wrong here)! So even these 40/40 results are (if I'm right now) bullet games, not rapid games. At bullet chess even the top human GMs would have trouble beating Stockfish or Komodo at knight odds. So the question is, what level engine can Komodo and Stockfish give knight odds to in rapid chess (15' + 10" being the standard now)? Obviously, it will be a weaker engine than these 2700+ engines, but how much weaker? I do have one data point: at 3' + 2" (blitz, roughly midway between bullet and rapid) I got a +94 elo result for Komodo 14 vs. Arasan 14 64 bit, about 2640 on CCRL 40/15 list (est. based on versions just before and after), which is a 2734 performance. But I used Contempt 150, which helps a lot at knight odds; I'll have to redo the test without Contempt (or with the default of just 4). My best guess is that with default Contempt, Komodo 14 and SF need opponents in the mid 2600s in blitz, and in the mid 2500s in Rapid (15' + 10"). I can run Komodo vs. Arasan 14 overnight at Rapid with default Contempt; if I'm correct Komodo will lose but not too badly. I run 63 games at once, so the thousand games won't finish in 8 or 9 hours, but maybe in 15 hours or so.
We hyper-analyzed these issues several years ago. IIRC, at bullet engine-engine Knight odds are some 600 (logistic) engine Elo points, at rapid 45min + 15s some 1200 logistic engine ELO points, and by continuation at tournament TC maybe 1400 engine Elo points (the last one was never really tested). The handicap is heavily TC dependent, and with humans it could be even more dependent (humans are weak at bullet and blitz). I don't think that even a perfect engine can give Knight odds to Carlsen at tournament TC. Currently SF and Komodo in whatever configuration are no stronger than 2100 FIDE Elo points Knight odds at tournament TC against a human.
My estimates for good "human" sparring at tournament TC is Lc0

11248 at 1000 nodes --- 2750 FIDE
11248 at 100 nodes --- 2450 FIDE
11248 at 10 nodes --- 2100 FIDE.

Again, tournament TC. At Blitz 5min + 3s add some 300 Elo points to those to get FIDE Blitz rating. All this is rough estimation, but it can be improved by testing.


Yes, we reached those conclusions a few years ago based on self-testing of the same engine at different time controls, IIRC. Whether it applies to games involving humans or to games with NN or NNUE engines that didn't exist then is unknown. Your estimate of 2100 FIDE Elo for Knight odds at TC vs human is pretty consistent with my estimate of 2300 for same at 15' + 10". In this new world of online-only chess it seems that 5 hour games may be mostly a thing of the past, and that Rapid has become the main form of chess, like it or not. Lc0 is probably a better substitute for a human than a conventional program for these tests; note that the networks starting with 70xxx are now the best for such purposes as they reverted to the no-resign training that made 11248 so good at playing lost positions, and they now play quite well down a piece or more, and they are stronger and play more sensibly than the old network. For testing against CPU engines there is the problem that you can't make proper use of a machine with many CPUs and just one GPU; you could use the Lc0 cpu version I suppose. Perhaps the best "human" for such tests might be an NNUE like Stockfish NNUE, but it's a bit early to say yet.
My overnight test of Komodo 14 giving knight odds to Arasan 14 at 15' + 10" on one thread is down 8.6 elo after 686 games, implying an elo of about 2570 CCRL Rapid at knight odds. This would probably mean something like 2670 on 32 Threads (each doubling is worth much less giving knight odds than in normal chess, so maybe 20 elo per doubling), which in turn means something like 2900 FIDE Rapid based on some estimates I made of the likely human equivalence in Rapid of CCRL rapid ratings, and this was without even setting Contempt. So it appears that using conventional engines to predict the rating needed to beat Komodo at knight odds in Rapid overstates the reality by something like 600 elo! Exactly why this is so is a bit of a mystery, even allowing for the fact that the engines don't fully appreciate the circumstances and aren't optimized for winning when up a piece. Perhaps this won't be the case if we substitute an NN (or NNUE) engine for the human.
Thanks for the tip on T70, it really a top dog now in these handicap matches. I adjusted the following:

SF nodes=20000 is sensibly equal to T70 nodes=60 and can be approximated to ~2300 FIDE Elo at tournament TC and 2600 FIDE Elo at 5min + 3sec. But in handicap match, full strength Komodo on 4 strong i7 threads at 60s + 0.6s with Contempt=150 Knight Odds, T70 performs incomparably better compared to SF:

Code: Select all

Rank Name                         Elo     +/-   Games   Score    Draw
   0 K_14 60s+0.6s no Knight       53      70      80   57.5%   20.0%
   
   1 T70 nodes=60                  53     103      40   57.5%   15.0%
   2 SF_dev nodes=20000          -168     105      40   27.5%   25.0%
The difference compared to SF is more than 200 Elo points, which is a big deal at Knight Odds. The strength of Komodo of 4 strong i7 threads at 60s + 0.6s is about that of one regular core at 5min + 3 sec. So, on one thread if T70 nodes=60 mimics well humans and converts a half of wins, Komodo (Contempt 150) can give Knight Odds at 5min + 3sec to FIDE 2600 Elo humans. On 32 cores, the difference with Knight Odds is not that large, maybe 2650 FIDE Elo points at 5min + 3sec. So, playing games in 5min + 3sec at Knight Odds with Komodo against a strong GM seems reasonable. At 15min + 10sec, the odds can be given to probably a 2450 FIDE rated player. At tournament time control, to no more than 2100 FIDE rated player.
Komodo should play the KingsCrusher at 90 Minute + 10 seconds increments with a Knight Odd he would be the perfect match and he would promote Komodo on his Youtube Channel


You can challenge Tryfon Cavriel here or Komodo Vs Tryfon Gavriel FIDE 2145+ Knight Odds
https://www.chessworld.net/chessclubs/a ... ?from=1053

OR https://www.linkedin.com/in/tryfon-gavriel-23b9a74a

Here is an interesting game of Tryfon Gavriel Known as KingsCrusher ==>
Do NOT worry and be happy, we all live a short life :roll:
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Stockfish Handicap Matches

Post by lkaufman »

Chessqueen wrote: Sat Jun 27, 2020 5:18 am
Komodo should play the KingsCrusher at 90 Minute + 10 seconds increments with a Knight Odd he would be the perfect match and he would promote Komodo on his Youtube Channel


You can challenge Tryfon Cavriel here or Komodo Vs Tryfon Gavriel FIDE 2145+ Knight Odds
https://www.chessworld.net/chessclubs/a ... ?from=1053

OR https://www.linkedin.com/in/tryfon-gavriel-23b9a74a

Here is an interesting game of Tryfon Gavriel Known as KingsCrusher ==>
I would be willing to do this, though I imagine 30 min + 10 sec would be more practical for his show, but I'm not set up for automatic play by Komodo except on chess.com. I don't know if that's allowed and easy to do on chessworld; to be honest I know nothing about chessworld.
Komodo rules!
Chessqueen
Posts: 5583
Joined: Wed Sep 05, 2018 2:16 am
Location: Moving
Full name: Jorge Picado

Re: Stockfish Handicap Matches

Post by Chessqueen »

lkaufman wrote: Sat Jun 27, 2020 6:04 am
Chessqueen wrote: Sat Jun 27, 2020 5:18 am
Komodo should play the KingsCrusher at 90 Minute + 10 seconds increments with a Knight Odd he would be the perfect match and he would promote Komodo on his Youtube Channel


You can challenge Tryfon Cavriel here or Komodo Vs Tryfon Gavriel FIDE 2145+ Knight Odds
https://www.chessworld.net/chessclubs/a ... ?from=1053

OR https://www.linkedin.com/in/tryfon-gavriel-23b9a74a

Here is an interesting game of Tryfon Gavriel Known as KingsCrusher ==>
I would be willing to do this, though I imagine 30 min + 10 sec would be more practical for his show, but I'm not set up for automatic play by Komodo except on chess.com. I don't know if that's allowed and easy to do on chessworld; to be honest I know nothing about chessworld.
I was thinking if Komodo could be operated from Chess.com by a human and Kingscrusher from Chessworld.net LIVE and in the case of Komodo that person will be imputing Tryfon Gavriel move by move by both using two computers one to see the other Web LIVE and responding and using their own Web to entertain their audience and allowing Tryfon Gavriel an extra 5 minutes like 35 + 10 increment for the lost in time of imputing and Komodo using just 30 + 10 seconds increment?
Do NOT worry and be happy, we all live a short life :roll:
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Stockfish Handicap Matches

Post by lkaufman »

Chessqueen wrote: Sat Jun 27, 2020 11:23 pm
lkaufman wrote: Sat Jun 27, 2020 6:04 am
Chessqueen wrote: Sat Jun 27, 2020 5:18 am
Komodo should play the KingsCrusher at 90 Minute + 10 seconds increments with a Knight Odd he would be the perfect match and he would promote Komodo on his Youtube Channel


You can challenge Tryfon Cavriel here or Komodo Vs Tryfon Gavriel FIDE 2145+ Knight Odds
https://www.chessworld.net/chessclubs/a ... ?from=1053

OR https://www.linkedin.com/in/tryfon-gavriel-23b9a74a

Here is an interesting game of Tryfon Gavriel Known as KingsCrusher ==>
I would be willing to do this, though I imagine 30 min + 10 sec would be more practical for his show, but I'm not set up for automatic play by Komodo except on chess.com. I don't know if that's allowed and easy to do on chessworld; to be honest I know nothing about chessworld.
I was thinking if Komodo could be operated from Chess.com by a human and Kingscrusher from Chessworld.net LIVE and in the case of Komodo that person will be imputing Tryfon Gavriel move by move by both using two computers one to see the other Web LIVE and responding and using their own Web to entertain their audience and allowing Tryfon Gavriel an extra 5 minutes like 35 + 10 increment for the lost in time of imputing and Komodo using just 30 + 10 seconds increment?
Are you just saying that if I do it, I would play on some arbitrary handle as if I were the player, but we would tell everyone it's really Komodo making the moves on another computer and I'm just inputting them? That's basically the way we played all of our GM matches until Komodo partnered with chess.com in 2018. No problem, except I'll need operator time, which is probably enough to offset commentator time, so any reasonably long rapid time control should be roughly fair. I suppose we would just agree on a date and time for the game or games, and I'd have to learn how to play on chessworld. Is that what you are proposing?
Komodo rules!
Chessqueen
Posts: 5583
Joined: Wed Sep 05, 2018 2:16 am
Location: Moving
Full name: Jorge Picado

Re: Stockfish Handicap Matches

Post by Chessqueen »

lkaufman wrote: Sun Jun 28, 2020 12:17 am
Chessqueen wrote: Sat Jun 27, 2020 11:23 pm
lkaufman wrote: Sat Jun 27, 2020 6:04 am
Chessqueen wrote: Sat Jun 27, 2020 5:18 am
Komodo should play the KingsCrusher at 90 Minute + 10 seconds increments with a Knight Odd he would be the perfect match and he would promote Komodo on his Youtube Channel


You can challenge Tryfon Cavriel here or Komodo Vs Tryfon Gavriel FIDE 2145+ Knight Odds
https://www.chessworld.net/chessclubs/a ... ?from=1053

OR https://www.linkedin.com/in/tryfon-gavriel-23b9a74a

Here is an interesting game of Tryfon Gavriel Known as KingsCrusher ==>
I would be willing to do this, though I imagine 30 min + 10 sec would be more practical for his show, but I'm not set up for automatic play by Komodo except on chess.com. I don't know if that's allowed and easy to do on chessworld; to be honest I know nothing about chessworld.
I was thinking if Komodo could be operated from Chess.com by a human and Kingscrusher from Chessworld.net LIVE and in the case of Komodo that person will be imputing Tryfon Gavriel move by move by both using two computers one to see the other Web LIVE and responding and using their own Web to entertain their audience and allowing Tryfon Gavriel an extra 5 minutes like 35 + 10 increment for the lost in time of imputing and Komodo using just 30 + 10 seconds increment?
Are you just saying that if I do it, I would play on some arbitrary handle as if I were the player, but we would tell everyone it's really Komodo making the moves on another computer and I'm just inputting them? That's basically the way we played all of our GM matches until Komodo partnered with chess.com in 2018. No problem, except I'll need operator time, which is probably enough to offset commentator time, so any reasonably long rapid time control should be roughly fair. I suppose we would just agree on a date and time for the game or games, and I'd have to learn how to play on chessworld. Is that what you are proposing?
All that you have to do is to be watching chessworld.net LIVE or your operator and make the move that Kingscrusher has made so he can imput it into Komodo, and Gavriel or Kingscrusher will be watching chess.com LIVE and as soon as he see Komodo reply he will make his reply imput Komodo latest reply on his WEB chessworld.net, therrefore he will make his own commentary and your people on chess.com will make their, but if it is a GM that you have making comments he should NOT say anything or any possible recommend move on behalf of Kingsctusher. :roll:
Do NOT worry and be happy, we all live a short life :roll: