Stockfish Handicap Matches

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Stockfish Handicap Matches

Post by Rebel »

lkaufman wrote: Wed Jun 24, 2020 3:04 am Wow, a day and night difference from the old knight-odds set. With the old set Arasan and Fruit scored 39% (averaged), here they averaged 78%, an improvement of 39%, roughly 273 elo (using 7 elo per percent)!! It seems that the new (500, actually 230 position) set really is something like a pawn harder for White than the old set, if a pawn is worth something like 273 elo, which is at least in the right ballpark. I looked over many of the positions in the new set, and except for a few near either extreme most are pretty reasonable for knight odds. A few of the outliers had moves played that didn't seem plausible for a grandmaster to have played in normal chess, perhaps the dataset included some amateur openings, and a few others were reasonable for normal chess but weren't sensible with the specific knight missing. But there were just a few such positions, and roughly balanced between White and Black errors, so the full set is a pretty good simulation of knight odds play, though culling ten or so from each end of the list would make it a bit better. So it seems we need somewhat weaker engines than Fruit 2.1 for knight odds at this time limit; perhaps it would be close to balanced with Fruit 2.1 at 40/10.
Apologies for the Fruit figure, forgot to clear the PGN which also contained 100 Fruit games with queen-odds which Fruit all won. This is the correct table:

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 Stockfish_11    : 3268.1      68.5     100   68.5%
   2 Fruit_2.1       : 3131.9      31.5     100   31.5%
90% of coding is debugging, the other 10% is writing bugs.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Stockfish Handicap Matches

Post by lkaufman »

Rebel wrote: Wed Jun 24, 2020 6:47 am
lkaufman wrote: Wed Jun 24, 2020 3:04 am Wow, a day and night difference from the old knight-odds set. With the old set Arasan and Fruit scored 39% (averaged), here they averaged 78%, an improvement of 39%, roughly 273 elo (using 7 elo per percent)!! It seems that the new (500, actually 230 position) set really is something like a pawn harder for White than the old set, if a pawn is worth something like 273 elo, which is at least in the right ballpark. I looked over many of the positions in the new set, and except for a few near either extreme most are pretty reasonable for knight odds. A few of the outliers had moves played that didn't seem plausible for a grandmaster to have played in normal chess, perhaps the dataset included some amateur openings, and a few others were reasonable for normal chess but weren't sensible with the specific knight missing. But there were just a few such positions, and roughly balanced between White and Black errors, so the full set is a pretty good simulation of knight odds play, though culling ten or so from each end of the list would make it a bit better. So it seems we need somewhat weaker engines than Fruit 2.1 for knight odds at this time limit; perhaps it would be close to balanced with Fruit 2.1 at 40/10.
Apologies for the Fruit figure, forgot to clear the PGN which also contained 100 Fruit games with queen-odds which Fruit all won. This is the correct table:

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 Stockfish_11    : 3268.1      68.5     100   68.5%
   2 Fruit_2.1       : 3131.9      31.5     100   31.5%
OK, so the improvement was from 39% to 59% overall, about 140 elo between the two tests, which is more like half a pawn difference, much more believable. If you run some other Fruit-level engines vs SF this way, perhaps you can include Stockfish 11 Skill Level 10, Contempt -100, to confirm my finding that it is much better at winning with an extra knight than other supposedly stronger engines. I ran two more relevant tests to add to the ones mentioned in previous post: I ran both this weakened SF and Arasan 14 against Komodo 14 Skill Level 23 (all at the same 3' + 2", normal chess, 4 threads), and although Arasan 14 and SF skill 10 were even head to head, Arasan 14 beat Komodo Skill 23 by 108 elo (252 games) while Komodo Skill 23 beat SF Skill 10 by 103 elo (354 games). So against weakened Komodo, weakened SF is 211 elo weaker than Arasan 14, yet against knight odds Komodo, SF Skill 10 (with appropriate Contempt) was 184 elo stronger! A swing of nearly 400 elo against the same opponent just due to being handicapped by material instead of by playing strength. So probably the SF Skill levels with Contempt -100 can play knight odds more or less as well as a human that would be an even match with that Skill level (we're talking about GM or near GM humans here). The mystery is why normal engines like Arasan and Fruit are so bad at converting an extra piece, even if they don't actively seek exchanges. Just avoiding blunders should suffice, one would think!
Komodo rules!
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Stockfish Handicap Matches

Post by Rebel »

Komodo gauntlet vs CEGT 3000 elo rated engines, knight-odds, tc=40/40

Code: Select all

   # ENGINE         : RATING    POINTS  PLAYED    (%)
   1 Demolito       : 3324.4     353.0     400   88.3%
   2 Vajolet_2.8    : 3261.7     168.0     200   84.0%
   3 Texel_1.7      : 3242.8     165.0     200   82.5%
   4 Komodo_14      : 2971.1     114.0     800   14.3%
90% of coding is debugging, the other 10% is writing bugs.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Stockfish Handicap Matches

Post by chrisw »

Rebel wrote: Wed Jun 24, 2020 9:09 am Komodo gauntlet vs CEGT 3000 elo rated engines, knight-odds, tc=40/40

Code: Select all

   # ENGINE         : RATING    POINTS  PLAYED    (%)
   1 Demolito       : 3324.4     353.0     400   88.3%
   2 Vajolet_2.8    : 3261.7     168.0     200   84.0%
   3 Texel_1.7      : 3242.8     165.0     200   82.5%
   4 Komodo_14      : 2971.1     114.0     800   14.3%
I'm kind of dylexically struggling with these charts trying to work out what means what and who is playing whom.

I guess after much mental gymnastics, it means that Komodo14 can't give knight odds to 3250 rated opponents, if it does it has a win rate of 14 per 100 only. Is that correct way to see it?
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Stockfish Handicap Matches

Post by chrisw »

Updated and rationalised all the odds-suites, they're now in two folders, the smaller 'new' suites of 'normal-type' positions are deleted and replaced with larger ones with a wider range of positions (generated by using way more PGNs for the base selection).

link:
https://github.com/ChrisWhittington/Chess-EPDs
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Stockfish Handicap Matches

Post by Rebel »

chrisw wrote: Wed Jun 24, 2020 9:48 am
Rebel wrote: Wed Jun 24, 2020 9:09 am Komodo gauntlet vs CEGT 3000 elo rated engines, knight-odds, tc=40/40

Code: Select all

   # ENGINE         : RATING    POINTS  PLAYED    (%)
   1 Demolito       : 3324.4     353.0     400   88.3%
   2 Vajolet_2.8    : 3261.7     168.0     200   84.0%
   3 Texel_1.7      : 3242.8     165.0     200   82.5%
   4 Komodo_14      : 2971.1     114.0     800   14.3%
I'm kind of dylexically struggling with these charts trying to work out what means what and who is playing whom.

I guess after much mental gymnastics, it means that Komodo14 can't give knight odds to 3250 rated opponents, if it does it has a win rate of 14 per 100 only. Is that correct way to see it?
The rating pool Komodo is facing is around ccrl 3000 elo, one can conclude these engines are too strong for Komodo. Larry will have pick an elo 2900 pool or 2800 pool to get even (~50%). I guess that's what he is after for the Komodo - GM challenges at knight odds.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Stockfish Handicap Matches

Post by Rebel »

Release of pgn_to_epd_v2 a tool to create epd for odd matches.

http://rebel13.nl/download/data.html
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Stockfish Handicap Matches

Post by Rebel »

Trying the elo 2900 pool.

Stockfish gauntlet, knight-odds, tc=40/10

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3273.3     128.0     200   64.0%
   2 Bobcat_8        : 3250.9     122.0     200   61.0%
   3 Stockfish_11    : 3172.5     269.5     600   44.9%
   4 Crafty_25.6     : 3103.3      80.5     200   40.3%
Next, same engines at tc=40/20
90% of coding is debugging, the other 10% is writing bugs.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Stockfish Handicap Matches

Post by chrisw »

Rebel wrote: Wed Jun 24, 2020 10:15 am Trying the elo 2900 pool.

Stockfish gauntlet, knight-odds, tc=40/10

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 cheng4_4.39     : 3273.3     128.0     200   64.0%
   2 Bobcat_8        : 3250.9     122.0     200   61.0%
   3 Stockfish_11    : 3172.5     269.5     600   44.9%
   4 Crafty_25.6     : 3103.3      80.5     200   40.3%
Next, same engines at tc=40/20
Stockfish11 can give knights odds to the latest Crafty and still win?!
Incredible.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Stockfish Handicap Matches

Post by chrisw »

Rebel wrote: Wed Jun 24, 2020 10:07 am Release of pgn_to_epd_v2 a tool to create epd for odd matches.

http://rebel13.nl/download/data.html
I selected some 'definitive' knight odds test suites (in order that testers are testing the same thing) and uploaded them to github, they're in the "knight odds definitive tests" folder.

https://github.com/ChrisWhittington/Chess-EPDs

nite-odds-masterlist-10.epd
nite-odds-masterlist-25.epd
nite-odds-masterlist-100.epd
nite-odds-masterlist-1000.epd

sized for whatever size of gauntlet desired. They are basically just random selected in the range -370 to -410 centipawn range. If you need a size between 100 and 1000, just cut and paste out the first N from the -1000.epd list