Stockfish Handicap Matches

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
xr_a_y
Posts: 1257
Joined: Sat Nov 25, 2017 1:28 pm
Location: France

Re: Stockfish Handicap Matches

Post by xr_a_y » Mon Jun 22, 2020 5:37 pm

lkaufman wrote:
Mon Jun 22, 2020 4:25 pm
xr_a_y wrote:
Mon Jun 22, 2020 8:57 am
I tried something this morning at short TC (10+0.1), Minic versus Stockfish with odds. Each time 100 games.
The funny thing is that with knight and color odds Minic won easily, also won quite easily with only knight odds.
But with pawn odd (with or without color) or castling odds, or even g pawn + castling + color odds Minic is badly losing.

What is a good start configuration between knight odds and g pawn + castling + color odds ?
b and g pawn + castling + color odds ?
The best handicap progression in my opinion is : c2; f2; c7; f7; c2 + f2; c7 + f7; Knight (b1 or g1); Knight and move (b8 or g8), Rook a1, Rook and move (a8). Beyond that just repeat the cycle with the queen's rook removed as well; when you get to what would be two rooks, substitute queen odds. The bishop's pawns are the most suitable for handicaps, their removal has the least consequences on development and doesn't create isolated pawns. Historically only the "f" pawn was used, but for a proper progression we need to use "c" pawn as well. The gaps between c7+f7 and knight, and between knight+ move and rook, are a bit wide, it is possible to insert an extra handicap in between (i.e. knight or rook with extra move for White).
I'll try that, starting with c2 + f2 and c7 + f7, thanks.
You are not considering castling, I guess that's another degree of freedom between 2P and N ?

User avatar
Rebel
Posts: 5518
Joined: Thu Aug 18, 2011 10:04 am

Re: Stockfish Handicap Matches

Post by Rebel » Mon Jun 22, 2020 6:12 pm

lkaufman wrote:
Mon Jun 22, 2020 4:48 pm
Rebel wrote:
Mon Jun 22, 2020 4:35 pm
lkaufman wrote:
Mon Jun 22, 2020 3:30 pm
Rebel wrote:
Mon Jun 22, 2020 6:05 am
@Larry - Ok, for the moment we do it the classic way with repeating colors. I took the first 200 positions from Chris set, so 400 games, tc=40/10.

Code: Select all

Komodo 10 - Fruit 2.1 83.9% WLD 315 44 41
Komodo 14 - Fruit 2.1 90.4% WLD 347 24 29
This is how you want to measure progress, isn't it?
Yes, but it looks like you are having each engine play the White side once, which is a waste of resources when the engines are far apart in strength. Presumably the stronger engine wins every game with Black a piece up, so it's easy enough to subtract 200 from the wins for the stronger engine to see the results when giving knight odds. I think that the reason that Komodo still beats fruit even with the 200 removed is that there was a flaw in the reference value used (see prior post), so these positions weren't even close to knight odds. With everything done right, I expect Fruit to come out ahead against both Stockfish and Komodo at knight odds.
1. I don't see the purpose of giving Komodo the advantage of a knight up against Fruit. So only 200 games (Fruit always knight up) now that Ferdy fixed the cute-chess obstacle. But of course you can do it yourself.

2. My guess would be that Fruit would do a lot better at longer time control, 40/60 instead of 40/10, that's not a good time control for the oldies. I will run it now, stay tuned.
Yes, 1. was my point, it was wasting resources. Yes, the weaker engine always does better with handicap with more time. But the main problem is that the positions are a full pawn off from knight odds. With a corrected set, even at 40/10 I'll bet on Fruit. The difference of 100 centipawns in the initial position is huge.
tc=40/60 from Chris knight-odds set.

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw
   0 Komodo_14                     228      50     200   78.8%   19.5%
   1 Benjamin                     -186      67     100   25.5%   23.0%
   2 Fruit_2.1                    -275      80     100   17.0%   16.0%
90% of coding is debugging, the other 10% is writing bugs.

lkaufman
Posts: 4324
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: Stockfish Handicap Matches

Post by lkaufman » Mon Jun 22, 2020 6:19 pm

xr_a_y wrote:
Mon Jun 22, 2020 5:37 pm
lkaufman wrote:
Mon Jun 22, 2020 4:25 pm
xr_a_y wrote:
Mon Jun 22, 2020 8:57 am
I tried something this morning at short TC (10+0.1), Minic versus Stockfish with odds. Each time 100 games.
The funny thing is that with knight and color odds Minic won easily, also won quite easily with only knight odds.
But with pawn odd (with or without color) or castling odds, or even g pawn + castling + color odds Minic is badly losing.

What is a good start configuration between knight odds and g pawn + castling + color odds ?
b and g pawn + castling + color odds ?
The best handicap progression in my opinion is : c2; f2; c7; f7; c2 + f2; c7 + f7; Knight (b1 or g1); Knight and move (b8 or g8), Rook a1, Rook and move (a8). Beyond that just repeat the cycle with the queen's rook removed as well; when you get to what would be two rooks, substitute queen odds. The bishop's pawns are the most suitable for handicaps, their removal has the least consequences on development and doesn't create isolated pawns. Historically only the "f" pawn was used, but for a proper progression we need to use "c" pawn as well. The gaps between c7+f7 and knight, and between knight+ move and rook, are a bit wide, it is possible to insert an extra handicap in between (i.e. knight or rook with extra move for White).
I'll try that, starting with c2 + f2 and c7 + f7, thanks.
You are not considering castling, I guess that's another degree of freedom between 2P and N ?
Yes, castling is worth maybe .6 or .7 pawn, so you could use knight for castling (as we indeed did in Komodo vs GM Lenderman) as an intermediate handicap between 2p+move and knight. But it's a bit different than a normal material handicap, and subject to the criticism that its value is not nearly so clear as a pawn or even a tempo. You could also go the other way and do two pawns plus move plus no black castling, but that might be nearly as large as knight odds.
Komodo rules!

chrisw
Posts: 3674
Joined: Tue Apr 03, 2012 2:28 pm

Re: Stockfish Handicap Matches

Post by chrisw » Mon Jun 22, 2020 6:42 pm

lkaufman wrote:
Mon Jun 22, 2020 3:23 pm
chrisw wrote:
Mon Jun 22, 2020 7:20 am
lkaufman wrote:
Mon Jun 22, 2020 3:13 am
chrisw wrote:
Sun Jun 21, 2020 10:42 pm
lkaufman wrote:
Sun Jun 21, 2020 10:30 pm
Rebel wrote:
Sun Jun 21, 2020 10:09 pm
chrisw wrote:
Sun Jun 21, 2020 9:20 pm
Done 5600 EPDs off the start position minus b1 knight, played out all four ply combinations, culled all duplicates, culled all positions where SF11 evaluated more than +/-10 centipawns away from 300 centipawns (SF11 average score for all epds), and am now left with 5600 EPDs.

Link: https://github.com/ChrisWhittington/Che ... t-odds.epd

Will upload for no knight at g1 tomorrow am.


I checked several positions and they showed scores of 4 pawns or more down, as does the initial knight odds position, not -300.
You checked the listed epds for variance from -300? Which ones? What was the variance? What SF11 conditions are you using?

If you just want under 20 positions, take off the b1 knight, choose the 3 best White moves by SF11 multiPV, chose the 3 best replies to each of those, and repeat with g1 knight off. 18 positions, totally fair, no silly moves, real knight odds chess! If you want more just choose best 4 or best five for each side.
Edit: Whoops, rechecked, the mean is indeed -293, but I was selecting EPDs based on -340, so you should find the listed EPDs evaluate at around -340 centipawns. Will correct the EPD dump tomorrow am.
OK, -340 is at least within range of what I was seeing. With fixed depth searches in the range of what you were using I get evals like -390 or so, but I think fixed depth omits Contempt while movetime does not, so we're not so far apart if you add in Contempt. You were right to switch from 10 ms to 25; SF is quite weak at 10ms but strong enough at 25, the difference is huge. But I think it would be more useful to have fewer positions but no positions with ridiculous moves played, maybe only including moves in the top ten by multipv at each point for example. It doesn't seem like a simulation of knight odds if you force the players to play moves that no one over 800 rating would even consider.
Uploaded just now the knight-odds epds for positions without b1/g1 knights.

https://github.com/ChrisWhittington/Chess-EPDs and download knight-odds.epd

They're better balanced around the correct mean (-293 centipawns), and restricted to all 4-ply positions where SF11 at 25ms search returns a score of -293 =/- 10 centipawns. About 5000 unique positions in total.

Random subset:

Code: Select all

rnbqkb1r/p1pppppp/7n/1p6/3P4/1P6/P1P1PPPP/RNBQKB1R w KQkq - 0 3
r1bqkbnr/pppp1ppp/n7/4p3/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 0 3
rnbqk1nr/ppppbppp/4p3/8/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 2 3
r1bqkb1r/pppppppp/2n4n/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 3 3
rnbqkbnr/pppppppp/8/8/8/5N1P/PPPPPPP1/R1BQKB1R w KQkq - 1 3
rnbqkb1r/p1pppppp/7n/1p6/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 0 3
r1bqkbnr/ppppp1pp/2n5/5p2/4P3/2P5/PP1P1PPP/RNBQKB1R w KQkq - 0 3
rnbqkb1r/p1pppppp/1p5n/7Q/4P3/8/PPPP1PPP/RNB1KB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/8/1P6/B7/P1PPPPPP/RN1QKB1R w KQkq - 2 3
r1bqkbnr/ppppp1pp/n4p2/8/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3
r1bqkbnr/p1pppppp/np6/P7/8/8/1PPPPPPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/pppp1p1p/4p3/6p1/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/8/8/3P4/PPPBPPPP/RN1QKB1R w KQkq - 2 3
rnbqkb1r/1ppppppp/7n/p7/4P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3
r1bqkbnr/ppppp1pp/2n2p2/8/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3
1nbqkbnr/rppppppp/8/p7/8/3BP3/PPPP1PPP/RNBQK2R w KQk - 2 3
rnbqkbn1/pppppppr/8/7p/8/2N3P1/PPPPPP1P/R1BQKB1R w KQq - 2 3
r1bqkbnr/ppppp1pp/n4p2/8/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 0 3
rnbq1bnr/pppkpppp/8/3p4/8/4PP2/PPPP2PP/R1BQKBNR w KQ - 1 3
rnbq1bnr/pppkpppp/8/3p4/8/4P1P1/PPPP1P1P/R1BQKBNR w KQ - 1 3
rn1qkbnr/p1pppppp/b7/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 1 3
rnbqkbnr/1ppppp1p/8/p5p1/Q7/2P5/PP1PPPPP/RNB1KB1R w KQkq - 0 3
r1bqkbnr/ppppppp1/n7/4P2p/8/8/PPPP1PPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/p1ppppp1/7p/1P6/8/8/PP1PPPPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/ppppp2p/5p2/6p1/3P4/4B3/PPP1PPPP/RN1QKB1R w KQkq - 0 3
r1bqkbnr/pppppp1p/n7/6p1/8/1PN5/P1PPPPPP/R1BQKB1R w KQkq - 1 3
r1bqkbnr/p1pppppp/2n5/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 1 3
rnbqkb1r/ppppp1pp/7n/5p2/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3
rnbqkb1r/p1pppppp/7n/1p6/2P5/7P/PP1PPPP1/R1BQKBNR w KQkq - 0 3
rnbqk1nr/pppp1ppp/8/2b1p3/8/P4N2/1PPPPPPP/R1BQKB1R w KQkq - 2 3
It's swings and roundabout on position selection. If we did your method (which I'm not going to because there's a limit as to how much time I'm prepared to put in on this), we'ld get, at four ply 10x10x10x10 = 10000 positions, as opposed to the actual number from brute force which was 170000 unique, with 5000 within +/- 10 centipawns of the mean. So, your 10000 would be guaranteed to contain positions already divergent in score beyond the +/- 10 centipawn window. If we recap the objective, it was to have a good number of test positions (which, for me, includes the possibility to do mass testing of 1000's of games at bullet/blitz) with good variance, no duplicates, no great difference from one knight handicap and as close as feasible to the start position. The selection method is entirely without bias, it doesn't matter HOW the positions were arrived at, just as long as they are close to initial and neither engine gets a head-start in any position. That's done. Arguably perfectly. Engines are then left to fight it out from unbiased, wide, close to root start positions.
Your argument to use 10 wide selections for 4 ply is going to produce fewer positions, many of which are going to be already knight odds plus something way more than 10 centipawns because of the way your method chooses, and therefore, I would argue, less satisfying the original objective.

Anyway, since everything now appears to work, I'll leave this thing running, and produce a few thousand of each - bishop odds, rook odds and queen odds. Can SF11 beat Fruit at queen odds? that's be fun. Personally I doubt it, but we'll see.
I'm afraid it's clear that you made a typo or misread a number. You give the eval for the position with g1 off as -2.93 after 25 ms, which you use for your sample. I ran the same position, and got -3.94. Of course computers aren't all the same speed so some variation is to be expected, but the score only fluctuates a few centiply. It's pretty obvious that when you ran it you got -3.93 but either misread or mistyped the 3 as a 2. So the result is that you have a bunch of positions where Black is playing blunders that lose a pawn or similar positional score to drop from a 3.93 edge to a 2.93 edge. So nowhere near knight odds!. For the b1 off position I got -3.73. Not sure what you got for that position.
It’s a bit cheap throwing out insults about typos or misreading or whatever, when actually you are arguing method (albeit by other means).

The figures back from SF are accurate, I just rechecked them. No typos, no misreading.
I guess you decide knights odds games are 3.73 on basis you put the start position into SF and asked for a score? Sure, SF will find the supposed best line and evaluate it.

I’m doing something different, I am asking SF to evaluate every single position that arises from the start position after four moves. Several tens of thousands of positions where each side has had the same move opportunities (two moves each) to made boobies or brilliancies. The net effect of all these thousands of moves is to generate thousands of positions, each then evaluated by SF, with a mean eval of about -3.00 pawns. To be fair to both black and white, I then took everything that centred on that -300 centipawns, about 10% or so of the total.

You’re saying that’s wrong because according to SF at the root position, knights odds = -3.87. And because nobody would play the 4-ply move sequences. Well, so what? You lost sight of the objective. Generate a large unbiased set of positions to evaluate how different engines get on with “knights odds”. Generate positions close to the root. Generate positions where material and a further SF search show that neither sides chances changed much from their chances at root zero.
Well, that’s done. Thanks me very much. It’s a pleasure. No problem at all. Have fun with them.


Just for fun, not really wasting my time reproofing things, the SF11 500ms evaluations of the first 100 positions in the knights odds suites, below.

For that subset, Mean = -320.2, st deviation = 29.5. Looks like a fine result from here.

Code: Select all

Loading epds knight-odds.epd
SF11, default conditions, movetime 500
rnbqkb1r/p1pppppp/7n/1p6/3P4/1P6/P1P1PPPP/RNBQKB1R w KQkq - 0 3 eval=-314 nodes=835263
r1bqkbnr/pppp1ppp/n7/4p3/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 0 3 eval=-284 nodes=843868
rnbqk1nr/ppppbppp/4p3/8/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 2 3 eval=-376 nodes=948424
r1bqkb1r/pppppppp/2n4n/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 3 3 eval=-364 nodes=911584
rnbqkbnr/pppppppp/8/8/8/5N1P/PPPPPPP1/R1BQKB1R w KQkq - 1 3 eval=-356 nodes=924862
rnbqkb1r/p1pppppp/7n/1p6/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 0 3 eval=-248 nodes=877730
r1bqkbnr/ppppp1pp/2n5/5p2/4P3/2P5/PP1P1PPP/RNBQKB1R w KQkq - 0 3 eval=-311 nodes=924917
rnbqkb1r/p1pppppp/1p5n/7Q/4P3/8/PPPP1PPP/RNB1KB1R w KQkq - 0 3 eval=-354 nodes=888675
r1bqkbnr/ppppp1pp/n4p2/8/1P6/B7/P1PPPPPP/RN1QKB1R w KQkq - 2 3 eval=-344 nodes=892030
r1bqkbnr/ppppp1pp/n4p2/8/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3 eval=-336 nodes=861229
r1bqkbnr/p1pppppp/np6/P7/8/8/1PPPPPPP/R1BQKBNR w KQkq - 0 3 eval=-352 nodes=862023
rnbqkbnr/pppp1p1p/4p3/6p1/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3 eval=-253 nodes=904501
r1bqkbnr/ppppp1pp/n4p2/8/8/3P4/PPPBPPPP/RN1QKB1R w KQkq - 2 3 eval=-297 nodes=880860
rnbqkb1r/1ppppppp/7n/p7/4P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 0 3 eval=-321 nodes=909321
r1bqkbnr/ppppp1pp/n4p2/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3 eval=-284 nodes=954091
r1bqkbnr/ppppp1pp/2n2p2/8/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-328 nodes=881852
1nbqkbnr/rppppppp/8/p7/8/3BP3/PPPP1PPP/RNBQK2R w KQk - 2 3 eval=-384 nodes=950046
rnbqkbn1/pppppppr/8/7p/8/2N3P1/PPPPPP1P/R1BQKB1R w KQq - 2 3 eval=-325 nodes=882350
r1bqkbnr/ppppp1pp/n4p2/8/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 0 3 eval=-307 nodes=906494
rnbq1bnr/pppkpppp/8/3p4/8/4PP2/PPPP2PP/R1BQKBNR w KQ - 1 3 eval=-300 nodes=954439
rnbq1bnr/pppkpppp/8/3p4/8/4P1P1/PPPP1P1P/R1BQKBNR w KQ - 1 3 eval=-311 nodes=932958
rn1qkbnr/p1pppppp/b7/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 1 3 eval=-317 nodes=961294
rnbqkbnr/1ppppp1p/8/p5p1/Q7/2P5/PP1PPPPP/RNB1KB1R w KQkq - 0 3 eval=-272 nodes=917542
r1bqkbnr/ppppppp1/n7/4P2p/8/8/PPPP1PPP/R1BQKBNR w KQkq - 0 3 eval=-318 nodes=928039
rnbqkbnr/p1ppppp1/7p/1P6/8/8/PP1PPPPP/R1BQKBNR w KQkq - 0 3 eval=-350 nodes=920456
rnbqkbnr/ppppp2p/5p2/6p1/3P4/4B3/PPP1PPPP/RN1QKB1R w KQkq - 0 3 eval=-300 nodes=894388
r1bqkbnr/pppppp1p/n7/6p1/8/1PN5/P1PPPPPP/R1BQKB1R w KQkq - 1 3 eval=-302 nodes=915277
r1bqkbnr/p1pppppp/2n5/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 1 3 eval=-357 nodes=907541
rnbqkb1r/ppppp1pp/7n/5p2/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3 eval=-304 nodes=996995
rnbqkb1r/p1pppppp/7n/1p6/2P5/7P/PP1PPPP1/R1BQKBNR w KQkq - 0 3 eval=-354 nodes=895685
rnbqk1nr/pppp1ppp/8/2b1p3/8/P4N2/1PPPPPPP/R1BQKB1R w KQkq - 2 3 eval=-376 nodes=904702
r1bqkbnr/p1pppppp/n7/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 0 3 eval=-308 nodes=907345
r1bqkbnr/ppppppp1/n6B/8/3P4/8/PPP1PPPP/R2QKBNR w KQkq - 1 3 eval=-299 nodes=903610
r1bqkbnr/ppppp1pp/2n2p2/8/3P4/P7/1PP1PPPP/RNBQKB1R w KQkq - 0 3 eval=-330 nodes=885982
rnbqkbnr/p1ppp1pp/5p2/1p6/1P3P2/8/P1PPP1PP/R1BQKBNR w KQkq - 0 3 eval=-331 nodes=899112
rnbqkbnr/2pppppp/8/pp6/3P4/5P2/PPP1P1PP/RNBQKB1R w KQkq - 0 3 eval=-334 nodes=923006
r1bqkbnr/pppp1ppp/n7/4p3/3P3P/8/PPP1PPP1/R1BQKBNR w KQkq - 1 3 eval=-333 nodes=915197
rnbqkbnr/p1ppppp1/8/1p5p/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-328 nodes=894117
r1bqkbnr/1ppppppp/n7/p7/4P3/3P4/PPP2PPP/R1BQKBNR w KQkq - 1 3 eval=-338 nodes=941629
rnbqkbnr/p1pppp1p/6p1/1p6/4P3/P7/1PPP1PPP/R1BQKBNR w KQkq - 0 3 eval=-312 nodes=899001
r1bqkbnr/pppppp1p/n7/6p1/2P5/5P2/PP1PP1PP/RNBQKB1R w KQkq - 1 3 eval=-349 nodes=911472
rnbqkb1r/pppppp1p/5n2/6p1/8/3P3P/PPP1PPP1/R1BQKBNR w KQkq - 0 3 eval=-315 nodes=919954
1nbqkbnr/rppppppp/8/p7/4P3/8/PPPPQPPP/RNB1KB1R w KQk - 2 3 eval=-328 nodes=929187
rnbqkbn1/pppppppr/8/7p/8/3P2P1/PPP1PP1P/R1BQKBNR w KQq - 1 3 eval=-335 nodes=927437
r1bqkbnr/pppppp1p/2n5/6p1/3P4/8/PPP1PPPP/RNBQKBR1 w Qkq - 1 3 eval=-276 nodes=894955
rnbqkb1r/p1pppppp/7n/1p6/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3 eval=-293 nodes=895381
rnbqkb1r/pppppp1p/7n/6p1/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 2 3 eval=-274 nodes=910466
rnbqkbr1/pppppppp/7n/8/8/3P3P/PPP1PPP1/RNBQKB1R w KQq - 1 3 eval=-321 nodes=907502
rnbqkbnr/ppp1pp1p/8/3p2p1/7P/8/PPPPPPPR/R1BQKBN1 w Qkq - 0 3 eval=-300 nodes=917633
r1bqkbnr/p1pppppp/1pn5/8/4P2P/8/PPPP1PP1/R1BQKBNR w KQkq - 1 3 eval=-326 nodes=899245
rnbq1bnr/pppkpppp/3p4/8/8/P7/1PPPPPPP/RNBQKBR1 w Q - 2 3 eval=-318 nodes=979196
rnbq1bnr/pppppkpp/5p2/8/8/4P2P/PPPP1PP1/R1BQKBNR w KQ - 1 3 eval=-303 nodes=985615
rnbqkbnr/p1ppp1pp/5p2/1p6/3P4/5P2/PPP1P1PP/R1BQKBNR w KQkq - 0 3 eval=-290 nodes=901600
rnbq1bnr/pppkpppp/3p4/8/8/3P4/PPPQPPPP/RNB1KB1R w KQ - 2 3 eval=-319 nodes=935623
r1bqkbnr/p1pppppp/2n5/1p6/4P3/3B4/PPPP1PPP/R1BQK1NR w KQkq - 2 3 eval=-285 nodes=910467
rnbqkbnr/pp1ppp1p/2p5/6p1/P7/3P4/1PP1PPPP/R1BQKBNR w KQkq - 0 3 eval=-295 nodes=929138
r1bqkbnr/p1pppppp/np6/8/4P2P/8/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-308 nodes=917646
rnbqkbnr/ppppp1p1/5p1p/8/8/2P2P2/PP1PP1PP/R1BQKBNR w KQkq - 0 3 eval=-314 nodes=936621
rnbqkb1r/pppppppp/8/8/3P2n1/2N5/PPP1PPPP/R1BQKB1R w KQkq - 3 3 eval=-309 nodes=906583
r1bqkbnr/pppppp1p/n7/6p1/8/P6P/1PPPPPP1/RNBQKB1R w KQkq - 1 3 eval=-306 nodes=890932
1nbqkbnr/rppppppp/p7/8/4P3/7P/PPPP1PP1/RNBQKB1R w KQk - 1 3 eval=-341 nodes=931251
1nbqkbnr/rppppppp/8/p7/8/P4N2/1PPPPPPP/R1BQKB1R w KQk - 1 3 eval=-357 nodes=929376
rn1qkbnr/p1pppppp/b7/1p2P3/8/8/PPPP1PPP/RNBQKB1R w KQkq - 1 3 eval=-331 nodes=922525
rnbqkbnr/ppppp1p1/5p1p/8/7P/8/PPPPPPP1/1RBQKBNR w Kkq - 0 3 eval=-295 nodes=950421
rnbqkbnr/ppp1pp1p/3p4/6p1/8/P6N/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-316 nodes=914793
rnbqkb1r/pppp1ppp/5n2/4p3/5P2/2P5/PP1PP1PP/RNBQKB1R w KQkq - 0 3 eval=-412 nodes=919575
rnbqkb1r/pppp1ppp/4p2n/8/3P4/2P5/PP2PPPP/R1BQKBNR w KQkq - 1 3 eval=-349 nodes=903832
rnbqkb1r/pppp1ppp/7n/4p2Q/8/4P3/PPPP1PPP/RNB1KB1R w KQkq - 2 3 eval=-268 nodes=933657
rnbqkb1r/1ppppppp/p6n/8/3P4/2P5/PP2PPPP/RNBQKB1R w KQkq - 0 3 eval=-330 nodes=954821
r1bqkbnr/pppp1ppp/n7/4p3/3P4/7P/PPP1PPP1/R1BQKBNR w KQkq - 0 3 eval=-307 nodes=925412
rnbqkbnr/p1pppppp/8/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 0 3 eval=-348 nodes=922373
rnbqkb1r/ppppppp1/7n/7p/2P5/7P/PP1PPPP1/RNBQKB1R w KQkq - 1 3 eval=-331 nodes=893674
rnbqkb1r/pp1ppppp/2p4n/8/2P5/2N5/PP1PPPPP/R1BQKB1R w KQkq - 2 3 eval=-348 nodes=936309
rnbqk1nr/pppppp1p/7b/6p1/3P4/3Q4/PPP1PPPP/R1B1KBNR w KQkq - 2 3 eval=-314 nodes=895006
r1bqkbnr/pppppppp/8/8/1n2P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 1 3 eval=-351 nodes=911614
rnbq1bnr/pppkpppp/3p4/8/8/3P1P2/PPP1P1PP/R1BQKBNR w KQ - 1 3 eval=-295 nodes=977138
rnbqkbnr/1ppppp1p/8/p5p1/5P2/7N/PPPPP1PP/R1BQKB1R w KQkq - 0 3 eval=-303 nodes=933937
r1bqkb1r/pppppppp/n6n/8/2P5/5N2/PP1PPPPP/R1BQKB1R w KQkq - 3 3 eval=-325 nodes=911263
rnbqkb1r/ppppp1pp/7n/5p2/3P4/8/PPPNPPPP/R1BQKB1R w KQkq - 0 3 eval=-321 nodes=930623
rnbqkb1r/p1pppppp/7n/1p6/P7/8/1PPPPPPP/R1BQKBNR w KQkq - 1 3 eval=-281 nodes=895926
rnbqkb1r/p1pppppp/7n/1p6/8/1QP5/PP1PPPPP/RNB1KB1R w KQkq - 2 3 eval=-335 nodes=904891
1nbqkbnr/1ppppppp/r7/p7/1PP5/8/P2PPPPP/RNBQKB1R w KQk - 1 3 eval=-356 nodes=881990
rnbqkbnr/ppp2ppp/8/3pP3/8/8/PPP1PPPP/RNBQKB1R w KQkq d6 0 3 eval=-342 nodes=945740
rnbqkbnr/1pppp1pp/p4p2/8/3P4/8/PPP1PPPP/RNBQKB1R w KQkq - 0 3 eval=-347 nodes=921882
rnbqkb1r/pppppp1p/5n2/6p1/3P1P2/8/PPP1P1PP/RNBQKB1R w KQkq - 1 3 eval=-292 nodes=891075
rnb1kbnr/pppqpppp/8/3p4/4P3/2N5/PPPP1PPP/R1BQKB1R w KQkq - 2 3 eval=-292 nodes=900883
rnbq1bnr/pppppkpp/5p2/8/8/3P3N/PPP1PPPP/R1BQKB1R w KQ - 2 3 eval=-336 nodes=966960
r1bqkbnr/pppppp1p/2n5/6p1/7P/4P3/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-302 nodes=899886
rnbqkbnr/pppppppp/8/8/4P3/7P/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-324 nodes=924363
r1bqkbnr/pppppp1p/n7/6p1/8/N2P4/PPP1PPPP/R1BQKB1R w KQkq - 0 3 eval=-279 nodes=929880
rnbqkbnr/ppppp1p1/5p1p/8/P7/1P6/2PPPPPP/RNBQKB1R w KQkq - 0 3 eval=-314 nodes=932199
rnbqkbnr/ppp1pppp/8/3p4/3PP3/8/PPP2PPP/RNBQKB1R w KQkq - 0 3 eval=-351 nodes=931440
rnbqkbr1/pppppppp/7n/8/1P6/6P1/P1PPPP1P/RNBQKB1R w KQq - 1 3 eval=-327 nodes=922162
rnbq1bnr/pppkpppp/3p4/8/8/1PP5/P2PPPPP/R1BQKBNR w KQ - 1 3 eval=-257 nodes=952343
rnbqkbnr/p1ppp1pp/5p2/1p6/8/2PP4/PP2PPPP/R1BQKBNR w KQkq - 0 3 eval=-273 nodes=929893
rnbqkbnr/pppppppp/8/8/3P4/7P/PPP1PPP1/RNBQKB1R w KQkq - 1 3 eval=-343 nodes=924330
rnbqkbr1/pppppppp/5n2/8/1P2P3/8/P1PP1PPP/R1BQKBNR w KQq - 1 3 eval=-376 nodes=936465
rnbqkbnr/pppppp2/8/6pp/8/2P5/PPQPPPPP/R1B1KBNR w KQkq - 0 3 eval=-304 nodes=921110
rnbq1bnr/pppppkpp/5p2/8/8/6PN/PPPPPP1P/R1BQKB1R w KQ - 2 3 eval=-300 nodes=945856
r1bqkbnr/ppppp1pp/n4p2/8/8/2P2P2/PP1PP1PP/R1BQKBNR w KQkq - 0 3 eval=-348 nodes=949560
r1bqkbnr/ppppppp1/n7/7p/2B1P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3 eval=-341 nodes=917983

SF mean eval -320.2
epds=101 numpy.mean=-320.1782178217822 st dev=29.494888310142848

chrisw
Posts: 3674
Joined: Tue Apr 03, 2012 2:28 pm

Re: Stockfish Handicap Matches

Post by chrisw » Mon Jun 22, 2020 6:54 pm

Rebel wrote:
Mon Jun 22, 2020 6:12 pm
lkaufman wrote:
Mon Jun 22, 2020 4:48 pm
Rebel wrote:
Mon Jun 22, 2020 4:35 pm
lkaufman wrote:
Mon Jun 22, 2020 3:30 pm
Rebel wrote:
Mon Jun 22, 2020 6:05 am
@Larry - Ok, for the moment we do it the classic way with repeating colors. I took the first 200 positions from Chris set, so 400 games, tc=40/10.

Code: Select all

Komodo 10 - Fruit 2.1 83.9% WLD 315 44 41
Komodo 14 - Fruit 2.1 90.4% WLD 347 24 29
This is how you want to measure progress, isn't it?
Yes, but it looks like you are having each engine play the White side once, which is a waste of resources when the engines are far apart in strength. Presumably the stronger engine wins every game with Black a piece up, so it's easy enough to subtract 200 from the wins for the stronger engine to see the results when giving knight odds. I think that the reason that Komodo still beats fruit even with the 200 removed is that there was a flaw in the reference value used (see prior post), so these positions weren't even close to knight odds. With everything done right, I expect Fruit to come out ahead against both Stockfish and Komodo at knight odds.
1. I don't see the purpose of giving Komodo the advantage of a knight up against Fruit. So only 200 games (Fruit always knight up) now that Ferdy fixed the cute-chess obstacle. But of course you can do it yourself.

2. My guess would be that Fruit would do a lot better at longer time control, 40/60 instead of 40/10, that's not a good time control for the oldies. I will run it now, stay tuned.
Yes, 1. was my point, it was wasting resources. Yes, the weaker engine always does better with handicap with more time. But the main problem is that the positions are a full pawn off from knight odds. With a corrected set, even at 40/10 I'll bet on Fruit. The difference of 100 centipawns in the initial position is huge.
tc=40/60 from Chris knight-odds set.

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw
   0 Komodo_14                     228      50     200   78.8%   19.5%
   1 Benjamin                     -186      67     100   25.5%   23.0%
   2 Fruit_2.1                    -275      80     100   17.0%   16.0%
Looks pretty good result.

Uploaded a bunch more, including pawn odds, pawns odds games can get quite wild, might be good for detecting wild attacker engines.

pawn-f2-odds.epd
queen-for-nite-odds.epd
queen-for-rook-odds.epd
queen-odds.epd
rook-odds.epd
knight-odds.epd
no-castling-odds.epd

https://github.com/ChrisWhittington/Chess-EPDs

lkaufman
Posts: 4324
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: Stockfish Handicap Matches

Post by lkaufman » Mon Jun 22, 2020 7:15 pm

chrisw wrote:
Mon Jun 22, 2020 6:42 pm
lkaufman wrote:
Mon Jun 22, 2020 3:23 pm
chrisw wrote:
Mon Jun 22, 2020 7:20 am
lkaufman wrote:
Mon Jun 22, 2020 3:13 am
chrisw wrote:
Sun Jun 21, 2020 10:42 pm
lkaufman wrote:
Sun Jun 21, 2020 10:30 pm
Rebel wrote:
Sun Jun 21, 2020 10:09 pm
chrisw wrote:
Sun Jun 21, 2020 9:20 pm
Done 5600 EPDs off the start position minus b1 knight, played out all four ply combinations, culled all duplicates, culled all positions where SF11 evaluated more than +/-10 centipawns away from 300 centipawns (SF11 average score for all epds), and am now left with 5600 EPDs.

Link: https://github.com/ChrisWhittington/Che ... t-odds.epd

Will upload for no knight at g1 tomorrow am.


I checked several positions and they showed scores of 4 pawns or more down, as does the initial knight odds position, not -300.
You checked the listed epds for variance from -300? Which ones? What was the variance? What SF11 conditions are you using?

If you just want under 20 positions, take off the b1 knight, choose the 3 best White moves by SF11 multiPV, chose the 3 best replies to each of those, and repeat with g1 knight off. 18 positions, totally fair, no silly moves, real knight odds chess! If you want more just choose best 4 or best five for each side.
Edit: Whoops, rechecked, the mean is indeed -293, but I was selecting EPDs based on -340, so you should find the listed EPDs evaluate at around -340 centipawns. Will correct the EPD dump tomorrow am.
OK, -340 is at least within range of what I was seeing. With fixed depth searches in the range of what you were using I get evals like -390 or so, but I think fixed depth omits Contempt while movetime does not, so we're not so far apart if you add in Contempt. You were right to switch from 10 ms to 25; SF is quite weak at 10ms but strong enough at 25, the difference is huge. But I think it would be more useful to have fewer positions but no positions with ridiculous moves played, maybe only including moves in the top ten by multipv at each point for example. It doesn't seem like a simulation of knight odds if you force the players to play moves that no one over 800 rating would even consider.
Uploaded just now the knight-odds epds for positions without b1/g1 knights.

https://github.com/ChrisWhittington/Chess-EPDs and download knight-odds.epd

They're better balanced around the correct mean (-293 centipawns), and restricted to all 4-ply positions where SF11 at 25ms search returns a score of -293 =/- 10 centipawns. About 5000 unique positions in total.

Random subset:

Code: Select all

rnbqkb1r/p1pppppp/7n/1p6/3P4/1P6/P1P1PPPP/RNBQKB1R w KQkq - 0 3
r1bqkbnr/pppp1ppp/n7/4p3/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 0 3
rnbqk1nr/ppppbppp/4p3/8/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 2 3
r1bqkb1r/pppppppp/2n4n/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 3 3
rnbqkbnr/pppppppp/8/8/8/5N1P/PPPPPPP1/R1BQKB1R w KQkq - 1 3
rnbqkb1r/p1pppppp/7n/1p6/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 0 3
r1bqkbnr/ppppp1pp/2n5/5p2/4P3/2P5/PP1P1PPP/RNBQKB1R w KQkq - 0 3
rnbqkb1r/p1pppppp/1p5n/7Q/4P3/8/PPPP1PPP/RNB1KB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/8/1P6/B7/P1PPPPPP/RN1QKB1R w KQkq - 2 3
r1bqkbnr/ppppp1pp/n4p2/8/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3
r1bqkbnr/p1pppppp/np6/P7/8/8/1PPPPPPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/pppp1p1p/4p3/6p1/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/8/8/3P4/PPPBPPPP/RN1QKB1R w KQkq - 2 3
rnbqkb1r/1ppppppp/7n/p7/4P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3
r1bqkbnr/ppppp1pp/2n2p2/8/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3
1nbqkbnr/rppppppp/8/p7/8/3BP3/PPPP1PPP/RNBQK2R w KQk - 2 3
rnbqkbn1/pppppppr/8/7p/8/2N3P1/PPPPPP1P/R1BQKB1R w KQq - 2 3
r1bqkbnr/ppppp1pp/n4p2/8/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 0 3
rnbq1bnr/pppkpppp/8/3p4/8/4PP2/PPPP2PP/R1BQKBNR w KQ - 1 3
rnbq1bnr/pppkpppp/8/3p4/8/4P1P1/PPPP1P1P/R1BQKBNR w KQ - 1 3
rn1qkbnr/p1pppppp/b7/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 1 3
rnbqkbnr/1ppppp1p/8/p5p1/Q7/2P5/PP1PPPPP/RNB1KB1R w KQkq - 0 3
r1bqkbnr/ppppppp1/n7/4P2p/8/8/PPPP1PPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/p1ppppp1/7p/1P6/8/8/PP1PPPPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/ppppp2p/5p2/6p1/3P4/4B3/PPP1PPPP/RN1QKB1R w KQkq - 0 3
r1bqkbnr/pppppp1p/n7/6p1/8/1PN5/P1PPPPPP/R1BQKB1R w KQkq - 1 3
r1bqkbnr/p1pppppp/2n5/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 1 3
rnbqkb1r/ppppp1pp/7n/5p2/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3
rnbqkb1r/p1pppppp/7n/1p6/2P5/7P/PP1PPPP1/R1BQKBNR w KQkq - 0 3
rnbqk1nr/pppp1ppp/8/2b1p3/8/P4N2/1PPPPPPP/R1BQKB1R w KQkq - 2 3
It's swings and roundabout on position selection. If we did your method (which I'm not going to because there's a limit as to how much time I'm prepared to put in on this), we'ld get, at four ply 10x10x10x10 = 10000 positions, as opposed to the actual number from brute force which was 170000 unique, with 5000 within +/- 10 centipawns of the mean. So, your 10000 would be guaranteed to contain positions already divergent in score beyond the +/- 10 centipawn window. If we recap the objective, it was to have a good number of test positions (which, for me, includes the possibility to do mass testing of 1000's of games at bullet/blitz) with good variance, no duplicates, no great difference from one knight handicap and as close as feasible to the start position. The selection method is entirely without bias, it doesn't matter HOW the positions were arrived at, just as long as they are close to initial and neither engine gets a head-start in any position. That's done. Arguably perfectly. Engines are then left to fight it out from unbiased, wide, close to root start positions.
Your argument to use 10 wide selections for 4 ply is going to produce fewer positions, many of which are going to be already knight odds plus something way more than 10 centipawns because of the way your method chooses, and therefore, I would argue, less satisfying the original objective.

Anyway, since everything now appears to work, I'll leave this thing running, and produce a few thousand of each - bishop odds, rook odds and queen odds. Can SF11 beat Fruit at queen odds? that's be fun. Personally I doubt it, but we'll see.
I'm afraid it's clear that you made a typo or misread a number. You give the eval for the position with g1 off as -2.93 after 25 ms, which you use for your sample. I ran the same position, and got -3.94. Of course computers aren't all the same speed so some variation is to be expected, but the score only fluctuates a few centiply. It's pretty obvious that when you ran it you got -3.93 but either misread or mistyped the 3 as a 2. So the result is that you have a bunch of positions where Black is playing blunders that lose a pawn or similar positional score to drop from a 3.93 edge to a 2.93 edge. So nowhere near knight odds!. For the b1 off position I got -3.73. Not sure what you got for that position.
It’s a bit cheap throwing out insults about typos or misreading or whatever, when actually you are arguing method (albeit by other means).

The figures back from SF are accurate, I just rechecked them. No typos, no misreading.
I guess you decide knights odds games are 3.73 on basis you put the start position into SF and asked for a score? Sure, SF will find the supposed best line and evaluate it.

I’m doing something different, I am asking SF to evaluate every single position that arises from the start position after four moves. Several tens of thousands of positions where each side has had the same move opportunities (two moves each) to made boobies or brilliancies. The net effect of all these thousands of moves is to generate thousands of positions, each then evaluated by SF, with a mean eval of about -3.00 pawns. To be fair to both black and white, I then took everything that centred on that -300 centipawns, about 10% or so of the total.

You’re saying that’s wrong because according to SF at the root position, knights odds = -3.87. And because nobody would play the 4-ply move sequences. Well, so what? You lost sight of the objective. Generate a large unbiased set of positions to evaluate how different engines get on with “knights odds”. Generate positions close to the root. Generate positions where material and a further SF search show that neither sides chances changed much from their chances at root zero.
Well, that’s done. Thanks me very much. It’s a pleasure. No problem at all. Have fun with them.


Just for fun, not really wasting my time reproofing things, the SF11 500ms evaluations of the first 100 positions in the knights odds suites, below.

For that subset, Mean = -320.2, st deviation = 29.5. Looks like a fine result from here.

Code: Select all

Loading epds knight-odds.epd
SF11, default conditions, movetime 500
rnbqkb1r/p1pppppp/7n/1p6/3P4/1P6/P1P1PPPP/RNBQKB1R w KQkq - 0 3 eval=-314 nodes=835263
r1bqkbnr/pppp1ppp/n7/4p3/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 0 3 eval=-284 nodes=843868
rnbqk1nr/ppppbppp/4p3/8/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 2 3 eval=-376 nodes=948424
r1bqkb1r/pppppppp/2n4n/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 3 3 eval=-364 nodes=911584
rnbqkbnr/pppppppp/8/8/8/5N1P/PPPPPPP1/R1BQKB1R w KQkq - 1 3 eval=-356 nodes=924862
rnbqkb1r/p1pppppp/7n/1p6/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 0 3 eval=-248 nodes=877730
r1bqkbnr/ppppp1pp/2n5/5p2/4P3/2P5/PP1P1PPP/RNBQKB1R w KQkq - 0 3 eval=-311 nodes=924917
rnbqkb1r/p1pppppp/1p5n/7Q/4P3/8/PPPP1PPP/RNB1KB1R w KQkq - 0 3 eval=-354 nodes=888675
r1bqkbnr/ppppp1pp/n4p2/8/1P6/B7/P1PPPPPP/RN1QKB1R w KQkq - 2 3 eval=-344 nodes=892030
r1bqkbnr/ppppp1pp/n4p2/8/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3 eval=-336 nodes=861229
r1bqkbnr/p1pppppp/np6/P7/8/8/1PPPPPPP/R1BQKBNR w KQkq - 0 3 eval=-352 nodes=862023
rnbqkbnr/pppp1p1p/4p3/6p1/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3 eval=-253 nodes=904501
r1bqkbnr/ppppp1pp/n4p2/8/8/3P4/PPPBPPPP/RN1QKB1R w KQkq - 2 3 eval=-297 nodes=880860
rnbqkb1r/1ppppppp/7n/p7/4P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 0 3 eval=-321 nodes=909321
r1bqkbnr/ppppp1pp/n4p2/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3 eval=-284 nodes=954091
r1bqkbnr/ppppp1pp/2n2p2/8/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-328 nodes=881852
1nbqkbnr/rppppppp/8/p7/8/3BP3/PPPP1PPP/RNBQK2R w KQk - 2 3 eval=-384 nodes=950046
rnbqkbn1/pppppppr/8/7p/8/2N3P1/PPPPPP1P/R1BQKB1R w KQq - 2 3 eval=-325 nodes=882350
r1bqkbnr/ppppp1pp/n4p2/8/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 0 3 eval=-307 nodes=906494
rnbq1bnr/pppkpppp/8/3p4/8/4PP2/PPPP2PP/R1BQKBNR w KQ - 1 3 eval=-300 nodes=954439
rnbq1bnr/pppkpppp/8/3p4/8/4P1P1/PPPP1P1P/R1BQKBNR w KQ - 1 3 eval=-311 nodes=932958
rn1qkbnr/p1pppppp/b7/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 1 3 eval=-317 nodes=961294
rnbqkbnr/1ppppp1p/8/p5p1/Q7/2P5/PP1PPPPP/RNB1KB1R w KQkq - 0 3 eval=-272 nodes=917542
r1bqkbnr/ppppppp1/n7/4P2p/8/8/PPPP1PPP/R1BQKBNR w KQkq - 0 3 eval=-318 nodes=928039
rnbqkbnr/p1ppppp1/7p/1P6/8/8/PP1PPPPP/R1BQKBNR w KQkq - 0 3 eval=-350 nodes=920456
rnbqkbnr/ppppp2p/5p2/6p1/3P4/4B3/PPP1PPPP/RN1QKB1R w KQkq - 0 3 eval=-300 nodes=894388
r1bqkbnr/pppppp1p/n7/6p1/8/1PN5/P1PPPPPP/R1BQKB1R w KQkq - 1 3 eval=-302 nodes=915277
r1bqkbnr/p1pppppp/2n5/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 1 3 eval=-357 nodes=907541
rnbqkb1r/ppppp1pp/7n/5p2/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3 eval=-304 nodes=996995
rnbqkb1r/p1pppppp/7n/1p6/2P5/7P/PP1PPPP1/R1BQKBNR w KQkq - 0 3 eval=-354 nodes=895685
rnbqk1nr/pppp1ppp/8/2b1p3/8/P4N2/1PPPPPPP/R1BQKB1R w KQkq - 2 3 eval=-376 nodes=904702
r1bqkbnr/p1pppppp/n7/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 0 3 eval=-308 nodes=907345
r1bqkbnr/ppppppp1/n6B/8/3P4/8/PPP1PPPP/R2QKBNR w KQkq - 1 3 eval=-299 nodes=903610
r1bqkbnr/ppppp1pp/2n2p2/8/3P4/P7/1PP1PPPP/RNBQKB1R w KQkq - 0 3 eval=-330 nodes=885982
rnbqkbnr/p1ppp1pp/5p2/1p6/1P3P2/8/P1PPP1PP/R1BQKBNR w KQkq - 0 3 eval=-331 nodes=899112
rnbqkbnr/2pppppp/8/pp6/3P4/5P2/PPP1P1PP/RNBQKB1R w KQkq - 0 3 eval=-334 nodes=923006
r1bqkbnr/pppp1ppp/n7/4p3/3P3P/8/PPP1PPP1/R1BQKBNR w KQkq - 1 3 eval=-333 nodes=915197
rnbqkbnr/p1ppppp1/8/1p5p/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-328 nodes=894117
r1bqkbnr/1ppppppp/n7/p7/4P3/3P4/PPP2PPP/R1BQKBNR w KQkq - 1 3 eval=-338 nodes=941629
rnbqkbnr/p1pppp1p/6p1/1p6/4P3/P7/1PPP1PPP/R1BQKBNR w KQkq - 0 3 eval=-312 nodes=899001
r1bqkbnr/pppppp1p/n7/6p1/2P5/5P2/PP1PP1PP/RNBQKB1R w KQkq - 1 3 eval=-349 nodes=911472
rnbqkb1r/pppppp1p/5n2/6p1/8/3P3P/PPP1PPP1/R1BQKBNR w KQkq - 0 3 eval=-315 nodes=919954
1nbqkbnr/rppppppp/8/p7/4P3/8/PPPPQPPP/RNB1KB1R w KQk - 2 3 eval=-328 nodes=929187
rnbqkbn1/pppppppr/8/7p/8/3P2P1/PPP1PP1P/R1BQKBNR w KQq - 1 3 eval=-335 nodes=927437
r1bqkbnr/pppppp1p/2n5/6p1/3P4/8/PPP1PPPP/RNBQKBR1 w Qkq - 1 3 eval=-276 nodes=894955
rnbqkb1r/p1pppppp/7n/1p6/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3 eval=-293 nodes=895381
rnbqkb1r/pppppp1p/7n/6p1/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 2 3 eval=-274 nodes=910466
rnbqkbr1/pppppppp/7n/8/8/3P3P/PPP1PPP1/RNBQKB1R w KQq - 1 3 eval=-321 nodes=907502
rnbqkbnr/ppp1pp1p/8/3p2p1/7P/8/PPPPPPPR/R1BQKBN1 w Qkq - 0 3 eval=-300 nodes=917633
r1bqkbnr/p1pppppp/1pn5/8/4P2P/8/PPPP1PP1/R1BQKBNR w KQkq - 1 3 eval=-326 nodes=899245
rnbq1bnr/pppkpppp/3p4/8/8/P7/1PPPPPPP/RNBQKBR1 w Q - 2 3 eval=-318 nodes=979196
rnbq1bnr/pppppkpp/5p2/8/8/4P2P/PPPP1PP1/R1BQKBNR w KQ - 1 3 eval=-303 nodes=985615
rnbqkbnr/p1ppp1pp/5p2/1p6/3P4/5P2/PPP1P1PP/R1BQKBNR w KQkq - 0 3 eval=-290 nodes=901600
rnbq1bnr/pppkpppp/3p4/8/8/3P4/PPPQPPPP/RNB1KB1R w KQ - 2 3 eval=-319 nodes=935623
r1bqkbnr/p1pppppp/2n5/1p6/4P3/3B4/PPPP1PPP/R1BQK1NR w KQkq - 2 3 eval=-285 nodes=910467
rnbqkbnr/pp1ppp1p/2p5/6p1/P7/3P4/1PP1PPPP/R1BQKBNR w KQkq - 0 3 eval=-295 nodes=929138
r1bqkbnr/p1pppppp/np6/8/4P2P/8/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-308 nodes=917646
rnbqkbnr/ppppp1p1/5p1p/8/8/2P2P2/PP1PP1PP/R1BQKBNR w KQkq - 0 3 eval=-314 nodes=936621
rnbqkb1r/pppppppp/8/8/3P2n1/2N5/PPP1PPPP/R1BQKB1R w KQkq - 3 3 eval=-309 nodes=906583
r1bqkbnr/pppppp1p/n7/6p1/8/P6P/1PPPPPP1/RNBQKB1R w KQkq - 1 3 eval=-306 nodes=890932
1nbqkbnr/rppppppp/p7/8/4P3/7P/PPPP1PP1/RNBQKB1R w KQk - 1 3 eval=-341 nodes=931251
1nbqkbnr/rppppppp/8/p7/8/P4N2/1PPPPPPP/R1BQKB1R w KQk - 1 3 eval=-357 nodes=929376
rn1qkbnr/p1pppppp/b7/1p2P3/8/8/PPPP1PPP/RNBQKB1R w KQkq - 1 3 eval=-331 nodes=922525
rnbqkbnr/ppppp1p1/5p1p/8/7P/8/PPPPPPP1/1RBQKBNR w Kkq - 0 3 eval=-295 nodes=950421
rnbqkbnr/ppp1pp1p/3p4/6p1/8/P6N/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-316 nodes=914793
rnbqkb1r/pppp1ppp/5n2/4p3/5P2/2P5/PP1PP1PP/RNBQKB1R w KQkq - 0 3 eval=-412 nodes=919575
rnbqkb1r/pppp1ppp/4p2n/8/3P4/2P5/PP2PPPP/R1BQKBNR w KQkq - 1 3 eval=-349 nodes=903832
rnbqkb1r/pppp1ppp/7n/4p2Q/8/4P3/PPPP1PPP/RNB1KB1R w KQkq - 2 3 eval=-268 nodes=933657
rnbqkb1r/1ppppppp/p6n/8/3P4/2P5/PP2PPPP/RNBQKB1R w KQkq - 0 3 eval=-330 nodes=954821
r1bqkbnr/pppp1ppp/n7/4p3/3P4/7P/PPP1PPP1/R1BQKBNR w KQkq - 0 3 eval=-307 nodes=925412
rnbqkbnr/p1pppppp/8/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 0 3 eval=-348 nodes=922373
rnbqkb1r/ppppppp1/7n/7p/2P5/7P/PP1PPPP1/RNBQKB1R w KQkq - 1 3 eval=-331 nodes=893674
rnbqkb1r/pp1ppppp/2p4n/8/2P5/2N5/PP1PPPPP/R1BQKB1R w KQkq - 2 3 eval=-348 nodes=936309
rnbqk1nr/pppppp1p/7b/6p1/3P4/3Q4/PPP1PPPP/R1B1KBNR w KQkq - 2 3 eval=-314 nodes=895006
r1bqkbnr/pppppppp/8/8/1n2P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 1 3 eval=-351 nodes=911614
rnbq1bnr/pppkpppp/3p4/8/8/3P1P2/PPP1P1PP/R1BQKBNR w KQ - 1 3 eval=-295 nodes=977138
rnbqkbnr/1ppppp1p/8/p5p1/5P2/7N/PPPPP1PP/R1BQKB1R w KQkq - 0 3 eval=-303 nodes=933937
r1bqkb1r/pppppppp/n6n/8/2P5/5N2/PP1PPPPP/R1BQKB1R w KQkq - 3 3 eval=-325 nodes=911263
rnbqkb1r/ppppp1pp/7n/5p2/3P4/8/PPPNPPPP/R1BQKB1R w KQkq - 0 3 eval=-321 nodes=930623
rnbqkb1r/p1pppppp/7n/1p6/P7/8/1PPPPPPP/R1BQKBNR w KQkq - 1 3 eval=-281 nodes=895926
rnbqkb1r/p1pppppp/7n/1p6/8/1QP5/PP1PPPPP/RNB1KB1R w KQkq - 2 3 eval=-335 nodes=904891
1nbqkbnr/1ppppppp/r7/p7/1PP5/8/P2PPPPP/RNBQKB1R w KQk - 1 3 eval=-356 nodes=881990
rnbqkbnr/ppp2ppp/8/3pP3/8/8/PPP1PPPP/RNBQKB1R w KQkq d6 0 3 eval=-342 nodes=945740
rnbqkbnr/1pppp1pp/p4p2/8/3P4/8/PPP1PPPP/RNBQKB1R w KQkq - 0 3 eval=-347 nodes=921882
rnbqkb1r/pppppp1p/5n2/6p1/3P1P2/8/PPP1P1PP/RNBQKB1R w KQkq - 1 3 eval=-292 nodes=891075
rnb1kbnr/pppqpppp/8/3p4/4P3/2N5/PPPP1PPP/R1BQKB1R w KQkq - 2 3 eval=-292 nodes=900883
rnbq1bnr/pppppkpp/5p2/8/8/3P3N/PPP1PPPP/R1BQKB1R w KQ - 2 3 eval=-336 nodes=966960
r1bqkbnr/pppppp1p/2n5/6p1/7P/4P3/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-302 nodes=899886
rnbqkbnr/pppppppp/8/8/4P3/7P/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-324 nodes=924363
r1bqkbnr/pppppp1p/n7/6p1/8/N2P4/PPP1PPPP/R1BQKB1R w KQkq - 0 3 eval=-279 nodes=929880
rnbqkbnr/ppppp1p1/5p1p/8/P7/1P6/2PPPPPP/RNBQKB1R w KQkq - 0 3 eval=-314 nodes=932199
rnbqkbnr/ppp1pppp/8/3p4/3PP3/8/PPP2PPP/RNBQKB1R w KQkq - 0 3 eval=-351 nodes=931440
rnbqkbr1/pppppppp/7n/8/1P6/6P1/P1PPPP1P/RNBQKB1R w KQq - 1 3 eval=-327 nodes=922162
rnbq1bnr/pppkpppp/3p4/8/8/1PP5/P2PPPPP/R1BQKBNR w KQ - 1 3 eval=-257 nodes=952343
rnbqkbnr/p1ppp1pp/5p2/1p6/8/2PP4/PP2PPPP/R1BQKBNR w KQkq - 0 3 eval=-273 nodes=929893
rnbqkbnr/pppppppp/8/8/3P4/7P/PPP1PPP1/RNBQKB1R w KQkq - 1 3 eval=-343 nodes=924330
rnbqkbr1/pppppppp/5n2/8/1P2P3/8/P1PP1PPP/R1BQKBNR w KQq - 1 3 eval=-376 nodes=936465
rnbqkbnr/pppppp2/8/6pp/8/2P5/PPQPPPPP/R1B1KBNR w KQkq - 0 3 eval=-304 nodes=921110
rnbq1bnr/pppppkpp/5p2/8/8/6PN/PPPPPP1P/R1BQKB1R w KQ - 2 3 eval=-300 nodes=945856
r1bqkbnr/ppppp1pp/n4p2/8/8/2P2P2/PP1PP1PP/R1BQKBNR w KQkq - 0 3 eval=-348 nodes=949560
r1bqkbnr/ppppppp1/n7/7p/2B1P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3 eval=-341 nodes=917983

SF mean eval -320.2
epds=101 numpy.mean=-320.1782178217822 st dev=29.494888310142848
I wasn't trying to insult, I thought you had just run the initial position at knight odds to determine the score to center on, and since it was off by a full pawn, it looked like a typo. I think what is happening is that Black is far more likely to blunder on move 2 than White is, since White will have an extra unit developed, so the average position includes many where Black has blundered a pawn on his second move, thus reducing White's score deficit by a pawn. Since White can't blunder anything on ply 1, it would have been about fair if you did this for three plies rather than four, since each side would have one chance to blunder after the opponent had made a move. So as it is the positions are a valid set of handicap positions, but they are not on average close to knight odds, maybe something like knight minus half a pawn or so.
Komodo rules!

User avatar
Rebel
Posts: 5518
Joined: Thu Aug 18, 2011 10:04 am

Re: Stockfish Handicap Matches

Post by Rebel » Mon Jun 22, 2020 7:36 pm

chrisw wrote:
Mon Jun 22, 2020 6:54 pm
Rebel wrote:
Mon Jun 22, 2020 6:12 pm
lkaufman wrote:
Mon Jun 22, 2020 4:48 pm
Rebel wrote:
Mon Jun 22, 2020 4:35 pm
lkaufman wrote:
Mon Jun 22, 2020 3:30 pm
Rebel wrote:
Mon Jun 22, 2020 6:05 am
@Larry - Ok, for the moment we do it the classic way with repeating colors. I took the first 200 positions from Chris set, so 400 games, tc=40/10.

Code: Select all

Komodo 10 - Fruit 2.1 83.9% WLD 315 44 41
Komodo 14 - Fruit 2.1 90.4% WLD 347 24 29
This is how you want to measure progress, isn't it?
Yes, but it looks like you are having each engine play the White side once, which is a waste of resources when the engines are far apart in strength. Presumably the stronger engine wins every game with Black a piece up, so it's easy enough to subtract 200 from the wins for the stronger engine to see the results when giving knight odds. I think that the reason that Komodo still beats fruit even with the 200 removed is that there was a flaw in the reference value used (see prior post), so these positions weren't even close to knight odds. With everything done right, I expect Fruit to come out ahead against both Stockfish and Komodo at knight odds.
1. I don't see the purpose of giving Komodo the advantage of a knight up against Fruit. So only 200 games (Fruit always knight up) now that Ferdy fixed the cute-chess obstacle. But of course you can do it yourself.

2. My guess would be that Fruit would do a lot better at longer time control, 40/60 instead of 40/10, that's not a good time control for the oldies. I will run it now, stay tuned.
Yes, 1. was my point, it was wasting resources. Yes, the weaker engine always does better with handicap with more time. But the main problem is that the positions are a full pawn off from knight odds. With a corrected set, even at 40/10 I'll bet on Fruit. The difference of 100 centipawns in the initial position is huge.
tc=40/60 from Chris knight-odds set.

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw
   0 Komodo_14                     228      50     200   78.8%   19.5%
   1 Benjamin                     -186      67     100   25.5%   23.0%
   2 Fruit_2.1                    -275      80     100   17.0%   16.0%
Looks pretty good result.

Uploaded a bunch more, including pawn odds, pawns odds games can get quite wild, might be good for detecting wild attacker engines.

pawn-f2-odds.epd
queen-for-nite-odds.epd
queen-for-rook-odds.epd
queen-odds.epd
rook-odds.epd
knight-odds.epd
no-castling-odds.epd

https://github.com/ChrisWhittington/Chess-EPDs
Quick Stockfish gauntlet test tc=40/10

Code: Select all

No. Engine             1     2     3     4     5  Score  Games   Perc   Moves
-----------------------------------------------------------------------------
 1 Komodo_14       xxxxx  94.5   0.0   0.0   0.0   94.5 /  100 (94.50%)  55.5  
 2 Stockfish_11      5.5 xxxxx  15.0  17.5  50.0   88.0 /  400 (22.00%)  62.0  
 3 Houdini_6.03      0.0  85.0 xxxxx   0.0   0.0   85.0 /  100 (85.00%)  63.0  
 4 Laser_1.7         0.0  82.5   0.0 xxxxx   0.0   82.5 /  100 (82.50%)  62.2  
 5 Arasan_22         0.0  50.0   0.0   0.0 xxxxx   50.0 /  100 (50.00%)  67.3  
90% of coding is debugging, the other 10% is writing bugs.

chrisw
Posts: 3674
Joined: Tue Apr 03, 2012 2:28 pm

Re: Stockfish Handicap Matches

Post by chrisw » Mon Jun 22, 2020 7:44 pm

lkaufman wrote:
Mon Jun 22, 2020 7:15 pm
chrisw wrote:
Mon Jun 22, 2020 6:42 pm
lkaufman wrote:
Mon Jun 22, 2020 3:23 pm
chrisw wrote:
Mon Jun 22, 2020 7:20 am
lkaufman wrote:
Mon Jun 22, 2020 3:13 am
chrisw wrote:
Sun Jun 21, 2020 10:42 pm
lkaufman wrote:
Sun Jun 21, 2020 10:30 pm
Rebel wrote:
Sun Jun 21, 2020 10:09 pm
chrisw wrote:
Sun Jun 21, 2020 9:20 pm
Done 5600 EPDs off the start position minus b1 knight, played out all four ply combinations, culled all duplicates, culled all positions where SF11 evaluated more than +/-10 centipawns away from 300 centipawns (SF11 average score for all epds), and am now left with 5600 EPDs.

Link: https://github.com/ChrisWhittington/Che ... t-odds.epd

Will upload for no knight at g1 tomorrow am.


I checked several positions and they showed scores of 4 pawns or more down, as does the initial knight odds position, not -300.
You checked the listed epds for variance from -300? Which ones? What was the variance? What SF11 conditions are you using?

If you just want under 20 positions, take off the b1 knight, choose the 3 best White moves by SF11 multiPV, chose the 3 best replies to each of those, and repeat with g1 knight off. 18 positions, totally fair, no silly moves, real knight odds chess! If you want more just choose best 4 or best five for each side.
Edit: Whoops, rechecked, the mean is indeed -293, but I was selecting EPDs based on -340, so you should find the listed EPDs evaluate at around -340 centipawns. Will correct the EPD dump tomorrow am.
OK, -340 is at least within range of what I was seeing. With fixed depth searches in the range of what you were using I get evals like -390 or so, but I think fixed depth omits Contempt while movetime does not, so we're not so far apart if you add in Contempt. You were right to switch from 10 ms to 25; SF is quite weak at 10ms but strong enough at 25, the difference is huge. But I think it would be more useful to have fewer positions but no positions with ridiculous moves played, maybe only including moves in the top ten by multipv at each point for example. It doesn't seem like a simulation of knight odds if you force the players to play moves that no one over 800 rating would even consider.
Uploaded just now the knight-odds epds for positions without b1/g1 knights.

https://github.com/ChrisWhittington/Chess-EPDs and download knight-odds.epd

They're better balanced around the correct mean (-293 centipawns), and restricted to all 4-ply positions where SF11 at 25ms search returns a score of -293 =/- 10 centipawns. About 5000 unique positions in total.

Random subset:

Code: Select all

rnbqkb1r/p1pppppp/7n/1p6/3P4/1P6/P1P1PPPP/RNBQKB1R w KQkq - 0 3
r1bqkbnr/pppp1ppp/n7/4p3/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 0 3
rnbqk1nr/ppppbppp/4p3/8/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 2 3
r1bqkb1r/pppppppp/2n4n/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 3 3
rnbqkbnr/pppppppp/8/8/8/5N1P/PPPPPPP1/R1BQKB1R w KQkq - 1 3
rnbqkb1r/p1pppppp/7n/1p6/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 0 3
r1bqkbnr/ppppp1pp/2n5/5p2/4P3/2P5/PP1P1PPP/RNBQKB1R w KQkq - 0 3
rnbqkb1r/p1pppppp/1p5n/7Q/4P3/8/PPPP1PPP/RNB1KB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/8/1P6/B7/P1PPPPPP/RN1QKB1R w KQkq - 2 3
r1bqkbnr/ppppp1pp/n4p2/8/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3
r1bqkbnr/p1pppppp/np6/P7/8/8/1PPPPPPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/pppp1p1p/4p3/6p1/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/8/8/3P4/PPPBPPPP/RN1QKB1R w KQkq - 2 3
rnbqkb1r/1ppppppp/7n/p7/4P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3
r1bqkbnr/ppppp1pp/2n2p2/8/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3
1nbqkbnr/rppppppp/8/p7/8/3BP3/PPPP1PPP/RNBQK2R w KQk - 2 3
rnbqkbn1/pppppppr/8/7p/8/2N3P1/PPPPPP1P/R1BQKB1R w KQq - 2 3
r1bqkbnr/ppppp1pp/n4p2/8/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 0 3
rnbq1bnr/pppkpppp/8/3p4/8/4PP2/PPPP2PP/R1BQKBNR w KQ - 1 3
rnbq1bnr/pppkpppp/8/3p4/8/4P1P1/PPPP1P1P/R1BQKBNR w KQ - 1 3
rn1qkbnr/p1pppppp/b7/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 1 3
rnbqkbnr/1ppppp1p/8/p5p1/Q7/2P5/PP1PPPPP/RNB1KB1R w KQkq - 0 3
r1bqkbnr/ppppppp1/n7/4P2p/8/8/PPPP1PPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/p1ppppp1/7p/1P6/8/8/PP1PPPPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/ppppp2p/5p2/6p1/3P4/4B3/PPP1PPPP/RN1QKB1R w KQkq - 0 3
r1bqkbnr/pppppp1p/n7/6p1/8/1PN5/P1PPPPPP/R1BQKB1R w KQkq - 1 3
r1bqkbnr/p1pppppp/2n5/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 1 3
rnbqkb1r/ppppp1pp/7n/5p2/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3
rnbqkb1r/p1pppppp/7n/1p6/2P5/7P/PP1PPPP1/R1BQKBNR w KQkq - 0 3
rnbqk1nr/pppp1ppp/8/2b1p3/8/P4N2/1PPPPPPP/R1BQKB1R w KQkq - 2 3
It's swings and roundabout on position selection. If we did your method (which I'm not going to because there's a limit as to how much time I'm prepared to put in on this), we'ld get, at four ply 10x10x10x10 = 10000 positions, as opposed to the actual number from brute force which was 170000 unique, with 5000 within +/- 10 centipawns of the mean. So, your 10000 would be guaranteed to contain positions already divergent in score beyond the +/- 10 centipawn window. If we recap the objective, it was to have a good number of test positions (which, for me, includes the possibility to do mass testing of 1000's of games at bullet/blitz) with good variance, no duplicates, no great difference from one knight handicap and as close as feasible to the start position. The selection method is entirely without bias, it doesn't matter HOW the positions were arrived at, just as long as they are close to initial and neither engine gets a head-start in any position. That's done. Arguably perfectly. Engines are then left to fight it out from unbiased, wide, close to root start positions.
Your argument to use 10 wide selections for 4 ply is going to produce fewer positions, many of which are going to be already knight odds plus something way more than 10 centipawns because of the way your method chooses, and therefore, I would argue, less satisfying the original objective.

Anyway, since everything now appears to work, I'll leave this thing running, and produce a few thousand of each - bishop odds, rook odds and queen odds. Can SF11 beat Fruit at queen odds? that's be fun. Personally I doubt it, but we'll see.
I'm afraid it's clear that you made a typo or misread a number. You give the eval for the position with g1 off as -2.93 after 25 ms, which you use for your sample. I ran the same position, and got -3.94. Of course computers aren't all the same speed so some variation is to be expected, but the score only fluctuates a few centiply. It's pretty obvious that when you ran it you got -3.93 but either misread or mistyped the 3 as a 2. So the result is that you have a bunch of positions where Black is playing blunders that lose a pawn or similar positional score to drop from a 3.93 edge to a 2.93 edge. So nowhere near knight odds!. For the b1 off position I got -3.73. Not sure what you got for that position.
It’s a bit cheap throwing out insults about typos or misreading or whatever, when actually you are arguing method (albeit by other means).

The figures back from SF are accurate, I just rechecked them. No typos, no misreading.
I guess you decide knights odds games are 3.73 on basis you put the start position into SF and asked for a score? Sure, SF will find the supposed best line and evaluate it.

I’m doing something different, I am asking SF to evaluate every single position that arises from the start position after four moves. Several tens of thousands of positions where each side has had the same move opportunities (two moves each) to made boobies or brilliancies. The net effect of all these thousands of moves is to generate thousands of positions, each then evaluated by SF, with a mean eval of about -3.00 pawns. To be fair to both black and white, I then took everything that centred on that -300 centipawns, about 10% or so of the total.

You’re saying that’s wrong because according to SF at the root position, knights odds = -3.87. And because nobody would play the 4-ply move sequences. Well, so what? You lost sight of the objective. Generate a large unbiased set of positions to evaluate how different engines get on with “knights odds”. Generate positions close to the root. Generate positions where material and a further SF search show that neither sides chances changed much from their chances at root zero.
Well, that’s done. Thanks me very much. It’s a pleasure. No problem at all. Have fun with them.


Just for fun, not really wasting my time reproofing things, the SF11 500ms evaluations of the first 100 positions in the knights odds suites, below.

For that subset, Mean = -320.2, st deviation = 29.5. Looks like a fine result from here.

Code: Select all

Loading epds knight-odds.epd
SF11, default conditions, movetime 500
rnbqkb1r/p1pppppp/7n/1p6/3P4/1P6/P1P1PPPP/RNBQKB1R w KQkq - 0 3 eval=-314 nodes=835263
r1bqkbnr/pppp1ppp/n7/4p3/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 0 3 eval=-284 nodes=843868
rnbqk1nr/ppppbppp/4p3/8/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 2 3 eval=-376 nodes=948424
r1bqkb1r/pppppppp/2n4n/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 3 3 eval=-364 nodes=911584
rnbqkbnr/pppppppp/8/8/8/5N1P/PPPPPPP1/R1BQKB1R w KQkq - 1 3 eval=-356 nodes=924862
rnbqkb1r/p1pppppp/7n/1p6/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 0 3 eval=-248 nodes=877730
r1bqkbnr/ppppp1pp/2n5/5p2/4P3/2P5/PP1P1PPP/RNBQKB1R w KQkq - 0 3 eval=-311 nodes=924917
rnbqkb1r/p1pppppp/1p5n/7Q/4P3/8/PPPP1PPP/RNB1KB1R w KQkq - 0 3 eval=-354 nodes=888675
r1bqkbnr/ppppp1pp/n4p2/8/1P6/B7/P1PPPPPP/RN1QKB1R w KQkq - 2 3 eval=-344 nodes=892030
r1bqkbnr/ppppp1pp/n4p2/8/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3 eval=-336 nodes=861229
r1bqkbnr/p1pppppp/np6/P7/8/8/1PPPPPPP/R1BQKBNR w KQkq - 0 3 eval=-352 nodes=862023
rnbqkbnr/pppp1p1p/4p3/6p1/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3 eval=-253 nodes=904501
r1bqkbnr/ppppp1pp/n4p2/8/8/3P4/PPPBPPPP/RN1QKB1R w KQkq - 2 3 eval=-297 nodes=880860
rnbqkb1r/1ppppppp/7n/p7/4P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 0 3 eval=-321 nodes=909321
r1bqkbnr/ppppp1pp/n4p2/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3 eval=-284 nodes=954091
r1bqkbnr/ppppp1pp/2n2p2/8/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-328 nodes=881852
1nbqkbnr/rppppppp/8/p7/8/3BP3/PPPP1PPP/RNBQK2R w KQk - 2 3 eval=-384 nodes=950046
rnbqkbn1/pppppppr/8/7p/8/2N3P1/PPPPPP1P/R1BQKB1R w KQq - 2 3 eval=-325 nodes=882350
r1bqkbnr/ppppp1pp/n4p2/8/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 0 3 eval=-307 nodes=906494
rnbq1bnr/pppkpppp/8/3p4/8/4PP2/PPPP2PP/R1BQKBNR w KQ - 1 3 eval=-300 nodes=954439
rnbq1bnr/pppkpppp/8/3p4/8/4P1P1/PPPP1P1P/R1BQKBNR w KQ - 1 3 eval=-311 nodes=932958
rn1qkbnr/p1pppppp/b7/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 1 3 eval=-317 nodes=961294
rnbqkbnr/1ppppp1p/8/p5p1/Q7/2P5/PP1PPPPP/RNB1KB1R w KQkq - 0 3 eval=-272 nodes=917542
r1bqkbnr/ppppppp1/n7/4P2p/8/8/PPPP1PPP/R1BQKBNR w KQkq - 0 3 eval=-318 nodes=928039
rnbqkbnr/p1ppppp1/7p/1P6/8/8/PP1PPPPP/R1BQKBNR w KQkq - 0 3 eval=-350 nodes=920456
rnbqkbnr/ppppp2p/5p2/6p1/3P4/4B3/PPP1PPPP/RN1QKB1R w KQkq - 0 3 eval=-300 nodes=894388
r1bqkbnr/pppppp1p/n7/6p1/8/1PN5/P1PPPPPP/R1BQKB1R w KQkq - 1 3 eval=-302 nodes=915277
r1bqkbnr/p1pppppp/2n5/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 1 3 eval=-357 nodes=907541
rnbqkb1r/ppppp1pp/7n/5p2/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3 eval=-304 nodes=996995
rnbqkb1r/p1pppppp/7n/1p6/2P5/7P/PP1PPPP1/R1BQKBNR w KQkq - 0 3 eval=-354 nodes=895685
rnbqk1nr/pppp1ppp/8/2b1p3/8/P4N2/1PPPPPPP/R1BQKB1R w KQkq - 2 3 eval=-376 nodes=904702
r1bqkbnr/p1pppppp/n7/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 0 3 eval=-308 nodes=907345
r1bqkbnr/ppppppp1/n6B/8/3P4/8/PPP1PPPP/R2QKBNR w KQkq - 1 3 eval=-299 nodes=903610
r1bqkbnr/ppppp1pp/2n2p2/8/3P4/P7/1PP1PPPP/RNBQKB1R w KQkq - 0 3 eval=-330 nodes=885982
rnbqkbnr/p1ppp1pp/5p2/1p6/1P3P2/8/P1PPP1PP/R1BQKBNR w KQkq - 0 3 eval=-331 nodes=899112
rnbqkbnr/2pppppp/8/pp6/3P4/5P2/PPP1P1PP/RNBQKB1R w KQkq - 0 3 eval=-334 nodes=923006
r1bqkbnr/pppp1ppp/n7/4p3/3P3P/8/PPP1PPP1/R1BQKBNR w KQkq - 1 3 eval=-333 nodes=915197
rnbqkbnr/p1ppppp1/8/1p5p/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-328 nodes=894117
r1bqkbnr/1ppppppp/n7/p7/4P3/3P4/PPP2PPP/R1BQKBNR w KQkq - 1 3 eval=-338 nodes=941629
rnbqkbnr/p1pppp1p/6p1/1p6/4P3/P7/1PPP1PPP/R1BQKBNR w KQkq - 0 3 eval=-312 nodes=899001
r1bqkbnr/pppppp1p/n7/6p1/2P5/5P2/PP1PP1PP/RNBQKB1R w KQkq - 1 3 eval=-349 nodes=911472
rnbqkb1r/pppppp1p/5n2/6p1/8/3P3P/PPP1PPP1/R1BQKBNR w KQkq - 0 3 eval=-315 nodes=919954
1nbqkbnr/rppppppp/8/p7/4P3/8/PPPPQPPP/RNB1KB1R w KQk - 2 3 eval=-328 nodes=929187
rnbqkbn1/pppppppr/8/7p/8/3P2P1/PPP1PP1P/R1BQKBNR w KQq - 1 3 eval=-335 nodes=927437
r1bqkbnr/pppppp1p/2n5/6p1/3P4/8/PPP1PPPP/RNBQKBR1 w Qkq - 1 3 eval=-276 nodes=894955
rnbqkb1r/p1pppppp/7n/1p6/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3 eval=-293 nodes=895381
rnbqkb1r/pppppp1p/7n/6p1/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 2 3 eval=-274 nodes=910466
rnbqkbr1/pppppppp/7n/8/8/3P3P/PPP1PPP1/RNBQKB1R w KQq - 1 3 eval=-321 nodes=907502
rnbqkbnr/ppp1pp1p/8/3p2p1/7P/8/PPPPPPPR/R1BQKBN1 w Qkq - 0 3 eval=-300 nodes=917633
r1bqkbnr/p1pppppp/1pn5/8/4P2P/8/PPPP1PP1/R1BQKBNR w KQkq - 1 3 eval=-326 nodes=899245
rnbq1bnr/pppkpppp/3p4/8/8/P7/1PPPPPPP/RNBQKBR1 w Q - 2 3 eval=-318 nodes=979196
rnbq1bnr/pppppkpp/5p2/8/8/4P2P/PPPP1PP1/R1BQKBNR w KQ - 1 3 eval=-303 nodes=985615
rnbqkbnr/p1ppp1pp/5p2/1p6/3P4/5P2/PPP1P1PP/R1BQKBNR w KQkq - 0 3 eval=-290 nodes=901600
rnbq1bnr/pppkpppp/3p4/8/8/3P4/PPPQPPPP/RNB1KB1R w KQ - 2 3 eval=-319 nodes=935623
r1bqkbnr/p1pppppp/2n5/1p6/4P3/3B4/PPPP1PPP/R1BQK1NR w KQkq - 2 3 eval=-285 nodes=910467
rnbqkbnr/pp1ppp1p/2p5/6p1/P7/3P4/1PP1PPPP/R1BQKBNR w KQkq - 0 3 eval=-295 nodes=929138
r1bqkbnr/p1pppppp/np6/8/4P2P/8/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-308 nodes=917646
rnbqkbnr/ppppp1p1/5p1p/8/8/2P2P2/PP1PP1PP/R1BQKBNR w KQkq - 0 3 eval=-314 nodes=936621
rnbqkb1r/pppppppp/8/8/3P2n1/2N5/PPP1PPPP/R1BQKB1R w KQkq - 3 3 eval=-309 nodes=906583
r1bqkbnr/pppppp1p/n7/6p1/8/P6P/1PPPPPP1/RNBQKB1R w KQkq - 1 3 eval=-306 nodes=890932
1nbqkbnr/rppppppp/p7/8/4P3/7P/PPPP1PP1/RNBQKB1R w KQk - 1 3 eval=-341 nodes=931251
1nbqkbnr/rppppppp/8/p7/8/P4N2/1PPPPPPP/R1BQKB1R w KQk - 1 3 eval=-357 nodes=929376
rn1qkbnr/p1pppppp/b7/1p2P3/8/8/PPPP1PPP/RNBQKB1R w KQkq - 1 3 eval=-331 nodes=922525
rnbqkbnr/ppppp1p1/5p1p/8/7P/8/PPPPPPP1/1RBQKBNR w Kkq - 0 3 eval=-295 nodes=950421
rnbqkbnr/ppp1pp1p/3p4/6p1/8/P6N/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-316 nodes=914793
rnbqkb1r/pppp1ppp/5n2/4p3/5P2/2P5/PP1PP1PP/RNBQKB1R w KQkq - 0 3 eval=-412 nodes=919575
rnbqkb1r/pppp1ppp/4p2n/8/3P4/2P5/PP2PPPP/R1BQKBNR w KQkq - 1 3 eval=-349 nodes=903832
rnbqkb1r/pppp1ppp/7n/4p2Q/8/4P3/PPPP1PPP/RNB1KB1R w KQkq - 2 3 eval=-268 nodes=933657
rnbqkb1r/1ppppppp/p6n/8/3P4/2P5/PP2PPPP/RNBQKB1R w KQkq - 0 3 eval=-330 nodes=954821
r1bqkbnr/pppp1ppp/n7/4p3/3P4/7P/PPP1PPP1/R1BQKBNR w KQkq - 0 3 eval=-307 nodes=925412
rnbqkbnr/p1pppppp/8/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 0 3 eval=-348 nodes=922373
rnbqkb1r/ppppppp1/7n/7p/2P5/7P/PP1PPPP1/RNBQKB1R w KQkq - 1 3 eval=-331 nodes=893674
rnbqkb1r/pp1ppppp/2p4n/8/2P5/2N5/PP1PPPPP/R1BQKB1R w KQkq - 2 3 eval=-348 nodes=936309
rnbqk1nr/pppppp1p/7b/6p1/3P4/3Q4/PPP1PPPP/R1B1KBNR w KQkq - 2 3 eval=-314 nodes=895006
r1bqkbnr/pppppppp/8/8/1n2P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 1 3 eval=-351 nodes=911614
rnbq1bnr/pppkpppp/3p4/8/8/3P1P2/PPP1P1PP/R1BQKBNR w KQ - 1 3 eval=-295 nodes=977138
rnbqkbnr/1ppppp1p/8/p5p1/5P2/7N/PPPPP1PP/R1BQKB1R w KQkq - 0 3 eval=-303 nodes=933937
r1bqkb1r/pppppppp/n6n/8/2P5/5N2/PP1PPPPP/R1BQKB1R w KQkq - 3 3 eval=-325 nodes=911263
rnbqkb1r/ppppp1pp/7n/5p2/3P4/8/PPPNPPPP/R1BQKB1R w KQkq - 0 3 eval=-321 nodes=930623
rnbqkb1r/p1pppppp/7n/1p6/P7/8/1PPPPPPP/R1BQKBNR w KQkq - 1 3 eval=-281 nodes=895926
rnbqkb1r/p1pppppp/7n/1p6/8/1QP5/PP1PPPPP/RNB1KB1R w KQkq - 2 3 eval=-335 nodes=904891
1nbqkbnr/1ppppppp/r7/p7/1PP5/8/P2PPPPP/RNBQKB1R w KQk - 1 3 eval=-356 nodes=881990
rnbqkbnr/ppp2ppp/8/3pP3/8/8/PPP1PPPP/RNBQKB1R w KQkq d6 0 3 eval=-342 nodes=945740
rnbqkbnr/1pppp1pp/p4p2/8/3P4/8/PPP1PPPP/RNBQKB1R w KQkq - 0 3 eval=-347 nodes=921882
rnbqkb1r/pppppp1p/5n2/6p1/3P1P2/8/PPP1P1PP/RNBQKB1R w KQkq - 1 3 eval=-292 nodes=891075
rnb1kbnr/pppqpppp/8/3p4/4P3/2N5/PPPP1PPP/R1BQKB1R w KQkq - 2 3 eval=-292 nodes=900883
rnbq1bnr/pppppkpp/5p2/8/8/3P3N/PPP1PPPP/R1BQKB1R w KQ - 2 3 eval=-336 nodes=966960
r1bqkbnr/pppppp1p/2n5/6p1/7P/4P3/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-302 nodes=899886
rnbqkbnr/pppppppp/8/8/4P3/7P/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-324 nodes=924363
r1bqkbnr/pppppp1p/n7/6p1/8/N2P4/PPP1PPPP/R1BQKB1R w KQkq - 0 3 eval=-279 nodes=929880
rnbqkbnr/ppppp1p1/5p1p/8/P7/1P6/2PPPPPP/RNBQKB1R w KQkq - 0 3 eval=-314 nodes=932199
rnbqkbnr/ppp1pppp/8/3p4/3PP3/8/PPP2PPP/RNBQKB1R w KQkq - 0 3 eval=-351 nodes=931440
rnbqkbr1/pppppppp/7n/8/1P6/6P1/P1PPPP1P/RNBQKB1R w KQq - 1 3 eval=-327 nodes=922162
rnbq1bnr/pppkpppp/3p4/8/8/1PP5/P2PPPPP/R1BQKBNR w KQ - 1 3 eval=-257 nodes=952343
rnbqkbnr/p1ppp1pp/5p2/1p6/8/2PP4/PP2PPPP/R1BQKBNR w KQkq - 0 3 eval=-273 nodes=929893
rnbqkbnr/pppppppp/8/8/3P4/7P/PPP1PPP1/RNBQKB1R w KQkq - 1 3 eval=-343 nodes=924330
rnbqkbr1/pppppppp/5n2/8/1P2P3/8/P1PP1PPP/R1BQKBNR w KQq - 1 3 eval=-376 nodes=936465
rnbqkbnr/pppppp2/8/6pp/8/2P5/PPQPPPPP/R1B1KBNR w KQkq - 0 3 eval=-304 nodes=921110
rnbq1bnr/pppppkpp/5p2/8/8/6PN/PPPPPP1P/R1BQKB1R w KQ - 2 3 eval=-300 nodes=945856
r1bqkbnr/ppppp1pp/n4p2/8/8/2P2P2/PP1PP1PP/R1BQKBNR w KQkq - 0 3 eval=-348 nodes=949560
r1bqkbnr/ppppppp1/n7/7p/2B1P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3 eval=-341 nodes=917983

SF mean eval -320.2
epds=101 numpy.mean=-320.1782178217822 st dev=29.494888310142848
I wasn't trying to insult, I thought you had just run the initial position at knight odds to determine the score to center on, and since it was off by a full pawn, it looked like a typo.
Well, you thought wrong. Helpful advice when trying to decode other people: I am not you.

I think what is happening is that Black is far more likely to blunder on move 2 than White is, since White will have an extra unit developed, so the average position includes many where Black has blundered a pawn on his second move, thus reducing White's score deficit by a pawn. Since White can't blunder anything on ply 1, it would have been about fair if you did this for three plies rather than four, since each side would have one chance to blunder after the opponent had made a move.
This is getting silly. f4 e5 g4 etc. Each side has two possibilities to make good or bad positional move choices. It should all average out. a3 b6 Nh3 a5 etc. The important feature is that all the positions are of roughly equal chances for being a knight down, we did that by culling the eval outliers and centering on the mean. Then they can form a coherent testing suite.
So as it is the positions are a valid set of handicap positions,
Oh, thank you very belatedly much.
but they are not on average close to knight odds,
they are a mass of positions where white has one knight less, very close to the start position, and without either side being able (according to SF11 search proof) to press an immediate advantage. As such, they form a fine, unbiased and as balanced as the algorithm, using SF proof can be, set of thousands of positions to test knight advantage when given to a 'lesser' engine. That was the original idea. Make unbiased large test suite with as much play in it as possible to test how engines have developed over the years (for fun, btw, this is not a university research department into something important).
maybe something like knight minus half a pawn or so.
Shrugs. So what? It's a test suite of one knight down, varied close to root positions, proofed by SF11 to not give any immediate advantage save one knight to either side. They don't have to have any other quality than that.

lkaufman
Posts: 4324
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: Stockfish Handicap Matches

Post by lkaufman » Mon Jun 22, 2020 9:36 pm

chrisw wrote:
Mon Jun 22, 2020 7:44 pm
lkaufman wrote:
Mon Jun 22, 2020 7:15 pm
chrisw wrote:
Mon Jun 22, 2020 6:42 pm
lkaufman wrote:
Mon Jun 22, 2020 3:23 pm
chrisw wrote:
Mon Jun 22, 2020 7:20 am
lkaufman wrote:
Mon Jun 22, 2020 3:13 am
chrisw wrote:
Sun Jun 21, 2020 10:42 pm
lkaufman wrote:
Sun Jun 21, 2020 10:30 pm
Rebel wrote:
Sun Jun 21, 2020 10:09 pm
chrisw wrote:
Sun Jun 21, 2020 9:20 pm
Done 5600 EPDs off the start position minus b1 knight, played out all four ply combinations, culled all duplicates, culled all positions where SF11 evaluated more than +/-10 centipawns away from 300 centipawns (SF11 average score for all epds), and am now left with 5600 EPDs.

Link: https://github.com/ChrisWhittington/Che ... t-odds.epd

Will upload for no knight at g1 tomorrow am.


I checked several positions and they showed scores of 4 pawns or more down, as does the initial knight odds position, not -300.
You checked the listed epds for variance from -300? Which ones? What was the variance? What SF11 conditions are you using?

If you just want under 20 positions, take off the b1 knight, choose the 3 best White moves by SF11 multiPV, chose the 3 best replies to each of those, and repeat with g1 knight off. 18 positions, totally fair, no silly moves, real knight odds chess! If you want more just choose best 4 or best five for each side.
Edit: Whoops, rechecked, the mean is indeed -293, but I was selecting EPDs based on -340, so you should find the listed EPDs evaluate at around -340 centipawns. Will correct the EPD dump tomorrow am.
OK, -340 is at least within range of what I was seeing. With fixed depth searches in the range of what you were using I get evals like -390 or so, but I think fixed depth omits Contempt while movetime does not, so we're not so far apart if you add in Contempt. You were right to switch from 10 ms to 25; SF is quite weak at 10ms but strong enough at 25, the difference is huge. But I think it would be more useful to have fewer positions but no positions with ridiculous moves played, maybe only including moves in the top ten by multipv at each point for example. It doesn't seem like a simulation of knight odds if you force the players to play moves that no one over 800 rating would even consider.
Uploaded just now the knight-odds epds for positions without b1/g1 knights.

https://github.com/ChrisWhittington/Chess-EPDs and download knight-odds.epd

They're better balanced around the correct mean (-293 centipawns), and restricted to all 4-ply positions where SF11 at 25ms search returns a score of -293 =/- 10 centipawns. About 5000 unique positions in total.

Random subset:

Code: Select all

rnbqkb1r/p1pppppp/7n/1p6/3P4/1P6/P1P1PPPP/RNBQKB1R w KQkq - 0 3
r1bqkbnr/pppp1ppp/n7/4p3/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 0 3
rnbqk1nr/ppppbppp/4p3/8/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 2 3
r1bqkb1r/pppppppp/2n4n/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 3 3
rnbqkbnr/pppppppp/8/8/8/5N1P/PPPPPPP1/R1BQKB1R w KQkq - 1 3
rnbqkb1r/p1pppppp/7n/1p6/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 0 3
r1bqkbnr/ppppp1pp/2n5/5p2/4P3/2P5/PP1P1PPP/RNBQKB1R w KQkq - 0 3
rnbqkb1r/p1pppppp/1p5n/7Q/4P3/8/PPPP1PPP/RNB1KB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/8/1P6/B7/P1PPPPPP/RN1QKB1R w KQkq - 2 3
r1bqkbnr/ppppp1pp/n4p2/8/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3
r1bqkbnr/p1pppppp/np6/P7/8/8/1PPPPPPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/pppp1p1p/4p3/6p1/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/8/8/3P4/PPPBPPPP/RN1QKB1R w KQkq - 2 3
rnbqkb1r/1ppppppp/7n/p7/4P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3
r1bqkbnr/ppppp1pp/2n2p2/8/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3
1nbqkbnr/rppppppp/8/p7/8/3BP3/PPPP1PPP/RNBQK2R w KQk - 2 3
rnbqkbn1/pppppppr/8/7p/8/2N3P1/PPPPPP1P/R1BQKB1R w KQq - 2 3
r1bqkbnr/ppppp1pp/n4p2/8/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 0 3
rnbq1bnr/pppkpppp/8/3p4/8/4PP2/PPPP2PP/R1BQKBNR w KQ - 1 3
rnbq1bnr/pppkpppp/8/3p4/8/4P1P1/PPPP1P1P/R1BQKBNR w KQ - 1 3
rn1qkbnr/p1pppppp/b7/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 1 3
rnbqkbnr/1ppppp1p/8/p5p1/Q7/2P5/PP1PPPPP/RNB1KB1R w KQkq - 0 3
r1bqkbnr/ppppppp1/n7/4P2p/8/8/PPPP1PPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/p1ppppp1/7p/1P6/8/8/PP1PPPPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/ppppp2p/5p2/6p1/3P4/4B3/PPP1PPPP/RN1QKB1R w KQkq - 0 3
r1bqkbnr/pppppp1p/n7/6p1/8/1PN5/P1PPPPPP/R1BQKB1R w KQkq - 1 3
r1bqkbnr/p1pppppp/2n5/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 1 3
rnbqkb1r/ppppp1pp/7n/5p2/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3
rnbqkb1r/p1pppppp/7n/1p6/2P5/7P/PP1PPPP1/R1BQKBNR w KQkq - 0 3
rnbqk1nr/pppp1ppp/8/2b1p3/8/P4N2/1PPPPPPP/R1BQKB1R w KQkq - 2 3
It's swings and roundabout on position selection. If we did your method (which I'm not going to because there's a limit as to how much time I'm prepared to put in on this), we'ld get, at four ply 10x10x10x10 = 10000 positions, as opposed to the actual number from brute force which was 170000 unique, with 5000 within +/- 10 centipawns of the mean. So, your 10000 would be guaranteed to contain positions already divergent in score beyond the +/- 10 centipawn window. If we recap the objective, it was to have a good number of test positions (which, for me, includes the possibility to do mass testing of 1000's of games at bullet/blitz) with good variance, no duplicates, no great difference from one knight handicap and as close as feasible to the start position. The selection method is entirely without bias, it doesn't matter HOW the positions were arrived at, just as long as they are close to initial and neither engine gets a head-start in any position. That's done. Arguably perfectly. Engines are then left to fight it out from unbiased, wide, close to root start positions.
Your argument to use 10 wide selections for 4 ply is going to produce fewer positions, many of which are going to be already knight odds plus something way more than 10 centipawns because of the way your method chooses, and therefore, I would argue, less satisfying the original objective.

Anyway, since everything now appears to work, I'll leave this thing running, and produce a few thousand of each - bishop odds, rook odds and queen odds. Can SF11 beat Fruit at queen odds? that's be fun. Personally I doubt it, but we'll see.
I'm afraid it's clear that you made a typo or misread a number. You give the eval for the position with g1 off as -2.93 after 25 ms, which you use for your sample. I ran the same position, and got -3.94. Of course computers aren't all the same speed so some variation is to be expected, but the score only fluctuates a few centiply. It's pretty obvious that when you ran it you got -3.93 but either misread or mistyped the 3 as a 2. So the result is that you have a bunch of positions where Black is playing blunders that lose a pawn or similar positional score to drop from a 3.93 edge to a 2.93 edge. So nowhere near knight odds!. For the b1 off position I got -3.73. Not sure what you got for that position.
It’s a bit cheap throwing out insults about typos or misreading or whatever, when actually you are arguing method (albeit by other means).

The figures back from SF are accurate, I just rechecked them. No typos, no misreading.
I guess you decide knights odds games are 3.73 on basis you put the start position into SF and asked for a score? Sure, SF will find the supposed best line and evaluate it.

I’m doing something different, I am asking SF to evaluate every single position that arises from the start position after four moves. Several tens of thousands of positions where each side has had the same move opportunities (two moves each) to made boobies or brilliancies. The net effect of all these thousands of moves is to generate thousands of positions, each then evaluated by SF, with a mean eval of about -3.00 pawns. To be fair to both black and white, I then took everything that centred on that -300 centipawns, about 10% or so of the total.

You’re saying that’s wrong because according to SF at the root position, knights odds = -3.87. And because nobody would play the 4-ply move sequences. Well, so what? You lost sight of the objective. Generate a large unbiased set of positions to evaluate how different engines get on with “knights odds”. Generate positions close to the root. Generate positions where material and a further SF search show that neither sides chances changed much from their chances at root zero.
Well, that’s done. Thanks me very much. It’s a pleasure. No problem at all. Have fun with them.


Just for fun, not really wasting my time reproofing things, the SF11 500ms evaluations of the first 100 positions in the knights odds suites, below.

For that subset, Mean = -320.2, st deviation = 29.5. Looks like a fine result from here.

Code: Select all

Loading epds knight-odds.epd
SF11, default conditions, movetime 500
rnbqkb1r/p1pppppp/7n/1p6/3P4/1P6/P1P1PPPP/RNBQKB1R w KQkq - 0 3 eval=-314 nodes=835263
r1bqkbnr/pppp1ppp/n7/4p3/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 0 3 eval=-284 nodes=843868
rnbqk1nr/ppppbppp/4p3/8/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 2 3 eval=-376 nodes=948424
r1bqkb1r/pppppppp/2n4n/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 3 3 eval=-364 nodes=911584
rnbqkbnr/pppppppp/8/8/8/5N1P/PPPPPPP1/R1BQKB1R w KQkq - 1 3 eval=-356 nodes=924862
rnbqkb1r/p1pppppp/7n/1p6/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 0 3 eval=-248 nodes=877730
r1bqkbnr/ppppp1pp/2n5/5p2/4P3/2P5/PP1P1PPP/RNBQKB1R w KQkq - 0 3 eval=-311 nodes=924917
rnbqkb1r/p1pppppp/1p5n/7Q/4P3/8/PPPP1PPP/RNB1KB1R w KQkq - 0 3 eval=-354 nodes=888675
r1bqkbnr/ppppp1pp/n4p2/8/1P6/B7/P1PPPPPP/RN1QKB1R w KQkq - 2 3 eval=-344 nodes=892030
r1bqkbnr/ppppp1pp/n4p2/8/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3 eval=-336 nodes=861229
r1bqkbnr/p1pppppp/np6/P7/8/8/1PPPPPPP/R1BQKBNR w KQkq - 0 3 eval=-352 nodes=862023
rnbqkbnr/pppp1p1p/4p3/6p1/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3 eval=-253 nodes=904501
r1bqkbnr/ppppp1pp/n4p2/8/8/3P4/PPPBPPPP/RN1QKB1R w KQkq - 2 3 eval=-297 nodes=880860
rnbqkb1r/1ppppppp/7n/p7/4P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 0 3 eval=-321 nodes=909321
r1bqkbnr/ppppp1pp/n4p2/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3 eval=-284 nodes=954091
r1bqkbnr/ppppp1pp/2n2p2/8/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-328 nodes=881852
1nbqkbnr/rppppppp/8/p7/8/3BP3/PPPP1PPP/RNBQK2R w KQk - 2 3 eval=-384 nodes=950046
rnbqkbn1/pppppppr/8/7p/8/2N3P1/PPPPPP1P/R1BQKB1R w KQq - 2 3 eval=-325 nodes=882350
r1bqkbnr/ppppp1pp/n4p2/8/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 0 3 eval=-307 nodes=906494
rnbq1bnr/pppkpppp/8/3p4/8/4PP2/PPPP2PP/R1BQKBNR w KQ - 1 3 eval=-300 nodes=954439
rnbq1bnr/pppkpppp/8/3p4/8/4P1P1/PPPP1P1P/R1BQKBNR w KQ - 1 3 eval=-311 nodes=932958
rn1qkbnr/p1pppppp/b7/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 1 3 eval=-317 nodes=961294
rnbqkbnr/1ppppp1p/8/p5p1/Q7/2P5/PP1PPPPP/RNB1KB1R w KQkq - 0 3 eval=-272 nodes=917542
r1bqkbnr/ppppppp1/n7/4P2p/8/8/PPPP1PPP/R1BQKBNR w KQkq - 0 3 eval=-318 nodes=928039
rnbqkbnr/p1ppppp1/7p/1P6/8/8/PP1PPPPP/R1BQKBNR w KQkq - 0 3 eval=-350 nodes=920456
rnbqkbnr/ppppp2p/5p2/6p1/3P4/4B3/PPP1PPPP/RN1QKB1R w KQkq - 0 3 eval=-300 nodes=894388
r1bqkbnr/pppppp1p/n7/6p1/8/1PN5/P1PPPPPP/R1BQKB1R w KQkq - 1 3 eval=-302 nodes=915277
r1bqkbnr/p1pppppp/2n5/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 1 3 eval=-357 nodes=907541
rnbqkb1r/ppppp1pp/7n/5p2/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3 eval=-304 nodes=996995
rnbqkb1r/p1pppppp/7n/1p6/2P5/7P/PP1PPPP1/R1BQKBNR w KQkq - 0 3 eval=-354 nodes=895685
rnbqk1nr/pppp1ppp/8/2b1p3/8/P4N2/1PPPPPPP/R1BQKB1R w KQkq - 2 3 eval=-376 nodes=904702
r1bqkbnr/p1pppppp/n7/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 0 3 eval=-308 nodes=907345
r1bqkbnr/ppppppp1/n6B/8/3P4/8/PPP1PPPP/R2QKBNR w KQkq - 1 3 eval=-299 nodes=903610
r1bqkbnr/ppppp1pp/2n2p2/8/3P4/P7/1PP1PPPP/RNBQKB1R w KQkq - 0 3 eval=-330 nodes=885982
rnbqkbnr/p1ppp1pp/5p2/1p6/1P3P2/8/P1PPP1PP/R1BQKBNR w KQkq - 0 3 eval=-331 nodes=899112
rnbqkbnr/2pppppp/8/pp6/3P4/5P2/PPP1P1PP/RNBQKB1R w KQkq - 0 3 eval=-334 nodes=923006
r1bqkbnr/pppp1ppp/n7/4p3/3P3P/8/PPP1PPP1/R1BQKBNR w KQkq - 1 3 eval=-333 nodes=915197
rnbqkbnr/p1ppppp1/8/1p5p/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-328 nodes=894117
r1bqkbnr/1ppppppp/n7/p7/4P3/3P4/PPP2PPP/R1BQKBNR w KQkq - 1 3 eval=-338 nodes=941629
rnbqkbnr/p1pppp1p/6p1/1p6/4P3/P7/1PPP1PPP/R1BQKBNR w KQkq - 0 3 eval=-312 nodes=899001
r1bqkbnr/pppppp1p/n7/6p1/2P5/5P2/PP1PP1PP/RNBQKB1R w KQkq - 1 3 eval=-349 nodes=911472
rnbqkb1r/pppppp1p/5n2/6p1/8/3P3P/PPP1PPP1/R1BQKBNR w KQkq - 0 3 eval=-315 nodes=919954
1nbqkbnr/rppppppp/8/p7/4P3/8/PPPPQPPP/RNB1KB1R w KQk - 2 3 eval=-328 nodes=929187
rnbqkbn1/pppppppr/8/7p/8/3P2P1/PPP1PP1P/R1BQKBNR w KQq - 1 3 eval=-335 nodes=927437
r1bqkbnr/pppppp1p/2n5/6p1/3P4/8/PPP1PPPP/RNBQKBR1 w Qkq - 1 3 eval=-276 nodes=894955
rnbqkb1r/p1pppppp/7n/1p6/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3 eval=-293 nodes=895381
rnbqkb1r/pppppp1p/7n/6p1/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 2 3 eval=-274 nodes=910466
rnbqkbr1/pppppppp/7n/8/8/3P3P/PPP1PPP1/RNBQKB1R w KQq - 1 3 eval=-321 nodes=907502
rnbqkbnr/ppp1pp1p/8/3p2p1/7P/8/PPPPPPPR/R1BQKBN1 w Qkq - 0 3 eval=-300 nodes=917633
r1bqkbnr/p1pppppp/1pn5/8/4P2P/8/PPPP1PP1/R1BQKBNR w KQkq - 1 3 eval=-326 nodes=899245
rnbq1bnr/pppkpppp/3p4/8/8/P7/1PPPPPPP/RNBQKBR1 w Q - 2 3 eval=-318 nodes=979196
rnbq1bnr/pppppkpp/5p2/8/8/4P2P/PPPP1PP1/R1BQKBNR w KQ - 1 3 eval=-303 nodes=985615
rnbqkbnr/p1ppp1pp/5p2/1p6/3P4/5P2/PPP1P1PP/R1BQKBNR w KQkq - 0 3 eval=-290 nodes=901600
rnbq1bnr/pppkpppp/3p4/8/8/3P4/PPPQPPPP/RNB1KB1R w KQ - 2 3 eval=-319 nodes=935623
r1bqkbnr/p1pppppp/2n5/1p6/4P3/3B4/PPPP1PPP/R1BQK1NR w KQkq - 2 3 eval=-285 nodes=910467
rnbqkbnr/pp1ppp1p/2p5/6p1/P7/3P4/1PP1PPPP/R1BQKBNR w KQkq - 0 3 eval=-295 nodes=929138
r1bqkbnr/p1pppppp/np6/8/4P2P/8/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-308 nodes=917646
rnbqkbnr/ppppp1p1/5p1p/8/8/2P2P2/PP1PP1PP/R1BQKBNR w KQkq - 0 3 eval=-314 nodes=936621
rnbqkb1r/pppppppp/8/8/3P2n1/2N5/PPP1PPPP/R1BQKB1R w KQkq - 3 3 eval=-309 nodes=906583
r1bqkbnr/pppppp1p/n7/6p1/8/P6P/1PPPPPP1/RNBQKB1R w KQkq - 1 3 eval=-306 nodes=890932
1nbqkbnr/rppppppp/p7/8/4P3/7P/PPPP1PP1/RNBQKB1R w KQk - 1 3 eval=-341 nodes=931251
1nbqkbnr/rppppppp/8/p7/8/P4N2/1PPPPPPP/R1BQKB1R w KQk - 1 3 eval=-357 nodes=929376
rn1qkbnr/p1pppppp/b7/1p2P3/8/8/PPPP1PPP/RNBQKB1R w KQkq - 1 3 eval=-331 nodes=922525
rnbqkbnr/ppppp1p1/5p1p/8/7P/8/PPPPPPP1/1RBQKBNR w Kkq - 0 3 eval=-295 nodes=950421
rnbqkbnr/ppp1pp1p/3p4/6p1/8/P6N/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-316 nodes=914793
rnbqkb1r/pppp1ppp/5n2/4p3/5P2/2P5/PP1PP1PP/RNBQKB1R w KQkq - 0 3 eval=-412 nodes=919575
rnbqkb1r/pppp1ppp/4p2n/8/3P4/2P5/PP2PPPP/R1BQKBNR w KQkq - 1 3 eval=-349 nodes=903832
rnbqkb1r/pppp1ppp/7n/4p2Q/8/4P3/PPPP1PPP/RNB1KB1R w KQkq - 2 3 eval=-268 nodes=933657
rnbqkb1r/1ppppppp/p6n/8/3P4/2P5/PP2PPPP/RNBQKB1R w KQkq - 0 3 eval=-330 nodes=954821
r1bqkbnr/pppp1ppp/n7/4p3/3P4/7P/PPP1PPP1/R1BQKBNR w KQkq - 0 3 eval=-307 nodes=925412
rnbqkbnr/p1pppppp/8/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 0 3 eval=-348 nodes=922373
rnbqkb1r/ppppppp1/7n/7p/2P5/7P/PP1PPPP1/RNBQKB1R w KQkq - 1 3 eval=-331 nodes=893674
rnbqkb1r/pp1ppppp/2p4n/8/2P5/2N5/PP1PPPPP/R1BQKB1R w KQkq - 2 3 eval=-348 nodes=936309
rnbqk1nr/pppppp1p/7b/6p1/3P4/3Q4/PPP1PPPP/R1B1KBNR w KQkq - 2 3 eval=-314 nodes=895006
r1bqkbnr/pppppppp/8/8/1n2P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 1 3 eval=-351 nodes=911614
rnbq1bnr/pppkpppp/3p4/8/8/3P1P2/PPP1P1PP/R1BQKBNR w KQ - 1 3 eval=-295 nodes=977138
rnbqkbnr/1ppppp1p/8/p5p1/5P2/7N/PPPPP1PP/R1BQKB1R w KQkq - 0 3 eval=-303 nodes=933937
r1bqkb1r/pppppppp/n6n/8/2P5/5N2/PP1PPPPP/R1BQKB1R w KQkq - 3 3 eval=-325 nodes=911263
rnbqkb1r/ppppp1pp/7n/5p2/3P4/8/PPPNPPPP/R1BQKB1R w KQkq - 0 3 eval=-321 nodes=930623
rnbqkb1r/p1pppppp/7n/1p6/P7/8/1PPPPPPP/R1BQKBNR w KQkq - 1 3 eval=-281 nodes=895926
rnbqkb1r/p1pppppp/7n/1p6/8/1QP5/PP1PPPPP/RNB1KB1R w KQkq - 2 3 eval=-335 nodes=904891
1nbqkbnr/1ppppppp/r7/p7/1PP5/8/P2PPPPP/RNBQKB1R w KQk - 1 3 eval=-356 nodes=881990
rnbqkbnr/ppp2ppp/8/3pP3/8/8/PPP1PPPP/RNBQKB1R w KQkq d6 0 3 eval=-342 nodes=945740
rnbqkbnr/1pppp1pp/p4p2/8/3P4/8/PPP1PPPP/RNBQKB1R w KQkq - 0 3 eval=-347 nodes=921882
rnbqkb1r/pppppp1p/5n2/6p1/3P1P2/8/PPP1P1PP/RNBQKB1R w KQkq - 1 3 eval=-292 nodes=891075
rnb1kbnr/pppqpppp/8/3p4/4P3/2N5/PPPP1PPP/R1BQKB1R w KQkq - 2 3 eval=-292 nodes=900883
rnbq1bnr/pppppkpp/5p2/8/8/3P3N/PPP1PPPP/R1BQKB1R w KQ - 2 3 eval=-336 nodes=966960
r1bqkbnr/pppppp1p/2n5/6p1/7P/4P3/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-302 nodes=899886
rnbqkbnr/pppppppp/8/8/4P3/7P/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-324 nodes=924363
r1bqkbnr/pppppp1p/n7/6p1/8/N2P4/PPP1PPPP/R1BQKB1R w KQkq - 0 3 eval=-279 nodes=929880
rnbqkbnr/ppppp1p1/5p1p/8/P7/1P6/2PPPPPP/RNBQKB1R w KQkq - 0 3 eval=-314 nodes=932199
rnbqkbnr/ppp1pppp/8/3p4/3PP3/8/PPP2PPP/RNBQKB1R w KQkq - 0 3 eval=-351 nodes=931440
rnbqkbr1/pppppppp/7n/8/1P6/6P1/P1PPPP1P/RNBQKB1R w KQq - 1 3 eval=-327 nodes=922162
rnbq1bnr/pppkpppp/3p4/8/8/1PP5/P2PPPPP/R1BQKBNR w KQ - 1 3 eval=-257 nodes=952343
rnbqkbnr/p1ppp1pp/5p2/1p6/8/2PP4/PP2PPPP/R1BQKBNR w KQkq - 0 3 eval=-273 nodes=929893
rnbqkbnr/pppppppp/8/8/3P4/7P/PPP1PPP1/RNBQKB1R w KQkq - 1 3 eval=-343 nodes=924330
rnbqkbr1/pppppppp/5n2/8/1P2P3/8/P1PP1PPP/R1BQKBNR w KQq - 1 3 eval=-376 nodes=936465
rnbqkbnr/pppppp2/8/6pp/8/2P5/PPQPPPPP/R1B1KBNR w KQkq - 0 3 eval=-304 nodes=921110
rnbq1bnr/pppppkpp/5p2/8/8/6PN/PPPPPP1P/R1BQKB1R w KQ - 2 3 eval=-300 nodes=945856
r1bqkbnr/ppppp1pp/n4p2/8/8/2P2P2/PP1PP1PP/R1BQKBNR w KQkq - 0 3 eval=-348 nodes=949560
r1bqkbnr/ppppppp1/n7/7p/2B1P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3 eval=-341 nodes=917983

SF mean eval -320.2
epds=101 numpy.mean=-320.1782178217822 st dev=29.494888310142848
I wasn't trying to insult, I thought you had just run the initial position at knight odds to determine the score to center on, and since it was off by a full pawn, it looked like a typo.
Well, you thought wrong. Helpful advice when trying to decode other people: I am not you.

I think what is happening is that Black is far more likely to blunder on move 2 than White is, since White will have an extra unit developed, so the average position includes many where Black has blundered a pawn on his second move, thus reducing White's score deficit by a pawn. Since White can't blunder anything on ply 1, it would have been about fair if you did this for three plies rather than four, since each side would have one chance to blunder after the opponent had made a move.
This is getting silly. f4 e5 g4 etc. Each side has two possibilities to make good or bad positional move choices. It should all average out. a3 b6 Nh3 a5 etc. The important feature is that all the positions are of roughly equal chances for being a knight down, we did that by culling the eval outliers and centering on the mean. Then they can form a coherent testing suite.
So as it is the positions are a valid set of handicap positions,
Oh, thank you very belatedly much.
but they are not on average close to knight odds,
they are a mass of positions where white has one knight less, very close to the start position, and without either side being able (according to SF11 search proof) to press an immediate advantage. As such, they form a fine, unbiased and as balanced as the algorithm, using SF proof can be, set of thousands of positions to test knight advantage when given to a 'lesser' engine. That was the original idea. Make unbiased large test suite with as much play in it as possible to test how engines have developed over the years (for fun, btw, this is not a university research department into something important).
maybe something like knight minus half a pawn or so.
Shrugs. So what? It's a test suite of one knight down, varied close to root positions, proofed by SF11 to not give any immediate advantage save one knight to either side. They don't have to have any other quality than that.
Okay, I guess my only objection now is that when pruning the final positions, it would be better to prune based on the Stockfish eval (at some reasonable depth, maybe ten seconds or so) of the initial handicap position, rather than the average of a bunch of positions that include many blunders, even queen blunders, mostly by Black, because White is twice as likely to have developed something to take the blundered piece. A piece is worth about four pawns in the opening, according to theory, to Stockfish, to Komodo, etc. So if the average eval is around 3 pawns, it means that on average Black is blundering about a full pawn on average. So they don't have the quality of having been shown to retain the full knight advantage, only about a knight for a pawn on average. This is quite surprising, I wouldn't have anticipated this large a bias from the blunders in only four ply.
Komodo rules!

lkaufman
Posts: 4324
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: Stockfish Handicap Matches

Post by lkaufman » Mon Jun 22, 2020 9:49 pm

Rebel wrote:
Mon Jun 22, 2020 7:36 pm
chrisw wrote:
Mon Jun 22, 2020 6:54 pm
Rebel wrote:
Mon Jun 22, 2020 6:12 pm
lkaufman wrote:
Mon Jun 22, 2020 4:48 pm
Rebel wrote:
Mon Jun 22, 2020 4:35 pm
lkaufman wrote:
Mon Jun 22, 2020 3:30 pm
Rebel wrote:
Mon Jun 22, 2020 6:05 am
@Larry - Ok, for the moment we do it the classic way with repeating colors. I took the first 200 positions from Chris set, so 400 games, tc=40/10.

Code: Select all

Komodo 10 - Fruit 2.1 83.9% WLD 315 44 41
Komodo 14 - Fruit 2.1 90.4% WLD 347 24 29
This is how you want to measure progress, isn't it?
Yes, but it looks like you are having each engine play the White side once, which is a waste of resources when the engines are far apart in strength. Presumably the stronger engine wins every game with Black a piece up, so it's easy enough to subtract 200 from the wins for the stronger engine to see the results when giving knight odds. I think that the reason that Komodo still beats fruit even with the 200 removed is that there was a flaw in the reference value used (see prior post), so these positions weren't even close to knight odds. With everything done right, I expect Fruit to come out ahead against both Stockfish and Komodo at knight odds.
1. I don't see the purpose of giving Komodo the advantage of a knight up against Fruit. So only 200 games (Fruit always knight up) now that Ferdy fixed the cute-chess obstacle. But of course you can do it yourself.

2. My guess would be that Fruit would do a lot better at longer time control, 40/60 instead of 40/10, that's not a good time control for the oldies. I will run it now, stay tuned.
Yes, 1. was my point, it was wasting resources. Yes, the weaker engine always does better with handicap with more time. But the main problem is that the positions are a full pawn off from knight odds. With a corrected set, even at 40/10 I'll bet on Fruit. The difference of 100 centipawns in the initial position is huge.
tc=40/60 from Chris knight-odds set.

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw
   0 Komodo_14                     228      50     200   78.8%   19.5%
   1 Benjamin                     -186      67     100   25.5%   23.0%
   2 Fruit_2.1                    -275      80     100   17.0%   16.0%
Looks pretty good result.

Uploaded a bunch more, including pawn odds, pawns odds games can get quite wild, might be good for detecting wild attacker engines.

pawn-f2-odds.epd
queen-for-nite-odds.epd
queen-for-rook-odds.epd
queen-odds.epd
rook-odds.epd
knight-odds.epd
no-castling-odds.epd

https://github.com/ChrisWhittington/Chess-EPDs
Quick Stockfish gauntlet test tc=40/10

Code: Select all

No. Engine             1     2     3     4     5  Score  Games   Perc   Moves
-----------------------------------------------------------------------------
 1 Komodo_14       xxxxx  94.5   0.0   0.0   0.0   94.5 /  100 (94.50%)  55.5  
 2 Stockfish_11      5.5 xxxxx  15.0  17.5  50.0   88.0 /  400 (22.00%)  62.0  
 3 Houdini_6.03      0.0  85.0 xxxxx   0.0   0.0   85.0 /  100 (85.00%)  63.0  
 4 Laser_1.7         0.0  82.5   0.0 xxxxx   0.0   82.5 /  100 (82.50%)  62.2  
 5 Arasan_22         0.0  50.0   0.0   0.0 xxxxx   50.0 /  100 (50.00%)  67.3  
I don't see which handicap set this data is based on. Seems like the knight odds set, if it was the f2 set Komodo couldn't score 94.5% vs SF, since f2 handicap is just a little over the winning margin, lots of chances for White to save a draw now and then.
Komodo rules!

Post Reply