Stockfish Handicap Matches

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Stockfish Handicap Matches

Post by Rebel »

@Larry - Ok, for the moment we do it the classic way with repeating colors. I took the first 200 positions from Chris set, so 400 games, tc=40/10.

Code: Select all

Komodo 10 - Fruit 2.1 83.9% WLD 315 44 41
Komodo 14 - Fruit 2.1 90.4% WLD 347 24 29
This is how you want to measure progress, isn't it?
90% of coding is debugging, the other 10% is writing bugs.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Stockfish Handicap Matches

Post by chrisw »

Rebel wrote: Mon Jun 22, 2020 7:18 am
chrisw wrote: Mon Jun 22, 2020 12:47 am Updated. The test set is now mix of knight g1/b1 removed.

https://github.com/ChrisWhittington/Che ... t-odds.epd

Random sample:

Code: Select all

rnb1kbnr/ppqppppp/8/2P5/8/8/P1PPPPPP/R1BQKBNR w KQkq - 1 3
rnbqkbnr/p1pppppp/8/1p6/8/1P4P1/P1PPPP1P/RNBQKB1R w KQkq - 0 3
rnbqkbnr/2pppppp/1p6/p7/8/2N1P3/PPPP1PPP/R1BQKB1R w KQkq - 0 3
rnbqkbnr/2pppppp/8/pp6/8/4P3/PPPP1PPP/1RBQKBNR w Kkq - 0 3
rnbqkb1r/pppppppp/8/8/3P2n1/8/PPPBPPPP/R2QKBNR w KQkq - 3 3
rnbqkbnr/ppp1p1pp/3p4/5p2/8/2P4N/PP1PPPPP/R1BQKB1R w KQkq - 0 3
rnbqkbnr/pp1ppppp/8/7Q/2p1P3/8/PPPP1PPP/RNB1KB1R w KQkq - 0 3
rnbqkbnr/p1p1pppp/8/1p1p4/3P4/2N5/PPP1PPPP/R1BQKB1R w KQkq - 0 3
rnbqkbnr/pppppp2/6p1/7p/8/6P1/PPPPPPBP/RNBQK2R w KQkq - 0 3
r1bqkbnr/p1pppppp/np6/8/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 1 3
rnbqkbnr/p1p1pppp/3p4/1p6/7P/4P3/PPPP1PP1/R1BQKBNR w KQkq - 0 3
rnbqkbnr/1ppppp1p/p7/6p1/8/4P3/PPPPBPPP/R1BQK1NR w KQkq - 0 3
rnbqkbnr/p1pppp1p/1p6/6p1/8/PP6/2PPPPPP/RNBQKB1R w KQkq - 0 3
rnbqkbr1/pppppppp/7n/8/8/5PP1/PPPPP2P/R1BQKBNR w KQq - 1 3
r1bqkbnr/pppppppp/8/8/1n2PP2/8/PPPP2PP/R1BQKBNR w KQkq - 1 3
r1bqkbnr/ppppppp1/2n5/7p/7P/2N5/PPPPPPP1/R1BQKB1R w KQkq - 0 3
rnbqkbnr/pppppppp/8/8/5P2/3P4/PPP1P1PP/R1BQKBNR w KQkq - 1 3
rnbqkbnr/p1p1pppp/3p4/1p6/4P3/8/PPPPQPPP/RNB1KB1R w KQkq - 0 3
rnbqkb1r/p1pppppp/5n2/1p6/2P5/P7/1P1PPPPP/RNBQKB1R w KQkq - 0 3
rnbqkb1r/pp1ppppp/7n/2p5/1P6/8/PBPPPPPP/R2QKBNR w KQkq - 2 3
rnbqkb1r/1ppppppp/7n/p7/P7/N7/1PPPPPPP/R1BQKB1R w KQkq - 0 3
rnbqkbnr/pppppp2/7p/6p1/8/P5P1/1PPPPP1P/RNBQKB1R w KQkq - 0 3
rn1qkbnr/ppp1pppp/3pb3/8/2P5/5N2/PP1PPPPP/R1BQKB1R w KQkq - 1 3
rnbqkbnr/1ppppp1p/8/p5p1/P7/8/1PPPPPPP/RNBQKBR1 w Qkq - 0 3
rnbqkb1r/pppppp1p/5n2/6p1/2P5/7P/PP1PPPP1/R1BQKBNR w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/8/8/1P5P/P1PPPPP1/R1BQKBNR w KQkq - 0 3
rnb1kbnr/ppppqppp/4p3/8/8/2N4P/PPPPPPP1/R1BQKB1R w KQkq - 1 3
rnb1kbnr/pppp1ppp/4p3/8/3P3q/8/PPPBPPPP/RN1QKB1R w KQkq - 2 3
rnbqkbnr/ppp1pp1p/3p2p1/8/2PP4/8/PP2PPPP/RNBQKB1R w KQkq - 0 3
rnbqkbnr/ppp1p1pp/3p1p2/8/8/P2P4/1PP1PPPP/R1BQKBNR w KQkq - 0 3
rnbqkb1r/ppppp1pp/7n/5p2/P6P/8/1PPPPPP1/RNBQKB1R w KQkq - 0 3
rnbqkbnr/1pppppp1/8/p6p/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 0 3
rnbqkbnr/p2ppppp/2p5/1p6/2P5/4P3/PP1P1PPP/RNBQKB1R w KQkq - 0 3
rnbqkbnr/pp1pp1pp/5p2/2p5/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 0 3
rnbqkbnr/1pppppp1/B6p/8/8/4P3/PPPP1PPP/R1BQK1NR w KQkq - 0 3
rnbqkb1r/1ppppppp/p6n/8/6P1/2P5/PP1PPP1P/RNBQKB1R w KQkq - 0 3
rnb1kbnr/pppp1ppp/8/4p3/3P3q/7N/PPP1PPPP/R1BQKB1R w KQkq - 1 3
rnbqkbnr/pp1ppp1p/2p5/6p1/3P2P1/8/PPP1PP1P/RNBQKB1R w KQkq - 0 3
r1bqkbnr/ppppppp1/n7/1B5p/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3
rnbqkbnr/pp1ppp1p/2p5/3P2p1/8/8/PPP1PPPP/RNBQKB1R w KQkq - 0 3
rnbqkb1r/ppp1pppp/7n/3p4/2P5/1Q6/PP1PPPPP/R1B1KBNR w KQkq - 0 3
r1bqkbnr/1ppppppp/2n5/p7/3P3P/8/PPP1PPP1/R1BQKBNR w KQkq - 1 3
rnbqk1nr/ppppppbp/8/6p1/P7/2P5/1P1PPPPP/RNBQKB1R w KQkq - 1 3
r1bqkbnr/pp1ppppp/n1p5/8/3P4/5P2/PPP1P1PP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/pp1ppp1p/2p5/6p1/8/3P1P2/PPP1P1PP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/pppp1p1p/4p3/6p1/2P2P2/8/PP1PP1PP/RNBQKB1R w KQkq - 0 3
r1bqkb1r/pppppppp/n4n2/8/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 1 3
rn1qkbnr/ppp1pppp/3pb3/8/3P4/4P3/PPP2PPP/R1BQKBNR w KQkq - 1 3
rnbqkbnr/p2ppppp/2p5/1p6/4P3/8/PPPP1PPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/1ppppp1p/p7/6pP/8/8/PPPPPPP1/R1BQKBNR w KQkq - 0 3
rnbqkbnr/ppppp2p/5pp1/8/3P4/5P2/PPP1P1PP/R1BQKBNR w KQkq - 0 3
rnbqkb1r/ppppp1pp/7n/5p2/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 0 3
rnbqkbnr/1ppp1ppp/4p3/p7/8/2P2N2/PP1PPPPP/R1BQKB1R w KQkq - 0 3
rnbqkbnr/pp1pp1pp/2p5/5p2/8/3P4/PPPBPPPP/RN1QKB1R w KQkq - 0 3
There is no download button for the file, can you mail it ?
If you just click the file in GitHub, it will open, then you can grab the data via cut and paste.
Otherwise, I think the clone button will grab everything
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Stockfish Handicap Matches

Post by chrisw »

lkaufman wrote: Mon Jun 22, 2020 5:13 am
chrisw wrote: Mon Jun 22, 2020 12:42 am
lkaufman wrote: Mon Jun 22, 2020 12:30 am
Rebel wrote: Mon Jun 22, 2020 12:09 am
chrisw wrote: Sun Jun 21, 2020 11:20 pm Done 5600 EPDs off the start position minus b1 knight, played out all four ply combinations, culled all duplicates, culled all positions where SF11 evaluated more than +/-10 centipawns away from 300 centipawns (SF11 average score for all epds), and am now left with 5600 EPDs.

Link: https://github.com/ChrisWhittington/Che ... t-odds.epd

Will upload for no knight at g1 tomorrow am.

Small randomised sample below

Code: Select all

rnbqkbnr/pppp2pp/5p2/4p3/4P2P/8/PPPP1PP1/RNBQKB1R w KQkq - 0 3
rnbqk1nr/pppp1ppp/3bp3/8/3P4/8/PPP1PPPP/RNBQKBR1 w Qkq - 2 3
rnbqkb1r/pppppp1p/6pn/8/3P2P1/8/PPP1PP1P/RNBQKB1R w KQkq - 0 3
rnbqkbnr/p1p1pppp/1p1p4/8/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3
rnbqkb1r/ppppnppp/8/4p3/8/2P3P1/PP1PPP1P/RNBQKB1R w KQkq - 1 3
rnbqkbnr/p1ppppp1/8/1p5p/8/1QP5/PP1PPPPP/RNB1KB1R w KQkq - 0 3
rnbqkbnr/pp1ppp1p/2p5/6p1/8/N4P2/PPPPP1PP/R1BQKB1R w KQkq - 0 3
rnbqkbnr/p1ppppp1/7p/1p6/1P6/P7/2PPPPPP/RNBQKB1R w KQkq - 0 3
rnbqkbnr/1p1ppppp/p7/2p5/2P5/4P3/PP1P1PPP/RNBQKB1R w KQkq - 0 3
rnbqkbnr/pppp1p1p/4p3/6p1/P7/3P4/1PP1PPPP/RNBQKB1R w KQkq - 0 3
rnbqkb1r/1ppppppp/p6n/8/1P6/2N5/P1PPPPPP/R1BQKB1R w KQkq - 1 3
r1bqkbnr/ppppp1pp/2n5/5p2/4P3/3P4/PPP2PPP/RNBQKB1R w KQkq - 0 3
rnbqkbnr/pppppp2/6p1/7p/3P4/2N5/PPP1PPPP/R1BQKB1R w KQkq - 0 3
r1bqkbnr/pppppp1p/n5p1/8/8/P6P/1PPPPPP1/RNBQKB1R w KQkq - 1 3
r1bqkbnr/ppppp1pp/2n2p2/8/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 1 3
rnbqkb1r/pp1ppppp/7n/2p5/3P4/8/PPPBPPPP/RN1QKB1R w KQkq - 0 3
r1bqkbnr/pppppppp/8/n3P3/8/8/PPPP1PPP/RNBQKB1R w KQkq - 1 3
rnbqkb1r/pppppppp/8/6P1/4n3/8/PPPPPP1P/RNBQKB1R w KQkq - 1 3
rnbqkbr1/pppppppp/5n2/8/1P2P3/8/P1PP1PPP/RNBQKB1R w KQq - 1 3
rnbq1bnr/pppkpppp/8/3p4/8/NP6/P1PPPPPP/R1BQKB1R w KQ - 1 3
r1bqkbnr/ppppp1pp/n4p2/8/8/1P1P4/P1P1PPPP/RNBQKB1R w KQkq - 0 3
rnbqkbnr/1pppppp1/p6p/8/3P1B2/8/PPP1PPPP/RN1QKB1R w KQkq - 0 3
rnbqkb1r/pppppp1p/7n/6p1/5P2/8/PPPPP1PP/RNBQKBR1 w Qkq - 1 3
r1bqkbnr/pp1ppppp/n1p5/8/3P4/P7/1PP1PPPP/RNBQKB1R w KQkq - 1 3
r1bqkbnr/ppppp1pp/2n2p2/8/4P3/6P1/PPPP1P1P/RNBQKB1R w KQkq - 1 3
r1bqkbnr/ppppppp1/n6p/2P5/8/8/PP1PPPPP/RNBQKB1R w KQkq - 1 3
rnbqkbn1/pppppppr/7p/8/7P/1P6/P1PPPPP1/RNBQKB1R w KQq - 1 3
rnbqkbnr/ppp1p1pp/3p4/5p2/6P1/2P5/PP1PPP1P/RNBQKB1R w KQkq - 0 3
rnbqkb1r/pppp1ppp/4p2n/8/8/4P3/PPPPBPPP/RNBQK2R w KQkq - 2 3
rnbqkbnr/pppppp1p/8/8/6p1/P5P1/1PPPPP1P/RNBQKB1R w KQkq - 0 3
1nbqkbnr/1ppppppp/r7/p7/8/NP6/P1PPPPPP/R1BQKB1R w KQk - 2 3
r1bqkb1r/pppppppp/2n4n/8/8/2NP4/PPP1PPPP/R1BQKB1R w KQkq - 3 3
rnbqkbr1/pppppppp/5n2/8/8/P6P/1PPPPPP1/RNBQKB1R w KQq - 1 3
Cool... hopefully there is a solution for the cute-chess obstacle.
Something was really wrong,
Well. Results are results. The average SF eval for all the 4-ply positions, duplicates culled, is within a whisker of 300 centipawns at 25ms search, default SF11.

Check the epds listed, they ought, under similar search parameters, to result in SF scores of -300 +/- 10

I checked several positions and they showed scores of 4 pawns or more down, as does the initial knight odds position, not -300.
You checked the listed epds for variance from -300? Which ones? What was the variance? What SF11 conditions are you using?

If you just want under 20 positions, take off the b1 knight, choose the 3 best White moves by SF11 multiPV, chose the 3 best replies to each of those, and repeat with g1 knight off. 18 positions, totally fair, no silly moves, real knight odds chess! If you want more just choose best 4 or best five for each side.
Edit: Whoops, rechecked, the mean is indeed -293, but I was selecting EPDs based on -340, so you should find the listed EPDs evaluate at around -340 centipawns. Will correct the EPD dump tomorrow am.
OK, -340 is at least within range of what I was seeing. With fixed depth searches in the range of what you were using I get evals like -390 or so, but I think fixed depth omits Contempt while movetime does not, so we're not so far apart if you add in Contempt. You were right to switch from 10 ms to 25; SF is quite weak at 10ms but strong enough at 25, the difference is huge. But I think it would be more useful to have fewer positions but no positions with ridiculous moves played, maybe only including moves in the top ten by multipv at each point for example. It doesn't seem like a simulation of knight odds if you force the players to play moves that no one over 800 rating would even consider.
Uploaded just now the knight-odds epds for positions without b1/g1 knights.

https://github.com/ChrisWhittington/Chess-EPDs and download knight-odds.epd

They're better balanced around the correct mean (-293 centipawns), and restricted to all 4-ply positions where SF11 at 25ms search returns a score of -293 =/- 10 centipawns. About 5000 unique positions in total.

Random subset:

Code: Select all

rnbqkb1r/p1pppppp/7n/1p6/3P4/1P6/P1P1PPPP/RNBQKB1R w KQkq - 0 3
r1bqkbnr/pppp1ppp/n7/4p3/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 0 3
rnbqk1nr/ppppbppp/4p3/8/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 2 3
r1bqkb1r/pppppppp/2n4n/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 3 3
rnbqkbnr/pppppppp/8/8/8/5N1P/PPPPPPP1/R1BQKB1R w KQkq - 1 3
rnbqkb1r/p1pppppp/7n/1p6/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 0 3
r1bqkbnr/ppppp1pp/2n5/5p2/4P3/2P5/PP1P1PPP/RNBQKB1R w KQkq - 0 3
rnbqkb1r/p1pppppp/1p5n/7Q/4P3/8/PPPP1PPP/RNB1KB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/8/1P6/B7/P1PPPPPP/RN1QKB1R w KQkq - 2 3
r1bqkbnr/ppppp1pp/n4p2/8/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3
r1bqkbnr/p1pppppp/np6/P7/8/8/1PPPPPPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/pppp1p1p/4p3/6p1/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/8/8/3P4/PPPBPPPP/RN1QKB1R w KQkq - 2 3
rnbqkb1r/1ppppppp/7n/p7/4P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3
r1bqkbnr/ppppp1pp/2n2p2/8/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3
1nbqkbnr/rppppppp/8/p7/8/3BP3/PPPP1PPP/RNBQK2R w KQk - 2 3
rnbqkbn1/pppppppr/8/7p/8/2N3P1/PPPPPP1P/R1BQKB1R w KQq - 2 3
r1bqkbnr/ppppp1pp/n4p2/8/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 0 3
rnbq1bnr/pppkpppp/8/3p4/8/4PP2/PPPP2PP/R1BQKBNR w KQ - 1 3
rnbq1bnr/pppkpppp/8/3p4/8/4P1P1/PPPP1P1P/R1BQKBNR w KQ - 1 3
rn1qkbnr/p1pppppp/b7/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 1 3
rnbqkbnr/1ppppp1p/8/p5p1/Q7/2P5/PP1PPPPP/RNB1KB1R w KQkq - 0 3
r1bqkbnr/ppppppp1/n7/4P2p/8/8/PPPP1PPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/p1ppppp1/7p/1P6/8/8/PP1PPPPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/ppppp2p/5p2/6p1/3P4/4B3/PPP1PPPP/RN1QKB1R w KQkq - 0 3
r1bqkbnr/pppppp1p/n7/6p1/8/1PN5/P1PPPPPP/R1BQKB1R w KQkq - 1 3
r1bqkbnr/p1pppppp/2n5/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 1 3
rnbqkb1r/ppppp1pp/7n/5p2/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3
rnbqkb1r/p1pppppp/7n/1p6/2P5/7P/PP1PPPP1/R1BQKBNR w KQkq - 0 3
rnbqk1nr/pppp1ppp/8/2b1p3/8/P4N2/1PPPPPPP/R1BQKB1R w KQkq - 2 3
It's swings and roundabout on position selection. If we did your method (which I'm not going to because there's a limit as to how much time I'm prepared to put in on this), we'ld get, at four ply 10x10x10x10 = 10000 positions, as opposed to the actual number from brute force which was 170000 unique, with 5000 within +/- 10 centipawns of the mean. So, your 10000 would be guaranteed to contain positions already divergent in score beyond the +/- 10 centipawn window. If we recap the objective, it was to have a good number of test positions (which, for me, includes the possibility to do mass testing of 1000's of games at bullet/blitz) with good variance, no duplicates, no great difference from one knight handicap and as close as feasible to the start position. The selection method is entirely without bias, it doesn't matter HOW the positions were arrived at, just as long as they are close to initial and neither engine gets a head-start in any position. That's done. Arguably perfectly. Engines are then left to fight it out from unbiased, wide, close to root start positions.
Your argument to use 10 wide selections for 4 ply is going to produce fewer positions, many of which are going to be already knight odds plus something way more than 10 centipawns because of the way your method chooses, and therefore, I would argue, less satisfying the original objective.

Anyway, since everything now appears to work, I'll leave this thing running, and produce a few thousand of each - bishop odds, rook odds and queen odds. Can SF11 beat Fruit at queen odds? that's be fun. Personally I doubt it, but we'll see.
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Stockfish Handicap Matches

Post by xr_a_y »

I tried something this morning at short TC (10+0.1), Minic versus Stockfish with odds. Each time 100 games.
The funny thing is that with knight and color odds Minic won easily, also won quite easily with only knight odds.
But with pawn odd (with or without color) or castling odds, or even g pawn + castling + color odds Minic is badly losing.

What is a good start configuration between knight odds and g pawn + castling + color odds ?
b and g pawn + castling + color odds ?
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Stockfish Handicap Matches

Post by chrisw »

added:
queen-odds
rook-odds

working on:
pawn-f2
queen-for-rook
queen-for-nite
no castling
bishop-odds

https://github.com/ChrisWhittington/Chess-EPDs
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Stockfish Handicap Matches

Post by lkaufman »

chrisw wrote: Mon Jun 22, 2020 9:20 am
lkaufman wrote: Mon Jun 22, 2020 5:13 am
chrisw wrote: Mon Jun 22, 2020 12:42 am
lkaufman wrote: Mon Jun 22, 2020 12:30 am
Rebel wrote: Mon Jun 22, 2020 12:09 am
chrisw wrote: Sun Jun 21, 2020 11:20 pm Done 5600 EPDs off the start position minus b1 knight, played out all four ply combinations, culled all duplicates, culled all positions where SF11 evaluated more than +/-10 centipawns away from 300 centipawns (SF11 average score for all epds), and am now left with 5600 EPDs.

Link: https://github.com/ChrisWhittington/Che ... t-odds.epd

Will upload for no knight at g1 tomorrow am.


I checked several positions and they showed scores of 4 pawns or more down, as does the initial knight odds position, not -300.
You checked the listed epds for variance from -300? Which ones? What was the variance? What SF11 conditions are you using?

If you just want under 20 positions, take off the b1 knight, choose the 3 best White moves by SF11 multiPV, chose the 3 best replies to each of those, and repeat with g1 knight off. 18 positions, totally fair, no silly moves, real knight odds chess! If you want more just choose best 4 or best five for each side.
Edit: Whoops, rechecked, the mean is indeed -293, but I was selecting EPDs based on -340, so you should find the listed EPDs evaluate at around -340 centipawns. Will correct the EPD dump tomorrow am.
OK, -340 is at least within range of what I was seeing. With fixed depth searches in the range of what you were using I get evals like -390 or so, but I think fixed depth omits Contempt while movetime does not, so we're not so far apart if you add in Contempt. You were right to switch from 10 ms to 25; SF is quite weak at 10ms but strong enough at 25, the difference is huge. But I think it would be more useful to have fewer positions but no positions with ridiculous moves played, maybe only including moves in the top ten by multipv at each point for example. It doesn't seem like a simulation of knight odds if you force the players to play moves that no one over 800 rating would even consider.
Uploaded just now the knight-odds epds for positions without b1/g1 knights.

https://github.com/ChrisWhittington/Chess-EPDs and download knight-odds.epd

They're better balanced around the correct mean (-293 centipawns), and restricted to all 4-ply positions where SF11 at 25ms search returns a score of -293 =/- 10 centipawns. About 5000 unique positions in total.

Random subset:

Code: Select all

rnbqkb1r/p1pppppp/7n/1p6/3P4/1P6/P1P1PPPP/RNBQKB1R w KQkq - 0 3
r1bqkbnr/pppp1ppp/n7/4p3/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 0 3
rnbqk1nr/ppppbppp/4p3/8/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 2 3
r1bqkb1r/pppppppp/2n4n/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 3 3
rnbqkbnr/pppppppp/8/8/8/5N1P/PPPPPPP1/R1BQKB1R w KQkq - 1 3
rnbqkb1r/p1pppppp/7n/1p6/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 0 3
r1bqkbnr/ppppp1pp/2n5/5p2/4P3/2P5/PP1P1PPP/RNBQKB1R w KQkq - 0 3
rnbqkb1r/p1pppppp/1p5n/7Q/4P3/8/PPPP1PPP/RNB1KB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/8/1P6/B7/P1PPPPPP/RN1QKB1R w KQkq - 2 3
r1bqkbnr/ppppp1pp/n4p2/8/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3
r1bqkbnr/p1pppppp/np6/P7/8/8/1PPPPPPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/pppp1p1p/4p3/6p1/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/8/8/3P4/PPPBPPPP/RN1QKB1R w KQkq - 2 3
rnbqkb1r/1ppppppp/7n/p7/4P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 0 3
r1bqkbnr/ppppp1pp/n4p2/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3
r1bqkbnr/ppppp1pp/2n2p2/8/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3
1nbqkbnr/rppppppp/8/p7/8/3BP3/PPPP1PPP/RNBQK2R w KQk - 2 3
rnbqkbn1/pppppppr/8/7p/8/2N3P1/PPPPPP1P/R1BQKB1R w KQq - 2 3
r1bqkbnr/ppppp1pp/n4p2/8/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 0 3
rnbq1bnr/pppkpppp/8/3p4/8/4PP2/PPPP2PP/R1BQKBNR w KQ - 1 3
rnbq1bnr/pppkpppp/8/3p4/8/4P1P1/PPPP1P1P/R1BQKBNR w KQ - 1 3
rn1qkbnr/p1pppppp/b7/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 1 3
rnbqkbnr/1ppppp1p/8/p5p1/Q7/2P5/PP1PPPPP/RNB1KB1R w KQkq - 0 3
r1bqkbnr/ppppppp1/n7/4P2p/8/8/PPPP1PPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/p1ppppp1/7p/1P6/8/8/PP1PPPPP/R1BQKBNR w KQkq - 0 3
rnbqkbnr/ppppp2p/5p2/6p1/3P4/4B3/PPP1PPPP/RN1QKB1R w KQkq - 0 3
r1bqkbnr/pppppp1p/n7/6p1/8/1PN5/P1PPPPPP/R1BQKB1R w KQkq - 1 3
r1bqkbnr/p1pppppp/2n5/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 1 3
rnbqkb1r/ppppp1pp/7n/5p2/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3
rnbqkb1r/p1pppppp/7n/1p6/2P5/7P/PP1PPPP1/R1BQKBNR w KQkq - 0 3
rnbqk1nr/pppp1ppp/8/2b1p3/8/P4N2/1PPPPPPP/R1BQKB1R w KQkq - 2 3
It's swings and roundabout on position selection. If we did your method (which I'm not going to because there's a limit as to how much time I'm prepared to put in on this), we'ld get, at four ply 10x10x10x10 = 10000 positions, as opposed to the actual number from brute force which was 170000 unique, with 5000 within +/- 10 centipawns of the mean. So, your 10000 would be guaranteed to contain positions already divergent in score beyond the +/- 10 centipawn window. If we recap the objective, it was to have a good number of test positions (which, for me, includes the possibility to do mass testing of 1000's of games at bullet/blitz) with good variance, no duplicates, no great difference from one knight handicap and as close as feasible to the start position. The selection method is entirely without bias, it doesn't matter HOW the positions were arrived at, just as long as they are close to initial and neither engine gets a head-start in any position. That's done. Arguably perfectly. Engines are then left to fight it out from unbiased, wide, close to root start positions.
Your argument to use 10 wide selections for 4 ply is going to produce fewer positions, many of which are going to be already knight odds plus something way more than 10 centipawns because of the way your method chooses, and therefore, I would argue, less satisfying the original objective.

Anyway, since everything now appears to work, I'll leave this thing running, and produce a few thousand of each - bishop odds, rook odds and queen odds. Can SF11 beat Fruit at queen odds? that's be fun. Personally I doubt it, but we'll see.
I'm afraid it's clear that you made a typo or misread a number. You give the eval for the position with g1 off as -2.93 after 25 ms, which you use for your sample. I ran the same position, and got -3.94. Of course computers aren't all the same speed so some variation is to be expected, but the score only fluctuates a few centiply. It's pretty obvious that when you ran it you got -3.93 but either misread or mistyped the 3 as a 2. So the result is that you have a bunch of positions where Black is playing blunders that lose a pawn or similar positional score to drop from a 3.93 edge to a 2.93 edge. So nowhere near knight odds!. For the b1 off position I got -3.73. Not sure what you got for that position.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Stockfish Handicap Matches

Post by lkaufman »

Rebel wrote: Mon Jun 22, 2020 8:05 am @Larry - Ok, for the moment we do it the classic way with repeating colors. I took the first 200 positions from Chris set, so 400 games, tc=40/10.

Code: Select all

Komodo 10 - Fruit 2.1 83.9% WLD 315 44 41
Komodo 14 - Fruit 2.1 90.4% WLD 347 24 29
This is how you want to measure progress, isn't it?
Yes, but it looks like you are having each engine play the White side once, which is a waste of resources when the engines are far apart in strength. Presumably the stronger engine wins every game with Black a piece up, so it's easy enough to subtract 200 from the wins for the stronger engine to see the results when giving knight odds. I think that the reason that Komodo still beats fruit even with the 200 removed is that there was a flaw in the reference value used (see prior post), so these positions weren't even close to knight odds. With everything done right, I expect Fruit to come out ahead against both Stockfish and Komodo at knight odds.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Stockfish Handicap Matches

Post by lkaufman »

xr_a_y wrote: Mon Jun 22, 2020 10:57 am I tried something this morning at short TC (10+0.1), Minic versus Stockfish with odds. Each time 100 games.
The funny thing is that with knight and color odds Minic won easily, also won quite easily with only knight odds.
But with pawn odd (with or without color) or castling odds, or even g pawn + castling + color odds Minic is badly losing.

What is a good start configuration between knight odds and g pawn + castling + color odds ?
b and g pawn + castling + color odds ?
The best handicap progression in my opinion is : c2; f2; c7; f7; c2 + f2; c7 + f7; Knight (b1 or g1); Knight and move (b8 or g8), Rook a1, Rook and move (a8). Beyond that just repeat the cycle with the queen's rook removed as well; when you get to what would be two rooks, substitute queen odds. The bishop's pawns are the most suitable for handicaps, their removal has the least consequences on development and doesn't create isolated pawns. Historically only the "f" pawn was used, but for a proper progression we need to use "c" pawn as well. The gaps between c7+f7 and knight, and between knight+ move and rook, are a bit wide, it is possible to insert an extra handicap in between (i.e. knight or rook with extra move for White).
Komodo rules!
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Stockfish Handicap Matches

Post by Rebel »

lkaufman wrote: Mon Jun 22, 2020 5:30 pm
Rebel wrote: Mon Jun 22, 2020 8:05 am @Larry - Ok, for the moment we do it the classic way with repeating colors. I took the first 200 positions from Chris set, so 400 games, tc=40/10.

Code: Select all

Komodo 10 - Fruit 2.1 83.9% WLD 315 44 41
Komodo 14 - Fruit 2.1 90.4% WLD 347 24 29
This is how you want to measure progress, isn't it?
Yes, but it looks like you are having each engine play the White side once, which is a waste of resources when the engines are far apart in strength. Presumably the stronger engine wins every game with Black a piece up, so it's easy enough to subtract 200 from the wins for the stronger engine to see the results when giving knight odds. I think that the reason that Komodo still beats fruit even with the 200 removed is that there was a flaw in the reference value used (see prior post), so these positions weren't even close to knight odds. With everything done right, I expect Fruit to come out ahead against both Stockfish and Komodo at knight odds.
1. I don't see the purpose of giving Komodo the advantage of a knight up against Fruit. So only 200 games (Fruit always knight up) now that Ferdy fixed the cute-chess obstacle. But of course you can do it yourself.

2. My guess would be that Fruit would do a lot better at longer time control, 40/60 instead of 40/10, that's not a good time control for the oldies. I will run it now, stay tuned.
90% of coding is debugging, the other 10% is writing bugs.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Stockfish Handicap Matches

Post by lkaufman »

Rebel wrote: Mon Jun 22, 2020 6:35 pm
lkaufman wrote: Mon Jun 22, 2020 5:30 pm
Rebel wrote: Mon Jun 22, 2020 8:05 am @Larry - Ok, for the moment we do it the classic way with repeating colors. I took the first 200 positions from Chris set, so 400 games, tc=40/10.

Code: Select all

Komodo 10 - Fruit 2.1 83.9% WLD 315 44 41
Komodo 14 - Fruit 2.1 90.4% WLD 347 24 29
This is how you want to measure progress, isn't it?
Yes, but it looks like you are having each engine play the White side once, which is a waste of resources when the engines are far apart in strength. Presumably the stronger engine wins every game with Black a piece up, so it's easy enough to subtract 200 from the wins for the stronger engine to see the results when giving knight odds. I think that the reason that Komodo still beats fruit even with the 200 removed is that there was a flaw in the reference value used (see prior post), so these positions weren't even close to knight odds. With everything done right, I expect Fruit to come out ahead against both Stockfish and Komodo at knight odds.
1. I don't see the purpose of giving Komodo the advantage of a knight up against Fruit. So only 200 games (Fruit always knight up) now that Ferdy fixed the cute-chess obstacle. But of course you can do it yourself.

2. My guess would be that Fruit would do a lot better at longer time control, 40/60 instead of 40/10, that's not a good time control for the oldies. I will run it now, stay tuned.
Yes, 1. was my point, it was wasting resources. Yes, the weaker engine always does better with handicap with more time. But the main problem is that the positions are a full pawn off from knight odds. With a corrected set, even at 40/10 I'll bet on Fruit. The difference of 100 centipawns in the initial position is huge.
Komodo rules!