Well, you should have been able to determine from the data linelkaufman wrote: ↑Mon Jun 22, 2020 11:36 pmOkay, I guess my only objection now is that when pruning the final positions, it would be better to prune based on the Stockfish eval (at some reasonable depth, maybe ten seconds or so) of the initial handicap position, rather than the average of a bunch of positions that include many blunders, even queen blunders, mostly by Black, because White is twice as likely to have developed something to take the blundered piece. A piece is worth about four pawns in the opening, according to theory, to Stockfish, to Komodo, etc. So if the average eval is around 3 pawns, it means that on average Black is blundering about a full pawn on average. So they don't have the quality of having been shown to retain the full knight advantage, only about a knight for a pawn on average. This is quite surprising, I wouldn't have anticipated this large a bias from the blunders in only four ply.chrisw wrote: ↑Mon Jun 22, 2020 9:44 pmWell, you thought wrong. Helpful advice when trying to decode other people: I am not you.lkaufman wrote: ↑Mon Jun 22, 2020 9:15 pmI wasn't trying to insult, I thought you had just run the initial position at knight odds to determine the score to center on, and since it was off by a full pawn, it looked like a typo.chrisw wrote: ↑Mon Jun 22, 2020 8:42 pmIt’s a bit cheap throwing out insults about typos or misreading or whatever, when actually you are arguing method (albeit by other means).lkaufman wrote: ↑Mon Jun 22, 2020 5:23 pmchrisw wrote: ↑Mon Jun 22, 2020 9:20 amlkaufman wrote: ↑Mon Jun 22, 2020 5:13 amI'm afraid it's clear that you made a typo or misread a number. You give the eval for the position with g1 off as -2.93 after 25 ms, which you use for your sample. I ran the same position, and got -3.94. Of course computers aren't all the same speed so some variation is to be expected, but the score only fluctuates a few centiply. It's pretty obvious that when you ran it you got -3.93 but either misread or mistyped the 3 as a 2. So the result is that you have a bunch of positions where Black is playing blunders that lose a pawn or similar positional score to drop from a 3.93 edge to a 2.93 edge. So nowhere near knight odds!. For the b1 off position I got -3.73. Not sure what you got for that position.chrisw wrote: ↑Mon Jun 22, 2020 12:42 amUploaded just now the knight-odds epds for positions without b1/g1 knights.lkaufman wrote: ↑Mon Jun 22, 2020 12:30 amOK, -340 is at least within range of what I was seeing. With fixed depth searches in the range of what you were using I get evals like -390 or so, but I think fixed depth omits Contempt while movetime does not, so we're not so far apart if you add in Contempt. You were right to switch from 10 ms to 25; SF is quite weak at 10ms but strong enough at 25, the difference is huge. But I think it would be more useful to have fewer positions but no positions with ridiculous moves played, maybe only including moves in the top ten by multipv at each point for example. It doesn't seem like a simulation of knight odds if you force the players to play moves that no one over 800 rating would even consider.Rebel wrote: ↑Mon Jun 22, 2020 12:09 amYou checked the listed epds for variance from -300? Which ones? What was the variance? What SF11 conditions are you using?chrisw wrote: ↑Sun Jun 21, 2020 11:20 pm Done 5600 EPDs off the start position minus b1 knight, played out all four ply combinations, culled all duplicates, culled all positions where SF11 evaluated more than +/-10 centipawns away from 300 centipawns (SF11 average score for all epds), and am now left with 5600 EPDs.
Link: https://github.com/ChrisWhittington/Che ... t-odds.epd
Will upload for no knight at g1 tomorrow am.
I checked several positions and they showed scores of 4 pawns or more down, as does the initial knight odds position, not -300.
Edit: Whoops, rechecked, the mean is indeed -293, but I was selecting EPDs based on -340, so you should find the listed EPDs evaluate at around -340 centipawns. Will correct the EPD dump tomorrow am.
If you just want under 20 positions, take off the b1 knight, choose the 3 best White moves by SF11 multiPV, chose the 3 best replies to each of those, and repeat with g1 knight off. 18 positions, totally fair, no silly moves, real knight odds chess! If you want more just choose best 4 or best five for each side.
https://github.com/ChrisWhittington/Chess-EPDs and download knight-odds.epd
They're better balanced around the correct mean (-293 centipawns), and restricted to all 4-ply positions where SF11 at 25ms search returns a score of -293 =/- 10 centipawns. About 5000 unique positions in total.
Random subset:
It's swings and roundabout on position selection. If we did your method (which I'm not going to because there's a limit as to how much time I'm prepared to put in on this), we'ld get, at four ply 10x10x10x10 = 10000 positions, as opposed to the actual number from brute force which was 170000 unique, with 5000 within +/- 10 centipawns of the mean. So, your 10000 would be guaranteed to contain positions already divergent in score beyond the +/- 10 centipawn window. If we recap the objective, it was to have a good number of test positions (which, for me, includes the possibility to do mass testing of 1000's of games at bullet/blitz) with good variance, no duplicates, no great difference from one knight handicap and as close as feasible to the start position. The selection method is entirely without bias, it doesn't matter HOW the positions were arrived at, just as long as they are close to initial and neither engine gets a head-start in any position. That's done. Arguably perfectly. Engines are then left to fight it out from unbiased, wide, close to root start positions.Code: Select all
rnbqkb1r/p1pppppp/7n/1p6/3P4/1P6/P1P1PPPP/RNBQKB1R w KQkq - 0 3 r1bqkbnr/pppp1ppp/n7/4p3/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 0 3 rnbqk1nr/ppppbppp/4p3/8/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 2 3 r1bqkb1r/pppppppp/2n4n/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 3 3 rnbqkbnr/pppppppp/8/8/8/5N1P/PPPPPPP1/R1BQKB1R w KQkq - 1 3 rnbqkb1r/p1pppppp/7n/1p6/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 0 3 r1bqkbnr/ppppp1pp/2n5/5p2/4P3/2P5/PP1P1PPP/RNBQKB1R w KQkq - 0 3 rnbqkb1r/p1pppppp/1p5n/7Q/4P3/8/PPPP1PPP/RNB1KB1R w KQkq - 0 3 r1bqkbnr/ppppp1pp/n4p2/8/1P6/B7/P1PPPPPP/RN1QKB1R w KQkq - 2 3 r1bqkbnr/ppppp1pp/n4p2/8/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3 r1bqkbnr/p1pppppp/np6/P7/8/8/1PPPPPPP/R1BQKBNR w KQkq - 0 3 rnbqkbnr/pppp1p1p/4p3/6p1/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3 r1bqkbnr/ppppp1pp/n4p2/8/8/3P4/PPPBPPPP/RN1QKB1R w KQkq - 2 3 rnbqkb1r/1ppppppp/7n/p7/4P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 0 3 r1bqkbnr/ppppp1pp/n4p2/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3 r1bqkbnr/ppppp1pp/2n2p2/8/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3 1nbqkbnr/rppppppp/8/p7/8/3BP3/PPPP1PPP/RNBQK2R w KQk - 2 3 rnbqkbn1/pppppppr/8/7p/8/2N3P1/PPPPPP1P/R1BQKB1R w KQq - 2 3 r1bqkbnr/ppppp1pp/n4p2/8/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 0 3 rnbq1bnr/pppkpppp/8/3p4/8/4PP2/PPPP2PP/R1BQKBNR w KQ - 1 3 rnbq1bnr/pppkpppp/8/3p4/8/4P1P1/PPPP1P1P/R1BQKBNR w KQ - 1 3 rn1qkbnr/p1pppppp/b7/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 1 3 rnbqkbnr/1ppppp1p/8/p5p1/Q7/2P5/PP1PPPPP/RNB1KB1R w KQkq - 0 3 r1bqkbnr/ppppppp1/n7/4P2p/8/8/PPPP1PPP/R1BQKBNR w KQkq - 0 3 rnbqkbnr/p1ppppp1/7p/1P6/8/8/PP1PPPPP/R1BQKBNR w KQkq - 0 3 rnbqkbnr/ppppp2p/5p2/6p1/3P4/4B3/PPP1PPPP/RN1QKB1R w KQkq - 0 3 r1bqkbnr/pppppp1p/n7/6p1/8/1PN5/P1PPPPPP/R1BQKB1R w KQkq - 1 3 r1bqkbnr/p1pppppp/2n5/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 1 3 rnbqkb1r/ppppp1pp/7n/5p2/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3 rnbqkb1r/p1pppppp/7n/1p6/2P5/7P/PP1PPPP1/R1BQKBNR w KQkq - 0 3 rnbqk1nr/pppp1ppp/8/2b1p3/8/P4N2/1PPPPPPP/R1BQKB1R w KQkq - 2 3
Your argument to use 10 wide selections for 4 ply is going to produce fewer positions, many of which are going to be already knight odds plus something way more than 10 centipawns because of the way your method chooses, and therefore, I would argue, less satisfying the original objective.
Anyway, since everything now appears to work, I'll leave this thing running, and produce a few thousand of each - bishop odds, rook odds and queen odds. Can SF11 beat Fruit at queen odds? that's be fun. Personally I doubt it, but we'll see.
The figures back from SF are accurate, I just rechecked them. No typos, no misreading.
I guess you decide knights odds games are 3.73 on basis you put the start position into SF and asked for a score? Sure, SF will find the supposed best line and evaluate it.
I’m doing something different, I am asking SF to evaluate every single position that arises from the start position after four moves. Several tens of thousands of positions where each side has had the same move opportunities (two moves each) to made boobies or brilliancies. The net effect of all these thousands of moves is to generate thousands of positions, each then evaluated by SF, with a mean eval of about -3.00 pawns. To be fair to both black and white, I then took everything that centred on that -300 centipawns, about 10% or so of the total.
You’re saying that’s wrong because according to SF at the root position, knights odds = -3.87. And because nobody would play the 4-ply move sequences. Well, so what? You lost sight of the objective. Generate a large unbiased set of positions to evaluate how different engines get on with “knights odds”. Generate positions close to the root. Generate positions where material and a further SF search show that neither sides chances changed much from their chances at root zero.
Well, that’s done. Thanks me very much. It’s a pleasure. No problem at all. Have fun with them.
Just for fun, not really wasting my time reproofing things, the SF11 500ms evaluations of the first 100 positions in the knights odds suites, below.
For that subset, Mean = -320.2, st deviation = 29.5. Looks like a fine result from here.
Code: Select all
Loading epds knight-odds.epd SF11, default conditions, movetime 500 rnbqkb1r/p1pppppp/7n/1p6/3P4/1P6/P1P1PPPP/RNBQKB1R w KQkq - 0 3 eval=-314 nodes=835263 r1bqkbnr/pppp1ppp/n7/4p3/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 0 3 eval=-284 nodes=843868 rnbqk1nr/ppppbppp/4p3/8/3P4/5N2/PPP1PPPP/R1BQKB1R w KQkq - 2 3 eval=-376 nodes=948424 r1bqkb1r/pppppppp/2n4n/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 3 3 eval=-364 nodes=911584 rnbqkbnr/pppppppp/8/8/8/5N1P/PPPPPPP1/R1BQKB1R w KQkq - 1 3 eval=-356 nodes=924862 rnbqkb1r/p1pppppp/7n/1p6/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 0 3 eval=-248 nodes=877730 r1bqkbnr/ppppp1pp/2n5/5p2/4P3/2P5/PP1P1PPP/RNBQKB1R w KQkq - 0 3 eval=-311 nodes=924917 rnbqkb1r/p1pppppp/1p5n/7Q/4P3/8/PPPP1PPP/RNB1KB1R w KQkq - 0 3 eval=-354 nodes=888675 r1bqkbnr/ppppp1pp/n4p2/8/1P6/B7/P1PPPPPP/RN1QKB1R w KQkq - 2 3 eval=-344 nodes=892030 r1bqkbnr/ppppp1pp/n4p2/8/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3 eval=-336 nodes=861229 r1bqkbnr/p1pppppp/np6/P7/8/8/1PPPPPPP/R1BQKBNR w KQkq - 0 3 eval=-352 nodes=862023 rnbqkbnr/pppp1p1p/4p3/6p1/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3 eval=-253 nodes=904501 r1bqkbnr/ppppp1pp/n4p2/8/8/3P4/PPPBPPPP/RN1QKB1R w KQkq - 2 3 eval=-297 nodes=880860 rnbqkb1r/1ppppppp/7n/p7/4P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 0 3 eval=-321 nodes=909321 r1bqkbnr/ppppp1pp/n4p2/1B6/4P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3 eval=-284 nodes=954091 r1bqkbnr/ppppp1pp/2n2p2/8/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-328 nodes=881852 1nbqkbnr/rppppppp/8/p7/8/3BP3/PPPP1PPP/RNBQK2R w KQk - 2 3 eval=-384 nodes=950046 rnbqkbn1/pppppppr/8/7p/8/2N3P1/PPPPPP1P/R1BQKB1R w KQq - 2 3 eval=-325 nodes=882350 r1bqkbnr/ppppp1pp/n4p2/8/4P3/6P1/PPPP1P1P/R1BQKBNR w KQkq - 0 3 eval=-307 nodes=906494 rnbq1bnr/pppkpppp/8/3p4/8/4PP2/PPPP2PP/R1BQKBNR w KQ - 1 3 eval=-300 nodes=954439 rnbq1bnr/pppkpppp/8/3p4/8/4P1P1/PPPP1P1P/R1BQKBNR w KQ - 1 3 eval=-311 nodes=932958 rn1qkbnr/p1pppppp/b7/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 1 3 eval=-317 nodes=961294 rnbqkbnr/1ppppp1p/8/p5p1/Q7/2P5/PP1PPPPP/RNB1KB1R w KQkq - 0 3 eval=-272 nodes=917542 r1bqkbnr/ppppppp1/n7/4P2p/8/8/PPPP1PPP/R1BQKBNR w KQkq - 0 3 eval=-318 nodes=928039 rnbqkbnr/p1ppppp1/7p/1P6/8/8/PP1PPPPP/R1BQKBNR w KQkq - 0 3 eval=-350 nodes=920456 rnbqkbnr/ppppp2p/5p2/6p1/3P4/4B3/PPP1PPPP/RN1QKB1R w KQkq - 0 3 eval=-300 nodes=894388 r1bqkbnr/pppppp1p/n7/6p1/8/1PN5/P1PPPPPP/R1BQKB1R w KQkq - 1 3 eval=-302 nodes=915277 r1bqkbnr/p1pppppp/2n5/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 1 3 eval=-357 nodes=907541 rnbqkb1r/ppppp1pp/7n/5p2/7P/3P4/PPP1PPP1/RNBQKB1R w KQkq - 0 3 eval=-304 nodes=996995 rnbqkb1r/p1pppppp/7n/1p6/2P5/7P/PP1PPPP1/R1BQKBNR w KQkq - 0 3 eval=-354 nodes=895685 rnbqk1nr/pppp1ppp/8/2b1p3/8/P4N2/1PPPPPPP/R1BQKB1R w KQkq - 2 3 eval=-376 nodes=904702 r1bqkbnr/p1pppppp/n7/1p6/1P2P3/8/P1PP1PPP/R1BQKBNR w KQkq - 0 3 eval=-308 nodes=907345 r1bqkbnr/ppppppp1/n6B/8/3P4/8/PPP1PPPP/R2QKBNR w KQkq - 1 3 eval=-299 nodes=903610 r1bqkbnr/ppppp1pp/2n2p2/8/3P4/P7/1PP1PPPP/RNBQKB1R w KQkq - 0 3 eval=-330 nodes=885982 rnbqkbnr/p1ppp1pp/5p2/1p6/1P3P2/8/P1PPP1PP/R1BQKBNR w KQkq - 0 3 eval=-331 nodes=899112 rnbqkbnr/2pppppp/8/pp6/3P4/5P2/PPP1P1PP/RNBQKB1R w KQkq - 0 3 eval=-334 nodes=923006 r1bqkbnr/pppp1ppp/n7/4p3/3P3P/8/PPP1PPP1/R1BQKBNR w KQkq - 1 3 eval=-333 nodes=915197 rnbqkbnr/p1ppppp1/8/1p5p/8/P1N5/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-328 nodes=894117 r1bqkbnr/1ppppppp/n7/p7/4P3/3P4/PPP2PPP/R1BQKBNR w KQkq - 1 3 eval=-338 nodes=941629 rnbqkbnr/p1pppp1p/6p1/1p6/4P3/P7/1PPP1PPP/R1BQKBNR w KQkq - 0 3 eval=-312 nodes=899001 r1bqkbnr/pppppp1p/n7/6p1/2P5/5P2/PP1PP1PP/RNBQKB1R w KQkq - 1 3 eval=-349 nodes=911472 rnbqkb1r/pppppp1p/5n2/6p1/8/3P3P/PPP1PPP1/R1BQKBNR w KQkq - 0 3 eval=-315 nodes=919954 1nbqkbnr/rppppppp/8/p7/4P3/8/PPPPQPPP/RNB1KB1R w KQk - 2 3 eval=-328 nodes=929187 rnbqkbn1/pppppppr/8/7p/8/3P2P1/PPP1PP1P/R1BQKBNR w KQq - 1 3 eval=-335 nodes=927437 r1bqkbnr/pppppp1p/2n5/6p1/3P4/8/PPP1PPPP/RNBQKBR1 w Qkq - 1 3 eval=-276 nodes=894955 rnbqkb1r/p1pppppp/7n/1p6/2P5/P7/1P1PPPPP/R1BQKBNR w KQkq - 1 3 eval=-293 nodes=895381 rnbqkb1r/pppppp1p/7n/6p1/4P3/5Q2/PPPP1PPP/R1B1KBNR w KQkq - 2 3 eval=-274 nodes=910466 rnbqkbr1/pppppppp/7n/8/8/3P3P/PPP1PPP1/RNBQKB1R w KQq - 1 3 eval=-321 nodes=907502 rnbqkbnr/ppp1pp1p/8/3p2p1/7P/8/PPPPPPPR/R1BQKBN1 w Qkq - 0 3 eval=-300 nodes=917633 r1bqkbnr/p1pppppp/1pn5/8/4P2P/8/PPPP1PP1/R1BQKBNR w KQkq - 1 3 eval=-326 nodes=899245 rnbq1bnr/pppkpppp/3p4/8/8/P7/1PPPPPPP/RNBQKBR1 w Q - 2 3 eval=-318 nodes=979196 rnbq1bnr/pppppkpp/5p2/8/8/4P2P/PPPP1PP1/R1BQKBNR w KQ - 1 3 eval=-303 nodes=985615 rnbqkbnr/p1ppp1pp/5p2/1p6/3P4/5P2/PPP1P1PP/R1BQKBNR w KQkq - 0 3 eval=-290 nodes=901600 rnbq1bnr/pppkpppp/3p4/8/8/3P4/PPPQPPPP/RNB1KB1R w KQ - 2 3 eval=-319 nodes=935623 r1bqkbnr/p1pppppp/2n5/1p6/4P3/3B4/PPPP1PPP/R1BQK1NR w KQkq - 2 3 eval=-285 nodes=910467 rnbqkbnr/pp1ppp1p/2p5/6p1/P7/3P4/1PP1PPPP/R1BQKBNR w KQkq - 0 3 eval=-295 nodes=929138 r1bqkbnr/p1pppppp/np6/8/4P2P/8/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-308 nodes=917646 rnbqkbnr/ppppp1p1/5p1p/8/8/2P2P2/PP1PP1PP/R1BQKBNR w KQkq - 0 3 eval=-314 nodes=936621 rnbqkb1r/pppppppp/8/8/3P2n1/2N5/PPP1PPPP/R1BQKB1R w KQkq - 3 3 eval=-309 nodes=906583 r1bqkbnr/pppppp1p/n7/6p1/8/P6P/1PPPPPP1/RNBQKB1R w KQkq - 1 3 eval=-306 nodes=890932 1nbqkbnr/rppppppp/p7/8/4P3/7P/PPPP1PP1/RNBQKB1R w KQk - 1 3 eval=-341 nodes=931251 1nbqkbnr/rppppppp/8/p7/8/P4N2/1PPPPPPP/R1BQKB1R w KQk - 1 3 eval=-357 nodes=929376 rn1qkbnr/p1pppppp/b7/1p2P3/8/8/PPPP1PPP/RNBQKB1R w KQkq - 1 3 eval=-331 nodes=922525 rnbqkbnr/ppppp1p1/5p1p/8/7P/8/PPPPPPP1/1RBQKBNR w Kkq - 0 3 eval=-295 nodes=950421 rnbqkbnr/ppp1pp1p/3p4/6p1/8/P6N/1PPPPPPP/R1BQKB1R w KQkq - 0 3 eval=-316 nodes=914793 rnbqkb1r/pppp1ppp/5n2/4p3/5P2/2P5/PP1PP1PP/RNBQKB1R w KQkq - 0 3 eval=-412 nodes=919575 rnbqkb1r/pppp1ppp/4p2n/8/3P4/2P5/PP2PPPP/R1BQKBNR w KQkq - 1 3 eval=-349 nodes=903832 rnbqkb1r/pppp1ppp/7n/4p2Q/8/4P3/PPPP1PPP/RNB1KB1R w KQkq - 2 3 eval=-268 nodes=933657 rnbqkb1r/1ppppppp/p6n/8/3P4/2P5/PP2PPPP/RNBQKB1R w KQkq - 0 3 eval=-330 nodes=954821 r1bqkbnr/pppp1ppp/n7/4p3/3P4/7P/PPP1PPP1/R1BQKBNR w KQkq - 0 3 eval=-307 nodes=925412 rnbqkbnr/p1pppppp/8/1p6/1P6/4P3/P1PP1PPP/RNBQKB1R w KQkq - 0 3 eval=-348 nodes=922373 rnbqkb1r/ppppppp1/7n/7p/2P5/7P/PP1PPPP1/RNBQKB1R w KQkq - 1 3 eval=-331 nodes=893674 rnbqkb1r/pp1ppppp/2p4n/8/2P5/2N5/PP1PPPPP/R1BQKB1R w KQkq - 2 3 eval=-348 nodes=936309 rnbqk1nr/pppppp1p/7b/6p1/3P4/3Q4/PPP1PPPP/R1B1KBNR w KQkq - 2 3 eval=-314 nodes=895006 r1bqkbnr/pppppppp/8/8/1n2P3/5P2/PPPP2PP/RNBQKB1R w KQkq - 1 3 eval=-351 nodes=911614 rnbq1bnr/pppkpppp/3p4/8/8/3P1P2/PPP1P1PP/R1BQKBNR w KQ - 1 3 eval=-295 nodes=977138 rnbqkbnr/1ppppp1p/8/p5p1/5P2/7N/PPPPP1PP/R1BQKB1R w KQkq - 0 3 eval=-303 nodes=933937 r1bqkb1r/pppppppp/n6n/8/2P5/5N2/PP1PPPPP/R1BQKB1R w KQkq - 3 3 eval=-325 nodes=911263 rnbqkb1r/ppppp1pp/7n/5p2/3P4/8/PPPNPPPP/R1BQKB1R w KQkq - 0 3 eval=-321 nodes=930623 rnbqkb1r/p1pppppp/7n/1p6/P7/8/1PPPPPPP/R1BQKBNR w KQkq - 1 3 eval=-281 nodes=895926 rnbqkb1r/p1pppppp/7n/1p6/8/1QP5/PP1PPPPP/RNB1KB1R w KQkq - 2 3 eval=-335 nodes=904891 1nbqkbnr/1ppppppp/r7/p7/1PP5/8/P2PPPPP/RNBQKB1R w KQk - 1 3 eval=-356 nodes=881990 rnbqkbnr/ppp2ppp/8/3pP3/8/8/PPP1PPPP/RNBQKB1R w KQkq d6 0 3 eval=-342 nodes=945740 rnbqkbnr/1pppp1pp/p4p2/8/3P4/8/PPP1PPPP/RNBQKB1R w KQkq - 0 3 eval=-347 nodes=921882 rnbqkb1r/pppppp1p/5n2/6p1/3P1P2/8/PPP1P1PP/RNBQKB1R w KQkq - 1 3 eval=-292 nodes=891075 rnb1kbnr/pppqpppp/8/3p4/4P3/2N5/PPPP1PPP/R1BQKB1R w KQkq - 2 3 eval=-292 nodes=900883 rnbq1bnr/pppppkpp/5p2/8/8/3P3N/PPP1PPPP/R1BQKB1R w KQ - 2 3 eval=-336 nodes=966960 r1bqkbnr/pppppp1p/2n5/6p1/7P/4P3/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-302 nodes=899886 rnbqkbnr/pppppppp/8/8/4P3/7P/PPPP1PP1/RNBQKB1R w KQkq - 1 3 eval=-324 nodes=924363 r1bqkbnr/pppppp1p/n7/6p1/8/N2P4/PPP1PPPP/R1BQKB1R w KQkq - 0 3 eval=-279 nodes=929880 rnbqkbnr/ppppp1p1/5p1p/8/P7/1P6/2PPPPPP/RNBQKB1R w KQkq - 0 3 eval=-314 nodes=932199 rnbqkbnr/ppp1pppp/8/3p4/3PP3/8/PPP2PPP/RNBQKB1R w KQkq - 0 3 eval=-351 nodes=931440 rnbqkbr1/pppppppp/7n/8/1P6/6P1/P1PPPP1P/RNBQKB1R w KQq - 1 3 eval=-327 nodes=922162 rnbq1bnr/pppkpppp/3p4/8/8/1PP5/P2PPPPP/R1BQKBNR w KQ - 1 3 eval=-257 nodes=952343 rnbqkbnr/p1ppp1pp/5p2/1p6/8/2PP4/PP2PPPP/R1BQKBNR w KQkq - 0 3 eval=-273 nodes=929893 rnbqkbnr/pppppppp/8/8/3P4/7P/PPP1PPP1/RNBQKB1R w KQkq - 1 3 eval=-343 nodes=924330 rnbqkbr1/pppppppp/5n2/8/1P2P3/8/P1PP1PPP/R1BQKBNR w KQq - 1 3 eval=-376 nodes=936465 rnbqkbnr/pppppp2/8/6pp/8/2P5/PPQPPPPP/R1B1KBNR w KQkq - 0 3 eval=-304 nodes=921110 rnbq1bnr/pppppkpp/5p2/8/8/6PN/PPPPPP1P/R1BQKB1R w KQ - 2 3 eval=-300 nodes=945856 r1bqkbnr/ppppp1pp/n4p2/8/8/2P2P2/PP1PP1PP/R1BQKBNR w KQkq - 0 3 eval=-348 nodes=949560 r1bqkbnr/ppppppp1/n7/7p/2B1P3/8/PPPP1PPP/RNBQK2R w KQkq - 2 3 eval=-341 nodes=917983 SF mean eval -320.2 epds=101 numpy.mean=-320.1782178217822 st dev=29.494888310142848
This is getting silly. f4 e5 g4 etc. Each side has two possibilities to make good or bad positional move choices. It should all average out. a3 b6 Nh3 a5 etc. The important feature is that all the positions are of roughly equal chances for being a knight down, we did that by culling the eval outliers and centering on the mean. Then they can form a coherent testing suite.
I think what is happening is that Black is far more likely to blunder on move 2 than White is, since White will have an extra unit developed, so the average position includes many where Black has blundered a pawn on his second move, thus reducing White's score deficit by a pawn. Since White can't blunder anything on ply 1, it would have been about fair if you did this for three plies rather than four, since each side would have one chance to blunder after the opponent had made a move.
Oh, thank you very belatedly much.So as it is the positions are a valid set of handicap positions,
they are a mass of positions where white has one knight less, very close to the start position, and without either side being able (according to SF11 search proof) to press an immediate advantage. As such, they form a fine, unbiased and as balanced as the algorithm, using SF proof can be, set of thousands of positions to test knight advantage when given to a 'lesser' engine. That was the original idea. Make unbiased large test suite with as much play in it as possible to test how engines have developed over the years (for fun, btw, this is not a university research department into something important).but they are not on average close to knight odds,
Shrugs. So what? It's a test suite of one knight down, varied close to root positions, proofed by SF11 to not give any immediate advantage save one knight to either side. They don't have to have any other quality than that.maybe something like knight minus half a pawn or so.
For that subset, Mean = -320.2, st deviation = 29.5
that only a very small fraction of all the positions would be less than -380 and those are mostly going to be where white got lucky and played randomly sensible and black did the opposite. Then you'ld be complaining of another form of bias, namely, positions chosen because they were way better for white. I guess there's no pleasing some people. But at least I have absolutely zero reason to be biasing or complaining about the data and every reason to try to make the test suite data as fair as possible for everybody. That's why it contains only positions that are around the average evaluation found from creating the full dataset of, I forget, 80,000 full width positions at 4-ply, and then culling them down to a few thousand. You don't like it? Tough. Go make your own dataset. My one is just fine and dandy and fit for purpose, sorry if that isn't your purpose.
There are no blunder positions in the data set, they've all been proofed for that by SF11, and your endless repetition of the lie word 'blunder' doesn't make it true, it just makes it offensive. Have a nice day.