I tested overnight (these tests take a lot of time) at more respectable time control 15''+ 0.15'' versus 7.5''+ 0.075'' (close to Stockfish testing STC) the sensitivity (SNR) of 2moves_v1.epd and 3moves_Elo2200.epd. 2000 games are not enough, but I wanted to have a picture at longer time control. I got very similar result to yours:Dariusz Orzechowski wrote:AlphaGo was just an example that we don't really know what "reasonable" means. AG plays inhuman moves being extremely strong at the same time. If we could prove that "reasonable" book is better and provide some definition of "reasonability", we could create better book. Problem is I have no idea how to do this. "Reasonable" book is obviously better for tournament play but not neccessarily for engine development.Laskos wrote:AlphaGo was not trained on random openings. Stockfish is literally trained on random 2-movers, which distorts its opening play for some 10 moves.
You certainly achieved this goal. But the question now is if using "crazy" positions in development book has any harmful effect on the playing strength. I don't know how to measure it.Laskos wrote:My goal was to create a suite containing many openings, to have sensitivity on par (or better) than 2moves_v1.epd, and to contain human over ELO 2200 moves. Humans at that level are not that crazy to play often random moves or very weak moves.
Code: Select all
2moves_v1.epd:
Score of SF2 vs SF1: 818 - 143 - 1039 [0.669] 2000
ELO difference: 122.04 +/- 10.39
Finished match
3moves_Elo2200.epd:
Score of SF2 vs SF1: 753 - 96 - 1151 [0.664] 2000
ELO difference: 118.53 +/- 9.59
Finished match
Code: Select all
2moves_v1.epd: 0.5574 +/- 0.0438
3moves_Elo2200.epd: 0.5838 +/- 0.0438
Your 5ply_v1.epd openings are excellent in building unbalanced early opening positions and using pentanomial variance for LLR, which is the future of testing engines (see here http://www.talkchess.com/forum/viewtopic.php?t=61245 ). And the requirement that openings are "reasonable" becomes irrelevant, computer Chess itself will become "unreasonable" because of very high draw rates with normal openings. When the testers (like Stockfish Framework) will get to draw rates above 80-85% with balanced openings, or in Bayeselo terminology, eloDraw above 500 or so, the optimum openings are defined by eloBias = eloDraw, with resulting draw rate of 50%. The improvement in number of necessary games to the same LOS or SPRT stop will be from ~4 for draw rate of 85% to an order of magnitude for above 90% draw rate compared to balanced positions. This sort of new testing with unbalanced positions already happened in Checkers more than a decade ago.
I analyzed with Stockfish your 5ply_v1.epd file and uploaded the openings with 70cp-110cp unbalance, suited for wide range of draw rates above 80% (or eloBias above 500 or so).
http://s000.tinyupload.com/?file_id=079 ... 7904268641
This 5ply_v1_unbalanced.epd contains 15083 unique unbalanced positions (70cp-110cp) for future testing, when Cutechess-Cli or similar tool will use pentanomial variance, and draw rates with balanced openings will become above 80% or so. It is not so far in the future, at least for Stockfish. The openings are skewed towards lower values in the [70,110] cp unbalance interval.