5ply_v1.epd seems very prone to improvement. Although itself it doesn't come above 2moves_v1.epd, after some experiments, I managed to significantly improve on it, and the resulting file shows almost significantly better Normalized ELO than other large suites. I used your idea to use absolute value of the eval (openings are anyway screwed up), and it looks like that:Dariusz Orzechowski wrote:This is by design. I filtered out dead even positions around 0.00. My goal was to have mostly around 0.3-0.8 out of the book. In unbalanced set I mentioned above there should be mostly 0.8-1.3 (I'm not sure I remember correctly the upper limit now, it's maybe a bit higher like 1.5).Laskos wrote:Do you have any idea why the shape of the eval is so curious in 5ply_v1.epd case? For building unbalanced suite from this, I cut it to [-1.1,-0.7] and [0.7,1.1] intervals for the eval. When people will start using unbalanced positions, your dataset will be one of the best to optimize for them.
I tried to find a sweet spot in eval with regard to Normalized ELO, but it's not that easy, there are two competing effects on Normalized ELO: higher ELO difference and higher Draw rate. Interesting thing that some sort of sweet spot came in abs(eval) range [0.50,0.60] or roughly in 40%-60% percentile counts of your 5ply_v1.epd file, around the median. You seems to have chosen well thought median in this suite. It shows almost significantly higher sensitivity in this range.
Stockfish self-games 6s vs 3s:
5ply_v1_40_50.epd (12664)
Score of SF2 vs SF1: 2140 - 247 - 1613 [0.737] 4000
ELO difference: 178.67 +/- 8.46
Finished match
Normalized ELO: 0.775 +/- 0.031
3moves_Elo2200.epd (6533)
Score of SF2 vs SF1: 1966 - 211 - 1823 [0.719] 4000
ELO difference: 163.53 +/- 7.90
Finished match
Normalized ELO: 0.740 +/- 0.031
2moves_v1.epd (40455)
Score of SF2 vs SF1: 2094 - 267 - 1639 [0.728] 4000
ELO difference: 171.35 +/- 8.40
Finished match
Normalized ELO: 0.739 +/- 0.031
3moves_Elo2200_epxerimental.epd (5848)
Score of SF2 vs SF1: 1974 - 219 - 1807 [0.719] 4000
ELO difference: 163.53 +/- 7.94
Finished match
Normalized ELO: 0.736 +/- 0.031
2moves_v1_experimental.epd (9564)
Score of SF2 vs SF1: 2066 - 263 - 1671 [0.725] 4000
ELO difference: 168.73 +/- 8.31
Finished match
Normalized ELO: 0.732 +/- 0.031
Also, by playing games, I managed to build a testsuite tuned for Stockfish, which has by far the highest sensitivity, ELO, Normalized ELO with Stockfish, but has only 1560 positions and is not necessary good for other engines. I plays normal openings.
3moves_Elo2200_Stockfish.epd (1560)
Score of SF2 vs SF1: 2199 - 173 - 1628 [0.753] 4000
ELO difference: 193.87 +/- 8.39
Finished match
Normalized ELO: 0.873 +/- 0.031
I uploaded 5ply_v1_40_50.epd and 3moves_Elo2200_Stockfish.epd here:
http://s000.tinyupload.com/?file_id=664 ... 3458846938