For your huge suite I used 100ms time Stockfish dev per position, which is probably higher than ply 12. I don't agree that we should use abs(eval), white and black are asymmetric in normal play. I also used asymmetric bounds when optimizing. Again, I want to emphasize in my suites normal play in the opening. Anyway, it's very hard to have representativity of the normal play and enough diversity for tens of thousands of games with very sensitive (very short) books.
Opening testing suites efficiency
Moderators: hgm, Harvey Williamson, bob
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Re: Opening testing suites efficiency

 Posts: 42
 Joined: Thu May 02, 2013 3:23 pm
Re: Opening testing suites efficiency
Suggestions of some minor corrections, hope you don't mind:Michel wrote:I posted a small document about "normalized elo"
http://hardy.uhasselt.be/Toga/normalized_elo.pdf
explaining why it is the correct quantity for comparing different books.
1. "Φ is a normal distribution with unit variance and zero mean"  I would write "(...) cumulative distribution function of a normal distribution (...)".
2. In remark 2.3 typo "sensity" instead of "sensitivity".
3. Under formula with RMS I would write for completeness what that abbreviation means.
Re: Opening testing suites efficiency
Thanks! I noticed it too but I could not fix it as I was away from my computer. Quite frustrating!It looks like the first formula on the second page (unfortunately it's not numbered) is wrong. Sigma_0 should be
sqrt((w + 1/4*d)  s^2)
instead of
sqrt((w + 1/4*d)^2  s^2)
I have now posted a new version. Thanks for the other comments as well.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
Re: Opening testing suites efficiency
A very useful file, thanks!Dariusz Orzechowski wrote:I'm uploading my unbalanced set of 5ply openings that I got as a "byproduct" from my work. Maybe it could be useful. It contains over 127k positions, most of them should be in interesting range 70110 although it is very crude so I would expect that completely lopsided positions could also have slipped in there. I tried to work on 5ply book last year but stopped due to lack of time (and ideas).Laskos wrote:Your 5ply_v1.epd openings are excellent in building unbalanced early opening positions and using pentanomial variance for LLR, which is the future of testing engines (see here http://www.talkchess.com/forum/viewtopic.php?t=61245 ). And the requirement that openings are "reasonable" becomes irrelevant, computer Chess itself will become "unreasonable" because of very high draw rates with normal openings. When the testers (like Stockfish Framework) will get to draw rates above 8085% with balanced openings, or in Bayeselo terminology, eloDraw above 500 or so, the optimum openings are defined by eloBias = eloDraw, with resulting draw rate of 50%. The improvement in number of necessary games to the same LOS or SPRT stop will be from ~4 for draw rate of 85% to an order of magnitude for above 90% draw rate compared to balanced positions. This sort of new testing with unbalanced positions already happened in Checkers more than a decade ago.
I analyzed with Stockfish your 5ply_v1.epd file and uploaded the openings with 70cp110cp unbalance, suited for wide range of draw rates above 80% (or eloBias above 500 or so).
http://s000.tinyupload.com/?file_id=079 ... 7904268641
This 5ply_v1_unbalanced.epd contains 15083 unique unbalanced positions (70cp110cp) for future testing, when CutechessCli or similar tool will use pentanomial variance, and draw rates with balanced openings will become above 80% or so. It is not so far in the future, at least for Stockfish. The openings are skewed towards lower values in the [70,110] cp unbalance interval.
5ply_unbalanced_127k.epd book (link expires in 7 days): http://dropmefiles.com/VDUNG
Very nice! I you could also add formula for pentanomial case it would be great to have it in one place.Michel wrote:I posted a small document about "normalized elo"
http://hardy.uhasselt.be/Toga/normalized_elo.pdf
explaining why it is the correct quantity for comparing different books.
I analyzed a bit this file of unbalanced 5pliers, the eval of Stockfish on them looks like that:
Although it looks a bit strange (what an engine should do in 5ply games to have significant number of positions with 1000cp disadvantage?), it is the most useful file to use for practically needed unbalances in 70cp180cp range.
To exemplify the power of these kinds of unbalanced openings in the future, I will show you much more drawish game of Endgame Chess, probably very related to Chess. Balanced in eval even early endgames are very drawish. Engines play them well, as in the future they will probably play the whole game. To anticipate a bit (maybe by 10 years), I used both balanced and unbalanced "opening" suites from early endgames. To roughly achieve 50% draw rate (optimal for sensitivity using pentanomial) with unbalanced endgames, the unbalance has to be set at 100cp140cp, higher than normal openings unbalance of 70cp110cp. Here are the match results of SF dev at 10''+ 0.1'' versus same SF dev at 5''+ 0.05'' (again doubling in time) in 1000 games each:
Balanced:
Endgames unbalance 0cp40cp:
Score of SF2 vs SF1: 28  8  964 [0.510] 1000
ELO difference: 6.95 +/ 4.06
Normalized ELO (trinomial): 0.106
Very high, 96% draw rate, which is bothering.
Unbalanced:
Endgame unbalance 100cp140cp:
Score of SF2 vs SF1: 299  187  514 [0.556] 1000
ELO difference: 39.08 +/ 14.99
Normalized ELO (trinomial): 0.162
The unbalance of 100cp140cp was chosen well, the draw rate is close to 50% optimum now. Even using only trinomial computation, sensitivity increased with unbalanced openings, but not dramatically.
But let's analyze unbalanced results from resulting PGN file. We have 1000 games in 500 pairs of color and color reversed games (this is essential). As these have unbalanced openings, they are highly correlated (balanced openings give only weak correlation, and trinomial can be pretty safely used). We are interested in pentanomial distribution of these pairs in scores out of 2 points as (0,1/2,1,3/2,2). SF2 scores in these pairs of color reversed games (out of 2 points):
Code: Select all
Score Counts

0: 0
0.5: 15
1.0: 354
1.5: 126
2.0: 2

Unbalanced:
Normalized ELO (pentanomial): 0.315
We see that using pentanomial and unbalanced openings we get a factor of 3 higher value of Normalized ELO compared to balanced openings. This means 9 times less games to the same LOS and SPRT stop. This will be the future of chess when draw rates will routinely be above 80%.

 Posts: 42
 Joined: Thu May 02, 2013 3:23 pm
Re: Opening testing suites efficiency
It's enough to loose a queen. But no engine actually played those 5 plies. Book is created "backwards" by taking all possible 5ply positions (perft 5) and then filtering out duplicates, mates on board, completely lopsided positions and so on. My tools are not perfect as you can see but I was not concerned because it was not supposed to be end product (at least not yet). I just keep things that I might use in the future so here we have this set of openings. I'm glad you like it, maybe something interesting will come out of this.Laskos wrote:A very useful file, thanks!
I analyzed a bit this file of unbalanced 5pliers, the eval of Stockfish on them looks like that:
Although it looks a bit strange (what an engine should do in 5ply games to have significant number of positions with 1000cp disadvantage?), it is the most useful file to use for practically needed unbalances in 70cp180cp range.

 Posts: 42
 Joined: Thu May 02, 2013 3:23 pm
Re: Opening testing suites efficiency
I have trouble reproducing this result. First, your numbers don't sum up to 500. But that's a minor issue. When I calculate normalized elo with those numbers I get result 0.466, not 0.315 and I cannot find any error in my calculations. Strange.Laskos wrote:But let's analyze unbalanced results from resulting PGN file. We have 1000 games in 500 pairs of color and color reversed games (this is essential). As these have unbalanced openings, they are highly correlated (balanced openings give only weak correlation, and trinomial can be pretty safely used). We are interested in pentanomial distribution of these pairs in scores out of 2 points as (0,1/2,1,3/2,2). SF2 scores in these pairs of color reversed games (out of 2 points):As you can see, the CutechessCli output in trinomial SF2 vs SF1: +299 187 =514 seems fairly balanced, but the pentanomial distribution of scores is very skewed. Computing the pentanomial variance we get for unbalanced set of openings:Code: Select all
Score Counts  0: 0 0.5: 15 1.0: 354 1.5: 126 2.0: 2 
Unbalanced:
Normalized ELO (pentanomial): 0.315
Re: Opening testing suites efficiency
It's a typo of what I wrote here, but I used correct number in computation:Dariusz Orzechowski wrote:I have trouble reproducing this result. First, your numbers don't sum up to 500. But that's a minor issue. When I calculate normalized elo with those numbers I get result 0.466, not 0.315 and I cannot find any error in my calculations. Strange.Laskos wrote:But let's analyze unbalanced results from resulting PGN file. We have 1000 games in 500 pairs of color and color reversed games (this is essential). As these have unbalanced openings, they are highly correlated (balanced openings give only weak correlation, and trinomial can be pretty safely used). We are interested in pentanomial distribution of these pairs in scores out of 2 points as (0,1/2,1,3/2,2). SF2 scores in these pairs of color reversed games (out of 2 points):As you can see, the CutechessCli output in trinomial SF2 vs SF1: +299 187 =514 seems fairly balanced, but the pentanomial distribution of scores is very skewed. Computing the pentanomial variance we get for unbalanced set of openings:Code: Select all
Score Counts  0: 0 0.5: 15 1.0: 354 1.5: 126 2.0: 2 
Unbalanced:
Normalized ELO (pentanomial): 0.315
0.5: is 18 instead of 15
Is the square root of your variance 0.00563276 ?
Re: Opening testing suites efficiency
Then, the score is 0.556 an tvalue is 0.056/0.00563276 ~ 9.94. You agree?Laskos wrote:It's a typo of what I wrote here, but I used correct number in computation:Dariusz Orzechowski wrote:I have trouble reproducing this result. First, your numbers don't sum up to 500. But that's a minor issue. When I calculate normalized elo with those numbers I get result 0.466, not 0.315 and I cannot find any error in my calculations. Strange.Laskos wrote:But let's analyze unbalanced results from resulting PGN file. We have 1000 games in 500 pairs of color and color reversed games (this is essential). As these have unbalanced openings, they are highly correlated (balanced openings give only weak correlation, and trinomial can be pretty safely used). We are interested in pentanomial distribution of these pairs in scores out of 2 points as (0,1/2,1,3/2,2). SF2 scores in these pairs of color reversed games (out of 2 points):As you can see, the CutechessCli output in trinomial SF2 vs SF1: +299 187 =514 seems fairly balanced, but the pentanomial distribution of scores is very skewed. Computing the pentanomial variance we get for unbalanced set of openings:Code: Select all
Score Counts  0: 0 0.5: 15 1.0: 354 1.5: 126 2.0: 2 
Unbalanced:
Normalized ELO (pentanomial): 0.315
0.5: is 18 instead of 15
Is the square root of your variance 0.00563276 ?

 Posts: 42
 Joined: Thu May 02, 2013 3:23 pm
Re: Opening testing suites efficiency
At the end I have the same tvalue but the rest is for some reason different. I think the reason may be that I'm counting 500 games and you probably 1000 (I'm guessing)? My normalized elo is 0.4446 and multiplied by sqrt(500) gives 9.94.Laskos wrote:Is the square root of your variance 0.00563276 ? Then, the score is 0.556 an tvalue is 0.056/0.00563276 ~ 9.94. You agree?
Re: Opening testing suites efficiency
Normalized Elo is t/sqrt(2*500), no?Dariusz Orzechowski wrote:At the end I have the same tvalue but the rest is for some reason different. I think the reason may be that I'm counting 500 games and you probably 1000 (I'm guessing)? My normalized elo is 0.4446 and multiplied by sqrt(500) gives 9.94.Laskos wrote:Is the square root of your variance 0.00563276 ? Then, the score is 0.556 an tvalue is 0.056/0.00563276 ~ 9.94. You agree?