Opening testing suites efficiency

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Laskos
Posts: 9045
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Opening testing suites efficiency

Post by Laskos » Wed Jun 28, 2017 4:57 pm

Dariusz Orzechowski wrote:
Laskos wrote:Image
Does it show static eval or after some search? I don't know what 'V6' means here. In my book I filtered positions based on eval at depth 12. By the way, it would be probably better to use abs(eval) instead of eval.
For your huge suite I used 100ms time Stockfish dev per position, which is probably higher than ply 12. I don't agree that we should use abs(eval), white and black are asymmetric in normal play. I also used asymmetric bounds when optimizing. Again, I want to emphasize in my suites normal play in the opening. Anyway, it's very hard to have representativity of the normal play and enough diversity for tens of thousands of games with very sensitive (very short) books.

Dariusz Orzechowski
Posts: 42
Joined: Thu May 02, 2013 3:23 pm

Re: Opening testing suites efficiency

Post by Dariusz Orzechowski » Wed Jun 28, 2017 7:40 pm

Michel wrote:I posted a small document about "normalized elo"

http://hardy.uhasselt.be/Toga/normalized_elo.pdf

explaining why it is the correct quantity for comparing different books.
Suggestions of some minor corrections, hope you don't mind:

1. "Φ is a normal distribution with unit variance and zero mean" - I would write "(...) cumulative distribution function of a normal distribution (...)".
2. In remark 2.3 typo "sensity" instead of "sensitivity".
3. Under formula with RMS I would write for completeness what that abbreviation means.

Michel
Posts: 2005
Joined: Sun Sep 28, 2008 11:50 pm

Re: Opening testing suites efficiency

Post by Michel » Wed Jun 28, 2017 9:04 pm

It looks like the first formula on the second page (unfortunately it's not numbered) is wrong. Sigma_0 should be

sqrt((w + 1/4*d) - s^2)
instead of

sqrt((w + 1/4*d)^2 - s^2)
Thanks! I noticed it too but I could not fix it as I was away from my computer. Quite frustrating!

I have now posted a new version. Thanks for the other comments as well.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

User avatar
Laskos
Posts: 9045
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Opening testing suites efficiency

Post by Laskos » Thu Jun 29, 2017 11:44 am

Dariusz Orzechowski wrote:
Laskos wrote:Your 5ply_v1.epd openings are excellent in building unbalanced early opening positions and using pentanomial variance for LLR, which is the future of testing engines (see here http://www.talkchess.com/forum/viewtopic.php?t=61245 ). And the requirement that openings are "reasonable" becomes irrelevant, computer Chess itself will become "unreasonable" because of very high draw rates with normal openings. When the testers (like Stockfish Framework) will get to draw rates above 80-85% with balanced openings, or in Bayeselo terminology, eloDraw above 500 or so, the optimum openings are defined by eloBias = eloDraw, with resulting draw rate of 50%. The improvement in number of necessary games to the same LOS or SPRT stop will be from ~4 for draw rate of 85% to an order of magnitude for above 90% draw rate compared to balanced positions. This sort of new testing with unbalanced positions already happened in Checkers more than a decade ago.

I analyzed with Stockfish your 5ply_v1.epd file and uploaded the openings with 70cp-110cp unbalance, suited for wide range of draw rates above 80% (or eloBias above 500 or so).
http://s000.tinyupload.com/?file_id=079 ... 7904268641
This 5ply_v1_unbalanced.epd contains 15083 unique unbalanced positions (70cp-110cp) for future testing, when Cutechess-Cli or similar tool will use pentanomial variance, and draw rates with balanced openings will become above 80% or so. It is not so far in the future, at least for Stockfish. The openings are skewed towards lower values in the [70,110] cp unbalance interval.
I'm uploading my unbalanced set of 5ply openings that I got as a "by-product" from my work. Maybe it could be useful. It contains over 127k positions, most of them should be in interesting range 70-110 although it is very crude so I would expect that completely lopsided positions could also have slipped in there. I tried to work on 5ply book last year but stopped due to lack of time (and ideas).


5ply_unbalanced_127k.epd book (link expires in 7 days): http://dropmefiles.com/VDUNG
Michel wrote:I posted a small document about "normalized elo"

http://hardy.uhasselt.be/Toga/normalized_elo.pdf

explaining why it is the correct quantity for comparing different books.
Very nice! I you could also add formula for pentanomial case it would be great to have it in one place.
A very useful file, thanks!

I analyzed a bit this file of unbalanced 5-pliers, the eval of Stockfish on them looks like that:

Image


Although it looks a bit strange (what an engine should do in 5-ply games to have significant number of positions with 1000cp disadvantage?), it is the most useful file to use for practically needed unbalances in 70cp-180cp range.

To exemplify the power of these kinds of unbalanced openings in the future, I will show you much more drawish game of Endgame Chess, probably very related to Chess. Balanced in eval even early endgames are very drawish. Engines play them well, as in the future they will probably play the whole game. To anticipate a bit (maybe by 10 years), I used both balanced and unbalanced "opening" suites from early endgames. To roughly achieve 50% draw rate (optimal for sensitivity using pentanomial) with unbalanced endgames, the unbalance has to be set at 100cp-140cp, higher than normal openings unbalance of 70cp-110cp. Here are the match results of SF dev at 10''+ 0.1'' versus same SF dev at 5''+ 0.05'' (again doubling in time) in 1000 games each:

Balanced:
Endgames unbalance 0cp-40cp:
Score of SF2 vs SF1: 28 - 8 - 964 [0.510] 1000
ELO difference: 6.95 +/- 4.06
Normalized ELO (trinomial): 0.106

Very high, 96% draw rate, which is bothering.


Unbalanced:
Endgame unbalance 100cp-140cp:
Score of SF2 vs SF1: 299 - 187 - 514 [0.556] 1000
ELO difference: 39.08 +/- 14.99
Normalized ELO (trinomial): 0.162

The unbalance of 100cp-140cp was chosen well, the draw rate is close to 50% optimum now. Even using only trinomial computation, sensitivity increased with unbalanced openings, but not dramatically.

But let's analyze unbalanced results from resulting PGN file. We have 1000 games in 500 pairs of color and color reversed games (this is essential). As these have unbalanced openings, they are highly correlated (balanced openings give only weak correlation, and trinomial can be pretty safely used). We are interested in pentanomial distribution of these pairs in scores out of 2 points as (0,1/2,1,3/2,2). SF2 scores in these pairs of color reversed games (out of 2 points):

Code: Select all

Score  Counts
-------------
0:        0
0.5:     15
1.0:    354
1.5:    126 
2.0:      2
-------------
As you can see, the Cutechess-Cli output in trinomial SF2 vs SF1: +299 -187 =514 seems fairly balanced, but the pentanomial distribution of scores is very skewed. Computing the pentanomial variance we get for unbalanced set of openings:

Unbalanced:
Normalized ELO (pentanomial): 0.315

We see that using pentanomial and unbalanced openings we get a factor of 3 higher value of Normalized ELO compared to balanced openings. This means 9 times less games to the same LOS and SPRT stop. This will be the future of chess when draw rates will routinely be above 80%.

Dariusz Orzechowski
Posts: 42
Joined: Thu May 02, 2013 3:23 pm

Re: Opening testing suites efficiency

Post by Dariusz Orzechowski » Thu Jun 29, 2017 2:08 pm

Laskos wrote:A very useful file, thanks!

I analyzed a bit this file of unbalanced 5-pliers, the eval of Stockfish on them looks like that:

Image


Although it looks a bit strange (what an engine should do in 5-ply games to have significant number of positions with 1000cp disadvantage?), it is the most useful file to use for practically needed unbalances in 70cp-180cp range.
It's enough to loose a queen. But no engine actually played those 5 plies. Book is created "backwards" by taking all possible 5ply positions (perft 5) and then filtering out duplicates, mates on board, completely lopsided positions and so on. My tools are not perfect as you can see but I was not concerned because it was not supposed to be end product (at least not yet). I just keep things that I might use in the future so here we have this set of openings. I'm glad you like it, maybe something interesting will come out of this.

Dariusz Orzechowski
Posts: 42
Joined: Thu May 02, 2013 3:23 pm

Re: Opening testing suites efficiency

Post by Dariusz Orzechowski » Thu Jun 29, 2017 9:00 pm

Laskos wrote:But let's analyze unbalanced results from resulting PGN file. We have 1000 games in 500 pairs of color and color reversed games (this is essential). As these have unbalanced openings, they are highly correlated (balanced openings give only weak correlation, and trinomial can be pretty safely used). We are interested in pentanomial distribution of these pairs in scores out of 2 points as (0,1/2,1,3/2,2). SF2 scores in these pairs of color reversed games (out of 2 points):

Code: Select all

Score  Counts
-------------
0:        0
0.5:     15
1.0:    354
1.5:    126 
2.0:      2
-------------
As you can see, the Cutechess-Cli output in trinomial SF2 vs SF1: +299 -187 =514 seems fairly balanced, but the pentanomial distribution of scores is very skewed. Computing the pentanomial variance we get for unbalanced set of openings:

Unbalanced:
Normalized ELO (pentanomial): 0.315
I have trouble reproducing this result. First, your numbers don't sum up to 500. But that's a minor issue. When I calculate normalized elo with those numbers I get result 0.466, not 0.315 and I cannot find any error in my calculations. Strange.

User avatar
Laskos
Posts: 9045
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Opening testing suites efficiency

Post by Laskos » Thu Jun 29, 2017 11:10 pm

Dariusz Orzechowski wrote:
Laskos wrote:But let's analyze unbalanced results from resulting PGN file. We have 1000 games in 500 pairs of color and color reversed games (this is essential). As these have unbalanced openings, they are highly correlated (balanced openings give only weak correlation, and trinomial can be pretty safely used). We are interested in pentanomial distribution of these pairs in scores out of 2 points as (0,1/2,1,3/2,2). SF2 scores in these pairs of color reversed games (out of 2 points):

Code: Select all

Score  Counts
-------------
0:        0
0.5:     15
1.0:    354
1.5:    126 
2.0:      2
-------------
As you can see, the Cutechess-Cli output in trinomial SF2 vs SF1: +299 -187 =514 seems fairly balanced, but the pentanomial distribution of scores is very skewed. Computing the pentanomial variance we get for unbalanced set of openings:

Unbalanced:
Normalized ELO (pentanomial): 0.315
I have trouble reproducing this result. First, your numbers don't sum up to 500. But that's a minor issue. When I calculate normalized elo with those numbers I get result 0.466, not 0.315 and I cannot find any error in my calculations. Strange.
It's a typo of what I wrote here, but I used correct number in computation:
0.5: is 18 instead of 15
Is the square root of your variance 0.00563276 ?

User avatar
Laskos
Posts: 9045
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Opening testing suites efficiency

Post by Laskos » Thu Jun 29, 2017 11:41 pm

Laskos wrote:
Dariusz Orzechowski wrote:
Laskos wrote:But let's analyze unbalanced results from resulting PGN file. We have 1000 games in 500 pairs of color and color reversed games (this is essential). As these have unbalanced openings, they are highly correlated (balanced openings give only weak correlation, and trinomial can be pretty safely used). We are interested in pentanomial distribution of these pairs in scores out of 2 points as (0,1/2,1,3/2,2). SF2 scores in these pairs of color reversed games (out of 2 points):

Code: Select all

Score  Counts
-------------
0:        0
0.5:     15
1.0:    354
1.5:    126 
2.0:      2
-------------
As you can see, the Cutechess-Cli output in trinomial SF2 vs SF1: +299 -187 =514 seems fairly balanced, but the pentanomial distribution of scores is very skewed. Computing the pentanomial variance we get for unbalanced set of openings:

Unbalanced:
Normalized ELO (pentanomial): 0.315
I have trouble reproducing this result. First, your numbers don't sum up to 500. But that's a minor issue. When I calculate normalized elo with those numbers I get result 0.466, not 0.315 and I cannot find any error in my calculations. Strange.
It's a typo of what I wrote here, but I used correct number in computation:
0.5: is 18 instead of 15
Is the square root of your variance 0.00563276 ?
Then, the score is 0.556 an t-value is 0.056/0.00563276 ~ 9.94. You agree?

Dariusz Orzechowski
Posts: 42
Joined: Thu May 02, 2013 3:23 pm

Re: Opening testing suites efficiency

Post by Dariusz Orzechowski » Fri Jun 30, 2017 12:02 am

Laskos wrote:Is the square root of your variance 0.00563276 ? Then, the score is 0.556 an t-value is 0.056/0.00563276 ~ 9.94. You agree?
At the end I have the same t-value but the rest is for some reason different. I think the reason may be that I'm counting 500 games and you probably 1000 (I'm guessing)? My normalized elo is 0.4446 and multiplied by sqrt(500) gives 9.94.

User avatar
Laskos
Posts: 9045
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Opening testing suites efficiency

Post by Laskos » Fri Jun 30, 2017 12:04 am

Dariusz Orzechowski wrote:
Laskos wrote:Is the square root of your variance 0.00563276 ? Then, the score is 0.556 an t-value is 0.056/0.00563276 ~ 9.94. You agree?
At the end I have the same t-value but the rest is for some reason different. I think the reason may be that I'm counting 500 games and you probably 1000 (I'm guessing)? My normalized elo is 0.4446 and multiplied by sqrt(500) gives 9.94.
Normalized Elo is t/sqrt(2*500), no?

Post Reply