Ovyron wrote: ↑Tue Jan 05, 2021 6:23 amhumans aren't forced to play the same openings with both colors, they play the openings they know well, and the same should happen with engines, building for them a book that makes them go into positions they understand better, to show their true strength.
When testing engines, I also think this would be the fair way to do it. Short self-generated books with a lot of positions. They don't need to be close to equal, or repeated, just varied and randomly selected from within that book.
There's also people who test books with an engine, BTW.
Michel wrote: ↑Wed Jan 06, 2021 6:56 am
Replaying games with reversed colors reduces the variance of the test outcome (one should use the pentanomial model to correctly estimate this variance). So you need fewer games to reach a decision. This effect is quite substantial. Fishtest (which is the gold standard in engine testing) uses a very balanced book and there is still a 5% saving. With their previous slightly less balanced book it was 10%. With very unbalanced books it is much more.
This document is actually about comparing the trinomial and the pentanomial model, but this is the same problem.
How can that be? The width of the Elo curve, which can be seen as the standard deviation of the actual performance difference is 280 Elo. Pawn odds gives an advantage that corresponds to about 100 Elo. Let's say 140 to be generous. If I randomly assign an opening advantage with a standard deviation of half a Pawn (which I would consider quite unbalanced), the variances should add, and the resulting standard deviation should increase by only a factor sqrt(1 + 0.25^2) = sqrt(1.0625). It would require 6.25% more games to compensate that. With a standard deviation of the opening advantage of 1/4 Pawn it should have alread dropped to 1.6%.
The white advantage is about 1/6 of a Pawn. Even if I don't alternate colors, but randomly decide each game which player will have black or white, it should only require 1.7% more games to get the same accuracy.
The book currently in use a Fishtest has a RMS bias of 30 elo which in score units corresponds to 0.043 (1% score=7Elo).
The trinomial formula for the variance normalized per game (in score^2 units) is (1-d)/4 where d is the draw ratio. The pentanomial variance is (1-d)/4-b^2 where b is the RMS bias of the book (in score units).
Assume a draw ratio of 0.8 (currently STC at Fishtest). Then we get a saving 0.043**2/0.05=3.7%.
At LTC the draw ratio is 0.92 but the RMS bias is starting to suffer from Elo compression. It seems to be more like 20 now. So the saving is maybe 5-6%.
The previous 2moves book in use at Fishtest had an RMS bias of 90 Elo (it had high selectivity but not enough positions for the long tests they currently run). In that case b**2 would be multiplied by 9 but of course the draw ratio was also lower (I seem to recall it was 0.6 at STC). So the savings were around 15%.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
abgursu wrote: ↑Sun Jan 10, 2021 10:34 am
Well nowadays I am working on a rating with Kings & Pawns Games.
[d]4k3/pppppppp/8/8/8/8/PPPPPPPP/4K3 w - - 0 1
I get students playing this as part of my coaching.
I played this when I learn chess in my childhood
Funny but there is a lot of losses happening. Komodo is the best in Non-NNUE engines but after NNUE it must be Eman or Dragon. They both won against SF but I never tested them with each other.
I doubt if strong humans are going to lose assuming they play for a draw.
It may be interesting to see a human-computer match from this position.
for comp-comp games you can also use unbalanced positions like the following position
[d]4k3/pppppppp/8/8/8/8/P6P/4KBN1 w - - 0 1
White has much more mobility with a Knight and Bishop, so I looked at this position and gave it a shot to practice my endgame
If anybody really wants a challenge and practice against an engine of your = Elo strength, please pick up this position 2 Knights versus 2 Bishops and post your result. The only and best way to learn is by practicing NOT by watching engine vs engine
Michel wrote: ↑Mon Jan 11, 2021 4:57 pmThe book currently in use a Fishtest has a RMS bias of 30 elo which in score units corresponds to 0.043 (1% score=7Elo).
Well, 30 Elo is already larger than the white advantage. I would not call that anywhere near balanced. Of course at these high draw ratios it might be better to use a very unbalanced book with reversed colors, to keep the draw ratio at 50%. But that is another issue.
Michel wrote: ↑Mon Jan 11, 2021 4:57 pmThe book currently in use a Fishtest has a RMS bias of 30 elo which in score units corresponds to 0.043 (1% score=7Elo).
Well, 30 Elo is already larger than the white advantage. I would not call that anywhere near balanced. Of course at these high draw ratios it might be better to use a very unbalanced book with reversed colors, to keep the draw ratio at 50%. But that is another issue.
My 2 cents on any chess Opening when testing engines is that as long as both engines play both sides , first with White and then the reverse it really does NOT matter for testing purpose, but most programmers prepare Opening Books specifically for their engines to get them into position suited for their playing style, just like when human GMs prepare opening to play versus other GMs
Michel wrote: ↑Mon Jan 11, 2021 4:57 pmThe book currently in use a Fishtest has a RMS bias of 30 elo which in score units corresponds to 0.043 (1% score=7Elo).
Well, 30 Elo is already larger than the white advantage. I would not call that anywhere near balanced. Of course at these high draw ratios it might be better to use a very unbalanced book with reversed colors, to keep the draw ratio at 50%. But that is another issue.
The Elo model does not predict anything about the suitability of a book for testing.
At one point a number of books were tested on Fishtest. The current very balanced book and the previous much more unbalanced book have about equal sensitivity but the current book has many more positions. Admittedly they should do that test again since the draw ratio has gone through the roof since NNUE was introduced and it is conceivable this might affect the relative sensitivity of the different books.
BTW: I consider 30 Elo RMS bias to be very balanced (since this about corresponds to the white advantage of 1/3 pawn). Note also that this is the RMS bias so it is heavily affected by outliers.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
abgursu wrote: ↑Sun Jan 10, 2021 10:34 am
Well nowadays I am working on a rating with Kings & Pawns Games.
[d]4k3/pppppppp/8/8/8/8/PPPPPPPP/4K3 w - - 0 1
I get students playing this as part of my coaching.
I played this when I learn chess in my childhood
Funny but there is a lot of losses happening. Komodo is the best in Non-NNUE engines but after NNUE it must be Eman or Dragon. They both won against SF but I never tested them with each other.
I doubt if strong humans are going to lose assuming they play for a draw.
It may be interesting to see a human-computer match from this position.
for comp-comp games you can also use unbalanced positions like the following position
[d]4k3/pppppppp/8/8/8/8/P6P/4KBN1 w - - 0 1
Once again my online trainer told me to try harder but this time with the Black Pieces, but against BiKJump rated 2121 no matter how hard I try with Black it was impossible for me to make any progress, but at least I tried . With so few pieces I consider this position like an endgame, since only Knight plus Bishop with two pawns versus my 8 pawns