Do you test engines with or without using a book?

Ozymandias · Post by **Ozymandias** » Mon Jan 11, 2021 3:17 pm

Ovyron wrote: ↑Tue Jan 05, 2021 6:23 amhumans aren't forced to play the same openings with both colors, they play the openings they know well, and the same should happen with engines, building for them a book that makes them go into positions they understand better, to show their true strength.

When testing engines, I also think this would be the fair way to do it. Short self-generated books with a lot of positions. They don't need to be close to equal, or repeated, just varied and randomly selected from within that book.

There's also people who test books with an engine, BTW.

Michel · Post by **Michel** » Mon Jan 11, 2021 4:57 pm

hgm wrote: ↑Mon Jan 11, 2021 2:24 pm
Michel wrote: ↑Wed Jan 06, 2021 6:56 am Replaying games with reversed colors reduces the variance of the test outcome (one should use the pentanomial model to correctly estimate this variance). So you need fewer games to reach a decision. This effect is quite substantial. Fishtest (which is the gold standard in engine testing) uses a very balanced book and there is still a 5% saving. With their previous slightly less balanced book it was 10%. With very unbalanced books it is much more.

The math is discussed here.

http://hardy.uhasselt.be/Fishtest/accou ... entity.pdf

This document is actually about comparing the trinomial and the pentanomial model, but this is the same problem.
How can that be? The width of the Elo curve, which can be seen as the standard deviation of the actual performance difference is 280 Elo. Pawn odds gives an advantage that corresponds to about 100 Elo. Let's say 140 to be generous. If I randomly assign an opening advantage with a standard deviation of half a Pawn (which I would consider quite unbalanced), the variances should add, and the resulting standard deviation should increase by only a factor sqrt(1 + 0.25^2) = sqrt(1.0625). It would require 6.25% more games to compensate that. With a standard deviation of the opening advantage of 1/4 Pawn it should have alread dropped to 1.6%.

The white advantage is about 1/6 of a Pawn. Even if I don't alternate colors, but randomly decide each game which player will have black or white, it should only require 1.7% more games to get the same accuracy.

The book currently in use a Fishtest has a RMS bias of 30 elo which in score units corresponds to 0.043 (1% score=7Elo).

The trinomial formula for the variance normalized per game (in score^2 units) is (1-d)/4 where d is the draw ratio. The pentanomial variance is (1-d)/4-b^2 where b is the RMS bias of the book (in score units).

Assume a draw ratio of 0.8 (currently STC at Fishtest). Then we get a saving 0.043**2/0.05=3.7%.

At LTC the draw ratio is 0.92 but the RMS bias is starting to suffer from Elo compression. It seems to be more like 20 now. So the saving is maybe 5-6%.

The previous 2moves book in use at Fishtest had an RMS bias of 90 Elo (it had high selectivity but not enough positions for the long tests they currently run). In that case b**2 would be multiplied by 9 but of course the draw ratio was also lower (I seem to recall it was 0.6 at STC). So the savings were around 15%.

Chessqueen · Post by **Chessqueen** » Mon Jan 11, 2021 6:41 pm

Uri Blass wrote: ↑Mon Jan 11, 2021 7:50 am
abgursu wrote: ↑Sun Jan 10, 2021 1:26 pm
Graham Banks wrote: ↑Sun Jan 10, 2021 10:48 am
abgursu wrote: ↑Sun Jan 10, 2021 10:34 am Well nowadays I am working on a rating with Kings & Pawns Games.
[d]4k3/pppppppp/8/8/8/8/PPPPPPPP/4K3 w - - 0 1
I get students playing this as part of my coaching.
I played this when I learn chess in my childhood
Funny but there is a lot of losses happening. Komodo is the best in Non-NNUE engines but after NNUE it must be Eman or Dragon. They both won against SF but I never tested them with each other.
I doubt if strong humans are going to lose assuming they play for a draw.
It may be interesting to see a human-computer match from this position.

for comp-comp games you can also use unbalanced positions like the following position
[d]4k3/pppppppp/8/8/8/8/P6P/4KBN1 w - - 0 1

White has much more mobility with a Knight and Bishop, so I looked at this position and gave it a shot to practice my endgame

[pgn][Event "Computer chess game"]
[Date "2021.01.11"]
[Round "?"]
[White "ChessQueen"]
[Black "BikJump"]
[Result "1-0"]
[BlackElo "2121"]
[Time "10:37:05"]
[WhiteElo "1900"]
[TimeControl "1800+3"]
[SetUp "1"]
[FEN "4k3/pppppppp/8/8/8/8/P6P/4KBN1 w - - 14 1"]
[Termination "rules infraction"]
[PlyCount "70"]

1. Nf3 e6 2. Bd3 h6 3. Nd4 Kd8 4. Kf2 g6 5. Nb5 a6 6. Nd4 e5 7. Nc2 f5 8.
h4 e4 9. Bc4 Ke7 10. h5 Kf6 11. hxg6 Kxg6 12. Bf1 Kf6 13. Bh3 Ke5 14. Ne3
f4 15. Ng4+ Kf5 16. Nxh6+ Kg6 17. Nf5 Kf6 18. Nh4 d5 19. Bc8 Kg5 20. Ng2 b6
21. Bb7 Kf5 22. Bxd5 c5 23. Ne1 a5 24. Bc6 Ke5 25. Ke2 Kf5 26. Nc2 Ke5 27.
Na3 Kd4 28. Nb5+ Ke5 29. Nc3 b5 30. Nxb5 a4 31. a3 c4 32. Nc3 e3 33. Nxa4
Kd4 34. Bf3 c3 35. Kd1 Kd3 36.Be2 Kd4 37.Kc2 Kd5 38. Nxc3 resign 1-0[/pgn]

Chessqueen · Post by **Chessqueen** » Mon Jan 11, 2021 6:48 pm

If anybody really wants a challenge and practice against an engine of your = Elo strength, please pick up this position 2 Knights versus 2 Bishops and post your result. The only and best way to learn is by practicing NOT by watching engine vs engine

[pgn][Event "Computer chess game"]
[Date "2021.01.10"]
[Round "2"]
[White "You"]
[Black "an engine = to your Elo"]
[Result "*"]
[BlackElo "?"]
[Time "16:37:21"]
[WhiteElo "?"]
[TimeControl "1200+3"]
[SetUp "1"]
[FEN "2b1kb2/pppppppp/8/8/8/8/PPPPPPPP/1N2K1N1 w - - 12 1"]
[Termination "unterminated"]
[PlyCount "26"][/pgn]

hgm · Post by **hgm** » Mon Jan 11, 2021 7:03 pm

Michel wrote: ↑Mon Jan 11, 2021 4:57 pmThe book currently in use a Fishtest has a RMS bias of 30 elo which in score units corresponds to 0.043 (1% score=7Elo).

Well, 30 Elo is already larger than the white advantage. I would not call that anywhere near balanced. Of course at these high draw ratios it might be better to use a very unbalanced book with reversed colors, to keep the draw ratio at 50%. But that is another issue.

Chessqueen · Post by **Chessqueen** » Tue Jan 12, 2021 2:15 am

hgm wrote: ↑Mon Jan 11, 2021 7:03 pm
Michel wrote: ↑Mon Jan 11, 2021 4:57 pmThe book currently in use a Fishtest has a RMS bias of 30 elo which in score units corresponds to 0.043 (1% score=7Elo).
Well, 30 Elo is already larger than the white advantage. I would not call that anywhere near balanced. Of course at these high draw ratios it might be better to use a very unbalanced book with reversed colors, to keep the draw ratio at 50%. But that is another issue.

My 2 cents on any chess Opening when testing engines is that as long as both engines play both sides , first with White and then the reverse it really does NOT matter for testing purpose, but most programmers prepare Opening Books specifically for their engines to get them into position suited for their playing style, just like when human GMs prepare opening to play versus other GMs

Michel · Post by **Michel** » Tue Jan 12, 2021 9:32 am

hgm wrote: ↑Mon Jan 11, 2021 7:03 pm
Michel wrote: ↑Mon Jan 11, 2021 4:57 pmThe book currently in use a Fishtest has a RMS bias of 30 elo which in score units corresponds to 0.043 (1% score=7Elo).
Well, 30 Elo is already larger than the white advantage. I would not call that anywhere near balanced. Of course at these high draw ratios it might be better to use a very unbalanced book with reversed colors, to keep the draw ratio at 50%. But that is another issue.

The Elo model does not predict anything about the suitability of a book for testing.

But how good a book is for testing can be somewhat objectively answered empirically using the concept of normalized Elo (or sensitivity or SNR ratio): http://hardy.uhasselt.be/Fishtest/normalized_elo.pdf.

At one point a number of books were tested on Fishtest. The current very balanced book and the previous much more unbalanced book have about equal sensitivity but the current book has many more positions. Admittedly they should do that test again since the draw ratio has gone through the roof since NNUE was introduced and it is conceivable this might affect the relative sensitivity of the different books.

BTW: I consider 30 Elo RMS bias to be very balanced (since this about corresponds to the white advantage of 1/3 pawn). Note also that this is the RMS bias so it is heavily affected by outliers.

Chessqueen · Post by **Chessqueen** » Tue Jan 12, 2021 4:20 pm

Uri Blass wrote: ↑Mon Jan 11, 2021 7:50 am
abgursu wrote: ↑Sun Jan 10, 2021 1:26 pm
Graham Banks wrote: ↑Sun Jan 10, 2021 10:48 am
abgursu wrote: ↑Sun Jan 10, 2021 10:34 am Well nowadays I am working on a rating with Kings & Pawns Games.
[d]4k3/pppppppp/8/8/8/8/PPPPPPPP/4K3 w - - 0 1
I get students playing this as part of my coaching.
I played this when I learn chess in my childhood
Funny but there is a lot of losses happening. Komodo is the best in Non-NNUE engines but after NNUE it must be Eman or Dragon. They both won against SF but I never tested them with each other.
I doubt if strong humans are going to lose assuming they play for a draw.
It may be interesting to see a human-computer match from this position.

for comp-comp games you can also use unbalanced positions like the following position
[d]4k3/pppppppp/8/8/8/8/P6P/4KBN1 w - - 0 1

Once again my online trainer told me to try harder but this time with the Black Pieces, but against BiKJump rated 2121 no matter how hard I try with Black it was impossible for me to make any progress, but at least I tried . With so few pieces I consider this position like an endgame, since only Knight plus Bishop with two pawns versus my 8 pawns

[pgn][Event "Computer chess game"]
[Site "DESKTOP-OFQ3C0P"]
[Date "2021.01.12"]
[Round "?"]
[White "BiKJump"]
[Black "Pichardos"]
[Result "1/2-1/2"]
[BlackElo "1900"]
[Time "08:35:01"]
[WhiteElo "2121"]
[TimeControl "900+5"]
[SetUp "1"]
[FEN "4k3/pppppppp/8/8/8/8/P6P/4KBN1 w - - 0 1"]
[Termination "normal"]
[PlyCount "48"]

1. Nf3 f6 2. Bd3 g6 3. h4 e6 4. h5 Kf7 5. hxg6+ hxg6 6. Nd4 c6 7. Kd2 e5 8.
Nb3 b6 9. Ba6 f5 10. Nc1 Ke6 11. Nd3 g5 12. Bc8 c5 13. Ke2 Kd6 14. Nb2 Kc7
15. Ba6 d5 16. Nd1 d4 17. Nf2 e4 18. Nh3 g4 19. Nf4 Kd6 20. Kf2 Ke5 21.
Ng6+ Kf6 22. Nf4 Ke5 23. Ng6+ Kf6 24. Nf4 Ke5 {3-fold repetition} 1/2-1/2[/pgn]

Do you test engines with or without using a book?

Re: Do you test engines with or without using a book?

Re: Do you test engines with or without using a book?

Re: Do you test engines with or without using a book?

Re: Do you test engines with or without using a book?

Re: Do you test engines with or without using a book?

Re: Do you test engines with or without using a book?

Re: Do you test engines with or without using a book?

Re: Do you test engines with or without using a book?