The influence of books on test results.

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: The influence of books on test results.

Post by geots »

Sedat Canbaz wrote:
lkaufman wrote:
velmarin wrote:That system used you, Chessbase book, book Schreder GUI,
What system?.

How many moves do you want? 4,5,6,8.

How many openings diverse needs.?

It can be prepared if you give the data.

I am preparing one of 8 movements in Fritz GUI, always ends with Black movement,
the idea that always begins think the white side .
We have always made our own test books, because we need at least 10,000 positions (to run 20,000 games). I don't know any publicly available test books like this, please tell me if there are any. I think CCRL and CEGT seem to use books averaging about 8 moves per side. All positions should be ones that have occurred a reasonable number of times in master play, so we can be pretty sure that White has just a fairly normal advantage. Which publicly available books come closest to meeting this description?
Dear Larry,

Just my two cents over this issue,

I don't suggest to be used a large opening book,where the engines will be played with 10,000 positions

Because i am afraid that Engines Elo performance will suffer due to such variety openings(there are many holes in a such huge database)

Best,
Sedat


Sedat, I am not quite sure what this thread is really all about. All I know is I AM STICKING WITH YOUR GENERIC BOOKS when I test Komodo or any other engine. Not 1 in a million chance of me changing to anything.

Secondly, for the most part, all the tablebases can take a quick airline flight into the depths of hell as far as I am concerned. (3rd class seating).


george
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: The influence of books on test results.

Post by Adam Hair »

lkaufman wrote:Of all the factors that can influence test results, such as time limit, increment vs. repeating controls, ponder, hardware, etc., the one we are currently most interested in is the effect of opening books/testsuites. Our own distributed tester uses a five move book, rather shorter than that used by most testers. Since it shows a sixteen elo lead for Komodo 5 over Houdini 1.5 (after over 11k games) which is not shown by the testing agencies, and since the only result on this forum showing Komodo 5 beating Houdini 2 in a long match used a four move book, we decided to make a new testbook that is more typical of books normally used in tests - it averages six moves, but some popular lines are much longer than this. Based on hyper-fast testing, our performance drops by 12 Elo playing against Critter (the closest opponent at hyperspeed levels) after 6700 games. So assuming this would also be true at the normal blitz levels used in the distributed test, this would appear to account for most of the discrepancy between our own test results and the others.
Has anyone else run long tests to compare the effect of different opening books on test results? The tests would have to be several thousand games long, but can be at very fast levels.
Probably we will modify our tester to use this or a similar new book, so that future results will be better predicted by it. My conclusion is that Komodo is better than other top programs at playing the early opening, but the longer the book line supplied, the less valuable this asset becomes. Perhaps switching to a more normal book for testing will gradually help Komodo as different features are tuned using this new book.
I never considered the opening book to be much of a factor in test results (assuming colors are switched for each book position tested), but I am gradually becoming a believer.
My testing used a set of 18,000 positions that all were 4 moves deep. These positions were derived from the databases of the CCRL, CEGT, SWCR, UEL, and my own games. Though I am certain that there are some unbalanced positions in this set, for the most part they are not too unbalanced nor too drawish. White score for my games have been just under 53%.

I do not use reversed colors. Doing so automatically reduces the independence of the positions used, which increases the actual error of the measurements. I depend on randomness to keep White (or Black) bias low. I think that shows in the White score of my games, which includes many more games than just those played by the Also-Ran engines.

I have evidence that using a large set of positions without reversed colors is much better than using a small set of positions with reversed colors. There is some variance that comes into play by not using reversed colors, especially if the pool of opponents is wide. But, it is more than offset (in my experience) by the large number of positions used, covering more situations that would be found in general.

I realize that you would like to adjust Komodo's testing in such a way that it would better predict the results of the rating list testers. And possibly you could achieve this. But it is not certain that it would make Komodo better (stronger). It could even make it worse.
User avatar
velmarin
Posts: 1600
Joined: Mon Feb 21, 2011 9:48 am

Re: The influence of books on test results.

Post by velmarin »

Honestly, I do not understand.

You do not know who puts Chessbase GUIS Nalimov tables works in all engines,
If you have installed Nalimov tables and start fritz by the default, all engines use the tables,
Fritz is in charge of it.

To be put in the engine,
They have not seen.
:D
User avatar
velmarin
Posts: 1600
Joined: Mon Feb 21, 2011 9:48 am

Re: The influence of books on test results.

Post by velmarin »

Adam Hair wrote:
lkaufman wrote:Of all the factors that can influence test results, such as time limit, increment vs. repeating controls, ponder, hardware, etc., the one we are currently most interested in is the effect of opening books/testsuites. Our own distributed tester uses a five move book, rather shorter than that used by most testers. Since it shows a sixteen elo lead for Komodo 5 over Houdini 1.5 (after over 11k games) which is not shown by the testing agencies, and since the only result on this forum showing Komodo 5 beating Houdini 2 in a long match used a four move book, we decided to make a new testbook that is more typical of books normally used in tests - it averages six moves, but some popular lines are much longer than this. Based on hyper-fast testing, our performance drops by 12 Elo playing against Critter (the closest opponent at hyperspeed levels) after 6700 games. So assuming this would also be true at the normal blitz levels used in the distributed test, this would appear to account for most of the discrepancy between our own test results and the others.
Has anyone else run long tests to compare the effect of different opening books on test results? The tests would have to be several thousand games long, but can be at very fast levels.
Probably we will modify our tester to use this or a similar new book, so that future results will be better predicted by it. My conclusion is that Komodo is better than other top programs at playing the early opening, but the longer the book line supplied, the less valuable this asset becomes. Perhaps switching to a more normal book for testing will gradually help Komodo as different features are tuned using this new book.
I never considered the opening book to be much of a factor in test results (assuming colors are switched for each book position tested), but I am gradually becoming a believer.
My testing used a set of 18,000 positions that all were 4 moves deep. These positions were derived from the databases of the CCRL, CEGT, SWCR, UEL, and my own games. Though I am certain that there are some unbalanced positions in this set, for the most part they are not too unbalanced nor too drawish. White score for my games have been just under 53%.

I do not use reversed colors. Doing so automatically reduces the independence of the positions used, which increases the actual error of the measurements. I depend on randomness to keep White (or Black) bias low. I think that shows in the White score of my games, which includes many more games than just those played by the Also-Ran engines.

I have evidence that using a large set of positions without reversed colors is much better than using a small set of positions with reversed colors. There is some variance that comes into play by not using reversed colors, especially if the pool of opponents is wide. But, it is more than offset (in my experience) by the large number of positions used, covering more situations that would be found in general.

I realize that you would like to adjust Komodo's testing in such a way that it would better predict the results of the rating list testers. And possibly you could achieve this. But it is not certain that it would make Komodo better (stronger). It could even make it worse.
Maybe 4 movements are many,
that no 3,
or 2
or do not put books.

their tests are really with 4 movements?

engines at the opening are more lost than an octopus in a garage.
that's not the way.
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: The influence of books on test results.

Post by Houdini »

Adam Hair wrote:My testing used a set of 18,000 positions that all were 4 moves deep.
I don't understand how that works.

For example King's Indian Defense, I see only a couple of possibilities with 4 moves, mostly:
1.d4 Nf6 2.c4 g6 3.Nc3 Bg7 4.e4 d6.
1.d4 Nf6 2.c4 g6 3.g3 Bg7 4.Bg2 O-O.

Do I understand correctly that in your engine matches the whole KID is summarized by just a handful of positions out of 18,000?
What about classical variation with Nd7 or Nc6, or Qe8 Na6 variations, or the Saemisch Variation, the Four-pawns Attack, or Averbakh variation?
Do you rely on the engines to find all these lines, or does it simply not matter that you don't really cover the KID?

Robert
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: The influence of books on test results.

Post by Laskos »

Adam Hair wrote:
I have evidence that using a large set of positions without reversed colors is much better than using a small set of positions with reversed colors.
Even better is to use a large set of positions with reversed colours. Randomnes reduces the errors as 1/sqrt(N), but reducing it by 1/sqrt(N/2) using reversed colours is even better, as the errors due to "wrong" opening positions cancels much better using reversed colours. Your argument is probably valid for matches of millions of games or so.
There is some variance that comes into play by not using reversed colors, especially if the pool of opponents is wide. But, it is more than offset (in my experience) by the large number of positions used, covering more situations that would be found in general.

I realize that you would like to adjust Komodo's testing in such a way that it would better predict the results of the rating list testers. And possibly you could achieve this. But it is not certain that it would make Komodo better (stronger). It could even make it worse.
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: The influence of books on test results.

Post by Sedat Canbaz »

geots wrote:
Sedat Canbaz wrote:
lkaufman wrote:
velmarin wrote:That system used you, Chessbase book, book Schreder GUI,
What system?.

How many moves do you want? 4,5,6,8.

How many openings diverse needs.?

It can be prepared if you give the data.

I am preparing one of 8 movements in Fritz GUI, always ends with Black movement,
the idea that always begins think the white side .
We have always made our own test books, because we need at least 10,000 positions (to run 20,000 games). I don't know any publicly available test books like this, please tell me if there are any. I think CCRL and CEGT seem to use books averaging about 8 moves per side. All positions should be ones that have occurred a reasonable number of times in master play, so we can be pretty sure that White has just a fairly normal advantage. Which publicly available books come closest to meeting this description?
Dear Larry,

Just my two cents over this issue,

I don't suggest to be used a large opening book,where the engines will be played with 10,000 positions

Because i am afraid that Engines Elo performance will suffer due to such variety openings(there are many holes in a such huge database)

Best,
Sedat


Sedat, I am not quite sure what this thread is really all about. All I know is I AM STICKING WITH YOUR GENERIC BOOKS when I test Komodo or any other engine. Not 1 in a million chance of me changing to anything.

Secondly, for the most part, all the tablebases can take a quick airline flight into the depths of hell as far as I am concerned. (3rd class seating).


george

Dear George,

Many thanks for your kind words...

You know what you are testing,you prefer quality than quantity !

But however,still i am not sutisfied 100 % by the performance of Perfect book series

Still there are a few lines which i am not sure to allow or to not allow those critical openings

Btw,i am glad to announce about Perfect 2012b's performance:
-After many efforts and improvements, i managed to reach 2% better performance with Blacks


Perfect 2012b book:

Code: Select all

Games        :   9200 (finished)

White Wins   :   2910 (31.6 %)
Black Wins   :   2273 (24.7 %)
Draws        :   4017 (43.7 %)
Unfinished   :      0

White Perf.  : 53.5 %
Black Perf.  : 46.5 %

ECO A =    327 Games ( 3.6 %)
ECO B =   4226 Games (45.9 %)
ECO C =   1862 Games (20.2 %)
ECO D =   2273 Games (24.7 %)
ECO E =    512 Games ( 5.6 %)
************************************

Perfect 2012 book:

Code: Select all

Games        :  35252 (finished)

White Wins   :  11627 (33.0 %)
Black Wins   :   7851 (22.3 %)
Draws        :  15774 (44.7 %)
Unfinished   :      1

White Perf.  : 55.4 %
Black Perf.  : 44.6 %

ECO A =   1869 Games ( 5.3 %)
ECO B =  13776 Games (39.1 %)
ECO C =   6416 Games (18.2 %)
ECO D =   9613 Games (27.3 %)
ECO E =   3113 Games ( 8.8 %)
Btw,i have a very good friend from Bulgaria,where his name is also George


Happy Testings,
Sedat
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: The influence of books on test results.

Post by Adam Hair »

Houdini wrote:
Adam Hair wrote:My testing used a set of 18,000 positions that all were 4 moves deep.
I don't understand how that works.

For example King's Indian Defense, I see only a couple of possibilities with 4 moves, mostly:
1.d4 Nf6 2.c4 g6 3.Nc3 Bg7 4.e4 d6.
1.d4 Nf6 2.c4 g6 3.g3 Bg7 4.Bg2 O-O.

Do I understand correctly that in your engine matches the whole KID is summarized by just a handful of positions out of 18,000?
What about classical variation with Nd7 or Nc6, or Qe8 Na6 variations, or the Saemisch Variation, the Four-pawns Attack, or Averbakh variation?
Do you rely on the engines to find all these lines, or does it simply not matter that you don't really cover the KID?

Robert
It does not matter to me. For me, the goal is to measure general strength. I am assuming that the greater an engine's general strength is, the better it will play the major openings and their variations.

If I tested at longer time controls, my focus would very likely be different.
Ferdy
Posts: 4851
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: The influence of books on test results.

Post by Ferdy »

lkaufman wrote:Of all the factors that can influence test results, such as time limit, increment vs. repeating controls, ponder, hardware, etc., the one we are currently most interested in is the effect of opening books/testsuites. Our own distributed tester uses a five move book, rather shorter than that used by most testers. Since it shows a sixteen elo lead for Komodo 5 over Houdini 1.5 (after over 11k games) which is not shown by the testing agencies, and since the only result on this forum showing Komodo 5 beating Houdini 2 in a long match used a four move book, we decided to make a new testbook that is more typical of books normally used in tests - it averages six moves, but some popular lines are much longer than this. Based on hyper-fast testing, our performance drops by 12 Elo playing against Critter (the closest opponent at hyperspeed levels) after 6700 games. So assuming this would also be true at the normal blitz levels used in the distributed test, this would appear to account for most of the discrepancy between our own test results and the others.
Has anyone else run long tests to compare the effect of different opening books on test results? The tests would have to be several thousand games long, but can be at very fast levels.
Probably we will modify our tester to use this or a similar new book, so that future results will be better predicted by it. My conclusion is that Komodo is better than other top programs at playing the early opening, but the longer the book line supplied, the less valuable this asset becomes. Perhaps switching to a more normal book for testing will gradually help Komodo as different features are tuned using this new book.
I never considered the opening book to be much of a factor in test results (assuming colors are switched for each book position tested), but I am gradually becoming a believer.
Our own distributed tester uses a five move book, rather shorter than that used by most testers.
This appears to have an effect on opening and middle game play. There are openings that have more choices beyond five move line. Perhaps those positions will not be visited by the engine.
I prepare my game test sets manually now, I choose differents schemes, 2 bishops advantage on one side the other side with a material compensation. Closed positions and open positions. Positions with weak pawn structure but has lead in development, gambits and others. All of these do not depend on how many number of moves. It is a work in progress and just add positions whenever I found the time.
Has anyone else run long tests to compare the effect of different opening books on test results? The tests would have to be several thousand games long, but can be at very fast levels.
I can't give you exact figures (> 1000 games), but I experienced different sets of positions affects the results. Just like different sets of opponents affects the results. There are opponents that are very good on certain positions but not in others. I used a minimum of 20 different opponents. I understand at the top you have limited choices. Maybe you may create different personalities of top engines and use those as sparring partners :) .
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: The influence of books on test results.

Post by geots »

Uri Blass wrote:
gleperlier wrote:
lkaufman wrote:
gleperlier wrote:
lkaufman wrote: Yes I think book have big influence on ratings, not only by the number of ply but by the variety of openings.
You can also add "tablebases" in the factors. I run 1000 games Komodo versus Houdini without TB, with TB 3, with TB 4 and with TB 5, you will see some difference. If I run a tournament with some engines without, some with Robbobases, some with Nalimov etc. it will also change a lot.

I would even say that for me, future of Chess engine "official" tournaments should be with books limited in plys and limited Tablebases.

Cheers,

Gab

I thought it was not at all clear that TBs of any sort help ratings, but please tell me what your findings were. Did you find that Houdini did progressively better with more TBs, or were the results pretty much random? Same question for other engines too.
I always found this quite obvious in my games but have to make some tests to confirm. For example, my engines could have lost 100 elo points on Playchess without TBs.

Will keep you updated with my tests.

Cheers,

Gab
100 elo points?

You cannot be serious.
I always thought that tablebases have almost no influence on playing strength based on what I read(Stockfish does not support tablebases exactly because the authors found no way to earn elo from using them).

Rating on playchess may be unstable and it probably possible to lose 100 elo or to earn 100 elo with no change in the program.


Uri- you are correct- but its not that the gains are negligible, but that I don't consider tb to be worth a shit for anything. Generally, by the time they kick in, an engine has long ago figured out the best course of action. Someone would have to explain to me how 4 and 5 and even 6 man bases help when the engines have already recognized #24 with 14 pieces left on the board. And those are not even the engines that are thought of as having the most endgame knowledge built in. But if some people like them- I think that is great- whatever floats their boat.

gts