Opening book read speed

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Opening book read speed

Post by xr_a_y »

All very good advices ! thanks a lot.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Opening book read speed

Post by Ras »

I'm doing a similar approach, though with a much smaller book (22,000 moves, 12,000 positions). The source format is line based, then I put a self-written opening book compiler in-between, and I get a binary format sorted by ascending position CRC-32. On top of that, there are also 256 cache indices to avoid scanning through half the book on average.

I'm using a primary CRC-32, the ethernet CRC, and a secondary CRC-8 so that I effectively have a CRC-40. The number of positions has to be much smaller than the root of 2^40, i.e. 1 million, which is easily fulfilled. Otherwise, you get birthday paradoxon collisions.

Memory footprint: per position, 4 bytes for CRC-32, 1 byte for the number of moves. Then 1 byte for each move from and to, and 1.8 moves per position, that's 3.7 bytes. In total 8.7 bytes per position times 12,000 positions gives around a 100 kB.

I've run a benchmark where I used my opening book copied around a thousand times, which gave me a 600 MB line based text file. My old AMD Phenom 2 1100 T crunched that through in about a 100 seconds. Since there is one even super-linear step, the sorting with O(n * lg n), we can say that fictional linear scaling would be a lower bound for the factor. 100s / 600 MB * 19 MB = 3 seconds.

I'd say that you could gain at least a factor of three - or even more if you reference PC is faster than mine.

Since you have longer lines, I think you will have fewer moves per position, let's assume 1.5 moves. With my approach, that would top out at around 50 MB. However, CRC-40 will not work anymore because you'd have around 6 million positions, which would run into birthday collisions, so you'd have to go with two independent CRC-32. That will be additional 3 bytes per position (one byte, like the CRC-8, is fiddled into free bits), and you get to around 70 MB.
Last edited by Ras on Mon Aug 13, 2018 5:28 pm, edited 1 time in total.
Rasmus Althoff
https://www.ct800.net
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Opening book read speed

Post by xr_a_y »

Using Martin advices and some more code clean up I get 3.1 sec for reading my binary book, which is compatible with a xboard use...

I'll need to add support for classic stuff soon for both openings and end-games.

But curiously, Weini is loosing elo using this book, apparently the engine is going out of book in position it does not like much ...
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Opening book read speed

Post by Ras »

Another idea, do you use some kind of caching? Often, a line based book contains the same say 10 starting plies in several consecutive lines, then different 11th moves. But all the work for the first 10 has already been done in the first of these lines. If you used a small hash table, you could skip a lot of duplicate work both for validation and recording a position/move pair. You could even use the memory for the regular hash tables for that through another base pointer type, just check for proper alignment.
Rasmus Althoff
https://www.ct800.net
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Opening book read speed

Post by hgm »

xr_a_y wrote: Mon Aug 13, 2018 5:23 pmBut curiously, Weini is loosing elo using this book, apparently the engine is going out of book in position it does not like much ...
I had the same thing with Joker. If your engine is reasonably well tuned for opening play, you might benefit from bringing the opponent out of book early. E.g. some engines around 2000 Elo would never castle if the book doesn't tell them to do it.
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Opening book read speed

Post by xr_a_y »

Ras wrote: Mon Aug 13, 2018 10:17 pm Another idea, do you use some kind of caching? Often, a line based book contains the same say 10 starting plies in several consecutive lines, then different 11th moves. But all the work for the first 10 has already been done in the first of these lines. If you used a small hash table, you could skip a lot of duplicate work both for validation and recording a position/move pair. You could even use the memory for the regular hash tables for that through another base pointer type, just check for proper alignment.
I used to have a cache for move generator and threats detection but didn't test this for a while. I'll have a look.
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Opening book read speed

Post by xr_a_y »

hgm wrote: Mon Aug 13, 2018 10:51 pm I had the same thing with Joker. If your engine is reasonably well tuned for opening play, you might benefit from bringing the opponent out of book early. E.g. some engines around 2000 Elo would never castle if the book doesn't tell them to do it.
Indeed, Weini without a book plays reasonably well in the opening (it will occupy center thanks to PSQT, develop knight and bishop, and castle fast enough). I was hoping that using a bigger books will bring some elo ...
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Opening book read speed

Post by Joost Buijs »

hgm wrote: Mon Aug 13, 2018 10:51 pm
xr_a_y wrote: Mon Aug 13, 2018 5:23 pmBut curiously, Weini is loosing elo using this book, apparently the engine is going out of book in position it does not like much ...
I had the same thing with Joker. If your engine is reasonably well tuned for opening play, you might benefit from bringing the opponent out of book early. E.g. some engines around 2000 Elo would never castle if the book doesn't tell them to do it.
This will only work when you play against weak opponents, most opening lines have been examined to death, if you want to bring the opponent out of book early it means that you have to play a move that has been proven to be inferior, when you play a inferior move against a strong opponent it is very likely that you will lose the game very quickly.

Most books are based on a huge number of games from strong engines (Stockfish, Komodo, Houdini to name a few and maybe some correspondence games), for instance my own book is based on ~5.5 mln. high quality games I collected over the years.
One of the problems is how to select the move you want to play in a certain position, if you want to add randomness you got to have several different moves available, if you always want to play the (best) move only one move will suffice. Usually playing the move with the highest frequency works fine, but that doesn't necessarily mean that this is the best move, you also have to look at the WLD rate and use some statistics to select (what you think) is the best move.
The drawback of using a large book like this is that the opponent often uses a book based on the very same games and that it happens often after playing 35 moves out of book that you end up in a dead drawn position that is not winnable anymore, even against (much) weaker opponents.