Some opening book design questions

jdart · Post by **jdart** » Thu Feb 22, 2018 3:58 pm

Color-flipped positions *can* occur in chess, but I would be surprised if they were frequent enough to make this optimization a significant win.

--Jon

noobpwnftw · Post by **noobpwnftw** » Thu Feb 22, 2018 5:24 pm

Crazy or not, here are some stats on binary encoded FENs from 8.85 billion stored positions in bytes:
min-keylen = 17, max-keylen = 45, avg-keylen = 35.19, avg-zkey = 7.34

After compression(as shown by avg-zkey) it is actually less than 8 bytes, yet seeking among keys are close to O(1). You can apply data specific filters during compression since you have quite a lot of similar rows in the opening positions.

Then, you can convert it to any other formats without having to iterate from a starting position or do de-duplication.

Certain positions in Xiangqi depend on whether the previous move contained a check or chase to determine the upcoming repetition is draw or lose. GUIs usually check it based on history moves, but when a move would cause reputation lose is found on the current moving side then it may be too late to avoid it without a major sacrifice. A small forward search in the book must be performed to avoid such cases.

If you store the book as "position_after_move -> score" then for each position you need to generate all moves and probe multiple times, I suggest you store the book as "position -> [move1 -> score1, move2 -> score2, ...]".

The last question, some opening variants may eventually lead to the same mid-game positions that can be reached by engine search, the book might not cover all paths towards it. Just like how WDL tablebases would benefit mid-game search.

Try out this one:
http://www.chessdb.cn/query_en/
The opening part contains 8.85 billion of unique positions with at least one move per position in ~32GB, covers almost every opening variation in Xiangqi. Each leaf is evaluated by engine no less than 20 plies and then back propagated to the root.

hgm · Post by **hgm** » Thu Feb 22, 2018 7:17 pm

phhnguyen wrote:If we use a list, it would take few more KB memory than using code. However, using a key list developers who want to use our books may get those keys straightaway in any programming languages. Rewriting your function may be harder and need some tests. Furthermore, your functions is actually from a GPL license project which someone don't like the license's restrictions.

This argues indeed i favor of a list. But I could easily publish that list. Most of the probing code used in WinBoard was put in the public domain by Michel van den Bergh before I used it.

Actually it is doable for any sizes of hash key since they are used for searching, not for converting back (from a key) to a chess position. When the hash key systems are different, we cannot convert directly from one position key to other position key in other system. We need to rebuild whole opening tree anyway. We could do that since we always know the root (the starting position) then build up branches from that node. From an opening tree we could write down in any new format / new hash key system.

Yes, I know. I actually did this once, to decode the Elephant-Eye book. Perform a perft to generate positions, and write the info from the book probe together with a FEN to a file. You might miss disconnected positions, however.

I don't like that field (learn info) either. In my previous implementation I separate learning data from opening data. Thus I could save a lot of memory (since real learnt data is usually small) and get easier to maintain data.

There are some tricks to use data when its size is not a multiple of 8. For example, if I need an entry of 9 bytes (8 bytes key, 1 byte weight) I simply write them as int8_t data[9] and then cast types when needed.

This can be done, but on machine architectures that do not support unaligned memory access you would then have to compare the key byte by byte.

I am not saying Polyglot format is optimal. OTOH, its drawbacks are hardly important at todays storage and memory capacities, and compatibility with other software can be very useful.

phhnguyen · Post by **phhnguyen** » Fri Feb 23, 2018 3:02 am

noobpwnftw wrote:Crazy or not, here are some stats on binary encoded FENs from 8.85 billion stored positions in bytes:
min-keylen = 17, max-keylen = 45, avg-keylen = 35.19, avg-zkey = 7.34

After compression(as shown by avg-zkey) it is actually less than 8 bytes, yet seeking among keys are close to O(1). You can apply data specific filters during compression since you have quite a lot of similar rows in the opening positions.

Then, you can convert it to any other formats without having to iterate from a starting position or do de-duplication.

Thanks for very interesting information. That is amazing when you could compress FEN to such small size!

I know the main advantage of using FEN over hash key is that we can convert two ways (FEN <--> board). However I for opening book I see it has some main drawbacks, comparing to hash key:
- The average FEN size is still too large. A hash key takes only 8 bytes instead of over 35 bytes in uncompress format. In my previous book I cut down to 7 bytes only to save more space.
- FENs have not fixed lengths, therefore you should store their lengths and / or a pointer tables somewhere to access the data. It requires more memory and computing.
- Compare an array (FEN) must be slower than a number. Converting board into FEN is a bit slow than computing hash key and that is not easy to update on fly.

Perhaps you take a look into piece list? Once I have done a roughly calculating. A piece list for Xiangqi may take under 24 bytes per chess position. It is significantly smaller than 35 bytes and it has fixed size.

noobpwnftw wrote: Certain positions in Xiangqi depend on whether the previous move contained a check or chase to determine the upcoming repetition is draw or lose. GUIs usually check it based on history moves, but when a move would cause reputation lose is found on the current moving side then it may be too late to avoid it without a major sacrifice. A small forward search in the book must be performed to avoid such cases.

I think a good book maker can check out those problems first before storing a move into a book.

noobpwnftw wrote: If you store the book as "position_after_move -> score" then for each position you need to generate all moves and probe multiple times, I suggest you store the book as "position -> [move1 -> score1, move2 -> score2, ...]".

By that you need to store moves and move lengths into books and can easily make books few times bigger.

Actually generating, making move, searching new positions out from a book may take few more milliseconds. That is not worth to save that computing time.

noobpwnftw wrote:
The last question, some opening variants may eventually lead to the same mid-game positions that can be reached by engine search, the book might not cover all paths towards it. Just like how WDL tablebases would benefit mid-game search.

I am not sure about this point. So those positions are still reachable in opening? If yes they could be searched as usual I guess.

noobpwnftw wrote: Try out this one:
http://www.chessdb.cn/query_en/
The opening part contains 8.85 billion of unique positions with at least one move per position in ~32GB, covers almost every opening variation in Xiangqi. Each leaf is evaluated by engine no less than 20 plies and then back propagated to the root.

Good effort! Keep up your excellent work!!!

phhnguyen · Post by **phhnguyen** » Fri Feb 23, 2018 3:10 am

hgm wrote: I am not saying Polyglot format is optimal. OTOH, its drawbacks are hardly important at todays storage and memory capacities, and compatibility with other software can be very useful.

I think Polyglot data may be double size than an optimal one. It may be almost nothing for small books but it may also be a trouble for downloading and using huge books.

I am still considering between modifying Polyglot or developing a total new one. For Xiangqi I could re-use very little from Polyglot thus compatibility is not an issue.

Just curious questions: do you (xboard) load all data of a book into memory or search from hard disk? If it is loaded do you release it after being out of the book?

noobpwnftw · Post by **noobpwnftw** » Fri Feb 23, 2018 7:56 am

1. Probing radix trees does not require comparisons of uncompressed form, so effectively I'm comparing for 7.34 bytes per position on average.
2. Fixed lengths or not do not really matter since you won't be storing your book as "data[index]" anyway. Sorted skip lists would seek just as fast.
3. https://en.wikipedia.org/wiki/Judy_array

How can you represent this position with 24 bytes of piece list?
r1bakabr1/9/nc2c1n2/p1p1p3p/6p2/2P6/P3P1P1P/1CN1B1C1N/4A4/R3KAB1R b
My binary FEN would took 35 bytes:
0x60327236088548140581101018218518181985982909090cd0b0c0d83a83e82fab0e10

I don't see an advantage of using piece lists on a 90-square board.

I think a good book maker can check out those problems first before storing a move into a book.

This is something you can't rely on when designing the book format.

By that you need to store moves and move lengths into books and can easily make books few times bigger.

Individual move data are of fixed length, storing one extra byte of move count won't make the book few times bigger. Also, storing a move instead of storing a key usually saves space.

phhnguyen · Post by **phhnguyen** » Fri Feb 23, 2018 9:53 am

noobpwnftw wrote:1. Probing radix trees does not require comparisons of uncompressed form, so effectively I'm comparing for 7.34 bytes per position on average.
2. Fixed lengths or not do not really matter since you won't be storing your book as "data[index]" anyway. Sorted skip lists would seek just as fast.
3. https://en.wikipedia.org/wiki/Judy_array

How can you represent this position with 24 bytes of piece list?
r1bakabr1/9/nc2c1n2/p1p1p3p/6p2/2P6/P3P1P1P/1CN1B1C1N/4A4/R3KAB1R b
My binary FEN would took 35 bytes:
0x60327236088548140581101018218518181985982909090cd0b0c0d83a83e82fab0e10

I don't see an advantage of using piece lists on a 90-square board.

The define of piece list in CPW:
https://chessprogramming.wikispaces.com/Piece-Lists

It is simply an array of 32 items for 32 pieces. Lucky for Xiangqi it has no promotion thus we don't need to save piece type into piece list, only positions of pieces.

I usually write a piece list for 2 side as:

Code: Select all

byte pieceList&#91;2&#93;&#91;16&#93;;

The first index (0) is always King, the 2nd and 3th for two Advisors... the last 5 indexes for 5 Pawns.

With that "standard" pieceList, it takes only 32 bytes for whole board, already smaller than your average length 35 bytes of your FENs.

You may see that there are some redundant bits here. For 90 squares we don't need whole byte but 7 bits only. 6 pieces of Rooks Cannons, Horses need 6 x 7 = 42 bits. The rest 26 pieces need roughly 6 bits each, take 26 x 6 = 156 bit. Total is 42 + 156 = 198 bits = 24.75 bytes.

You may reduce more since you are EGTB expert, say King + Advisors may take totally under 1200 combinations (they need about 11 bits for 3 pieces), Elephants can be located in totally 7 positions (need 3 bits only). All may reduce the size to 21 - 22 bytes.

noobpwnftw wrote: Individual move data are of fixed length, storing one extra byte of move count won't make the book few times bigger. Also, storing a move instead of storing a key usually saves space.

Do you mean you did not store position for each entry but moves only? I knew that method but it has some big drawbacks for chess engines (but it is good for your website).

For each entry I store only 7 bytes key and 1 byte weight, total is 8 bytes.

noobpwnftw · Post by **noobpwnftw** » Fri Feb 23, 2018 3:23 pm

It would then require 12(RCN) * 7 + 6(P) * 10 + 4(B) * 4 + 3(A) * 4 + 4(K) * 2 + 1(turn) = 181 bits or even less if palace states are combined.

Although keys(either hash or other formats) are known before probing the book, unlike EGTBs that you won't store keys at all(accessed with data[index] = value), opening book is a sparse array so you always store at least partial key information for each position for its reference to hold, picking a key that compresses well is important. As you may see the majority of book data are just the keys.

Piece list seems very viable, I would expect it compresses well since only a few bits are changed when making a move.

jdart · Post by **jdart** » Sat Feb 24, 2018 4:14 am

I used to use memory-mapped files and only load pages from the book as needed, but with the size book I have and the size of memory on modern systems it does not really make sense to do that anymore.

But if you do want to do that you should probably just borrow an existing framework like LMDB.

--Jon

noobpwnftw · Post by **noobpwnftw** » Sat Feb 24, 2018 4:55 pm

Actually, Chess has every necessary parts ready(i.e. results from fishtest) to make a book by brute force. It might practically ultra-weak solve the opening.

Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions