Color-flipped positions *can* occur in chess, but I would be surprised if they were frequent enough to make this optimization a significant win.
--Jon
Some opening book design questions
Moderators: hgm, Rebel, chrisw
-
- Posts: 4367
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
-
- Posts: 560
- Joined: Sun Nov 08, 2015 11:10 pm
Re: Some opening book design questions
Crazy or not, here are some stats on binary encoded FENs from 8.85 billion stored positions in bytes:
min-keylen = 17, max-keylen = 45, avg-keylen = 35.19, avg-zkey = 7.34
After compression(as shown by avg-zkey) it is actually less than 8 bytes, yet seeking among keys are close to O(1). You can apply data specific filters during compression since you have quite a lot of similar rows in the opening positions.
Then, you can convert it to any other formats without having to iterate from a starting position or do de-duplication.
Certain positions in Xiangqi depend on whether the previous move contained a check or chase to determine the upcoming repetition is draw or lose. GUIs usually check it based on history moves, but when a move would cause reputation lose is found on the current moving side then it may be too late to avoid it without a major sacrifice. A small forward search in the book must be performed to avoid such cases.
If you store the book as "position_after_move -> score" then for each position you need to generate all moves and probe multiple times, I suggest you store the book as "position -> [move1 -> score1, move2 -> score2, ...]".
The last question, some opening variants may eventually lead to the same mid-game positions that can be reached by engine search, the book might not cover all paths towards it. Just like how WDL tablebases would benefit mid-game search.
Try out this one:
http://www.chessdb.cn/query_en/
The opening part contains 8.85 billion of unique positions with at least one move per position in ~32GB, covers almost every opening variation in Xiangqi. Each leaf is evaluated by engine no less than 20 plies and then back propagated to the root.
min-keylen = 17, max-keylen = 45, avg-keylen = 35.19, avg-zkey = 7.34
After compression(as shown by avg-zkey) it is actually less than 8 bytes, yet seeking among keys are close to O(1). You can apply data specific filters during compression since you have quite a lot of similar rows in the opening positions.
Then, you can convert it to any other formats without having to iterate from a starting position or do de-duplication.
Certain positions in Xiangqi depend on whether the previous move contained a check or chase to determine the upcoming repetition is draw or lose. GUIs usually check it based on history moves, but when a move would cause reputation lose is found on the current moving side then it may be too late to avoid it without a major sacrifice. A small forward search in the book must be performed to avoid such cases.
If you store the book as "position_after_move -> score" then for each position you need to generate all moves and probe multiple times, I suggest you store the book as "position -> [move1 -> score1, move2 -> score2, ...]".
The last question, some opening variants may eventually lead to the same mid-game positions that can be reached by engine search, the book might not cover all paths towards it. Just like how WDL tablebases would benefit mid-game search.
Try out this one:
http://www.chessdb.cn/query_en/
The opening part contains 8.85 billion of unique positions with at least one move per position in ~32GB, covers almost every opening variation in Xiangqi. Each leaf is evaluated by engine no less than 20 plies and then back propagated to the root.
-
- Posts: 27814
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Some opening book design questions
This argues indeed i favor of a list. But I could easily publish that list. Most of the probing code used in WinBoard was put in the public domain by Michel van den Bergh before I used it.phhnguyen wrote:If we use a list, it would take few more KB memory than using code. However, using a key list developers who want to use our books may get those keys straightaway in any programming languages. Rewriting your function may be harder and need some tests. Furthermore, your functions is actually from a GPL license project which someone don't like the license's restrictions.
Yes, I know. I actually did this once, to decode the Elephant-Eye book. Perform a perft to generate positions, and write the info from the book probe together with a FEN to a file. You might miss disconnected positions, however.Actually it is doable for any sizes of hash key since they are used for searching, not for converting back (from a key) to a chess position. When the hash key systems are different, we cannot convert directly from one position key to other position key in other system. We need to rebuild whole opening tree anyway. We could do that since we always know the root (the starting position) then build up branches from that node. From an opening tree we could write down in any new format / new hash key system.
This can be done, but on machine architectures that do not support unaligned memory access you would then have to compare the key byte by byte.I don't like that field (learn info) either. In my previous implementation I separate learning data from opening data. Thus I could save a lot of memory (since real learnt data is usually small) and get easier to maintain data.
There are some tricks to use data when its size is not a multiple of 8. For example, if I need an entry of 9 bytes (8 bytes key, 1 byte weight) I simply write them as int8_t data[9] and then cast types when needed.
I am not saying Polyglot format is optimal. OTOH, its drawbacks are hardly important at todays storage and memory capacities, and compatibility with other software can be very useful.
-
- Posts: 1437
- Joined: Wed Apr 21, 2010 4:58 am
- Location: Australia
- Full name: Nguyen Hong Pham
Re: Some opening book design questions
Thanks for very interesting information. That is amazing when you could compress FEN to such small size!noobpwnftw wrote:Crazy or not, here are some stats on binary encoded FENs from 8.85 billion stored positions in bytes:
min-keylen = 17, max-keylen = 45, avg-keylen = 35.19, avg-zkey = 7.34
After compression(as shown by avg-zkey) it is actually less than 8 bytes, yet seeking among keys are close to O(1). You can apply data specific filters during compression since you have quite a lot of similar rows in the opening positions.
Then, you can convert it to any other formats without having to iterate from a starting position or do de-duplication.
I know the main advantage of using FEN over hash key is that we can convert two ways (FEN <--> board). However I for opening book I see it has some main drawbacks, comparing to hash key:
- The average FEN size is still too large. A hash key takes only 8 bytes instead of over 35 bytes in uncompress format. In my previous book I cut down to 7 bytes only to save more space.
- FENs have not fixed lengths, therefore you should store their lengths and / or a pointer tables somewhere to access the data. It requires more memory and computing.
- Compare an array (FEN) must be slower than a number. Converting board into FEN is a bit slow than computing hash key and that is not easy to update on fly.
Perhaps you take a look into piece list? Once I have done a roughly calculating. A piece list for Xiangqi may take under 24 bytes per chess position. It is significantly smaller than 35 bytes and it has fixed size.
I think a good book maker can check out those problems first before storing a move into a book.noobpwnftw wrote: Certain positions in Xiangqi depend on whether the previous move contained a check or chase to determine the upcoming repetition is draw or lose. GUIs usually check it based on history moves, but when a move would cause reputation lose is found on the current moving side then it may be too late to avoid it without a major sacrifice. A small forward search in the book must be performed to avoid such cases.
By that you need to store moves and move lengths into books and can easily make books few times bigger.noobpwnftw wrote: If you store the book as "position_after_move -> score" then for each position you need to generate all moves and probe multiple times, I suggest you store the book as "position -> [move1 -> score1, move2 -> score2, ...]".
Actually generating, making move, searching new positions out from a book may take few more milliseconds. That is not worth to save that computing time.
I am not sure about this point. So those positions are still reachable in opening? If yes they could be searched as usual I guess.noobpwnftw wrote:
The last question, some opening variants may eventually lead to the same mid-game positions that can be reached by engine search, the book might not cover all paths towards it. Just like how WDL tablebases would benefit mid-game search.
Good effort! Keep up your excellent work!!!noobpwnftw wrote: Try out this one:
http://www.chessdb.cn/query_en/
The opening part contains 8.85 billion of unique positions with at least one move per position in ~32GB, covers almost every opening variation in Xiangqi. Each leaf is evaluated by engine no less than 20 plies and then back propagated to the root.
-
- Posts: 1437
- Joined: Wed Apr 21, 2010 4:58 am
- Location: Australia
- Full name: Nguyen Hong Pham
Re: Some opening book design questions
I think Polyglot data may be double size than an optimal one. It may be almost nothing for small books but it may also be a trouble for downloading and using huge books.hgm wrote: I am not saying Polyglot format is optimal. OTOH, its drawbacks are hardly important at todays storage and memory capacities, and compatibility with other software can be very useful.
I am still considering between modifying Polyglot or developing a total new one. For Xiangqi I could re-use very little from Polyglot thus compatibility is not an issue.
Just curious questions: do you (xboard) load all data of a book into memory or search from hard disk? If it is loaded do you release it after being out of the book?
-
- Posts: 560
- Joined: Sun Nov 08, 2015 11:10 pm
Re: Some opening book design questions
1. Probing radix trees does not require comparisons of uncompressed form, so effectively I'm comparing for 7.34 bytes per position on average.
2. Fixed lengths or not do not really matter since you won't be storing your book as "data[index]" anyway. Sorted skip lists would seek just as fast.
3. https://en.wikipedia.org/wiki/Judy_array
How can you represent this position with 24 bytes of piece list?
r1bakabr1/9/nc2c1n2/p1p1p3p/6p2/2P6/P3P1P1P/1CN1B1C1N/4A4/R3KAB1R b
My binary FEN would took 35 bytes:
0x60327236088548140581101018218518181985982909090cd0b0c0d83a83e82fab0e10
I don't see an advantage of using piece lists on a 90-square board.
2. Fixed lengths or not do not really matter since you won't be storing your book as "data[index]" anyway. Sorted skip lists would seek just as fast.
3. https://en.wikipedia.org/wiki/Judy_array
How can you represent this position with 24 bytes of piece list?
r1bakabr1/9/nc2c1n2/p1p1p3p/6p2/2P6/P3P1P1P/1CN1B1C1N/4A4/R3KAB1R b
My binary FEN would took 35 bytes:
0x60327236088548140581101018218518181985982909090cd0b0c0d83a83e82fab0e10
I don't see an advantage of using piece lists on a 90-square board.
This is something you can't rely on when designing the book format.I think a good book maker can check out those problems first before storing a move into a book.
Individual move data are of fixed length, storing one extra byte of move count won't make the book few times bigger. Also, storing a move instead of storing a key usually saves space.By that you need to store moves and move lengths into books and can easily make books few times bigger.
-
- Posts: 1437
- Joined: Wed Apr 21, 2010 4:58 am
- Location: Australia
- Full name: Nguyen Hong Pham
Re: Some opening book design questions
The define of piece list in CPW:noobpwnftw wrote:1. Probing radix trees does not require comparisons of uncompressed form, so effectively I'm comparing for 7.34 bytes per position on average.
2. Fixed lengths or not do not really matter since you won't be storing your book as "data[index]" anyway. Sorted skip lists would seek just as fast.
3. https://en.wikipedia.org/wiki/Judy_array
How can you represent this position with 24 bytes of piece list?
r1bakabr1/9/nc2c1n2/p1p1p3p/6p2/2P6/P3P1P1P/1CN1B1C1N/4A4/R3KAB1R b
My binary FEN would took 35 bytes:
0x60327236088548140581101018218518181985982909090cd0b0c0d83a83e82fab0e10
I don't see an advantage of using piece lists on a 90-square board.
https://chessprogramming.wikispaces.com/Piece-Lists
It is simply an array of 32 items for 32 pieces. Lucky for Xiangqi it has no promotion thus we don't need to save piece type into piece list, only positions of pieces.
I usually write a piece list for 2 side as:
Code: Select all
byte pieceList[2][16];
With that "standard" pieceList, it takes only 32 bytes for whole board, already smaller than your average length 35 bytes of your FENs.
You may see that there are some redundant bits here. For 90 squares we don't need whole byte but 7 bits only. 6 pieces of Rooks Cannons, Horses need 6 x 7 = 42 bits. The rest 26 pieces need roughly 6 bits each, take 26 x 6 = 156 bit. Total is 42 + 156 = 198 bits = 24.75 bytes.
You may reduce more since you are EGTB expert, say King + Advisors may take totally under 1200 combinations (they need about 11 bits for 3 pieces), Elephants can be located in totally 7 positions (need 3 bits only). All may reduce the size to 21 - 22 bytes.
Do you mean you did not store position for each entry but moves only? I knew that method but it has some big drawbacks for chess engines (but it is good for your website).noobpwnftw wrote: Individual move data are of fixed length, storing one extra byte of move count won't make the book few times bigger. Also, storing a move instead of storing a key usually saves space.
For each entry I store only 7 bytes key and 1 byte weight, total is 8 bytes.
-
- Posts: 560
- Joined: Sun Nov 08, 2015 11:10 pm
Re: Some opening book design questions
It would then require 12(RCN) * 7 + 6(P) * 10 + 4(B) * 4 + 3(A) * 4 + 4(K) * 2 + 1(turn) = 181 bits or even less if palace states are combined.
Although keys(either hash or other formats) are known before probing the book, unlike EGTBs that you won't store keys at all(accessed with data[index] = value), opening book is a sparse array so you always store at least partial key information for each position for its reference to hold, picking a key that compresses well is important. As you may see the majority of book data are just the keys.
Piece list seems very viable, I would expect it compresses well since only a few bits are changed when making a move.
Although keys(either hash or other formats) are known before probing the book, unlike EGTBs that you won't store keys at all(accessed with data[index] = value), opening book is a sparse array so you always store at least partial key information for each position for its reference to hold, picking a key that compresses well is important. As you may see the majority of book data are just the keys.
Piece list seems very viable, I would expect it compresses well since only a few bits are changed when making a move.
-
- Posts: 4367
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Some opening book design questions
I used to use memory-mapped files and only load pages from the book as needed, but with the size book I have and the size of memory on modern systems it does not really make sense to do that anymore.
But if you do want to do that you should probably just borrow an existing framework like LMDB.
--Jon
But if you do want to do that you should probably just borrow an existing framework like LMDB.
--Jon
-
- Posts: 560
- Joined: Sun Nov 08, 2015 11:10 pm
Re: Some opening book design questions
Actually, Chess has every necessary parts ready(i.e. results from fishtest) to make a book by brute force. It might practically ultra-weak solve the opening.