Cumulative building of a shared search tree

noobpwnftw · Post by **noobpwnftw** » Sat Dec 31, 2016 6:35 pm

You can see what it is like from my web query interface. It is a database of chess engine search tree and supports live "lazy smp" style extension.

I wonder how is it possible that you can generate moves with only an arbitrary 64-bit hashkey of the board without performing enumerating from an known root position. And they will eventually collide if your database is large enough.

I believe the most flexible way is to store the full information and make a hash index.

noobpwnftw · Post by **noobpwnftw** » Sat Dec 31, 2016 10:22 pm

I agree with using only hash is fast & compact, but let me describe my method here: move+score is stored in a LSM tree, one index is bitfen representation of the board, which would have 361 bits per board, round up to 46 bytes at most, stored by a 64-bit radix tree index it would have no more than 6 levels (8 bytes * 6 = 48 > 46). This index(implemented in Judy array) is rather small because it contains only one copy of the 8 bytes per level. Another hash key index is used to have faster read performance.
Some numbers:
For 298,875,823 unique positions, 64-bit hash sparse index size is 4,256,324,457 bytes, bitfen radix tree size is 7,385,053,062 bytes, while bitfen+move+score(record) size is 25,320,638,208 bytes. I leave bitfen stored in records in order to test different indexing methods.

I think a 368-bit O(n) index vs. a a 64-bit O(log(n)) index with the former being less than 2x more in size is quite acceptable to have both.

phhnguyen · Post by **phhnguyen** » Sun Jan 01, 2017 2:09 am

To extract the move of a given hash key you need to know his parent position first (at root of the tree, it is the starting position). Then you generate all moves for that position and find out the move which hash key matched that given hash key.

This method is so good to examine the tree from the root and then go down to expand branches. It almost cannot work for a given random hash key when we don't know its parent.

To extract opening names, I store those names in a small dictionary with their hash keys. Simply matching the given hash keys and/or his ancestors' hash keys to that dictionary for the opening names.

Look like you build you opening tree around hash keys so I think it is good.

IMO, you may need to try reducing the size of nodes as well as separating databases by purposes.

- Your node's size 46 bytes is about 10 times as large as a typical one. As I said in above post, usually we store only a hash key 4 bytes and a score 1 byte (5 bytes) for a node. Someone have tried to cut some bits of both hash key and score to fit them all into 4 bytes. Your bigger opening book should be harder for downloading and much slower to search for Xiangqi engines when they have to seek and read too much data from hard disks.

- Using a same kind of database for both opening book and normal database of game is not a good idea. For the second one, I don't store hash keys since they are almost useless even for searching when users want to search similar positions but not really exact ones (and there are very few matched positions in middle games for that search anyways). Hash keys also eat too much space and make a worse compression ratio. That is why I store simply only 2 bytes for each move. I divide data by trunks and compress them to save space. Note that I can compress normal database but not opening books since it may affect seriously to search speed of engines.

noobpwnftw · Post by **noobpwnftw** » Sun Jan 01, 2017 5:45 pm

Hash keys, or at least a hash index to partition the database is always needed unless your database is not on a large scale.

But we have differences about the definition of a node:
My 46 bytes(worst case) per node is used for alternative indexing, and this index is less than 2x the size of a hash index, of which a node contains all known moves under its key hash(board) instead of one hash(board+move) per node.

Both our methods can partition the database as long as there is a hash index, but you would need more queries for each position's related moves, and more key space for the hash index.

Per node size is really not a big issue since I use LSM storage tree, indexes point directly to storage locations, and I partition the storage tree to cache at least 50% in physical memory.

So far I can get in parallel ~20K NPS on update/writes and ~80K NPS for reads, by using read/writer locks I can still get ~8K QPS in a mixed situation, the database is partitioned on 4 physical machines.

You can use LZ4 for node data compression, it is lightening fast and performs impressively well for me: 2x as fast, 10% smaller compared to snappy. Bitfen index is radix compressed(using Judy array, it is just fast and space efficient, I don't care why), and hash index is not compressed at all.

Talking about integration to chess engines, I think if your engine uses lazy smp, you can just spawn some extra threads to probe the database and it won't affect your search speed even if they are slow.

noobpwnftw · Post by **noobpwnftw** » Sun Jan 01, 2017 7:26 pm

My implementation requires specific hardware and software stack, it isn't meant to be downloaded and ran effectively under any environment.

Since I already built everything for it, I wonder if it is worthwhile to port it to chess for whoever interested.

Typical use case for this is that you can have an online opening book which optimizes itself over time and it also provides some pre-calculated PV during search. Plus it is capable of EGTB probing, you only need local WDL tables for search and probe online for DTx when you reach the known position.

Probably this kind of stuff is provided by GUI/frontends as a service and their data may not be shared, I wish to build a free shared database so chess engines can make better use of it.

noobpwnftw · Post by **noobpwnftw** » Wed Mar 22, 2017 10:32 am

Now, this project has reached a stable status. Everything seems working fine and the API gives acceptable query performance for engine usage during search.

My dedicated compute cluster is running at almost 1 billion NPS, not including the nodes working on move sieving and retrograde scoring.
The database contains 0.5 billion positions, growing at 5m positions per day, each of them contains at least 1 move that is searched at depth no less than 20 plies.

Also I'm hosting and building both DTC and DTM endgame tablebases (currently 9TB total).

I think this is very useful for NN training experiments.

Please let me know if you are interested in porting it to western chess, I'm willing to provide compute power and experience.

Ferdy · Post by **Ferdy** » Wed Mar 22, 2017 4:03 pm

noobpwnftw wrote: Please let me know if you are interested in porting it to western chess, I'm willing to provide compute power and experience.

This is always a welcome gesture for the chess players. So go ahead.
Sooner the 8x8 board will be over run so I am looking forward to use it on bigger boards like in gothic chess variants.

You have a cool query site, I have an engine called makulit that can play xiangqi, why the rules of that game is too complicated

.

noobpwnftw · Post by **noobpwnftw** » Fri Apr 27, 2018 9:05 am

I have summarized important parts of the porting process and provided them with my Xiangqi version as a reference.

https://github.com/noobpwnftw/chessdb

noobpwnftw · Post by **noobpwnftw** » Fri Apr 27, 2018 9:23 am

Oh, I just figured, it used weighted average scores and selective MCTS roll-outs to develop the search tree way before the recent hype obviously.

noobpwnftw · Post by **noobpwnftw** » Sat Dec 08, 2018 1:50 pm

I finally got it ported to chess, now it seems working but without any GUI.

http://www.chessdb.cn/cdb.php?action=qu ... &showall=1

Available values for "actions":
queryall/querybest/query/queryscore/querypv

The "board" parameter takes a FEN without sequence of following moves or move counters.

Cumulative building of a shared search tree

Re: Cumulative building of a shared search tree

Re: Cumulative building of a shared search tree

Re: Cumulative building of a shared search tree

Re: Cumulative building of a shared search tree

Re: Cumulative building of a shared search tree

Re: Cumulative building of a shared search tree

Re: Cumulative building of a shared search tree

Re: Cumulative building of a shared search tree

Re: Cumulative building of a shared search tree

Re: Cumulative building of a shared search tree