So far we have done some attempts to use hash keys to match positions. They work so well. However, there are still some problems we want to address and solve in this attempt.
Hash key
We match a given position with records in the database by using the hash key. One of the problems is how to get the hash key for a given position. @Dann asked us to create a SQL user-defined function to create hash keys from FENs. Unfortunately, SQLite doesn’t support user-defined functions. Thus we can’t create that function to run in any SQLite GUI/browser. Instead, we need to run our code, use our provided functions to create hash keys.
One of the potential solutions for that issue is to save FENs with or replace hash keys.
Using FENs may have some issues as the below:
- Size: A FEN string with an average length of 55 bytes takes much more space than a hash key of 8 only (near 7 times larger). When we may ignore the problem with the database size in hard disks (it is large but still fine with modern hardware) it may create a big issue by eating too much memory, taking too much time when creating. We will mention more in the next part
- Standardized for searching: Two completely the same positions may still have different FENs because of having different half move clocks and full move numbers. Even we may use SQK statements with LIKE to match those FENs, the performance of LIKE is much worse than full-string marching. To solve the problem, we standardize all FENs to have half move clocks as 0 and full move numbers as 1
Memory pressure
In the last attempt, we keep all information in memory till the end. They are 200 million hash keys, 3.45 million game IDs, all take 200 x 8 + 3.45 x 4, about 1.6 GB. That size itself is not so huge for my computer with 16 GB RAM. However, the compiler has added some space to each item to manage them, thus the real memory should be significantly bigger. To make the SQL engine works fast, we use a transaction for the whole process. That makes SQLite keep the whole database in memory. In total all take a huge amount of memory. Some of my tries in some previous attempts crashed because of running out of memory and we were successful only after some tries to reduce taken memory. On the last try, my computer informed the app using over 14 GB RAM.
Even though in the end the program can run well, it is actually so close to some boundaries. The app may be crashed again if we add more data to each record and/or work with larger numbers of games.
1. Building the database
Adding FEN strings is a challenge since it may push the memory usage over the limit. If we keep all those strings in memory, with 200 million positions and an average length of 55, they will take about 11 GB. My old computer has only 16 GB of RAM. That additional amount will surely bring trouble.
1st try
This was simple, straightforward, to keep all FEN strings in memory. When running, the system quickly became so laggy and then almost stopped responding. The speed became crawling. I canceled since couldn’t wait for anymore. FAILED
2nd try
Instead of keeping FEN strings in memory, we wrote them down immediately into the database. The problem was that the UPDATE statements would take too much time due to scanning all hash keys to find updating records. In this attempt, we didn’t update by hash keys but their IDs instead, in the hope the SQL engine could find updating locations quickly without scanning all items. Unfortunately, it didn’t work as we want, the speed was too bad to accept. FAILED
3rd try
We stored FEN strings into a temporary file and then read it back to write down into the database later with hash keys and game IDs. To know which FEN belongs to which hash key we stored that hash key with that FEN. Furthermore, we stored the first game ID that comes with that FEN, saving it from memory.
The stats of this trial are as the below:
Code: Select all
Completed writing hash keys, total: 199217076, one: 188479599, blob: 10737476, FEN average length: 55 max: 79 min: 27, elapsed: 3013648 ms 50:13, speed: 66104 hash keys/s
#games: 3456399, #errors: 0, #hashKeys: 199217076, #posCnt: 548690734, hashCnt: 0, hashHit: 78579956, elapsed: 4228185ms 1:10:28, speed: 817 games/s
BTW, all work as we plan. SUCCEED.
2. Searching
FEN index
In Attempt 5th, the time to search a hash key is fast enough even the app may have to scan the whole database (without index). However, in this Attempt, the search time for a FEN may take too much time, especially for non-hit ones, over 5 minutes on my computer. Perhaps, that is caused by matching strings taking more time and the whole database is much larger to load and process, compared with the previous ones.
To speed up we create an index for the column FEN. The process took almost 1 hour and increased the database size to 31.46 GB (an increase of 13 GB just for FEN indexes). All make the current database size 4.5 times larger than the previous one.
Code: Select all
Created Index, elapsed: 3552358 ms, 59:12
Searching
In general, searching is quite similar to Attempt 5th except we can query directly with the FEN of the given position instead of firstly converting it into a hash key. We still need to standardize the FEN string first by replacing the half move clocks and full move numbers to 0 and 1, respectively.
The stats are as the below:
Code: Select all
ply: 1, #total queries: 1000, elapsed: 2592555 ms, 43:12, time per query: 2592 ms, total results: 1137427224, results/query: 1137427
ply: 2, #total queries: 1000, elapsed: 1637633 ms, 27:17, time per query: 1637 ms, total results: 359585256, results/query: 359585
ply: 3, #total queries: 1000, elapsed: 1170815 ms, 19:30, time per query: 1170 ms, total results: 199675249, results/query: 199675
ply: 4, #total queries: 1000, elapsed: 619721 ms, 10:19, time per query: 619 ms, total results: 15892107, results/query: 15892
ply: 5, #total queries: 1000, elapsed: 755671 ms, 12:35, time per query: 755 ms, total results: 5547110, results/query: 5547
ply: 6, #total queries: 1000, elapsed: 746903 ms, 12:26, time per query: 746 ms, total results: 6512038, results/query: 6512
ply: 7, #total queries: 1000, elapsed: 743398 ms, 12:23, time per query: 743 ms, total results: 2446515, results/query: 2446
ply: 8, #total queries: 1000, elapsed: 737885 ms, 12:17, time per query: 737 ms, total results: 5303250, results/query: 5303
ply: 9, #total queries: 1000, elapsed: 710761 ms, 11:50, time per query: 710 ms, total results: 2411668, results/query: 2411
ply: 10, #total queries: 1000, elapsed: 763072 ms, 12:43, time per query: 763 ms, total results: 2076649, results/query: 2076
ply: 11, #total queries: 1000, elapsed: 744002 ms, 12:24, time per query: 744 ms, total results: 420394, results/query: 420
3. Quick conclusions for this attempt
Using FENs for position matching is workable. It can help to work with the database using normal SQLite GUIs/Browsers (doesn't require any special code/app). However, it comes with a high price: requires more time to build, more space for the database. IMO, it seems to be not worth that trade-off.
We may not use FENs but this Attempt gave us some more knowledge/understanding as well as some new techniques to use later.
4. Source code
All code of this Attempt has been pushed already into a new branch “searchwithhashkeysblob2”.