bob wrote:When you try on-the-fly decompression as we did in the Nalimov tables, we (Eugene and I) were somewhat surprised that decompression was far faster overall, because you end up doing less file I/O because you are reading in compressed blocks. Less I/O is always good.
The optimality of the compression strategy to be used, or even if compression is to be used at all, is dependent upon the tablebase class, the end user's processor, the memory bandwidth, the disk speed, etc. Therefore, the strategy should be variable and this is the approach I'm taking with the Edwards' Version Two tablebases, possibly to appear early next year.
I suspect you will find the same thing we found. Even using my 15K U320 scsi drives on my office machine, running with compression was significantly faster. And this on a PIV/2.8ghz-based processor. The faster the processor, the wider this gap gets. And since not many will be runing U320 SCSI drives with 15K rpm spindle speeds to get latency way down, I do not believe that uncompressed is ever going to perform as well as the on-the-fly-decompression approach.
Note that the above is true for 4-5 piece files, and the difference was even more signifiicant when we tested with 6 piece files.
bob wrote:I still prefer to follow the "order by increasing cost" approach. Repetition check is the cheapest operation I have, and I do that first. Hash probe is next. EGTBs are significantly more expensive and are deferred until the hash probe has failed. Once I do an EGTB probe on a position, it is unlikely I will repeat it since it promptly goes into the hash table with infinite draft making it very difficult to overwrite.
Well, my TB probe approach is also in order of ascending cost where cost equals probability of success times expense of use. The big difference is that the TB values get their own space (half sized entries allow twice as efficient storage). The main transposition table becomes slightly more efficient as it now has more free slots that would be otherwise taken by long-lived TB score entries.
The only downside to Symbolic's current approach is the situation where tablebases are enabled but only a few of the actual TB files are available. This would mean that the TB classifier would be run on TB eligible positions that had no hope of actually being scored, and so a small amount of time would be wasted. This is why the classifier is fairly fast and can be made even faster by using a material hash signature instead of a material piece distribution signature. (Symbolic actually has both of these, but only the distribution signature gets used currently, and it gets used in many places.)
bob wrote:Repetition check is the cheapest operation I have, and I do that first. Hash probe is next. EGTBs are significantly more expensive and are deferred until the hash probe has failed.
Symbolic:
1) User interrupt check
This looks at a volatile boolean variable that might be set by another thread; if set the search unwinds and exits.
2) General limit check
If the search has a time check in force, then the tick count variable that gives microseconds since program start is tested against the limit value for "try another node". The tick count is maintained by a separate thread activated by a ten millisecond resolution system interval timer, so there's no per node system call overhead. If the search has a node count limit in force, then that is checked.
3) Maximum ply check
Set at 256.
4) Fifty move draw
Very fast.
5) Insufficient material draw
Almost as fast due to material distribution signature.
6) Repetition draw
Need to roll through a (usually) short loop.
I do 4, 5 and 6 in one piece of code. For 6 I use a normal repetition list for each side, with the loop based on the 50-move counter.
7) TB check
As described earlier.
At this point, if nothing has caught then the node processor figures out which of five different sub-processor routines are to be called. These are:
1) the base (ply zero) routine
2) the check evasion routine (uses the main transposition table)
3) the full width routine (uses the main transposition table)
4) the gainers and checks only routine (uses eval/pawn tables)
5) the gainers only routine (uses eval/pawn tables)
If you want to take advantage of the fact that TB probes are smaller, why not simply store two of them in a normal entry of the TB. Most people use 16-byte entries and fit 4 of them in a cache line. If one of these 4 gets filled by a TB hit you could set a flag in the entry to indicate it is a TB entry, making the other half available as an (empty) other one. This hardly would add any time to the probing (which is dominated by the time required to fetch the cache line). And it would automatically adapt the space allocated for TB entries and normal entries to the needs of the prosition, rather than having a fixed and usually sub-optimal allocation.