hgm wrote:With a 4-byt hash, you will have a collision that would have been avoided using the full key every 2^32 probes. At 1Mnps, that means every 4 million seconds, or about once a month.
Unless a collission can crash your engine, even having many thousands of collissions in a single search does not measurbly affect performance. (Perhaps Bob could give us more exact numbers here.)
In Joker I use 9-byte entries, including a 4-byte signature. I never tried to shrink it to 8 bytes. With a 3 bytes signature, I would have a collission every 5 seconds (at 1 Mnps), as Joker probes 3 entries per access. This might still be bearable.
But there is very little gain to be expected from this. Currently Joker has 7 entries per 64-byte cache line. That could be increased to 8 entries. That is 14% more entries, which would lead to about 1% faster search. (The search speed scales as the twelfth root of hash sizes.)
First, on entry size.
I use 16 byte entries. I store the entire 64 bit Zobrist signature even though this is redundant since I use the rightmost N (N = log2(table size)) bits to address an entry. In Cray Blitz we stored the leftmost 32 bits, because we used the rightmost N (N was always > 24) which was almost using the entire 64 bits. Why store the whole thing? Simplicity and laziness.
We used 12 byte entries in Cray Blitz, 64 bits for the best move, value/bound, draft/depth, flags, age, etc, the other 32 bits from the upper half of the hash key. That gives you 33% more entries than just using the entire 16 bytes.
Personal choice.
As far as collision errors go, Cozzie and I found that one collision every 1000 nodes did not appreciably affect the final search result. Anything rarer than that could not even be measured as to its effect on the final best move and score.
The only significant issue is the one you mentioned, that is your engine has to be resistant to bogus data, so that it won't crash. If an invalid best move will make you castle after you have already castled, you can have a problem with 2 kings of the same side on the board. If you can prevent bogus data from crashing your program, you can pretty well ignore any collision effect for reasonable hashing approaches.
Our trees are so incredibly big, it is sometimes astounding what kinds of errors the search can hide. I've been playing with some aggressive pruning changes and many of them that I really think should be bad seem to have very small effects on overall performance.