Page 2 of 5

Re: Hash collision?

Posted: Fri Jun 07, 2019 4:43 pm
by chrisw
Puzzle. Suppose a search takes X nodes and the hashtable is 2^n in size. You would like to be 95% confident no two nodes have a hash collision.

What’s the formula connecting X and n?

Re: Hash collision?

Posted: Fri Jun 07, 2019 7:35 pm
by Ras
Bob Hyatt has already done the research on the required hash length many years ago. The result was that 32 bit are not sufficient, 64 bit are, and you can even get away with 48 bit.

Obviously, you also have the implicit storage via the hash table index, minus the bits for your number of slots. You can get away with storing 32 bits if you have at least 16 bits implicit storage, i.e. 64k entries. Let's assume a bucket length of 3, then you need 18 bits, i.e. 256k entries. That shouldn't be an issue today.

Re: Hash collision?

Posted: Fri Jun 07, 2019 9:04 pm
by elpapa
chrisw wrote: Fri Jun 07, 2019 4:43 pm Puzzle. Suppose a search takes X nodes and the hashtable is 2^n in size. You would like to be 95% confident no two nodes have a hash collision.

What’s the formula connecting X and n?
This will occupy my mind for days and in the end I won't be able to solve it, thank you very much.

Re: Hash collision?

Posted: Fri Jun 07, 2019 9:15 pm
by silentshark
This is all useful. Thanks.

So the big question for me, remains. 16 bits - good enough, or will screw you up sometimes?

(Of course, maybe I have another bug elsewhere, but hey.)

Re: Hash collision?

Posted: Fri Jun 07, 2019 10:00 pm
by chrisw
elpapa wrote: Fri Jun 07, 2019 9:04 pm
chrisw wrote: Fri Jun 07, 2019 4:43 pm Puzzle. Suppose a search takes X nodes and the hashtable is 2^n in size. You would like to be 95% confident no two nodes have a hash collision.

What’s the formula connecting X and n?
This will occupy my mind for days and in the end I won't be able to solve it, thank you very much.
it’s similar to the how many people would need to be in a room for 50/50 chance that two have the same birthday paradox. well, not really a paradox.

Re: Hash collision?

Posted: Sat Jun 08, 2019 12:26 am
by hgm
The number of collisions you get is trivial to calculate: it only depends on the size of the signature (in so far that doesn't overlap with the part of the key used as table index). With 32-bit signature one in every 4G probes will give you a false match, once the table is entirely filled. (Which in practice will always be the case.) If your probing scheme probes in clusters of 4, this of course also multiplies the collision rate by 4, so you get one error in a billion. At 1Mnps that means one collision every 1000 sec. This is bad/unacceptable if collisions can make your engine crash, and completely unobservable if they cannot. To really affect play you should have thousands of collisions per search.

With a 16-bit signature you would have 1 collision in 65k probes, even 1 in 16k if you probe 4 entries. That means that at 1Mnps you would get 60 collisions/sec. That should still be way below the rate where there is a realistic chance that they would affect the move choice or score, as long as they can never crash the engine.

The total size of the table has no influence on any of this.

Re: Hash collision?

Posted: Sat Jun 08, 2019 2:00 am
by Uri Blass
chrisw wrote: Fri Jun 07, 2019 4:43 pm Puzzle. Suppose a search takes X nodes and the hashtable is 2^n in size. You would like to be 95% confident no two nodes have a hash collision.

What’s the formula connecting X and n?
More interesting question for me:
What is the probability that an engine get a wrong mate score in tablebase positions because of hash collision or a different reason when there is mate in n as a function of n.
By wrong mate score I mean mate that is faster than possible or mate for the opponent.

It may be interesting to have an automatic tool that simply get an epd file of random million mate in n positions from the 7 piece tablebases
and gives an engine to analyze them for m seconds without tablebases and give statistics about the results of the searches.

This tool may be good for testing engines.

Re: Hash collision?

Posted: Sat Jun 08, 2019 3:19 am
by Ras
hgm wrote: Sat Jun 08, 2019 12:26 amThe total size of the table has no influence on any of this.
I don't agree. Imagine you had enough memory for a table with 2^64 entries, then you wouldn't need to store a signature at all because the full signature itself is only 64 bit (usually) so that the signature would be identical to the index position. That's how the implicit bits of the signature (via the index position) come into play.

Re: Hash collision?

Posted: Sat Jun 08, 2019 3:22 am
by Dann Corbit
Ras wrote: Sat Jun 08, 2019 3:19 am
hgm wrote: Sat Jun 08, 2019 12:26 amThe total size of the table has no influence on any of this.
I don't agree. Imagine you had enough memory for a table with 2^64 entries, then you wouldn't need to store a signature at all because the full signature itself is only 64 bit (usually) so that the signature would be identical to the index position. That's how the implicit bits of the signature (via the index position) come into play.
Take the other extreme: A hash table that stores a single entry.
Almost every operation overwrites it (except for real transpositions).

Re: Hash collision?

Posted: Sat Jun 08, 2019 3:35 am
by Ras
Dann Corbit wrote: Sat Jun 08, 2019 3:22 amTake the other extreme: A hash table that stores a single entry.
Yeah then you would need to store the full 64 bit signature. Basically, the information stored is the length of the signature plus the implicit index bits minus the bucket length bits. Of course, you have to store bits that are not already encoded in the index bits.

I store the upper 32 bits. Then I abuse the flag for exact/alpha/beta which needs 2 bits, but is a uint8_t, so that allows to fudge in 6 additional bits. The bucket length is 3 which loses me 2 bits again, that gives 36 bits. With 2^12 = 4096 entries, that makes 48 bits, and with 10 bytes per entry, that's 40 kB. Even my embedded STM32 platform is sufficient for that, let alone smartphones or PCs.

On top of that, I check whether the hash move is pseudo-legal, so that gives a little more redundancy.