Hash collision?

Discussion of chess software programming and technical issues.

Moderators: Harvey Williamson, Dann Corbit, hgm

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
chrisw
Posts: 3851
Joined: Tue Apr 03, 2012 2:28 pm

Re: Hash collision?

Puzzle. Suppose a search takes X nodes and the hashtable is 2^n in size. You would like to be 95% confident no two nodes have a hash collision.

What’s the formula connecting X and n?

Ras
Posts: 1643
Joined: Tue Aug 30, 2016 6:19 pm
Full name: Rasmus Althoff
Contact:

Re: Hash collision?

Bob Hyatt has already done the research on the required hash length many years ago. The result was that 32 bit are not sufficient, 64 bit are, and you can even get away with 48 bit.

Obviously, you also have the implicit storage via the hash table index, minus the bits for your number of slots. You can get away with storing 32 bits if you have at least 16 bits implicit storage, i.e. 64k entries. Let's assume a bucket length of 3, then you need 18 bits, i.e. 256k entries. That shouldn't be an issue today.
Rasmus Althoff
https://www.ct800.net

elpapa
Posts: 204
Joined: Sun Jan 18, 2009 10:27 pm
Location: Sweden
Full name: Patrik Karlsson

Re: Hash collision?

chrisw wrote:
Fri Jun 07, 2019 2:43 pm
Puzzle. Suppose a search takes X nodes and the hashtable is 2^n in size. You would like to be 95% confident no two nodes have a hash collision.

What’s the formula connecting X and n?
This will occupy my mind for days and in the end I won't be able to solve it, thank you very much.

silentshark
Posts: 290
Joined: Sat Mar 27, 2010 6:15 pm
Contact:

Re: Hash collision?

This is all useful. Thanks.

So the big question for me, remains. 16 bits - good enough, or will screw you up sometimes?

(Of course, maybe I have another bug elsewhere, but hey.)

chrisw
Posts: 3851
Joined: Tue Apr 03, 2012 2:28 pm

Re: Hash collision?

elpapa wrote:
Fri Jun 07, 2019 7:04 pm
chrisw wrote:
Fri Jun 07, 2019 2:43 pm
Puzzle. Suppose a search takes X nodes and the hashtable is 2^n in size. You would like to be 95% confident no two nodes have a hash collision.

What’s the formula connecting X and n?
This will occupy my mind for days and in the end I won't be able to solve it, thank you very much.
it’s similar to the how many people would need to be in a room for 50/50 chance that two have the same birthday paradox. well, not really a paradox.

hgm
Posts: 25454
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Hash collision?

The number of collisions you get is trivial to calculate: it only depends on the size of the signature (in so far that doesn't overlap with the part of the key used as table index). With 32-bit signature one in every 4G probes will give you a false match, once the table is entirely filled. (Which in practice will always be the case.) If your probing scheme probes in clusters of 4, this of course also multiplies the collision rate by 4, so you get one error in a billion. At 1Mnps that means one collision every 1000 sec. This is bad/unacceptable if collisions can make your engine crash, and completely unobservable if they cannot. To really affect play you should have thousands of collisions per search.

With a 16-bit signature you would have 1 collision in 65k probes, even 1 in 16k if you probe 4 entries. That means that at 1Mnps you would get 60 collisions/sec. That should still be way below the rate where there is a realistic chance that they would affect the move choice or score, as long as they can never crash the engine.

The total size of the table has no influence on any of this.
Get rid of the shit: vote for SHID!

Uri Blass
Posts: 8842
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Hash collision?

chrisw wrote:
Fri Jun 07, 2019 2:43 pm
Puzzle. Suppose a search takes X nodes and the hashtable is 2^n in size. You would like to be 95% confident no two nodes have a hash collision.

What’s the formula connecting X and n?
More interesting question for me:
What is the probability that an engine get a wrong mate score in tablebase positions because of hash collision or a different reason when there is mate in n as a function of n.
By wrong mate score I mean mate that is faster than possible or mate for the opponent.

It may be interesting to have an automatic tool that simply get an epd file of random million mate in n positions from the 7 piece tablebases
and gives an engine to analyze them for m seconds without tablebases and give statistics about the results of the searches.

This tool may be good for testing engines.

Ras
Posts: 1643
Joined: Tue Aug 30, 2016 6:19 pm
Full name: Rasmus Althoff
Contact:

Re: Hash collision?

hgm wrote:
Fri Jun 07, 2019 10:26 pm
The total size of the table has no influence on any of this.
I don't agree. Imagine you had enough memory for a table with 2^64 entries, then you wouldn't need to store a signature at all because the full signature itself is only 64 bit (usually) so that the signature would be identical to the index position. That's how the implicit bits of the signature (via the index position) come into play.
Rasmus Althoff
https://www.ct800.net

Dann Corbit
Posts: 11717
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Hash collision?

Ras wrote:
Sat Jun 08, 2019 1:19 am
hgm wrote:
Fri Jun 07, 2019 10:26 pm
The total size of the table has no influence on any of this.
I don't agree. Imagine you had enough memory for a table with 2^64 entries, then you wouldn't need to store a signature at all because the full signature itself is only 64 bit (usually) so that the signature would be identical to the index position. That's how the implicit bits of the signature (via the index position) come into play.
Take the other extreme: A hash table that stores a single entry.
Almost every operation overwrites it (except for real transpositions).
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

Ras
Posts: 1643
Joined: Tue Aug 30, 2016 6:19 pm
Full name: Rasmus Althoff
Contact:

Re: Hash collision?

Dann Corbit wrote:
Sat Jun 08, 2019 1:22 am
Take the other extreme: A hash table that stores a single entry.
Yeah then you would need to store the full 64 bit signature. Basically, the information stored is the length of the signature plus the implicit index bits minus the bucket length bits. Of course, you have to store bits that are not already encoded in the index bits.

I store the upper 32 bits. Then I abuse the flag for exact/alpha/beta which needs 2 bits, but is a uint8_t, so that allows to fudge in 6 additional bits. The bucket length is 3 which loses me 2 bits again, that gives 36 bits. With 2^12 = 4096 entries, that makes 48 bits, and with 10 bytes per entry, that's 40 kB. Even my embedded STM32 platform is sufficient for that, let alone smartphones or PCs.

On top of that, I check whether the hash move is pseudo-legal, so that gives a little more redundancy.
Rasmus Althoff
https://www.ct800.net