Hash collision?
Moderators: hgm, Harvey Williamson, bob
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Re: Hash collision?
Puzzle. Suppose a search takes X nodes and the hashtable is 2^n in size. You would like to be 95% confident no two nodes have a hash collision.
What’s the formula connecting X and n?
What’s the formula connecting X and n?
Re: Hash collision?
Bob Hyatt has already done the research on the required hash length many years ago. The result was that 32 bit are not sufficient, 64 bit are, and you can even get away with 48 bit.
Obviously, you also have the implicit storage via the hash table index, minus the bits for your number of slots. You can get away with storing 32 bits if you have at least 16 bits implicit storage, i.e. 64k entries. Let's assume a bucket length of 3, then you need 18 bits, i.e. 256k entries. That shouldn't be an issue today.
Obviously, you also have the implicit storage via the hash table index, minus the bits for your number of slots. You can get away with storing 32 bits if you have at least 16 bits implicit storage, i.e. 64k entries. Let's assume a bucket length of 3, then you need 18 bits, i.e. 256k entries. That shouldn't be an issue today.
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
Re: Hash collision?
This will occupy my mind for days and in the end I won't be able to solve it, thank you very much.
 silentshark
 Posts: 220
 Joined: Sat Mar 27, 2010 6:15 pm
 Contact:
Re: Hash collision?
This is all useful. Thanks.
So the big question for me, remains. 16 bits  good enough, or will screw you up sometimes?
(Of course, maybe I have another bug elsewhere, but hey.)
So the big question for me, remains. 16 bits  good enough, or will screw you up sometimes?
(Of course, maybe I have another bug elsewhere, but hey.)
Re: Hash collision?
 hgm
 Posts: 23203
 Joined: Fri Mar 10, 2006 9:06 am
 Location: Amsterdam
 Full name: H G Muller
 Contact:
Re: Hash collision?
The number of collisions you get is trivial to calculate: it only depends on the size of the signature (in so far that doesn't overlap with the part of the key used as table index). With 32bit signature one in every 4G probes will give you a false match, once the table is entirely filled. (Which in practice will always be the case.) If your probing scheme probes in clusters of 4, this of course also multiplies the collision rate by 4, so you get one error in a billion. At 1Mnps that means one collision every 1000 sec. This is bad/unacceptable if collisions can make your engine crash, and completely unobservable if they cannot. To really affect play you should have thousands of collisions per search.
With a 16bit signature you would have 1 collision in 65k probes, even 1 in 16k if you probe 4 entries. That means that at 1Mnps you would get 60 collisions/sec. That should still be way below the rate where there is a realistic chance that they would affect the move choice or score, as long as they can never crash the engine.
The total size of the table has no influence on any of this.
With a 16bit signature you would have 1 collision in 65k probes, even 1 in 16k if you probe 4 entries. That means that at 1Mnps you would get 60 collisions/sec. That should still be way below the rate where there is a realistic chance that they would affect the move choice or score, as long as they can never crash the engine.
The total size of the table has no influence on any of this.
Re: Hash collision?
More interesting question for me:
What is the probability that an engine get a wrong mate score in tablebase positions because of hash collision or a different reason when there is mate in n as a function of n.
By wrong mate score I mean mate that is faster than possible or mate for the opponent.
It may be interesting to have an automatic tool that simply get an epd file of random million mate in n positions from the 7 piece tablebases
and gives an engine to analyze them for m seconds without tablebases and give statistics about the results of the searches.
This tool may be good for testing engines.
Re: Hash collision?
I don't agree. Imagine you had enough memory for a table with 2^64 entries, then you wouldn't need to store a signature at all because the full signature itself is only 64 bit (usually) so that the signature would be identical to the index position. That's how the implicit bits of the signature (via the index position) come into play.
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net

 Posts: 9632
 Joined: Wed Mar 08, 2006 7:57 pm
 Location: Redmond, WA USA
 Contact:
Re: Hash collision?
Take the other extreme: A hash table that stores a single entry.Ras wrote: ↑Sat Jun 08, 2019 1:19 amI don't agree. Imagine you had enough memory for a table with 2^64 entries, then you wouldn't need to store a signature at all because the full signature itself is only 64 bit (usually) so that the signature would be identical to the index position. That's how the implicit bits of the signature (via the index position) come into play.
Almost every operation overwrites it (except for real transpositions).
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Re: Hash collision?
Yeah then you would need to store the full 64 bit signature. Basically, the information stored is the length of the signature plus the implicit index bits minus the bucket length bits. Of course, you have to store bits that are not already encoded in the index bits.Dann Corbit wrote: ↑Sat Jun 08, 2019 1:22 amTake the other extreme: A hash table that stores a single entry.
I store the upper 32 bits. Then I abuse the flag for exact/alpha/beta which needs 2 bits, but is a uint8_t, so that allows to fudge in 6 additional bits. The bucket length is 3 which loses me 2 bits again, that gives 36 bits. With 2^12 = 4096 entries, that makes 48 bits, and with 10 bytes per entry, that's 40 kB. Even my embedded STM32 platform is sufficient for that, let alone smartphones or PCs.
On top of that, I check whether the hash move is pseudolegal, so that gives a little more redundancy.
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net