AlvaroBegue wrote:syzygy wrote:AlvaroBegue wrote:cdani wrote:Someone has an idea of how often happen the hash collisions? I did some tries and I'm not able to catch an illegal move coming from hash. Anyway I keep checking them for legality.
Here's some relevant math behind the question:
http://en.wikipedia.org/wiki/Birthday_attack
This always comes up, but it is the wrong approach. The hash table does not remember all previous birthdays, only those that are still present.
"The wrong approach" seems a bit harsh. I like the sound of "conservative estimate" better.

I don't like the sound of "wrong" either, but I've seen the birthday paradox come up so often in this context...
I may have once come up with it myself as well in the days of rgcc. At least I remember working it out on a whiteboard.
As usual, it's the wrong approach!!
Dennis Breuker wrote:We note that the problem of calculating the probability that at least one error occurs (being 1 - P(no errors)), is analogous to the problem widely known as the birthday paradox (Feller, 1950), where the probability of at least two persons having the same birthday in a group of M persons (N being 365) has to be calculated.
The example on p. 21 is a program searching 100,000 nps for 2 hours, thereby searching 7.2 x 10^8 nodes. The assumption is that 30% is stored in the transposition table. With a hash key of 32 bits, the probability of at least two stored positions having the same hash key is calculated to be very close to 1. With 64 bits the probability was found to be 10^-3 with an expected number of collisions of 0.05.
That he is looking only at stores already shows the error of his ways. If you're only storing, you can never go wrong. Problems only occur when you probe, find a hit, but the stored position although having the same hash key is not the right one.
The thesis was written in 1998. At that time 16 MB was a reasonably big hashtable, so let's assume 2^20 (about 1 million) entries. With 32 bits for the hash key, that leaves 12 bits to distinguish between positions mapped to the same hash entry. Assuming a full hash table, that means a false hit every 2^12 = 4096 probes. With 100,000 nps during 2 hours that certainly will give a serious amount of false hits. With 64 bits for the hash key, we still have 44 bits to distinguish between positions mapped to the same hash entry. Again assuming a full hash table, we now get one false hit every 2^44 probes. If we probe at EVERY node for 2 hours at 100,000 nps, this leads to a probability of 0.00004092642 and an expected number of false hits of 0.00004092726 (using Google as calculator). Quite a difference with Breuker's result.