Hash collision?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Hash collision?

Post by syzygy »

mar wrote: Sat Jun 08, 2019 4:53 pm
syzygy wrote: Sat Jun 08, 2019 2:51 pm Collisions in the sense of two positions mapping to the same entry are unavoidable. They cannot cause strang evals and moves as long as the engine detects them as different.

So what is important is the ability of the engine to detect that two different positions mapping to the same entry/bucket are indeed different.

It seems that for this, you use just 16 bits. With 4 positions per bucket (if I am understanding you correctly), that means an effective 14 bits to distinguish between positions.

Since engines should not be clearing the hashtable between moves, you may always assume that the hashtable is completely filled (and this assumption makes the analysis very simple). So 14 bits means that one in every 16384 hashtable probes that should not detect a hit will falsely detect a hit.

In Stockfish, the ratio of false hits is one in every 2^16/3 = 21845 probes, so one in every 16384 is unlikely to do a lot of damage.
I think I'm missing something. Those 16 bits are only for those probes that map to the same bucket, so if your TT is 64k buckets (*8 slots * 8 bytes = 4 megs, which is still rather small), then only one in 64k probes will map to the same bucket and thus only one in 64k*8k should collide, or one in half a billion probes (so effectively 29-bit hash).
Unless you meant 1 in 16k probes that fall within the same bucket, what am I missing?
What you are missing is that you can always assume the TT to be full (because the engine does not clear the hashtable between moves), which reduces the analysis to the case of positions mapping to the same bucket. Whatever the bucket the current position is being mapped to, there will always be a full bucket of positions waiting to be collided with. If the bucket contains four positions, there are four 16-bit values that can be hit, so there is a false hit with a probability of 4 in 65536 = 1 in 16384 (in the case that the current position is not in the hashtable, because otherwise there will be a true hit, obviously).

So the birthday problem formula does not actually play a role here.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Hash collision?

Post by Ras »

hgm wrote: Sat Jun 08, 2019 3:55 pmThe total size of the table only comes in when the the table only gets partly filled.
Yeah I have to agree. In that case, my objection was wrong.
(you tacitly assume that by taking it for granted that a key length of 64 bits is enough to have no key collisions)
That's an unrelated issue for the problem of how much of the key to store. Of course we can have key collisions because a chess position is worth more than 64 bit of information. But compared to the issues when storing less than 64 bits signature, this is negligible.

What I'm also doing is a check for pseudo legality if there is a hash move. If that fails, I treat the lookup as "not matching".
Rasmus Althoff
https://www.ct800.net
elcabesa
Posts: 855
Joined: Sun May 23, 2010 1:32 pm

Re: Hash collision?

Post by elcabesa »

Ras wrote: Tue Jun 11, 2019 1:16 am
hgm wrote: Sat Jun 08, 2019 3:55 pmThe total size of the table only comes in when the the table only gets partly filled.
Yeah I have to agree. In that case, my objection was wrong.
(you tacitly assume that by taking it for granted that a key length of 64 bits is enough to have no key collisions)
That's an unrelated issue for the problem of how much of the key to store. Of course we can have key collisions because a chess position is worth more than 64 bit of information. But compared to the issues when storing less than 64 bits signature, this is negligible.

What I'm also doing is a check for pseudo legality if there is a hash move. If that fails, I treat the lookup as "not matching".
I had the same objection, but before writing a message I tested it with Vajolet and Ronald is right :)
the only difference is that I uuse 32 bit for the key, butr probably 126 are just enough