Hash collision?

silentshark · Post by **silentshark** » Sat Jun 08, 2019 11:16 am

So.. going back to my original posting, here's my data.

What do people think about the probability of hash collisions causing strange evals and moves?

1. Time control 40 moves in 25 minutes
2. Searching approx 2 million NPS
3. Size of hash table, 2097152, with 4 buckets
4. Using a 64 bit int for the hash of the current position in the search
5. Using the bottom 20 bits of this to calculate the index
6. Using the top 16 bits to store in the table

I'm not sure, does this mean I have an effective hash length of 21 + 16 - 4, so 33 bits?

brianr · Post by **brianr** » Sat Jun 08, 2019 11:42 am

Turns out that it may not matter very much:
http://www.craftychess.com/hyatt/collisions.html

Sven · Post by **Sven** » Sat Jun 08, 2019 12:17 pm

silentshark wrote: ↑Sat Jun 08, 2019 11:16 am 3. Size of hash table, 2097152, with 4 buckets
[...]
5. Using the bottom 20 bits of this to calculate the index
6. Using the top 16 bits to store in the table

I'm not sure, does this mean I have an effective hash length of 21 + 16 - 4, so 33 bits?

In my view you have 2^21 hash entries, 2^2 entries per bucket, 2^19 indices (so I would use the bottom 19 bits to calculate the index), and therefore a hash length of 21 - 2 + 16 = 35 bits.

silentshark · Post by **silentshark** » Sat Jun 08, 2019 1:52 pm

brianr wrote: ↑Sat Jun 08, 2019 11:42 am Turns out that it may not matter very much:
http://www.craftychess.com/hyatt/collisions.html

Right, although I noted the useful summary in the article - "In summary, 32 bit signatures produced wrong moves and scores, while 64 bits did not".

I may go back to using more that 16 bits, for my own sanity.

(although, it's probably a different bug that caused the fun and games.. anyway)

Thanks all for the useful postings around this topic

syzygy · Post by **syzygy** » Sat Jun 08, 2019 2:51 pm

silentshark wrote: ↑Sat Jun 08, 2019 11:16 am So.. going back to my original posting, here's my data.

What do people think about the probability of hash collisions causing strange evals and moves?

1. Time control 40 moves in 25 minutes
2. Searching approx 2 million NPS
3. Size of hash table, 2097152, with 4 buckets
4. Using a 64 bit int for the hash of the current position in the search
5. Using the bottom 20 bits of this to calculate the index
6. Using the top 16 bits to store in the table

I'm not sure, does this mean I have an effective hash length of 21 + 16 - 4, so 33 bits?

Collisions in the sense of two positions mapping to the same entry are unavoidable. They cannot cause strang evals and moves as long as the engine detects them as different.

So what is important is the ability of the engine to detect that two different positions mapping to the same entry/bucket are indeed different.

It seems that for this, you use just 16 bits. With 4 positions per bucket (if I am understanding you correctly), that means an effective 14 bits to distinguish between positions.

Since engines should not be clearing the hashtable between moves, you may always assume that the hashtable is completely filled (and this assumption makes the analysis very simple). So 14 bits means that one in every 16384 hashtable probes that should not detect a hit will falsely detect a hit.

In Stockfish, the ratio of false hits is one in every 2^16/3 = 21845 probes, so one in every 16384 is unlikely to do a lot of damage.

hgm · Post by **hgm** » Sat Jun 08, 2019 3:55 pm

Ras wrote: ↑Sat Jun 08, 2019 3:19 amI don't agree. Imagine you had enough memory for a table with 2^64 entries, then you wouldn't need to store a signature at all because the full signature itself is only 64 bit (usually) so that the signature would be identical to the index position. That's how the implicit bits of the signature (via the index position) come into play.

We discussed this before. Because you have no independent signature EVERY new entry you wanted to store would be a collision, when the table is completely filled. There would already be something in the slot to which it maps, and as this would be a different position (as otherwise the one you want to store would not be a new one). And you have no way to detect that.

So the collision rate is 1/nrOfSignatures * fillingFraction. The total size of the table only comes in when the the table only gets partly filled. Then having a larger table automatically means a lower filling fraction. (Which is just the trivial effect that you won't have collisions in empty slots, presuming that you can somehow recognize them as empty.) But this is not a situation that can be achieved in practice. (And it would be very wasteful if you could; adding one bit to the signature would have the same effect on collisions as doubling the number of entries of an already too large table.) Your hypothetical table with 2^64 entries is in that limit, though, (you tacitly assume that by taking it for granted that a key length of 64 bits is enough to have no key collisions), which is why it is a misleading example.

Joost Buijs · Post by **Joost Buijs** » Sat Jun 08, 2019 4:07 pm

syzygy wrote: ↑Sat Jun 08, 2019 2:51 pm
In Stockfish, the ratio of false hits is one in every 2^16/3 = 21845 probes, so one in every 16384 is unlikely to do a lot of damage.

Maybe Stockfish doesn't probe on every node (position) it generates, I don't know. But if it does, and you run on a fairly modern PC it can have like 1000 false hits a second, personally I can't imagine that this doesn't hurt. But I assume it has all been tested safe and sound.

mar · Post by **mar** » Sat Jun 08, 2019 4:53 pm

syzygy wrote: ↑Sat Jun 08, 2019 2:51 pm Collisions in the sense of two positions mapping to the same entry are unavoidable. They cannot cause strang evals and moves as long as the engine detects them as different.

So what is important is the ability of the engine to detect that two different positions mapping to the same entry/bucket are indeed different.

It seems that for this, you use just 16 bits. With 4 positions per bucket (if I am understanding you correctly), that means an effective 14 bits to distinguish between positions.

Since engines should not be clearing the hashtable between moves, you may always assume that the hashtable is completely filled (and this assumption makes the analysis very simple). So 14 bits means that one in every 16384 hashtable probes that should not detect a hit will falsely detect a hit.

In Stockfish, the ratio of false hits is one in every 2^16/3 = 21845 probes, so one in every 16384 is unlikely to do a lot of damage.

I think I'm missing something. Those 16 bits are only for those probes that map to the same bucket, so if your TT is 64k buckets (*8 slots * 8 bytes = 4 megs, which is still rather small), then only one in 64k probes will map to the same bucket and thus only one in 64k*8k should collide, or one in half a billion probes (so effectively 29-bit hash).
Unless you meant 1 in 16k probes that fall within the same bucket, what am I missing?

mar · Post by **mar** » Sat Jun 08, 2019 5:32 pm

I just ran a quick simulation and you were right, the number of probe collisions is independent of TT size, so for 16 bits of hash stored in the TT in 8-bucket scheme, I get a false hit for 1 in 8k probes... so only 13 bits
I must really suck at math, but where's the flaw? I thought that increasing the TT effectively increases the hash size but it doesn't seem to be the case...

chrisw · Post by **chrisw** » Sat Jun 08, 2019 6:05 pm

syzygy wrote: ↑Sat Jun 08, 2019 2:51 pm
silentshark wrote: ↑Sat Jun 08, 2019 11:16 am So.. going back to my original posting, here's my data.

What do people think about the probability of hash collisions causing strange evals and moves?

1. Time control 40 moves in 25 minutes
2. Searching approx 2 million NPS
3. Size of hash table, 2097152, with 4 buckets
4. Using a 64 bit int for the hash of the current position in the search
5. Using the bottom 20 bits of this to calculate the index
6. Using the top 16 bits to store in the table

I'm not sure, does this mean I have an effective hash length of 21 + 16 - 4, so 33 bits?
Collisions in the sense of two positions mapping to the same entry are unavoidable. They cannot cause strang evals and moves as long as the engine detects them as different.

So what is important is the ability of the engine to detect that two different positions mapping to the same entry/bucket are indeed different.

It seems that for this, you use just 16 bits. With 4 positions per bucket (if I am understanding you correctly), that means an effective 14 bits to distinguish between positions.

Since engines should not be clearing the hashtable between moves, you may always assume that the hashtable is completely filled (and this assumption makes the analysis very simple). So 14 bits means that one in every 16384 hashtable probes that should not detect a hit will falsely detect a hit.

In Stockfish, the ratio of false hits is one in every 2^16/3 = 21845 probes, so one in every 16384 is unlikely to do a lot of damage.

Depends. Although damage is mitigated by two factors I can think up. Many of the hash hits only result in using the continuation move, not so many hits are useful for using bounds. In case continuation move being “wrong”, some of those won’t get past move verification, and those that do will only slow the search, not change its result (ok, not exactly true, but pure a/b wouldn’t have its search result changed).
In second case where the bounds are such as to give a false evaluation, the amount of damage depends on how close to the root the collision occurs. Far down the tree, the occasional bad eval will get smothered out, but very close to the root and the false score really can change things. On the other side, most collisions will be deep in the tree, because that’s where most of the nodes are.

All in all, hash collisions are going to have a generally dulling effect throughout the tree. Stuff just gets done not right, but since a/b program evals deep in the tree are missing all kinds of things, and pruning deep in the tree is missing all kinds of things, it’s not surprising that a bit more brain dulling isn’t going to measure up in tests. Get everything else perfect and the dulling will show.

Ah, it just occurs to me how to test the effect of collisions in a more perfectly analysed world.
Simple a/b program. Humungous suite of varied 6-man positions. Play games using EGTB as evaluator, but don’t treat nodes as terminal, just continue searching to fixed depth. Theoretically, this simple program will (long-windedly) return DTM as per the EGTB.

Introduce a hash table, with continuation move and bounds. For a range of hashkey sizes, measure the delta off the correct DTM. Mean delta should decrease with hashkey length.

Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?