Rotated hash

jwes · Post by **jwes** » Tue Sep 13, 2016 6:20 am

I implemented Zobrist hashing by using rotated random numbers rather than arrays of random numbers, i.e.
hash = hash ^ _rotl64(zob[piecetype], from) ^ _rotl64(zob[piecetype], to)
instead of
hash = hash ^ zob[piecetype][from] ^ zob[piecetype][to]
This means my zobrist table has 16 qwords rather than over 1000 qwords.

I let it run for several days and got these statistics with 256M hash entries and a 64 bit key:

nodes 416,679,145,493
tt_probes 62,783,706,360
tt_hits 12,824,168,127
tt_falsehits 1

I feel that 1 false hit every few days is acceptable, and saving 8K bytes of frequently referenced data should reduce cache misses.

BTW, I would appreciate it if someone who remembers their statistics better than I do could calculate confidence intervals for false hits per hash probe.

syzygy · Post by **syzygy** » Tue Sep 13, 2016 8:17 am

The math is easy, but to do the math we have to know a bit more.

1. How do you know a hit is false? Or in other words: what does tt_falsehits mean precisely?

2. Do you use buckets of size 1?

3. Do you store the whole 64-bit key in hash entries?

hgm · Post by **hgm** » Tue Sep 13, 2016 8:24 am

That is about 1 in 2^36 collisions on a set of 2^28. Which is exactly what wouldbe expected for a 64-bit key, as 36 + 28 = 64.

How do you detect false hits? And how did you generate the random piece keys?

In HaChu I use pieceKey[pieceType]*squareKey[squareNumber], rather than rotations, because I have to deal with more than 64 squares (sometimes even an order of magnitude more). But the keys can be 32-bit in this case.

jwes · Post by **jwes** » Tue Sep 13, 2016 8:49 am

syzygy wrote:The math is easy, but to do the math we have to know a bit more.

1. How do you know a hit is false? Or in other words: what does tt_falsehits mean precisely?

I have a second independent 32 bit hash key for testing, so tt_falsehits means that the first key matched, but the second key didn't.

syzygy wrote:2. Do you use buckets of size 1?

Yes, but I use wtm as the low bit of the index, so all entries for a bucket ar the same color.

syzygy wrote:3. Do you store the whole 64-bit key in hash entries?

Only 48 bits, but I used 27 bits of the 64 for the index.

matthewlai · Post by **matthewlai** » Tue Sep 13, 2016 3:24 pm

jwes wrote:I implemented Zobrist hashing by using rotated random numbers rather than arrays of random numbers, i.e.
hash = hash ^ _rotl64(zob[piecetype], from) ^ _rotl64(zob[piecetype], to)
instead of
hash = hash ^ zob[piecetype][from] ^ zob[piecetype][to]
This means my zobrist table has 16 qwords rather than over 1000 qwords.

I let it run for several days and got these statistics with 256M hash entries and a 64 bit key:
nodes 416,679,145,493
tt_probes 62,783,706,360
tt_hits 12,824,168,127
tt_falsehits 1
I feel that 1 false hit every few days is acceptable, and saving 8K bytes of frequently referenced data should reduce cache misses.

BTW, I would appreciate it if someone who remembers their statistics better than I do could calculate confidence intervals for false hits per hash probe.

8KBytes is tiny relative to megabytes of L3 cache we now have on modern CPUs, and even a L3 hit (L1 and L2 miss) should still be much faster than doing a rotation.

Have you benchmarked this? My feeling is using a larger table would still be faster.

In a chess engine, when playing with hash tables much bigger than L3, my feeling is most of the cache is wasted anyways. The engine will almost never access another entry in the same cache line before it's flushed out.

Aleks Peshkov · Post by **Aleks Peshkov** » Tue Sep 13, 2016 3:38 pm

I use bit-rotated Zobrist keys, but I do not belief in blind random luck for selecting just 7 base keys.

hgm · Post by **hgm** » Tue Sep 13, 2016 5:09 pm

matthewlai wrote:8KBytes is tiny relative to megabytes of L3 cache we now have on modern CPUs, and even a L3 hit (L1 and L2 miss) should still be much faster than doing a rotation.

Rotation is a single machine instruction, right? With a latency similar to an L1 access.

matthewlai · Post by **matthewlai** » Tue Sep 13, 2016 5:16 pm

hgm wrote:
matthewlai wrote:8KBytes is tiny relative to megabytes of L3 cache we now have on modern CPUs, and even a L3 hit (L1 and L2 miss) should still be much faster than doing a rotation.
Rotation is a single machine instruction, right? With a latency similar to an L1 access.

Is it? I didn't know that.

If that's the case then this is definitely a good idea.

I assumed it would be at least a few instructions.

jwes · Post by **jwes** » Tue Sep 13, 2016 5:44 pm

matthewlai wrote:
jwes wrote:I implemented Zobrist hashing by using rotated random numbers rather than arrays of random numbers, i.e.
hash = hash ^ _rotl64(zob[piecetype], from) ^ _rotl64(zob[piecetype], to)
instead of
hash = hash ^ zob[piecetype][from] ^ zob[piecetype][to]
This means my zobrist table has 16 qwords rather than over 1000 qwords.

I let it run for several days and got these statistics with 256M hash entries and a 64 bit key:
nodes 416,679,145,493
tt_probes 62,783,706,360
tt_hits 12,824,168,127
tt_falsehits 1
I feel that 1 false hit every few days is acceptable, and saving 8K bytes of frequently referenced data should reduce cache misses.

BTW, I would appreciate it if someone who remembers their statistics better than I do could calculate confidence intervals for false hits per hash probe.
8KBytes is tiny relative to megabytes of L3 cache we now have on modern CPUs, and even a L3 hit (L1 and L2 miss) should still be much faster than doing a rotation.

Have you benchmarked this? My feeling is using a larger table would still be faster.

In a chess engine, when playing with hash tables much bigger than L3, my feeling is most of the cache is wasted anyways. The engine will almost never access another entry in the same cache line before it's flushed out.

The Intel Optimization Manual says ROL has latency of 1 cycle and throughput of 1.5 cycles. It would be hard to benchmark now since the result depends on cache pressure, and my program is not stable.

bob · Post by **bob** » Tue Sep 13, 2016 6:46 pm

jwes wrote:
syzygy wrote:The math is easy, but to do the math we have to know a bit more.

1. How do you know a hit is false? Or in other words: what does tt_falsehits mean precisely?
I have a second independent 32 bit hash key for testing, so tt_falsehits means that the first key matched, but the second key didn't.

syzygy wrote:2. Do you use buckets of size 1?
Yes, but I use wtm as the low bit of the index, so all entries for a bucket ar the same color.

syzygy wrote:3. Do you store the whole 64-bit key in hash entries?
Only 48 bits, but I used 27 bits of the 64 for the index.

Doing this, how do you know it is not the 32 bit independent hash key that is causing the problem?

Rotated hash

Rotated hash

Re: Rotated hash

Re: Rotated hash

Re: Rotated hash

Re: Rotated hash

Re: Rotated hash

Re: Rotated hash

Re: Rotated hash

Re: Rotated hash

Re: Rotated hash