Alternative to power of two size hash tables

MOBMAT · Post by **MOBMAT** » Tue May 01, 2018 10:13 pm

Last winter, I was checking the Stockfish site for new development builds, and came across a release that had a different scheme for indexing into the hash table.

I think you can link to the info here:

http://abrok.eu/stockfish/?page=6

The actual text follow:

Author: Joost VandeVondele
Date: Mon Dec 18 16:32:21 2017 +0100
Timestamp: 1513611141

Allow for general transposition table sizes. (#1341)

For efficiency reasons current master only allows for transposition table sizes that are N = 2^k in size, the index computation can be done efficiently as (hash % N) can be written instead as (hash & 2^k - 1). On a typical computer (with 4, 8... etc Gb of RAM), this implies roughly half the RAM is left unused in analysis.

This issue was mentioned on fishcooking by Mindbreaker:

http://tests.stockfishchess.org/tests/v ... ccbb8be04

Recently a neat trick was proposed to map a hash into the range [0,N[ more efficiently than (hash % N) for general N, nearly as efficiently as (hash % 2^k):

https://lemire.me/blog/2016/06/27/a-fas ... eduction/

namely computing (hash * N / 2^32) for 32 bit hashes. This patch implements this trick and now allows for general hash sizes. Note that for N = 2^k this just amounts to using a different subset of bits from the hash. Master will use the lower k bits, this trick will use the upper k bits (of the 32 bit hash).

There is no slowdown as measured with [-3, 1] test:

http://tests.stockfishchess.org/tests/v ... 0ccbb8be04

LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 128498 W: 23332 L: 23395 D: 81771

There are two (smaller) caveats:

1) the patch is implemented for a 32 bit hash (so that a 64 bit multiply can be used), this effectively limits the number of clusters that can be used to 2^32 or to 128Gb of transpostion table. That's a change in the maximum allowed TT size, which could bother those using 256Gb or more regularly.

2) Already in master, an excluded move is hashed into the position key in rather simple way, essentially only affecting the lower 16 bits of the key. This is OK in master, since bits 0-15 end up in the index, but not in the new scheme, which picks the higher bits. This is 'fixed' by shifting the excluded move a few bits up. Eventually a better hashing scheme seems wise.

Despite these two caveats, I think this is a nice improvement in usability.

Bench: 5346341

As I was in the middle of some code rewrite, I decided to try this idea. So far, it is working fine. My implementation is as follows:

Initialize the hash with:

Code: Select all

U32 m_NumHashEntries = (HashSize * MEGABYTE) / sizeof(TTMAIN);
	m_Hash = (TTMAIN *)calloc(m_NumHashEntries, sizeof(TTMAIN));

Note that m_NumHashEntries will probably NOT be a power of two!

TTMAIN is the hash entry structure
MEGABTYE = 1024 * 1024

To get the correct index from the key...

Code: Select all

U32 GetHashIndex(const U64 key){
	return ((key & BITMASK32) * m_NumHashEntries) >> 32;
	// from stockfish
   }

where:
U64 BITMASK32 = 0xFFFFFFFF;

the 32 bit right shift is a very fast divide so the extra overhead of this method is not very much. The first mask gives use the lower 32bits of the 64bit key, and the multiply will not overflow a 64bit number. Then we shift 32 to the right to get the upper portion as our index.

Volker Annuss · Post by **Volker Annuss** » Tue May 01, 2018 10:26 pm

That trick was already discussed here.

MOBMAT · Post by **MOBMAT** » Tue May 01, 2018 10:31 pm

wow, the idea has been around a long time!

I'll read that thread for more info, thanks.

syzygy · Post by **syzygy** » Wed May 02, 2018 12:00 am

I have tried to modify the trick to make it work for hash tables larger than 128 GB:
https://github.com/official-stockfish/S ... ssues/1349

Unfortunately all my tests failed, even though there is no clear reason why this would be measurably slower on x86-64:

Code: Select all

return &table[(key * (unsigned __int128)clusterCount) >> 64].entry[0];

Dann Corbit · Post by **Dann Corbit** » Wed May 02, 2018 12:14 am

If you use a union of 2 64 bit integers and a 128 bit integer then you would not even have to shift. Just grab the top half.

syzygy · Post by **syzygy** » Wed May 02, 2018 12:20 am

Dann Corbit wrote:If you use a union of 2 64 bit integers and a 128 bit integer then you would not even have to shift. Just grab the top half.

That's what the compiler already does.

Dann Corbit · Post by **Dann Corbit** » Wed May 02, 2018 1:42 am

Pretty strange that it is slow then.

Volker Annuss · Post by **Volker Annuss** » Wed May 02, 2018 9:49 pm

I have not measured how fast it is, but you might want to try

Code: Select all

return &table[(((key * cluster_count) & 0x3ffffffffull) >> 34].entry[0];

for 512GB if your cluster size is 32.

syzygy · Post by **syzygy** » Thu May 03, 2018 12:50 am

Volker Annuss wrote:I have not measured how fast it is, but you might want to try
Code: Select all
return &table[(((key * cluster_count) & 0x3ffffffffull) >> 34].entry[0];
for 512GB if your cluster size is 32.

I have tried something similar to get to 1 TB:

Code: Select all

return &table[((key >> 20) * unitCount) >> 29].entry[0];

It also tested worse:
http://tests.stockfishchess.org/tests/v ... 0ccbb8c19e

Volker Annuss · Post by **Volker Annuss** » Thu May 03, 2018 7:25 am

Much too late to edit. This one should be correct:

Code: Select all

return &table[((key & 0x3ffffffffull) * cluster_count) >> 34].entry[0];

Alternative to power of two size hash tables

Alternative to power of two size hash tables

Re: Alternative to power of two size hash tables

Re: Alternative to power of two size hash tables

Re: Alternative to power of two size hash tables

Re: Alternative to power of two size hash tables

Re: Alternative to power of two size hash tables

Re: Alternative to power of two size hash tables

Re: Alternative to power of two size hash tables

Re: Alternative to power of two size hash tables

Re: Alternative to power of two size hash tables