crafty new hash scheme question

Discussion of chess software programming and technical issues.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: crafty new hash scheme question - % test results

Post by bob »

In the hash code, I had a few of these:

entry = xxx & hash_mask;

I replaced each one by this:

entry = xxx % (hash_mask + 1);

just for a quick test. There was about a 5% slowdown by doing this, which shows that the cost of the divide was a bit more than I had originally guessed...
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: crafty new hash scheme question

Post by jwes »

bob wrote:
jwes wrote:
bob wrote:
jwes wrote:
hgm wrote:A way to avoid division or modulo instructons, and have a hash size that is, say, 7 times a power of two would be to derive the index as:

index = 7*(hashKey & ((1<<60)-1)) >> N;

That requires only an AND, multiply and shift, which is usually faster than modulo.

I use this method to derive number < 5 from the hash key to index the number of the entry within the bucket, as I have 7 entries per bucket, where one is an overflow entry with always-replace policy, while the primary (depth-preferred) hit goes to one of the five overlapping pairs fitted within the other 6. So the number derived from the hash key decides in which pair the node can be stored, and if both these entries already contan a higher draft, the new node goes to the overflow entry.
There is an even simpler way that works for any hash size. If n is the number of entries,

index = ((unsigned int32) hashKey * n) >> 32;

the compiler should be able to reduce this to just a multiply (on x86).
How is that going to work? N can be greater than 2^32 in fact on today's boxes. This has to work for all machines to be useful...
You're right. My code only works up to 4G hash entries or 64GB hash memory, but will be faster on 32 bit OSes. It should be

index = ((unsigned int64) hashKey * n) >> 64;
This is not legal. On a Cray shifting 64 bits is really a 0 bit shift as the shift count is only 6 bits wide. Intel won't allow 64 bit shifts. You can only shift 63 bits on X86_64. There is no 128 bit shifts using the normal registers.
Right again. The 32 bit code with the appropriate casts would work. For 64 bit, you need assembly code or a library function to get the high 64 bits of the result of a 64*64 bit multiply. I believe that the Microsoft CRT has such a function, but I can't find it.
User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: crafty new hash scheme question - % test results

Post by hgm »

bob wrote:There was about a 5% slowdown by doing this, which shows that the cost of the divide was a bit more than I had originally guessed...
What makes it extra painful is that the hash probe is already the longest-latency operation in your code, and the calculation of the address is in series with it on the critical path. So there is no way for the optimizer or out-of-order execution to hide the latency. It would not be so bad if you were doing the division to decide which element in the cache line you are fetching you should access, as then the operations could be performed in parallel...
Aleks Peshkov
Posts: 922
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia
Full name: Aleks Peshkov

Re: crafty new hash scheme question - % test results

Post by Aleks Peshkov »

hgm wrote:So there is no way for the optimizer or out-of-order execution to hide the latency. It would not be so bad if you were doing the division to decide which element in the cache line you are fetching you should access, as then the operations could be performed in parallel...
It may payoff to calculate and store TT key together with Zobrist key when Zobrist is calculated.
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: crafty new hash scheme question

Post by Gerd Isenberg »

jwes wrote: Right again. The 32 bit code with the appropriate casts would work. For 64 bit, you need assembly code or a library function to get the high 64 bits of the result of a 64*64 bit multiply. I believe that the Microsoft CRT has such a function, but I can't find it.
__UMULH, __umulh or _umul128

Code: Select all

unsigned __int64 __umulh( 
   unsigned __int64 a, 
   unsigned __int64 b 
);
User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: crafty new hash scheme question - % test results

Post by hgm »

Aleks Peshkov wrote:It may payoff to calculate and store TT key together with Zobrist key when Zobrist is calculated.
Not sure what you mean here. Is TT key the same thing as hash-table index? Why would you ever want to store that? You only need it once, not? The critcal path is deciding upon a move to search, update the key accordingly, derive the index, fetch the cache line, compare the signature. In parallel you can peform the move on the board and in the piece list, check for repetitions (after you have the key), pre-emptively call Search() for the next level.
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: crafty new hash scheme question - % test results

Post by Zach Wegner »

hgm wrote:
Aleks Peshkov wrote:It may payoff to calculate and store TT key together with Zobrist key when Zobrist is calculated.
Not sure what you mean here. Is TT key the same thing as hash-table index? Why would you ever want to store that? You only need it once, not? The critcal path is deciding upon a move to search, update the key accordingly, derive the index, fetch the cache line, compare the signature. In parallel you can peform the move on the board and in the piece list, check for repetitions (after you have the key), pre-emptively call Search() for the next level.
It's actually not such a bad idea on first thought. You're usually going to calculate the hash table index anyways, and doing it early can allow OOE to absorb the cost of the div. If you're already prefetching though, this is pretty much the same idea. Keeping the index around could be slightly useful for the hash store at the end of search, though.