lockless hashing

Evert · Post by **Evert** » Wed May 11, 2016 10:20 pm

bob wrote: Lockless hashing is NOT about "Elo". It is about correctness in a threaded search, where a bad move (or other information) could cause a program to crash. If your program is immune to crashes caused by a bogus hash table best move, this won't provide any measurable gain whatsoever. Same applies to lockless pawn hashing as well.

So if you can store move/depth/lock in a single atomic entry, would you still need the lockless hashing? The only other thing in the table is the score/bound type.
Of course that assumes you can write those three entries together atomically, which may not be the case in practice (I haven't checked).

Of course the program can still encounter a false hash hit with a bogus move. This happens rarely, but personally I find a program that is known to crash occasionally during normal operation inelegant. However, if you do accept this possibility, you can actually gain a bit by storing whether the position is "in check" or not, and skip that test (which can be done efficiently, but is still relatively slow).

jdart · Post by **jdart** » Wed May 11, 2016 10:25 pm

Lockless hashing via the Hyatt method is cheap - just a few machine instructions. I also in addition do some validity checks (but not full legality checking) on the hash move before use. That may seem like overkill. But my program does not tolerate illegal moves. And I regularly run matches or do online play for days on big multi core boxes (up to 24 cores). I care about reliability and this code is reliable.

--Jon

bob · Post by **bob** » Wed May 11, 2016 11:42 pm

Evert wrote:
bob wrote: Lockless hashing is NOT about "Elo". It is about correctness in a threaded search, where a bad move (or other information) could cause a program to crash. If your program is immune to crashes caused by a bogus hash table best move, this won't provide any measurable gain whatsoever. Same applies to lockless pawn hashing as well.
So if you can store move/depth/lock in a single atomic entry, would you still need the lockless hashing? The only other thing in the table is the score/bound type.
Of course that assumes you can write those three entries together atomically, which may not be the case in practice (I haven't checked).

Of course the program can still encounter a false hash hit with a bogus move. This happens rarely, but personally I find a program that is known to crash occasionally during normal operation inelegant. However, if you do accept this possibility, you can actually gain a bit by storing whether the position is "in check" or not, and skip that test (which can be done efficiently, but is still relatively slow).

Yes, if somewhat unsafely. Today 8 byte-aligned writes of size 8 are done atomically. Who knows about tomorrow? And there are other architectures with a more relaxed memory model (the Alpha was probably the most aggressive) where such an assumption might blow up.

However, 8 byte hash entries are really problematic in terms of collisions, which means to get rid of the lockless code, you still have to vet the move to be sure it is OK to make, assuming you have potential crash conditions (such as castling with no king on e1, which would produce a second one, etc... I have an 8 byte version but I was not happy with the number of collisions.

hgm · Post by **hgm** » Thu May 12, 2016 7:36 am

The point is that this SMP-induced corruption of hash entries will not cause any effects you would not have on ordinary key collisions. So 'solutions' that increase the latter just make things worse, and going to a shorter signature to squeeze the hash entry in 8 bytes is thus not the way to go. You would have more bad hash moves than ever, so you would still have to vet those.

So if you want to get rid of the need to validate the hash move, you will need a signature longer than 4 bytes, and this makes it impossible to cram the complete hash entry in an atomic 8 bytes. You could put a 6-byte signature and 2-byte move in there, however, and store score and depth (and perhaps age) non-atomically elsewhere in the bucket (the bound-type flags can usually fit with the move). You don't care so much about these getting corrupted.

An alternative would be to store 4-bytes of the signature, 2-byte score and move+flags in one 8-byte word, and another signature byte plus a depth byte in another word. Like

Code: Select all

typedef struct &#123;
  int32 signature;
  int16 score;
  int16 move;
&#125; Atomic; // 8 byte

typedef struct &#123;
  char depth;
  char signature_ext;
&#125; Extended; // 2 byte

typedef struct &#123;
  Atomic ssm&#91;6&#93;;
  Extended ex&#91;6&#93;;
  char filler&#91;2&#93;;
&#125; HashEntry;

The signature_ext would only have to be tested in case the 4-byte signature matches.

Joost Buijs · Post by **Joost Buijs** » Thu May 12, 2016 1:58 pm

jdart wrote:Lockless hashing via the Hyatt method is cheap - just a few machine instructions. I also in addition do some validity checks (but not full legality checking) on the hash move before use. That may seem like overkill. But my program does not tolerate illegal moves. And I regularly run matches or do online play for days on big multi core boxes (up to 24 cores). I care about reliability and this code is reliable.

I fully agree...

The scheme that I use is XORing the key with the data field which are both 64 bit in size, this is probably the same as the Hyatt method, it takes just one extra machine instruction during store and load.

It is bad practice to accept wrong entries from the hash-table even when it doesn't cost you any Elo.
When a bad entry occurs near the root it will certainly have an effect on move selection, maybe this is why several engines don't use TT pruning in the PV.

When I switch off lockless hashing in my SMP environment and don't check for legality of the TT move the engine crashes almost immediately, mostly because of kings getting captured which my engine doesn't like. With lockless hashing enabled (still without legality check) I never see the engine crash.

My engine only has a full legality check which checks everything, including moving into check, the extra time it takes is negligible and drowns in the SMP noise.

I guess for a non SMP engine it will be less critical because ordinary collisions occur less frequently than non atomic hash updates.

hgm · Post by **hgm** » Thu May 12, 2016 4:56 pm

Well, a 64-bit key is usually not really 64 bit, as many of the bits are implied by the index. With 64-byte buckets and 1GB hash 24 bits of the key are implied. So storing 4-byte primary signature with the move and a 1-byte signature extension with the remaining data basically stores all signature you have. And instead of having to do an XOR on every entry in the bucket before comparing its thus decoded signature with the sought one, you woud just do normal 32-bit compares on the plain signature, and only a single additional byte compare on the signature extension.

Of course you could try to store another position-identifying quantity in the place of the key bits implied by the index. In KingSlayer the index is taken from the low-order bits of the key, and I store the XOR of that key and the (incrementally updated) pstEval in a 64-bit signature field.

Patrice Duhamel · Post by **Patrice Duhamel** » Thu May 12, 2016 6:35 pm

bob wrote:Same applies to lockless pawn hashing as well.

How lockless hash works with the pawn hash table ?
Pawns evaluation scores are enough to XOR with the key ?

bob · Post by **bob** » Thu May 12, 2016 9:22 pm

hgm wrote:Well, a 64-bit key is usually not really 64 bit, as many of the bits are implied by the index. With 64-byte buckets and 1GB hash 24 bits of the key are implied. So storing 4-byte primary signature with the move and a 1-byte signature extension with the remaining data basically stores all signature you have. And instead of having to do an XOR on every entry in the bucket before comparing its thus decoded signature with the sought one, you woud just do normal 32-bit compares on the plain signature, and only a single additional byte compare on the signature extension.

Of course you could try to store another position-identifying quantity in the place of the key bits implied by the index. In KingSlayer the index is taken from the low-order bits of the key, and I store the XOR of that key and the (incrementally updated) pstEval in a 64-bit signature field.

How does that solve the problem of non-atomic stores for things > 8 bytes long (or even for things less than 8 bytes long if not properly aligned)???

bob · Post by **bob** » Thu May 12, 2016 9:25 pm

hgm wrote:The point is that this SMP-induced corruption of hash entries will not cause any effects you would not have on ordinary key collisions. So 'solutions' that increase the latter just make things worse, and going to a shorter signature to squeeze the hash entry in 8 bytes is thus not the way to go. You would have more bad hash moves than ever, so you would still have to vet those.

So if you want to get rid of the need to validate the hash move, you will need a signature longer than 4 bytes, and this makes it impossible to cram the complete hash entry in an atomic 8 bytes. You could put a 6-byte signature and 2-byte move in there, however, and store score and depth (and perhaps age) non-atomically elsewhere in the bucket (the bound-type flags can usually fit with the move). You don't care so much about these getting corrupted.

An alternative would be to store 4-bytes of the signature, 2-byte score and move+flags in one 8-byte word, and another signature byte plus a depth byte in another word. Like
Code: Select all
typedef struct &#123;
  int32 signature;
  int16 score;
  int16 move;
&#125; Atomic; // 8 byte

typedef struct &#123;
  char depth;
  char signature_ext;
&#125; Extended; // 2 byte

typedef struct &#123;
  Atomic ssm&#91;6&#93;;
  Extended ex&#91;6&#93;;
  char filler&#91;2&#93;;
&#125; HashEntry;
The signature_ext would only have to be tested in case the 4-byte signature matches.

The problem is, the corruption is FAR more common when you have 20 different threads storing things at every node. You end up with lots of positions with key from position A but the data (best move and such) from position B. A signature collision is rare, even with 64 bit signatures, but the non-atomic writes turn into a killer problem. I can turn off lockless hashing and verify every hash move and get a BUNCH of errors every single search. If I only use one thread, I can search for days with no reported illegal moves found.

bob · Post by **bob** » Thu May 12, 2016 9:28 pm

Patrice Duhamel wrote:
bob wrote:Same applies to lockless pawn hashing as well.
How lockless hash works with the pawn hash table ?
Pawns evaluation scores are enough to XOR with the key ?

I XOR everything in the pawn hash using local memory, then write it to the hash table. To probe I copy to local memory, then XOR and see if the sig matches. You could get by with just including things that would cause a crash if they are from the wrong position (i.e. for me, a mask identifying files with passed pawns was a critical one that would cause an infinite eval loop if there was no pawn on the file the mask claimed there was. I fixed this (at some cost) when I tested with the 8-byte hashing idea late last year, but the lockless XOR fixes it simply.

lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing