How could a compiler break the lockless hashing method?

syzygy · Post by **syzygy** » Mon Dec 09, 2013 8:06 pm

bob wrote:So what is your point?

You have already shown my point, over and over and over and over again. It does not need repeating.

wgarvin · Post by **wgarvin** » Mon Dec 09, 2013 9:12 pm

bob wrote:2. I am certainly betting that the compiler, in its "common subexpression" optimization pass, will realize that going BACK to memory is about a thousand times worse than recomputing a typical common subexpression. And it will NOT generate multiple memory accesses.

Okay, wait a second there -- if we're talking about a compiler that is compiling "single threaded" code, and it knows it just accessed the memory location recently, why would it NOT reload it from memory instead of recomputing the expression? L1 cache is extremely fast, and rematerialization might look cheaper to it than spilling since it doesn't have to spill the value it loaded the first time if it just re-loads it. Compilers, even smart ones, don't always make optimal decisions about what to keep live in a register, and e.g. I think clang aggressively splits live ranges to try and make better use of registers around loops etc.

I mean, in this specific instance you might be right (I wouldn't bet against it) but I can at least imagine a theoretical compiler deciding, for perfectly sensible-seeming reasons, to load that value twice from memory instead of once. And if another thread mutated it in-between, your XOR safety check might test the first value but the engine end up actually using the second.

Does the XOR trick work today, in practice, on the compilers it has been tried on? Yes it does. Its a clever and useful trick.

But is it guaranteed to keep working on future compilers? That answer is not so clear. My guess is 'no', but the risk seems minimal at least for now. Until someone actually finds an example of a compiler breaking this code, its hard to know how likely it is to happen in practice.

bob · Post by **bob** » Mon Dec 09, 2013 9:45 pm

syzygy wrote:
bob wrote:So what is your point?
You have already shown my point, over and over and over and over again. It does not need repeating.

In other words, "the empty set"?

bob · Post by **bob** » Mon Dec 09, 2013 10:20 pm

wgarvin wrote:
bob wrote:2. I am certainly betting that the compiler, in its "common subexpression" optimization pass, will realize that going BACK to memory is about a thousand times worse than recomputing a typical common subexpression. And it will NOT generate multiple memory accesses.
Okay, wait a second there -- if we're talking about a compiler that is compiling "single threaded" code, and it knows it just accessed the memory location recently, why would it NOT reload it from memory instead of recomputing the expression? L1 cache is extremely fast, and rematerialization might look cheaper to it than spilling since it doesn't have to spill the value it loaded the first time if it just re-loads it. Compilers, even smart ones, don't always make optimal decisions about what to keep live in a register, and e.g. I think clang aggressively splits live ranges to try and make better use of registers around loops etc.

I mean, in this specific instance you might be right (I wouldn't bet against it) but I can at least imagine a theoretical compiler deciding, for perfectly sensible-seeming reasons, to load that value twice from memory instead of once. And if another thread mutated it in-between, your XOR safety check might test the first value but the engine end up actually using the second.

Does the XOR trick work today, in practice, on the compilers it has been tried on? Yes it does. Its a clever and useful trick.

But is it guaranteed to keep working on future compilers? That answer is not so clear. My guess is 'no', but the risk seems minimal at least for now. Until someone actually finds an example of a compiler breaking this code, its hard to know how likely it is to happen in practice.

The expression is this:

sam

which happens to be a memory reference. If a compiler sees this:

a=a+sam;
b=b+sam;

it will load same exactly one time for obvious reasons. It is actually worse than an expression, particularly when it has to resort to L2, L3 or even main memory.

Nothing says that the variable is, or is not modified in another thread. If it is important to detect that change, one should add volatile to the declaration, then the above can no longer be optimized at all and two memory loads are now required even if there is no statement in between the two above.

Most think of GCSE as things like

extern int array[256];
extern int a, b;
int f(void)
{
int x, y;
x = (array[a]+array) / a;
y = (array[a]+array) / b;
return (x + y);
}

Where the array[a]+array is an expression. And a good compiler will compute that expression once and use it twice (assuming no volatile as you can see. But remove one of those two array references in both expressions, and the compiler still treats that as a GCSE case (Global Common Subexpression Elimination) and do the load once, which makes a lot of sense. You could make it cloudier by putting code in between those two lines, so that the compiler can no longer assume a cache hit on the second pair of loads, even...

In lockless hashing there is ALWAYS a tiny timing window. I have to read/write two 8-byte values and the two writes can not be combined, done as one with an atomic guarantee since we are stuck with a 64 bit bus in Intel. That means I can always load one old value and one new value even on successive movq instructions, because there is a "break" between the two instructions where another CPU can write both after I read the first but before I read the second.

So what are the risks exactly? If one does not do lockless hashing at all, I would expect to see mismatched pairs at maybe a rate of 100 out of every two billion nodes searched, using 8gb of hash and an 8 core box. This from actual measurements. Double the cores and that goes up quite a bit. Double hash size and it drops by 50% obviously.

The mismatched pairs only cause a problem if (a) the mismatched pairs decode to a hash signature that is real and is being matched against one that just happens to match. This is way out there in probability, around the same probability that two different positions hash to the same position, which is already quite low. If (a) happens, then we need the second part of the pair (with score and such) to have a draft that allows the entry to be used. Most don't, but there are lots of cases where this condition is met in normal hashing; If that condition is true, then we need the table bound to be a value that helps us avoid work. If I have an UPPER entry and the current alpha value is +100, and this entry's bound is +200, that doesn't help at all.

Bottom line is that a number of very low probability events have to happen before this entry even gets a chance to influence the final result. It is clearly lower than the probability of a 64 bit hash collision, because we require the 64 bit match + a usable draft AND a usable bound. If that happens, what is the result? Nothing worse than the original hash collision problem we all tolerate when we elect to use 64 bit signatures for a position that can not be uniquely represented using 64 bits. The paper Cozzie and I wrote show that even one false match every 1000 nodes has a minimal effect on a program, where this is a false match every N billion nodes where N is pretty big itself.

Bottom line, one can eliminate everything but the collisions by using locks. Or one can eliminate ALMOST everything but the collisions by using the lockless hash idea. Both produce the same result, but lockless has significantly lower overhead. Nothing breaks it. You can remove the XORs in crafty and it will continue to work. If you enable the "bad move from hash" warning, you will see more of those without lockless than with, but you might see 100 in 2 billion nodes. At 20M nodes per second, that is 1 per second at worst. If you enable lockless hashing it becomes more like maybe one every few hours, if ever. one hash error every 20M nodes won't touch the final score or move... Which makes it a totally safe algorithm so long as one has been careful with hash entries so that a bad move won't wreck things. Bad moves kill me, but I validate hash moves and killer moves to make sure I don't actually make an illegal move and corrupt things. Therefore, zero problems whatever stupid thing the compiler decides to do. Even removing the XORs just increases the frequency a bit, without any harmful effect.

hgm · Post by **hgm** » Tue Dec 10, 2013 11:20 am

So what is the bottom line here? Is declaring the hash table entry as 'volatile' sufficient guarantee against the compiler subverting the code?

(Of course I would write the code as making a one-time copy of the entry, testing that one for validity, and using it when it passed:

Code: Select all

HashEntry entry = hashTable[index^i];
for(i=0; i<4; i++) {
  if((entry.signature ^ entry.data) == hashKey) break;
}
...
draft = entry.data.draft;
... // etc.

)

rbarreira · Post by **rbarreira** » Tue Dec 10, 2013 11:35 am

hgm wrote:So what is the bottom line here? Is declaring the hash table entry as 'volatile' sufficient guarantee against the compiler subverting the code?

Yes. With volatile the compiler cannot assume that the memory stays unchanged, which is to say it cannot assume anything about that memory region at all.

hgm · Post by **hgm** » Tue Dec 10, 2013 11:44 am

So compilers are not allowed to 'reason' that the fact that the memory is volatile makes any (non-atomic) reading of it undefined, so that it would be much more efficient to replace it by a call to exit(), because that is probably what the programmer meant?

mvk · Post by **mvk** » Tue Dec 10, 2013 11:53 am

hgm wrote:So compilers are not allowed to 'reason' that the fact that the memory is volatile makes any (non-atomic) reading of it undefined, so that it would be much more efficient to replace it by a call to exit(), because that is probably what the programmer meant?

I think you are confusing an unknown/undefined value with undefined behaviour. The latter is a technical term with a specific, well-defined, meaning. Reading a volatile variable isn't UB.

rbarreira · Post by **rbarreira** » Tue Dec 10, 2013 11:53 am

hgm wrote:So compilers are not allowed to 'reason' that the fact that the memory is volatile makes any (non-atomic) reading of it undefined, so that it would be much more efficient to replace it by a call to exit(), because that is probably what the programmer meant?

It's the other way around.

The point of volatile memory is to let the compiler know that entities external to the program can modify that memory and be affected by specific writes to that memory. So it's restricting the compiler, not giving it more freedom.

hgm · Post by **hgm** » Tue Dec 10, 2013 12:05 pm

OK, great. But then I don't understand what the issue of this thread is about. It seems to me that declaring variables that could be written by other threads as 'volatile' is quite natural, and actually intended use of 'volatile'. So I cannot see any reason not to do it, and if you do it, there apparently cannot be any problem. Or is there a performance penalty in declaring things as 'volatile' (e.g. because it will try to make accesses to it atomic by introducing its own locks around them).

Or is the problem the specific way it is programmed in Crafty, not making a copy, but following the pointer twice, relying on the optimizer having made the copy in the register anyway. The latter seems risky. Doing a second memory (= L1 cache) access is actually the fastest way to do this on many x86 CPUs; code that only uses architectural registers often runs very slow there, and you seem to need a mix of memory operands and register operands to be able to reach maximum throughput. If I would hand-optimize the code I probably would opt for a second memory access. Especially if the 64-bit data part was in a register, but I needed only a single byte of it that stored the draft. Just reading the required byte from memory would definitely consume fewer CPU resources than shifting and masking the value stored in the register.

How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?