How could a compiler break the lockless hashing method?

syzygy · Post by **syzygy** » Tue Dec 10, 2013 8:38 pm

rbarreira wrote:Yeah I was totally wrong, see my other reply.

I think your other reply is not correct either, depending on what you mean by "the mere potential for a data race" and "an actual race". The C11 definition of "data race" has a considerable amount of "mere potential" in it.

bob · Post by **bob** » Tue Dec 10, 2013 8:47 pm

syzygy wrote:
rbarreira wrote:Yeah I was totally wrong, see my other reply.
I think your other reply is not correct either, depending on what you mean by "the mere potential for a data race" and "an actual race". The C11 definition of "data race" has a considerable amount of "mere potential" in it.

Seems to be irrelevant since the compiler can't see races anyway...

syzygy · Post by **syzygy** » Tue Dec 10, 2013 8:49 pm

bob wrote:
hgm wrote:So what is the bottom line here? Is declaring the hash table entry as 'volatile' sufficient guarantee against the compiler subverting the code?

(Of course I would write the code as making a one-time copy of the entry, testing that one for validity, and using it when it passed:
Code: Select all
HashEntry entry = hashTable[index^i];
for(i=0; i<4; i++) {
  if((entry.signature ^ entry.data) == hashKey) break;
}
...
draft = entry.data.draft;
... // etc.
)
I think volatile will make it worse. The problem is the time span between fetching one of the words and the other. Each time you fetch the second again, the time span has increased, allowing another thread to store over that value. It doesn't cause any significant errors if that does happen, but lockless hashing was developed to eliminate most of those.

For the xor-trick to work correctly, word1 and word2 should only be fetched once. It seems making the entry volatile would indeed achieve that. I do not see how making the entry volatile could negatively affect performance (unless it happens to be more efficient to load twice, but clearly we don't want that to happen even if it were faster).

In general making a variable volatile can of course negatively affect efficiency, but in this specific situation, where you really want to load the entry only once and you only write it once, I don't see it.

If the entries are made volatile, I don't see anymore how a compiler could break the xor-trick. The code might still formally not be in compliance with the standard (in this case the pthread standard, because C99 is silent on it and you don't use C11), but I don't see how an optimisation could break it.

bob · Post by **bob** » Tue Dec 10, 2013 8:49 pm

Aaron Becker wrote:
bob wrote:
rbarreira wrote:
hgm wrote:
mvk wrote:I think you are confusing an unknown/undefined value with undefined behaviour. The latter is a technical term with a specific, well-defined, meaning. Reading a volatile variable isn't UB.
Well, it was said somewhere that race conditions are mentioned in the standard as a form of undefined behavior, and multiple threads doing reads and writes to the same memory is definitely able to produce race conditions. So if the compiler would see 'volatile' as evidence that some other thread or hardware might be altering the values, and it sees non-atomic accesses, it cannot exclude race conditions would occur. And it could grab that as an excuse to completely delete the accesses from the program, or make a call to fdisk, or whatever.
I don't think race conditions are undefined behavior in the C standard. They're just ignored by it, which is not the same thing.
Someone posted an excerpt from a recent version of the standard that specifically called races "undefined behavior".

Which I find amusing because no compiler on the planet can even recognize a race condition at compile time...
Well, what would you call them if not "undefined behavior"? The standard has to dictate what will happen, but the actions aren't ordered with respect to each other and we can't guarantee that the outcome will be the same as if one preceded the other, because it's not true.

It's not that the compiler detects undefined behavior and then goes off and does something insane on purpose. It's a warning that the results of unordered writes to memory by different threads aren't guaranteed to have any particular result and your ideas about what's correct or reasonable in those cases might not correspond to what actually happens.

The problem is that "some" seem to think that "undefined behavior" means the compiler can do anything it wants. Abort the program. Ignore the operation and optimize it out. As opposed to "just do the right thing" where the race may well be perfectly acceptable with no risk at all.

You are taking it as the latter, which I agree with. But supposedly if the compiler could actually recognize a true race, which it can't, then it could simply optimize that code out of the program entirely, which is NOT a good thing..

syzygy · Post by **syzygy** » Tue Dec 10, 2013 8:52 pm

bob wrote:Which I find amusing because no compiler on the planet can even recognize a race condition at compile time...

What is amusing is that you still think (or maybe don't think, but claim, for the sake of trolling) that defining something as UB only makes sense if it can be detected by the compiler.

hgm · Post by **hgm** » Tue Dec 10, 2013 8:52 pm

Aaron Becker wrote:It's not that the compiler detects undefined behavior and then goes off and does something insane on purpose.

Well, this statement is a bit at odds with what was seen in the strcpy thread. There it was taken as normal / self-evident that the compiler would intentionally do something insane on purpose, like exiting with an obscure error message.

I would not be very happy if my program exited any time I coded a non-atomic write to a volatile variable. A language / compiler that did that would not be very useful to me...

It was shown in an actual example that there already exist compilers that think it quite normal to insert an empty infinite loop in a program when they think the loop counter would suffer int overflow. So it seems insanity rules.

bob · Post by **bob** » Tue Dec 10, 2013 8:56 pm

syzygy wrote:
bob wrote:
hgm wrote:So what is the bottom line here? Is declaring the hash table entry as 'volatile' sufficient guarantee against the compiler subverting the code?

(Of course I would write the code as making a one-time copy of the entry, testing that one for validity, and using it when it passed:
Code: Select all
HashEntry entry = hashTable[index^i];
for(i=0; i<4; i++) {
  if((entry.signature ^ entry.data) == hashKey) break;
}
...
draft = entry.data.draft;
... // etc.
)
I think volatile will make it worse. The problem is the time span between fetching one of the words and the other. Each time you fetch the second again, the time span has increased, allowing another thread to store over that value. It doesn't cause any significant errors if that does happen, but lockless hashing was developed to eliminate most of those.
For the xor-trick to work correctly, word1 and word2 should only be fetched once. It seems making the entry volatile would indeed achieve that. I do not see how making the entry volatile could negatively affect performance (unless it happens to be more efficient to load twice, but clearly we don't want that to happen even if it were faster).

In general making a variable volatile can of course negatively affect efficiency, but in this specific situation, where you really want to load the entry only once and you only write it once, I don't see it.

If the entries are made volatile, I don't see anymore how a compiler could break the xor-trick. The code might still formally not be in compliance with the standard (in this case the pthread standard, because C99 is silent on it and you don't use C11), but I don't see how an optimisation could break it.

Depends on whether you use compiler knowledge or not. I explicitly used local variables. You don't have to, because without volatile, the compiler will STILL read from the tt once. If you use volatile, it will be problematic if you have multiple references to htable->word1 or word2 because those can't be optimized away using a register. So it depends on how you write the code.

syzygy · Post by **syzygy** » Tue Dec 10, 2013 9:18 pm

hgm wrote:
Aaron Becker wrote:It's not that the compiler detects undefined behavior and then goes off and does something insane on purpose.
Well, this statement is a bit at odds with what was seen in the strcpy thread. There it was taken as normal / self-evident that the compiler would intentionally do something insane on purpose, like exiting with an obscure error message.

The compiler is allowed to do that (i.e. it would not violate the standard). The compiler is not obliged to do it. If the compiler doesn't detect a particular instance of UB, it will obviously not specifically decide to format your hard drive. But the UB may still cause strange things to happen, which may include your hard drive being formatted, because the compiler only needs to produce code that works correctly in the absence of UB. The UB may cause any kind of random behavior.

I would not be very happy if my program exited any time I coded a non-atomic write to a volatile variable.

That is not UB. (If another thread also accesses the variable and you're not using synchronisation primitives that guarantee that the two accesses are in a program-defined order, then you do have UB.)

It was shown in an actual example that there already exist compilers that think it quite normal to insert an empty infinite loop in a program when they think the loop counter would suffer int overflow. So it seems insanity rules.

It also allows for useful optimisations. For example, because signed overflow "does not happen", the compiler "knows" that certain values are positive and does not have to sign extend when converting between integer types.

Anyway, maybe you don't want to use such a compiler, but maybe your code will still one day be compiled on such a compiler. So it still might be a good idea to not have overflowing signed integers in your code.

syzygy · Post by **syzygy** » Tue Dec 10, 2013 9:19 pm

bob wrote:
syzygy wrote:
bob wrote:
hgm wrote:So what is the bottom line here? Is declaring the hash table entry as 'volatile' sufficient guarantee against the compiler subverting the code?

(Of course I would write the code as making a one-time copy of the entry, testing that one for validity, and using it when it passed:
Code: Select all
HashEntry entry = hashTable[index^i];
for(i=0; i<4; i++) {
  if((entry.signature ^ entry.data) == hashKey) break;
}
...
draft = entry.data.draft;
... // etc.
)
I think volatile will make it worse. The problem is the time span between fetching one of the words and the other. Each time you fetch the second again, the time span has increased, allowing another thread to store over that value. It doesn't cause any significant errors if that does happen, but lockless hashing was developed to eliminate most of those.
For the xor-trick to work correctly, word1 and word2 should only be fetched once. It seems making the entry volatile would indeed achieve that. I do not see how making the entry volatile could negatively affect performance (unless it happens to be more efficient to load twice, but clearly we don't want that to happen even if it were faster).

In general making a variable volatile can of course negatively affect efficiency, but in this specific situation, where you really want to load the entry only once and you only write it once, I don't see it.

If the entries are made volatile, I don't see anymore how a compiler could break the xor-trick. The code might still formally not be in compliance with the standard (in this case the pthread standard, because C99 is silent on it and you don't use C11), but I don't see how an optimisation could break it.
Depends on whether you use compiler knowledge or not. I explicitly used local variables. You don't have to, because without volatile, the compiler will STILL read from the tt once. If you use volatile, it will be problematic if you have multiple references to htable->word1 or word2 because those can't be optimized away using a register. So it depends on how you write the code.

I assume the code is written by someone using his brain, yes. The point that volatile fixes the issue should already have been clear, but I know you're in a particular, let's say undefined, mode.

Code: Select all

--- hash.c.old	2013-12-10 21:31:28.130379051 +0100
+++ hash.c	2013-12-10 21:25:57.153319362 +0100
@@ -58,7 +58,7 @@
  */
 int HashProbe(TREE * RESTRICT tree, int ply, int depth, int wtm, int alpha,
     int beta, int *value) {
-  HASH_ENTRY *htable;
+  volatile HASH_ENTRY *htable;
   HPATH_ENTRY *ptable;
   uint64_t word1, word2, temp_hashkey;
   int type, draft, avoid_null = 0, val, entry, i, j;

Fixed.

bob · Post by **bob** » Tue Dec 10, 2013 10:25 pm

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:
hgm wrote:So what is the bottom line here? Is declaring the hash table entry as 'volatile' sufficient guarantee against the compiler subverting the code?

(Of course I would write the code as making a one-time copy of the entry, testing that one for validity, and using it when it passed:
Code: Select all
HashEntry entry = hashTable[index^i];
for(i=0; i<4; i++) {
  if((entry.signature ^ entry.data) == hashKey) break;
}
...
draft = entry.data.draft;
... // etc.
)
I think volatile will make it worse. The problem is the time span between fetching one of the words and the other. Each time you fetch the second again, the time span has increased, allowing another thread to store over that value. It doesn't cause any significant errors if that does happen, but lockless hashing was developed to eliminate most of those.
For the xor-trick to work correctly, word1 and word2 should only be fetched once. It seems making the entry volatile would indeed achieve that. I do not see how making the entry volatile could negatively affect performance (unless it happens to be more efficient to load twice, but clearly we don't want that to happen even if it were faster).

In general making a variable volatile can of course negatively affect efficiency, but in this specific situation, where you really want to load the entry only once and you only write it once, I don't see it.

If the entries are made volatile, I don't see anymore how a compiler could break the xor-trick. The code might still formally not be in compliance with the standard (in this case the pthread standard, because C99 is silent on it and you don't use C11), but I don't see how an optimisation could break it.
Depends on whether you use compiler knowledge or not. I explicitly used local variables. You don't have to, because without volatile, the compiler will STILL read from the tt once. If you use volatile, it will be problematic if you have multiple references to htable->word1 or word2 because those can't be optimized away using a register. So it depends on how you write the code.
I assume the code is written by someone using his brain, yes. The point that volatile fixes the issue should already have been clear, but I know you're in a particular, let's say undefined, mode.
Code: Select all
--- hash.c.old	2013-12-10 21:31:28.130379051 +0100
+++ hash.c	2013-12-10 21:25:57.153319362 +0100
@@ -58,7 +58,7 @@
  */
 int HashProbe(TREE * RESTRICT tree, int ply, int depth, int wtm, int alpha,
     int beta, int *value) {
-  HASH_ENTRY *htable;
+  volatile HASH_ENTRY *htable;
   HPATH_ENTRY *ptable;
   uint64_t word1, word2, temp_hashkey;
   int type, draft, avoid_null = 0, val, entry, i, j;
Fixed.

It is only fixed because of the way I wrote my code.

Just replace all occurrences of word1 with htable->word1 and you have not fixed anything at all. That was my point. I had specifically written the code to force the compiler to do it my way. And I later chose to not use volatile on the hash entry because there are lots of places that use the thing and I liked the idea of reading once and then using local variables or registers beyond that point...

As I said, if you pay attention to the details, this can work. If you just use htable->word1 and htable->word2 everywhere, volatile makes it worse, not better, because then the early reference can not be maintained in a register for later use, you get a memory access each and every time.

How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?