How could a compiler break the lockless hashing method?

rbarreira · Post by **rbarreira** » Tue Dec 10, 2013 7:16 pm

hgm wrote:
mvk wrote:I think you are confusing an unknown/undefined value with undefined behaviour. The latter is a technical term with a specific, well-defined, meaning. Reading a volatile variable isn't UB.
Well, it was said somewhere that race conditions are mentioned in the standard as a form of undefined behavior, and multiple threads doing reads and writes to the same memory is definitely able to produce race conditions. So if the compiler would see 'volatile' as evidence that some other thread or hardware might be altering the values, and it sees non-atomic accesses, it cannot exclude race conditions would occur. And it could grab that as an excuse to completely delete the accesses from the program, or make a call to fdisk, or whatever.

I don't think race conditions are undefined behavior in the C standard. They're just ignored by it, which is not the same thing.

rbarreira · Post by **rbarreira** » Tue Dec 10, 2013 7:39 pm

Yeah that came out wrong. What I should have said is that the mere potential for a data race is not undefined. Only an actual race is.

AlvaroBegue · Post by **AlvaroBegue** » Tue Dec 10, 2013 7:42 pm

rbarreira wrote:
hgm wrote:
mvk wrote:I think you are confusing an unknown/undefined value with undefined behaviour. The latter is a technical term with a specific, well-defined, meaning. Reading a volatile variable isn't UB.
Well, it was said somewhere that race conditions are mentioned in the standard as a form of undefined behavior, and multiple threads doing reads and writes to the same memory is definitely able to produce race conditions. So if the compiler would see 'volatile' as evidence that some other thread or hardware might be altering the values, and it sees non-atomic accesses, it cannot exclude race conditions would occur. And it could grab that as an excuse to completely delete the accesses from the program, or make a call to fdisk, or whatever.
I don't think race conditions are undefined behavior in the C standard. They're just ignored by it, which is not the same thing.

The draft of the C11 standard, under "5.1.2.4 Multi-threaded executions and data races", says

25 The execution of a program contains a data race if it contains two conflicting actions in
different threads, at least one of which is not atomic, and neither happens before the
other. Any such data race results in undefined behavior.

rbarreira · Post by **rbarreira** » Tue Dec 10, 2013 7:47 pm

AlvaroBegue wrote:
rbarreira wrote:
hgm wrote:
mvk wrote:I think you are confusing an unknown/undefined value with undefined behaviour. The latter is a technical term with a specific, well-defined, meaning. Reading a volatile variable isn't UB.
Well, it was said somewhere that race conditions are mentioned in the standard as a form of undefined behavior, and multiple threads doing reads and writes to the same memory is definitely able to produce race conditions. So if the compiler would see 'volatile' as evidence that some other thread or hardware might be altering the values, and it sees non-atomic accesses, it cannot exclude race conditions would occur. And it could grab that as an excuse to completely delete the accesses from the program, or make a call to fdisk, or whatever.
I don't think race conditions are undefined behavior in the C standard. They're just ignored by it, which is not the same thing.
The draft of the C11 standard, under "5.1.2.4 Multi-threaded executions and data races", says
25 The execution of a program contains a data race if it contains two conflicting actions in
different threads, at least one of which is not atomic, and neither happens before the
other. Any such data race results in undefined behavior.

Yeah I was totally wrong, see my other reply.

In light of that, I may have to backtrack on my claim that volatile completely fixes lockless hashing (if that's what I said). But it does prevent the xor-cancellation optimization.

bob · Post by **bob** » Tue Dec 10, 2013 8:17 pm

Daniel Shawul wrote:I wonder what the innovation is with the lockless hashing scheme. Accidental data corruption, be it due to concurrent data modification as is the case here ,or data transfer through a noisy network etc, can be checked and handled via cyclic redundancy checks (CRC). Ofcourse this will not prevent deliberate attacks. But it seems to me the lockless hashing scheme is pretty much the same as a simple CRC check that XORs hash signatures of the data and appends the key at the end. This is especially obvious when the hash entry has more than 128 bits of data, that requires a a chain of XORing. So what is the difference?

It is simple, requiring 1 cycle (xor is very fast). It prevents corruption when you have two different pieces of two different entries that appear to belong together but do not.

It is not earth-shattering. It is just a simple solution to an aggravating problem.

bob · Post by **bob** » Tue Dec 10, 2013 8:20 pm

hgm wrote:So what is the bottom line here? Is declaring the hash table entry as 'volatile' sufficient guarantee against the compiler subverting the code?

(Of course I would write the code as making a one-time copy of the entry, testing that one for validity, and using it when it passed:
Code: Select all
HashEntry entry = hashTable[index^i];
for(i=0; i<4; i++) {
  if((entry.signature ^ entry.data) == hashKey) break;
}
...
draft = entry.data.draft;
... // etc.
)

I think volatile will make it worse. The problem is the time span between fetching one of the words and the other. Each time you fetch the second again, the time span has increased, allowing another thread to store over that value. It doesn't cause any significant errors if that does happen, but lockless hashing was developed to eliminate most of those.

If the compiler is stupid, anything can happen. None I have checked in the past two days do anything other than what my code tells them to, load both words back-to-back and then not reload them from the shared hash table again, period.

bob · Post by **bob** » Tue Dec 10, 2013 8:23 pm

rbarreira wrote:
hgm wrote:
mvk wrote:I think you are confusing an unknown/undefined value with undefined behaviour. The latter is a technical term with a specific, well-defined, meaning. Reading a volatile variable isn't UB.
Well, it was said somewhere that race conditions are mentioned in the standard as a form of undefined behavior, and multiple threads doing reads and writes to the same memory is definitely able to produce race conditions. So if the compiler would see 'volatile' as evidence that some other thread or hardware might be altering the values, and it sees non-atomic accesses, it cannot exclude race conditions would occur. And it could grab that as an excuse to completely delete the accesses from the program, or make a call to fdisk, or whatever.
I don't think race conditions are undefined behavior in the C standard. They're just ignored by it, which is not the same thing.

Someone posted an excerpt from a recent version of the standard that specifically called races "undefined behavior".

Which I find amusing because no compiler on the planet can even recognize a race condition at compile time...

bob · Post by **bob** » Tue Dec 10, 2013 8:25 pm

rbarreira wrote:
AlvaroBegue wrote:
rbarreira wrote:
hgm wrote:
mvk wrote:I think you are confusing an unknown/undefined value with undefined behaviour. The latter is a technical term with a specific, well-defined, meaning. Reading a volatile variable isn't UB.
Well, it was said somewhere that race conditions are mentioned in the standard as a form of undefined behavior, and multiple threads doing reads and writes to the same memory is definitely able to produce race conditions. So if the compiler would see 'volatile' as evidence that some other thread or hardware might be altering the values, and it sees non-atomic accesses, it cannot exclude race conditions would occur. And it could grab that as an excuse to completely delete the accesses from the program, or make a call to fdisk, or whatever.
I don't think race conditions are undefined behavior in the C standard. They're just ignored by it, which is not the same thing.
The draft of the C11 standard, under "5.1.2.4 Multi-threaded executions and data races", says
25 The execution of a program contains a data race if it contains two conflicting actions in
different threads, at least one of which is not atomic, and neither happens before the
other. Any such data race results in undefined behavior.
Yeah I was totally wrong, see my other reply.

In light of that, I may have to backtrack on my claim that volatile completely fixes lockless hashing (if that's what I said). But it does prevent the xor-cancellation optimization.

The compiler can't do the xor-cancellation anyway. This is a global variable. The compiler can not see all the other places that access the variable. Removing an xor in one place would break all the others. The compiler can only do such optimizations if it can see the variable for its entire life-time, which in C it can't do due to separate files.

bob · Post by **bob** » Tue Dec 10, 2013 8:28 pm

Daniel Shawul wrote:
I used to think of it as something different until I needed it for bigger data structures in nebiyu
I also belive that the lockless hashing scheme as described in the paper was meant for 128bit hash entry, not something claimed by mouth. If indeed the latter was true, the connection with CRC is unmissable. With two 64-bits it is easy to miss the connection, otherwise it is mandatory to have a reference to an exisitng CRC literature that it is virtually identical to. 'You can't have your cake and eat it too' describes what is going on here well.

Not sure what you mean. I use it also for my pawn hash table which is far longer than 128 bytes per entry...

Aaron Becker · Post by **Aaron Becker** » Tue Dec 10, 2013 8:35 pm

bob wrote:
rbarreira wrote:
hgm wrote:
mvk wrote:I think you are confusing an unknown/undefined value with undefined behaviour. The latter is a technical term with a specific, well-defined, meaning. Reading a volatile variable isn't UB.
Well, it was said somewhere that race conditions are mentioned in the standard as a form of undefined behavior, and multiple threads doing reads and writes to the same memory is definitely able to produce race conditions. So if the compiler would see 'volatile' as evidence that some other thread or hardware might be altering the values, and it sees non-atomic accesses, it cannot exclude race conditions would occur. And it could grab that as an excuse to completely delete the accesses from the program, or make a call to fdisk, or whatever.
I don't think race conditions are undefined behavior in the C standard. They're just ignored by it, which is not the same thing.
Someone posted an excerpt from a recent version of the standard that specifically called races "undefined behavior".

Which I find amusing because no compiler on the planet can even recognize a race condition at compile time...

Well, what would you call them if not "undefined behavior"? The standard has to dictate what will happen, but the actions aren't ordered with respect to each other and we can't guarantee that the outcome will be the same as if one preceded the other, because it's not true.

It's not that the compiler detects undefined behavior and then goes off and does something insane on purpose. It's a warning that the results of unordered writes to memory by different threads aren't guaranteed to have any particular result and your ideas about what's correct or reasonable in those cases might not correspond to what actually happens.

How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?