How could a compiler break the lockless hashing method?

rbarreira · Post by **rbarreira** » Tue Dec 10, 2013 1:04 pm

hgm wrote:It seems to me that declaring variables that could be written by other threads as 'volatile' is quite natural, and actually intended use of 'volatile'. So I cannot see any reason not to do it, and if you do it, there apparently cannot be any problem. Or is there a performance penalty in declaring things as 'volatile' (e.g. because it will try to make accesses to it atomic by introducing its own locks around them).

There is a performance penalty with volatile in general, not necessarily every instance of it. It's not because of locks, since volatile does not imply atomic access. Careful usage of volatile memory can prevent the loss of performance though (basically you write it and read only as much as you need to, e.g. caching it in strategic places where you don't care about the current value).

Note that you don't need volatile on every variable that is shared between threads - for example, if you use pthread mutexes to guard access to a variable, it doesn't need to be volatile as the pthread_mutex_lock call will be assumed to have side effects.

wgarvin · Post by **wgarvin** » Tue Dec 10, 2013 3:20 pm

hgm wrote:Or is the problem the specific way it is programmed in Crafty, not making a copy, but following the pointer twice, relying on the optimizer having made the copy in the register anyway. The latter seems risky. Doing a second memory (= L1 cache) access is actually the fastest way to do this on many x86 CPUs; code that only uses architectural registers often runs very slow there, and you seem to need a mix of memory operands and register operands to be able to reach maximum throughput. If I would hand-optimize the code I probably would opt for a second memory access. Especially if the 64-bit data part was in a register, but I needed only a single byte of it that stored the draft. Just reading the required byte from memory would definitely consume fewer CPU resources than shifting and masking the value stored in the register.

Yes, that was the point I was trying to make about L1 being so fast -- the compiler likely knows that reading from it a second time is quite cheap.

But without the volatile, even making a copy of the hash entry into a local variable, may not protect you from compiler optimizations. By alias analysis, it will know that nothing else has written to the original hash slot (since it considers only the "single threaded" code you are executing, and isn't aware of other threads). The compiler might choose to keep each member of the structure in a register, so the "copy" will consist of read(s) from the hash table into registers. It might then decide to split the live range of those registers and/or spill them to the stack. At that point it will notice that it can re-create the value by reading it straight from the original memory, and if it does that it doesn't have to actually spill the register to memory (so it saves a store if it rematerializes from the original location instead of loading from the local variable on the stack). Since it knows the original location was recently accessed, it knows it will hit in L1. It has to preserve the abstract C machine behavior only for a "single threaded" machine; it doesn't know or care that you might actually have other threads. It could easily turn the one read from your source code into two or more reads in the machine code.

Volatile is at least the partial solution, and since the hash really is accessed by multiple threads, marking it as volatile seems like a good idea.

Volatile means two things: (1) the compiler can't delete accesses that were in your source code or insert its own accesses that weren't in your source code (so it can't merge them or split them [edit: in particular, it can't hoist them out of loops!]), and (2) it can't re-order volatile accesses with respect to OTHER volatile accesses in your source code. Note that it can and will re-order them with respect to other computations including non-volatile loads and stores! But marking the hash as volatile should prevent any of the compiler shenanigans that I can think of that would break the XOR trick. At least, as long as your compiler implements volatile correctly...

Daniel Shawul · Post by **Daniel Shawul** » Tue Dec 10, 2013 5:08 pm

I wonder what the innovation is with the lockless hashing scheme. Accidental data corruption, be it due to concurrent data modification as is the case here ,or data transfer through a noisy network etc, can be checked and handled via cyclic redundancy checks (CRC). Ofcourse this will not prevent deliberate attacks. But it seems to me the lockless hashing scheme is pretty much the same as a simple CRC check that XORs hash signatures of the data and appends the key at the end. This is especially obvious when the hash entry has more than 128 bits of data, that requires a a chain of XORing. So what is the difference?

hgm · Post by **hgm** » Tue Dec 10, 2013 5:24 pm

wgarvin wrote:But without the volatile, even making a copy of the hash entry into a local variable, may not protect you from compiler optimizations.

Agreed. The compiler is well within its rights to eliminate the copy variable altogether, knowing it is a copy and no explicit change to both original and copy visible to it. And then replace every read access to the copy to a read access of the original. Which actually makes sense as an optimization on i386/x64.

If the only performance effect of volatile is that it cannot do such optimizations anymore, then there is of course no penalty at all if you did not intend it to make this optimization, in order not to break your code. It just means you shouldn't use it in places where you don't want to prevent this kind of optimization.

Daniel Shawul wrote:I wonder what the innovation is with the lockless hashing scheme. ...

Well, like you say, it is an ordinary checksum. The innovation, however, is that the checksum is not stored separately, but is merged with the hash signature. This means it can never be tested explicitly whether the entry is OK or a checksum violation, but that it entries that are checksum violations are overwhelmingly unlikely to be ever considered as hash hits. Not more often than any other key collision, at any rate.

rbarreira · Post by **rbarreira** » Tue Dec 10, 2013 5:37 pm

Daniel Shawul wrote:I wonder what the innovation is with the lockless hashing scheme. Accidental data corruption, be it due to concurrent data modification as is the case here ,or data transfer through a noisy network etc, can be checked and handled via cyclic redundancy checks (CRC). Ofcourse this will not prevent deliberate attacks. But it seems to me the lockless hashing scheme is pretty much the same as a simple CRC check that XORs hash signatures of the data and appends the key at the end. This is especially obvious when the hash entry has more than 128 bits of data, that requires a a chain of XORing. So what is the difference?

You could calculate a 64-bit hash of the data and XOR the key with that, instead of outright XORing with the key with the data (which assumes the data is just 64-bit).

hgm · Post by **hgm** » Tue Dec 10, 2013 5:56 pm

The calculation of a CRC (or whatever) key makes sense if the signature is smaller than the data to check. E.g. with a 16-byte hash entry, but only 32-bit signature, you would want to XOR that key with a checksum based on all 12 other bytes.

Daniel Shawul · Post by **Daniel Shawul** » Tue Dec 10, 2013 6:23 pm

Well, like you say, it is an ordinary checksum. The innovation, however, is that the checksum is not stored separately, but is merged with the hash signature. This means it can never be tested explicitly whether the entry is OK or a checksum violation, but that it entries that are checksum violations are overwhelmingly unlikely to be ever considered as hash hits. Not more often than any other key collision, at any rate.

I used to think of it as something different until I needed it for bigger data structures in nebiyu. It seems to me a weakened version of CRC, that does not generate hash_signatures for the data byte by byte. So this extra saving comes from giving up on some capability in error detection, which I concur maybe acceptable for the current application. If we think of the hashkey of the position as data, then the lock-less scheme simply XORs the data in 64bytes to generate a checksum that is stored (same as CRC). But we don't need to store part of the actual data again, because we can XOR back to get it. This saving will be negligible with a TT entry that has many elements. A real CRC generates hash-signatures byte by byte with something like crc32 = (crc32 >> 8) ^ crcTable[ (crc32 ^ byteBuf) & 0xFF ] so it is 'safer'. So with this thought, I think it is an optimization to an already existing solution i.e. CRC. If I have 100 64-bit elements in my hash table entry (hypothetical), the code will look like something below, making it obvious that it is a specialization of CRC.

Code: Select all

checksum=-1
for(i=0;i<100;i++)
      checksum ^= data64[i]

hMx · Post by **hMx** » Tue Dec 10, 2013 6:34 pm

hgm wrote: For me, a programming language is a way to instruct the hardware as to what it should do, not a blank cheque to the compiler writer to do whatever he dreams up in his crazy mind, and hope for the best.

Hiding behind a standard IMO is too feeble an excuse for abandoning common sense. Just doing something silly because the standard allows it, is malicious.

YES!

Otherwise it renders the compiler unusable for at least some real world scenarios, some of which occur on my daily job, e.g.

Daniel Shawul · Post by **Daniel Shawul** » Tue Dec 10, 2013 7:06 pm

I used to think of it as something different until I needed it for bigger data structures in nebiyu

I also belive that the lockless hashing scheme as described in the paper was meant for 128bit hash entry, not something claimed by mouth. If indeed the latter was true, the connection with CRC is unmissable. With two 64-bits it is easy to miss the connection, otherwise it is mandatory to have a reference to an exisitng CRC literature that it is virtually identical to. 'You can't have your cake and eat it too' describes what is going on here well.

hgm · Post by **hgm** » Tue Dec 10, 2013 7:12 pm

mvk wrote:I think you are confusing an unknown/undefined value with undefined behaviour. The latter is a technical term with a specific, well-defined, meaning. Reading a volatile variable isn't UB.

Well, it was said somewhere that race conditions are mentioned in the standard as a form of undefined behavior, and multiple threads doing reads and writes to the same memory is definitely able to produce race conditions. So if the compiler would see 'volatile' as evidence that some other thread or hardware might be altering the values, and it sees non-atomic accesses, it cannot exclude race conditions would occur. And it could grab that as an excuse to completely delete the accesses from the program, or make a call to fdisk, or whatever.

How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?