OK, let's do this technically.syzygy wrote:You don't see a problem, yet you need to argue that a good compiler won't do this anti-optimization.bob wrote:Don't see any problem at all. Doesn't change a thing. Next line after exiting the loop access word1, so it has to load word 1 anyway. This is a bad optimization, btw, it saves nothing whatsoever and adds the extra code to load word one after the loop breaks rather than within the loop. Don't need that kind of anti-optimization and I don't believe any good compiler would choose to do something so broken...syzygy wrote:I can only assume you did not read.bob wrote:Doesn't break a thing, however.syzygy wrote:Unless I'm mistaken, the C standard allows the compiler to load a value multiple times. Since the C99 standard does not know about threads, it may load the hash entry, do the xor-trick to verify that the entry can be used, and load the hash entry again. Between the two loads the hash entry may change its content due to other threads. There you go.
Think about it.
A compiler can legally rewrite:asCode: Select all
word1 = htable->word1; word2 = htable->word2 ^ word1; if (word2 == temp_hashkey) break;
See the problem?Code: Select all
word2 = htable->word2 ^ htable->word1; if (word2 == temp_hashkey) break; word1 = htable->word1;
I suppose this means that you DO see the problem?
So this is what I should have written:Compare to your hash.c.Code: Select all
word2 = htable->word2 ^ htable->word1; if (word2 == temp_hashkey) break; } if (entry < 4) { word1 = htable->word1; ...
This rewrite is legal. The compiler may take your code and effectively produce the above code.
This rewrite breaks lockless hashing.
On architectures with few registers, a compiler may actually do this to avoid having to spill word1 onto the stack. Why store word1 on the stack when it can just as easily be reloaded from htable->word1?
Whether we can find an existing compiler for x86 that actually does this is not so interesting. What matters is that your code is broken.
And you really do not realise that reloading defeats the purpose of the xor-trick?Again, won't change a thing. I could even FORCE it to re-read it later by referring to it as volatile. But I don't care...![]()
![]()
Here is the actual code, again:
for (entry = 0; entry < 4; entry++, htable++) {
word1 = htable->word1;
word2 = htable->word2 ^ word1;
if (word2 == temp_hashkey)
break;
}
...
...
...
use word1 to extract draft and such later, as in right here.
A compiler has three choices. I'll be the compiler and "explain" what I see and what I will do.
choice 1. Before I can do ANYTHING inside that loop, I have to read htable->word1 AND table->word2. I now have them in registers. I do the xor, and the comparison, and if it matches I break out of the loop. At this point, htable->word1, htable->word2, and htable itself are all in registers. I now notice there is no future references to either htable->word2 or htable and I free up those registers. I keep the register with htable->word1 for use when the code extracts draft and such. Two memory accesses to the table, period.
choice 2. Registers are tight. Again I have to load htable->word1 and htable->word2 into registers. but when I exit the loop, I can't keep htable->word1 in a register, so I spill it to local memory (variable named just word1 in my code) When I need that value, I load it again from memory. Three memory reads, rather than two. Slower.
choice 3. The choice you think can happen, by the way. Registers are tight, so again I load htable->word1 and htable->word2 in registers to do the comparison. But when I get the match, I can't save word1, and I don't spill it to the local data, I will just reload from scratch when it is needed. I now need it. I obviously do NOT have htable in a register, since it is NOT needed after the loop, otherwise I would have simply put word1 in that register and kept it instead. So I go back to the loop and when I finish I spill htable to a local variable, and just drop the other two. Now I need word1. I first go to memory to get the pointer (stored in htable) and then use that to go to memory to fetch htable->word1. How does that compare? two memory reads as always, a memory write, two MORE memory reads. That is slower than if it had just done EXACTLY what my code says do, as is done with -O0.
It is an anti-optimization. It makes the code slower. Choice one is what is actually done by gcc, and you can verify that easily enough with -S as I did. On a 32 bit box, choice 2 might happen since there are fewer registers (8 less) to use. Choice 3 will NEVER happen because the purpose of the optimizer is not to make the code slower.
Now how about stopping with "the compiler could do this or could do that." It could actually do most anything, but in most of such cases it is broken and will eventually be fixed. This would be one of those cases.
The rewrite never happens. Although I could force it by just using htable->word1 EVERYWHERE, AND declaring that word1 is volatile. Then it has to do exactly what you suggest, and it also runs slower. But I didn't declare it volatile. And rewriting as you suggest is simply broken because it makes the code slower than if compiled with no optimization at all...
I suppose if you want to hang your hat on a compiler bug, then yes, lockless hashing can be broken. As can a truly locked implementation, because compilers have been known to move the code that acquires or clears locks around to make the code faster. That will also break a locked hashing approach. But who cares? I can't do a thing about compiler bugs. I can only plan on what is reasonable. And what you suggest is not going to happen, for the reasons I gave.