I made no such claim. My claim is that the compiler is required to semantically do EXACTLY what I say.Rein Halbersma wrote:No, the burden of proof is on you. You claim that the compiler has no choice but to emit a sane set of instructions that give you the expected result. However, both the C11 and C++11 Standard leave no doubt that reads and writes from different threads to the same variable constitute undefined behavior.bob wrote: So how about stopping with the hand waving and simply explain how to break a hash probe where I get an 8 byte signature that does not go with the corresponding 8 bytes of score and stuff. Then we can talk.
I want to write w1 and w2 to memory, w1 and w2 are 8 byte values. I simply write w1^w2 for the first value, and w2 for the second. If anything else gets written, I won't get a match since w1 (original signature) xored with anything other than the original score/etc will NOT produce the original signature to give me a match.
It really is that simple. You are reading far more into this than there is. The only issue here occurs when two threads write to the same address, one writes {a1, a2} as the two 8 byte values, the other writes {b1,b2} as the two values. There are exactly 4 possible outcomes. After both complete the writes, memory contains one of the following pairs of words: {a1, a2}, {a1, b2}, {b1,a2}, {b1, b2}
{a1, a2} will be recognized, as will {b1, b2}. The other two will not decode to valid signatures and no match occurs.
Now exactly how hard is that to grasp? If you have some bizarre hardware that writes byte by byte rather than double word by double word, still works perfectly. If your hardware writes bit by bit, STILL works perfectly.
Compiler can't eliminate ANYTHING. It won't even KNOW about the race.
You might get away with it for a long time on many different architectures, and no compiler will probably emit completely bogus instrutions. The most likely failure scenario that I can think of is that the compiler will eliminate the offending code in its entirety. The code example posted by Ronald is very instructive
Don't know what planet you compile on. Here's that run on my macbook:Running gcc -O0 will produce 31 on architectures where an int is 32-bits. However, with gcc -O2 and higher, the compiler will recognize that "i += i" yields signed overflow UB. It will then eliminate the entire expression, and further optimize this code to an infinite loop. Such a scenario is also possible to happen with your XOR trick on multicore machines. Compiler routinely optimize away UB code instructions. You need extra compiler instructions or code modifcations to get the compiler to do what you intended.Code: Select all
#include <stdio.h> int main() { int i, k = 0; for (i = 1; i > 0; i += i) k++; printf("k = %d\n", k); return 0; }
scrappy% cc -O -o tst tst.c
.scrappy% ./tst
k = 31
scrappy% cc -O2 -o tst tst.c
scrappy% ./tst
k = 31
scrappy% cc -O3 -o tst tst.c
scrappy% ./tst
k = 31
gcc version 4.7.3 (MacPorts gcc47 4.7.3_3)
Compilers do NOT always behave as you seem to imagine they do. BTW, this does break intel's compiler. But for the race condition, the compilers have no clue there is even a race in the first place.
You can detect such optimizations with -Wstrict-overflow=5 and gcc will warn you about it (and with -Werrors it will also fail to compile). Even better is to pass a flag -fwrapv and then you are guaranteed it will output CHAR_BIT * sizeof(int) - 1. Best of all is not to write such insane loops in the first place.