A note for C programmers

bob · Post by **bob** » Fri Dec 06, 2013 2:05 am

mvk wrote:
bob wrote:There is ABSOLUTELY no way my use of strcpy() will break.
Puhlease..... Crafty is full of buffer overflows. They are all over the place.

Ok, lets see:
Code: Select all
crafty `python -c 'print 5000*"x"'`
unable to open book file &#91;/opt/local/share/crafty/book.bin&#93; for "write".
learning is disabled
found computer opening book file &#91;/opt/local/share/crafty/bookc.bin&#93;.
Abort trap&#58; 6
Or, on Linux:
Code: Select all
./crafty-235-64-ja `python 5000*"x"'` 
EPD Kit revision date&#58; 1996.04.21
unable to open book file &#91;./books.bin&#93;.
&#40;info&#41; command line option "xxxxxxxxxxxx
&#91;... snip ...&#93;
xxxxxx"
Segmentation fault
Mind that on Debian this crap was patched downstream, in 2003...! But not upstream of course, because mr-know-it-all doesn't make bugs. It is ALWAYS somebody else's fault. Fact is, Crafty's use of strcpy(), and buffers in general, is not something to be proud of, and Apple in this case does a nice service to many of their customers to try out weed out programs written in this style from the eco system.

P.S.: This thread, and the counterpart on open-chess, reminds me of one of the writing by one of my professors:

EWD wrote:"We shall do a much better programming job, provided that we approach the task with a full appreciation of its tremendous difficulty, provided that we stick to modest and elegant programming languages, provided that we respect the intrinsic limitations of the human mind and approach the task as Very Humble Programmers."
By that standard, you have a long way to go.

You want to be arrogant, seems you are qualified. Not once have I said I didn't have bugs in Crafty, I said the strcpy() code in ReadPGN() has worked forever, and still works everywhere but Apple. This "never makes a mistake" is YOUR straw man argument, not mine. When I intentionally work around undefined behavior, I am certain of the result. Whatever I might have done unintentionally was just that, unintentional.

But carry on, I see no point in trying to discuss the point with you anyway, as in past cases...

bob · Post by **bob** » Fri Dec 06, 2013 2:11 am

Rein Halbersma wrote:
bob wrote: Sure it does. So what should they do, make the program abort? In the simple case of the above, where thread one writes to X at the same time thread two writes, there are exactly two possible outcomes, one of them writes last and that is what is stored. Which writes last is not defined. NO other behavior is reasonable, however, because these are two non-synchronized memory writes. 8 byte writes on the PC ARE atomic, so one thread can't write one bit, then another can write the second bit. But IF that happens, my lockless hashing works perfectly, still. There is no possible behavior you can cite here that will break it. Hence whatever happens is fine, it will work. Just so SOMETHING happens. Undefined does NOT imply "maybe we won't write anything at all" or "maybe we will write, but to a completely different memory address."

The undefined behavior in this case is which is written first. I don't care. Either can be written first without breaking my xor hash trick. The only thing that should NOT happen is for the vendor to do something stupid just because he recognizes there's a race. If you know what you are doing, the race doesn't hurt a thing.
The proper way to do this is to learn about the C11 atomics and relaxed memory orders as pointed out by Peter Österlund. There are a lot of exotic architectures out there with deep memory / cache hierarchies where the compiler might not map to the instructions you think it will, but instead entirely eliminate pieces of code it can prove to be undefined behavior.

I don't care what the compiler does, it can't break the lockless hashing code. It is as simple as that. If you want to even try to convince me otherwise, here's the challenge:

a hash entry is 16 bytes. I have to write it as 2 8-byte blocks, each of which is certainly guaranteed to be atomic in x86 since that is what is written across the bus in one cycle. The problem arises when the first 8 bytes (the signature) doesn't go with the second 8 bytes (the score, draft, best move and such). I simply xor the two pieces, and then do the two 8 byte memory writes. When I do a probe, I fetch both pieces, XOR them again, and only then do I do a signature match and use the results if it does match. Change either half by out-of-order writes, doesn't matter. Signature match will fail. Write 1 bit at a time, randomly from the two simultaneous 16 byte stores, no match. No failure.

Feel free to dream up some scheme that will break it. deep cache hierarchies have absolutely no effect on this. In-order-writes on the PC work. Out-of-order writes on the alpha work.

So how about stopping with the hand waving and simply explain how to break a hash probe where I get an 8 byte signature that does not go with the corresponding 8 bytes of score and stuff. Then we can talk.

rbarreira · Post by **rbarreira** » Fri Dec 06, 2013 8:59 am

Rein Halbersma wrote:
bob wrote: Sure it does. So what should they do, make the program abort? In the simple case of the above, where thread one writes to X at the same time thread two writes, there are exactly two possible outcomes, one of them writes last and that is what is stored. Which writes last is not defined. NO other behavior is reasonable, however, because these are two non-synchronized memory writes. 8 byte writes on the PC ARE atomic, so one thread can't write one bit, then another can write the second bit. But IF that happens, my lockless hashing works perfectly, still. There is no possible behavior you can cite here that will break it. Hence whatever happens is fine, it will work. Just so SOMETHING happens. Undefined does NOT imply "maybe we won't write anything at all" or "maybe we will write, but to a completely different memory address."

The undefined behavior in this case is which is written first. I don't care. Either can be written first without breaking my xor hash trick. The only thing that should NOT happen is for the vendor to do something stupid just because he recognizes there's a race. If you know what you are doing, the race doesn't hurt a thing.
The proper way to do this is to learn about the C11 atomics and relaxed memory orders as pointed out by Peter Österlund. There are a lot of exotic architectures out there with deep memory / cache hierarchies where the compiler might not map to the instructions you think it will, but instead entirely eliminate pieces of code it can prove to be undefined behavior.

To be fair... even if it is undefined behavior, the compiler is still required to generate correct code if only 1 thread is running. Which makes it quite unlikely for it to exploit the undefined behavior, unless it actively generates two code paths.

Rein Halbersma · Post by **Rein Halbersma** » Fri Dec 06, 2013 9:03 am

rbarreira wrote: To be fair... even if it is undefined behavior, the compiler is still required to generate correct code if only 1 thread is running. Which makes it quite unlikely for it to exploit the undefined behavior, unless it actively generates two code paths.

Yes with 1 thread there is no UB. The Standard quotes (both C and C++) explicitly use the phrase "different threads". Of course there is also no need for XOR tricks in the first place for single-threaded code. But you cannot draw any inference about the likelihood for behavior under multiple threads from this. It's UB and anything can happen, including that the program continues to work as intended. UB is not a guarantee of a crash. It's just the absence of any guarantee whatsoever.

rbarreira · Post by **rbarreira** » Fri Dec 06, 2013 9:06 am

Rein Halbersma wrote:
rbarreira wrote: To be fair... even if it is undefined behavior, the compiler is still required to generate correct code if only 1 thread is running. Which makes it quite unlikely for it to exploit the undefined behavior, unless it actively generates two code paths.
Yes with 1 thread there is no UB. The Standard quotes (both C and C++) explicitly use the phrase "different threads". Of course there is also no need for XOR tricks in the first place for single-threaded code. But you cannot draw any inference about the likelihood for behavior under multiple threads from this. It's UB and anything can happen, including that the program continues to work as intended. UB is not a guarantee of a crash. It's just the absence of any guarantee whatsoever.

That's both true and unrelated to what I said.

Rein Halbersma · Post by **Rein Halbersma** » Fri Dec 06, 2013 9:11 am

rbarreira wrote:
Rein Halbersma wrote:
rbarreira wrote: To be fair... even if it is undefined behavior, the compiler is still required to generate correct code if only 1 thread is running. Which makes it quite unlikely for it to exploit the undefined behavior, unless it actively generates two code paths.
Yes with 1 thread there is no UB. The Standard quotes (both C and C++) explicitly use the phrase "different threads". Of course there is also no need for XOR tricks in the first place for single-threaded code. But you cannot draw any inference about the likelihood for behavior under multiple threads from this. It's UB and anything can happen, including that the program continues to work as intended. UB is not a guarantee of a crash. It's just the absence of any guarantee whatsoever.
That's both true and unrelated to what I said.

Sorry if I missed what you were saying. Perhaps you could explain again what your point was exactly? How can you make statements about the likelihood of UB?

Rein Halbersma · Post by **Rein Halbersma** » Fri Dec 06, 2013 9:36 am

bob wrote: So how about stopping with the hand waving and simply explain how to break a hash probe where I get an 8 byte signature that does not go with the corresponding 8 bytes of score and stuff. Then we can talk.

No, the burden of proof is on you. You claim that the compiler has no choice but to emit a sane set of instructions that give you the expected result. However, both the C11 and C++11 Standard leave no doubt that reads and writes from different threads to the same variable constitute undefined behavior.

You might get away with it for a long time on many different architectures, and no compiler will probably emit completely bogus instrutions. The most likely failure scenario that I can think of is that the compiler will eliminate the offending code in its entirety. The code example posted by Ronald is very instructive

Code: Select all

#include <stdio.h> 

int main&#40;) 
&#123; 
  int i, k = 0; 

  for &#40;i = 1; i > 0; i += i&#41; 
    k++; 
  printf&#40;"k = %d\n", k&#41;; 
  return 0; 
&#125;

Running gcc -O0 will produce 31 on architectures where an int is 32-bits. However, with gcc -O2 and higher, the compiler will recognize that "i += i" yields signed overflow UB. It will then eliminate the entire expression, and further optimize this code to an infinite loop. Such a scenario is also possible to happen with your XOR trick on multicore machines. Compiler routinely optimize away UB code instructions. You need extra compiler instructions or code modifcations to get the compiler to do what you intended.

You can detect such optimizations with -Wstrict-overflow=5 and gcc will warn you about it (and with -Werrors it will also fail to compile). Even better is to pass a flag -fwrapv and then you are guaranteed it will output CHAR_BIT * sizeof(int) - 1. Best of all is not to write such insane loops in the first place.

rbarreira · Post by **rbarreira** » Fri Dec 06, 2013 11:17 am

Rein Halbersma wrote:
rbarreira wrote:
Rein Halbersma wrote:
rbarreira wrote: To be fair... even if it is undefined behavior, the compiler is still required to generate correct code if only 1 thread is running. Which makes it quite unlikely for it to exploit the undefined behavior, unless it actively generates two code paths.
Yes with 1 thread there is no UB. The Standard quotes (both C and C++) explicitly use the phrase "different threads". Of course there is also no need for XOR tricks in the first place for single-threaded code. But you cannot draw any inference about the likelihood for behavior under multiple threads from this. It's UB and anything can happen, including that the program continues to work as intended. UB is not a guarantee of a crash. It's just the absence of any guarantee whatsoever.
That's both true and unrelated to what I said.
Sorry if I missed what you were saying. Perhaps you could explain again what your point was exactly? How can you make statements about the likelihood of UB?

What I mean is: if a piece of code has defined behavior for 1 thread but undefined behavior for more than 1 thread, the only way that the compiler can "entirely eliminate pieces of code it can prove to be undefined behavior" is if it generates two different pieces of code: one which is used when only 1 thread is running, and another which is used when more than one thread is running.

While theoretically possible, this is very unlikely to happen with current (and near/medium-term) compiler technology. The reason being of course that the compilers are nowhere near smart enough to make high-level analysis of C code to determine its threading strategies, let alone generate code which makes runtime decisions based on the current thread layout.

Rein Halbersma · Post by **Rein Halbersma** » Fri Dec 06, 2013 12:13 pm

rbarreira wrote: What I mean is: if a piece of code has defined behavior for 1 thread but undefined behavior for more than 1 thread, the only way that the compiler can "entirely eliminate pieces of code it can prove to be undefined behavior" is if it generates two different pieces of code: one which is used when only 1 thread is running, and another which is used when more than one thread is running.

While theoretically possible, this is very unlikely to happen with current (and near/medium-term) compiler technology. The reason being of course that the compilers are nowhere near smart enough to make high-level analysis of C code to determine its threading strategies, let alone generate code which makes runtime decisions based on the current thread layout.

You may be right, but all I'm saying is that it's UB and therefore a time bomb in the code. Any sufficiently advanced compiler is free to exploit this for optimization purposes. The UB has been diagnosed and if the response of a programmer is "I'll believe it when I see it", then best of luck to him. The proper way to deal with it is to specify the memory order or to introduce synchronization.

I'm glad Bob is making chess programs. I shudder at the thought of car, medical or defense systems being made by people who ignore undefined behavior. Fortunately, the MISRA C/C++ rules explicitly forbid the use of undefined behavior in any life-critical software system, and current international jurisprudence is to hold firms and individuals to a strict liability standard.

mar · Post by **mar** » Fri Dec 06, 2013 2:11 pm

Rein Halbersma wrote:Such a scenario is also possible to happen with your XOR trick on multicore machines.

Not at all. Consider a shared library. How can you know in advance that the code that's exported is executed from multiple threads? The only undefined behavior by accessing unguarded variable from multiple threads is a simple race condition. There is no way the compiler can use this to optimize this kind of UB because it can't know at compile time.

A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers