A note for C programmers

Henk · Post by **Henk** » Sun Dec 08, 2013 11:14 am

If you get a lot of reactions on a topic you created you might easily get too busy. This can be harmful. Sometimes it is better not to respond.

syzygy · Post by **syzygy** » Sun Dec 08, 2013 1:22 pm

Rein Halbersma wrote:Double checked locking in C++11 was heavily influenced by a 2004 paper by Alexandrescu and Meyers

http://www.drdobbs.com/cpp/c-and-the-pe ... /184405726
http://www.drdobbs.com/cpp/c-and-the-pe ... /184405772

It discusses the futility of almost every trick that has come up in the last few posts. It should apply equally to aggressively optimizing C compilers as it does to C++.

The first article nicely explains the relation to standards. I'll cite a passage that might be helpful to Bob:

Optimizing compilers carefully analyze and reorder your code so as to execute as many things at once as possible (within the constraints on observable behavior). Discovering and exploiting such parallelism in regular serial code is the single most important reason for rearranging code and introducing out-of-order execution. But it's not the only reason. Compilers (and linkers) might also reorder instructions to avoid spilling data from a register, to keep the instruction pipeline full, to perform common subexpression elimination, and reduce the size of the generated executable (see Bruno De Bus et al., "Post-pass Compaction Techniques").

When performing these kinds of optimizations, C/C++ compilers and linkers are constrained only by the dictates of observable behavior on the abstract machines defined by the language standards, and—this is the important bit—those abstract machines are implicitly single threaded. As languages, neither C nor C++ have threads, so compilers don't have to worry about breaking threaded programs when they are optimizing. It should, therefore, not surprise you that they sometimes do.

That being the case, how can you write C and C++ multithreaded programs that actually work? By using system-specific libraries defined for that purpose. Libraries such as POSIX threads (pthreads) (see ANSI/IEEE 1003.1c-1995) give precise specifications for the execution semantics of various synchronization primitives. These libraries impose restrictions on the code that library-conformant compilers are permitted to generate, thus forcing such compilers to emit code that respects the execution ordering constraints on which those libraries depend. That's why threading packages have parts written in assembler or issue system calls that are themselves written in assembler (or in some unportable language): You have to go outside Standard C and C++ to express the ordering constraints that multithreaded programs require. DCLP tries to get by using only language constructs. That's why DCLP isn't reliable.

And regarding Bob's "it's in another file, the compiler can't see it":

If you reach for bigger ammo and try moving temp to a larger scope (say, by making it file static), the compiler can still perform the same analysis and come to the same conclusion. Scope, schmope. Game over. You lose. So you call for backup. You declare temp extern and define it in a separate translation unit, thus preventing your compiler from seeing what you are doing. Alas, some compilers have the optimizing equivalent of night-vision goggles: They perform interprocedural analysis, discover your ruse with temp, and again optimize it out of existence. Remember, these are optimizing compilers. They're supposed to track down unnecessary code and eliminate it. Game over. You lose.

bob · Post by **bob** » Sun Dec 08, 2013 5:38 pm

syzygy wrote:
bob wrote:Sorry, we are talking pure C. C as is used in Crafty. Threads DO make sense, there is a native thread library for every machine I have used.
They are compiler and environment specific. They are not pure C89 or C99. There is no reasonable way to argue against this.

Your example fails pretty miserably however.
It does not. Look up "data race" in C11. The example I gave is a data race and it would be trivial to detect. (Well, at least one access must be a write access.) I don't know why are you are talking about different source files, that was not part of my example. (Not that modern compiler/linkers would be bothered by optimising across source files.)

I'm also not talking about Crafty. You said something was impossible. I gave an example where it is trivial. That's it. Now you start turning and twisting, of course, as you always do in such circumstances.

I am talking about C. NOT C++. Not Java. Not Ada. Not anything else. Just C. The language where this discussion started. Note that "POSIX threads" is a very precise standard itself.

I said "detecting a race in C is impossible". It is. Just writing to a global value in two different places is NOT a race. They have to happen at the SAME time. There are several clever ways to deal with this, one not using any sort of "atomic lock" mechanism.

In C you can't even determine if two different locations write to a common value, because of pointers. That's why we have the option to tell the compiler "we don't do this, optimize as if it is not happening" because the compiler can NOT recognize such, it doesn't have the entire program in front of it when it compiles each source file...

Past time to move on. There are no objects. There are no classes. Again, this is C. Always has been C.

bob · Post by **bob** » Sun Dec 08, 2013 5:41 pm

syzygy wrote:
rbarreira wrote:
bob wrote:
Rein Halbersma wrote:
bob wrote: Since one of the ongoing quibbles here is about the definition of "undefined behavior", here is the usual definition. One that has been used since the 1950's in fact. "Behavior that can not be predicted." This has ALWAYS been an issue of unexpected results caused, NOT by the compiler intentionally doing something off-the-wall, but by the code the compiler produces, which might not always do the expected thing.
You are the only one quibbling with the Standard's definition of UB. Everyone else is on the same page here. Don't pull in whatever colloquial speech you are using into a precise and exact definition used by every compiler writer.
UB is about the freedom of a compiler to translate your C code to machine instructions.
The standards does not adequately define "undefined behavior" To wit:

3.4.3
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
EXAMPLE An example of undefined behavior is the behavior on integer overflow.

So it can ignore the situation, to behaving in a documented way (both of which make sense) to terminating the compilation with a message. No mention of demons flying out one's nose or anything else.

So how about we get back to planet earth.
On planet earth, you didn't even read properly what you quoted:

"ignoring the situation completely with unpredictable results" OR "behaving during translation or program execution in a documented manner"

So yes, demons flying out of one's nose are a possible case of undefined behavior.
And it's all concisely expressed in imposes no requirements. If demons start flying through your nose, that is compliant with the standard, because you can't violate "no requirements".

I would assume it is therefore acceptable that if you unintentionally introduce a "undefined behavior" bug in your program, that formatting your hard drive is perfectly acceptable?

bob · Post by **bob** » Sun Dec 08, 2013 5:44 pm

syzygy wrote:
bob wrote:You need to pull out an intel manual.
Why don't read before you respond:
syzygy wrote:You keep confusing things. What the standard guarantees is one thing. What one would naively expect knowing the architecture of a particular machine is another thing. What a particular compiler actually will do is yet something else.
The C99 standard guarantees nothing about how a 64-bit int is written to memory. On x86-64 it makes sense to use a single store, which will then be atomic provided it does not cross cache lines. But if the only thing you know about a compiler is that it complies with C99, you don't know that it will use single stores and even if it does, you don't know that it will properly align 64-bit ints.
If a compiler does what you suggest, I/O is impossible. How can I write to a buffer, and signal another thread or process to write that out to disk, so that only one process does I/O? Yet the operating system, the libraries, and everyone else depends on that happening. There is a BIG difference between writing to a local variable, and writing to a global (shared) variable. The compiler is NOT free to optimize away stores/writes to global data.
Duh? There is no reason that any particular C library or OS kernel must have been written in C or must have been compiled with the compiler you're using. It may have been written entirely in assembly. Certainly the Linux kernel will NOT properly run when compiled with just any C99 compliant compiler.

Which one doesn't it compile with? It runs on every platform I have seen. Or are you now saying C is not C, that each compiler can implement a "unique C"?

Fortunately I do not see this on any machine I happen to run on, which just about includes ALL machines, BTW..

syzygy · Post by **syzygy** » Sun Dec 08, 2013 5:46 pm

bob wrote:I said "detecting a race in C is impossible". It is.

Detecting all races is impossible, halting problem and so on. But some races can be detected trivially.

Just writing to a global value in two different places is NOT a race. They have to happen at the SAME time.

Read the C11 standard about what constitutes a data race leading to "undefined behavior". Can you find it on the internet yourself, or do you need help?

bob wrote:because the compiler can NOT recognize such, it doesn't have the entire program in front of it when it compiles each source file...

syzygy wrote:And regarding Bob's "it's in another file, the compiler can't see it":
If you reach for bigger ammo and try moving temp to a larger scope (say, by making it file static), the compiler can still perform the same analysis and come to the same conclusion. Scope, schmope. Game over. You lose. So you call for backup. You declare temp extern and define it in a separate translation unit, thus preventing your compiler from seeing what you are doing. Alas, some compilers have the optimizing equivalent of night-vision goggles: They perform interprocedural analysis, discover your ruse with temp, and again optimize it out of existence. Remember, these are optimizing compilers. They're supposed to track down unnecessary code and eliminate it. Game over. You lose.

syzygy · Post by **syzygy** » Sun Dec 08, 2013 5:49 pm

bob wrote:
syzygy wrote:And it's all concisely expressed in imposes no requirements. If demons start flying through your nose, that is compliant with the standard, because you can't violate "no requirements".
I would assume it is therefore acceptable that if you unintentionally introduce a "undefined behavior" bug in your program, that formatting your hard drive is perfectly acceptable?

You are finally getting it!
Yes, formatting your hard drive would comply with the standard, which imposes no requirements in the presence of UB.

bob · Post by **bob** » Sun Dec 08, 2013 5:49 pm

wgarvin wrote:
bob wrote:With the deranged reasoning here, if one does a BSF instruction and the source is all zero bits, the destination register is undefined. Should the program crash? Because you can always use a "jz" to determine that there were no one bits set and eax has no useful value in it (actually, it is unchanged from its contents prior to executing the bsf).
There's no C language construct that maps to BSF, but just suppose for a moment there was and that the language spec said it was "undefined behavior" to use the value from the destination register after a BSF with all zero source bits. All kinds of weird stuff could happen if the programmer didn't respect that undefinedness.

The compiler might notice that you are assigning the destination register to a variable. If you happened to pass in all zero bits, then the contents of the register are undefined, and so is the variable. So in a branch that uses that variable, it might assume that your source for the BSF was non-zero. It might then delete an if-statement comparing that source with zero.

Another example: the compiler might be able to prove that if reached via a certain control path, the BSF was executed with all zero bits as the input. It might hoist the BSF and subsequent instructions onto each path, and then delete the BSF that produces the undefined result, and any instructions on that path using the result (it might do this because of a partial redundancy elimination pass, for example).

Similar odd-sounding narratives exist for things like shifting into or out of the sign bit in signed ints, dividing INT_MIN by -1, dividing by zero, and so on. So its best to just never do them. Even if it happens to work _today_, on some compilers and some library versions, that is probably a temporary state of affairs. Now that they have a taste for the optimization benefits, it will take a bloody revolution to convince compiler vendors to stop taking advantage of the freedoms that the language has always reserved to them.

You can't reason your way up from how the low-level machine works to how a C construct is going to work, if that C construct happens to invoke "undefined behavior" according to the C language standard. Any reasoning of that sort is fallacious and dangerous. The only way you can know your program is correct is to (1) test it under the specific target platform, with specific compiler, libraries, etc. and then NEVER CHANGE THEM. Or, (2) follow the rules of the language, even arcane ones that offend your sensibilities.

What I SHOULD be able to assume is that the compiler will do its best to implement exactly what I specify in my code. If it produces int overflow, let it produce it. That's the purpose of a compiler, to turn a source into its machine-language equivalent. I see absolutely no reason why it would behave differently when the result is undefined than if not. I make the assumption that when I write to a global value, that the compiler will produce code to do so. There is no allowable optimization that eliminates that. It has a little flexibility in WHERE/WHEN it does the write, but that is all... The intent is to do what the program says...

bob · Post by **bob** » Sun Dec 08, 2013 5:53 pm

wgarvin wrote:LLVM Project Blog: What Every Programmer Should Know About Undefined Behavior
It turns out that C is not a "high level assembler" like many experienced C programmers (particularly folks with a low-level focus) like to think, and that C++ and Objective-C have directly inherited plenty of issues from it.

...

Both LLVM IR and the C programming language have the concept of "undefined behavior". Undefined behavior is a broad topic with a lot of nuances. The best introduction I've found to it is a post on John Regehr's Blog. The short version of this excellent article is that many seemingly reasonable things in C actually have undefined behavior, and this is a common source of bugs in programs. Beyond that, any undefined behavior in C gives license to the implementation (the compiler and runtime) to produce code that formats your hard drive, does completely unexpected things, or worse. Again, I would highly recommend reading John's article.

bob wrote:Since one of the ongoing quibbles here is about the definition of "undefined behavior", here is the usual definition. One that has been used since the 1950's in fact. "Behavior that can not be predicted." This has ALWAYS been an issue of unexpected results caused, NOT by the compiler intentionally doing something off-the-wall, but by the code the compiler produces, which might not always do the expected thing.
I don't think anybody is quibbling about it except for you. Everyone debating you appears to accept what the standard says; that literally anything might happen, and you have no recourse. You keep insisting there should be some logic or reason to what happens, in case X. Which is silly, because there is no meaningful difference between the compiler intentionally doing something off-the-wall and unintentionally causing you an unexpected result. Undefined behavior is UNDEFINED. You have to avoid it, or the semantics of your program are UNDEFINED. Period, full stop.

C89 defined undefined behavior like this:
* Undefined behavior --- behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
A working draft of C11 still defines it in the same way:
3.4.3
1
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
2
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable
results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
3
EXAMPLE An example of undefined behavior is the behavior on integer overflow.
So the language stance on this has not changed at all in over 20 years. Either your understanding of it is dangerously deficient for someone who teaches programming in C to others, or else you really do understand it and just continue to argue out of stubornness. I suspect its the latter, and I apologize if I offend you by saying so, but do you not realize how foolish that looks?

I simply disagree with the "it can do anything" reasoning. Best choice, just do what the program says. Perhaps the user can deal with the "undefined behavior" knowing, for example, that on Intel, integer overflow is NOT "undefined". It is precisely defined.

rbarreira · Post by **rbarreira** » Sun Dec 08, 2013 5:56 pm

bob wrote: I simply disagree with the "it can do anything" reasoning. Best choice, just do what the program says. Perhaps the user can deal with the "undefined behavior" knowing, for example, that on Intel, integer overflow is NOT "undefined". It is precisely defined.

Disagree as much as you want, that's what the standard says.

Opinions are cheap. I challenge you to find a single authoritative source that says otherwise, to counteract all the sources people have presented to you.

A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers