A note for C programmers

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: A note for C programmers

Post by syzygy »

syzygy wrote:According to paragraph 4 of the C99 standard:
A program that is correct in all other aspects, operating on correct data, containing unspecified behavior shall be a correct program and act in accordance with 5.1.2.3.
Oops! I read this as "containing undefined behavior". But what it says is "unspecified behavior". So paragraph 5.1.2.3 (the abstract machine story) most likely does not apply to program executions exhibiting undefined behavior...
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: A note for C programmers

Post by Rein Halbersma »

syzygy wrote: This definition indeed does allow C++ compilers to consider undefined behavior as "can't happen".

This approach seems to have been thought up by a theoretical computer scientist used to thinking of programs processing predefined input. If the abstract machine leaves undefined any aspect of the execution of that program on that input, then the combination (program, input) has no meaning. But in reality programs interact with users and the undefined behavior can be triggered by unexpected user input that the user thinks up a long time after the program has started to run. It seems a C++ program could claim to be able to look into the future and refuse to run, because it "knows" that the user would cause it to execute undefined behavior at some point in the future, and therefore it does not have to run at all.
I don't think it's quite as bleak as you seem to paint it here. Clearly, it's possible to write a proper guard against division by zero (and similarly for null pointer dereference). Simply returning / throwing an exception or doing abort or exit would eliminate the undefined behavior from that branch of execution after which 1.9/5 no longer applies and the reordering cannot take place.

What C++ does, is punish users for insufficiently strong guards (such as printf() or anything else that allows the exection to continue onto the undefined behavior) by allowing the compiler to eliminate the guard altogether. It's the "blow your whole foot off" philsophy, basically.
Btw, (as also pointed out on stackoverflow) either way the ereport() example was compiled wrongly, because ereport() might simply not return in which case no undefined behavior would occur. But the gcc bug report is a bit different (although I'm not sure it is valid to assume that printf() always returns).
I think for C it doesn't matter whether printf() returns or not, because a library call is defined as a sequence point and there can be no reordering of statements through such a a point. But I'm no C expert.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: A note for C programmers

Post by syzygy »

Rein Halbersma wrote:I think for C it doesn't matter whether printf() returns or not, because a library call is defined as a sequence point and there can be no reordering of statements through such a a point. But I'm no C expert.
That the standard does not allow reordering through sequence points does not matter much if the provable occurence of UB in a particular execution means that the standard does not apply at all to that execution.

And it seems that the C++ rule also applies to C, i.e. UB allows time travel also for C. I found this very long discussion in which Douglas A. Gwyn (DAGwyn), who apparently cowrote the C standard, confirmed that UB means the program is completely meaningless.

In practice this means that an executing program will conform to the abstract machine until it is known that at some future point in time UB will occur.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: A note for C programmers

Post by Rein Halbersma »

syzygy wrote: And it seems that the C++ rule also applies to C, i.e. UB allows time travel also for C. I found ]this very long discussion in which Douglas A. Gwyn (DAGwyn), who apparently cowrote the C standard, confirmed that UB means the program is completely meaningless.
Awesome thread, thanks a lot! Very informative, especially about buffered output that could not appear if UB occurs.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

stevenaaus wrote:
bob wrote:
mvk wrote:
bob wrote:On Apple Mavericks, all you see is "Abort". No hint about what the problem was, no nothing. Running under the debugger shows absolutely nothing either. Nice software design.
There is a message logged to the system log, as was mentioned before. Just open de Console application and read "crafty: detected source and destination buffer overlap". And then a nice readable crash report is saved in ~/Library/Logs/DiagnosticsReports/, containing, amongst others, a stack trace.
Fine. I am SURE every C programmer on the planet is aware of such. Makes a lot more sense than displaying an error message right out of the library like "Abort(source and destination strings overlap)". Let's put it in a non-obvious place and see if the programmer can find it.
OS X is a funny thing. Mavericks seems to have broken lots of stuff.
Do they really alias gcc to clang ? That's bizarre.

Generally i agree with Linus. Here is a thread where he has his say
http://sourceware.org/bugzilla/show_bug.cgi?id=12518
Don't post such links HERE. All you will get is the matra "don't used undefined behavior" repeated until you are sick of it. Obviously Linus is as big an idiot as I am.

As far as clang goes, I don't know whether they use a stock (but older) gcc version and rename it, or whether they make source code changes and then call that new compiler clang.

Don't really care either, because I use macports gcc until I have time to get linux on this box. :)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

syzygy wrote:
wgarvin wrote:(1) Dangerous Optimizations and the Loss of Causality is a classic presentation about the growing trend of compiler writers exploiting the freedom of undefined behavior. The spec lets them assume you never invoke undefined behavior in your code; as they are increasingly taking advantage of that, more and more "working" old code is put at risk of failing in subtle and unexpected ways. Code that invokes any undefined behavior is basically a potential time bomb that might someday, with the help of an eager compiler writer, decide to blow up your program, or (much worse) introduce subtle security vulnerabilities into it. Every C or C++ programmer ought to know enough about this stuff to avoid falling into the bear pit.
I may be proven wrong, but I believe that Seacord, the author of this presentation, makes a crucial mistake (apart from believing that the C standard prescribes that "if (cond)" gives undefined behavior if "cond" evaluates to something else than 0 or 1).

On page 6, bottom and page 7, top, he essentially explains what I have tried to express in my previous post. The C standard essentially defines the behavior of an abstract machine. A compiler must produce code that provides this behavior in so far as it is specified and in so far as it is visible to the outside, e.g. in terms of side effects. If at some point during its execution the abstract machine encounters some a code segment with undefined behaviour, then anything goes and the compiler may do whatever it pleases, e.g. nothing. But before any undefined behavior is encountered, it seems to me that defined side effects must be respected.

I therefore believe that the Seacord is wrong about compilers being allowed to implement the "total license" policy he defines on page 7, bottom:

Code: Select all

Total license: treat any possible undefined behavior as a "can't happen" condition. This permits aggressive optimizations.
It seems to me that the correct formulation is: treat any possible undefined behavior as a "anything goes". This is not the same as "can't happen". Can't happen implies that a program that provably encounters undefined behavior at some point during its execution could be fully optimised away, including all its side-effects before the undefined behavior is encountered.

According to paragraph 4 of the C99 standard:
A program that is correct in all other aspects, operating on correct data, containing unspecified behavior shall be a correct program and act in accordance with 5.1.2.3.
Paragraph 5.1.2.3, which explains in what sense a compiled program must implement the behaviour of the abstract machine, contains no indication that side effects do not have to be produced if at some later point in time undefined behavior is encountered. It only states that an evaluation can be optimised away if its value is not used and no "needed" side effects are produced.

The way I understand "undefined behavior", it just means that the compiler writer may supplement the specification of the abstract machine with whatever he likes, for example with whatever seems to be most efficient. Different compilers and compilers for different systems may fill in the behavior left undefined by the standard in their own way. This seems to me to be the only natural interpretation of a standard leaving something "undefined". Turning that into a "can't happen" is nothing more than a distortion of the standard.
There are other issues. Those doing "standards" and those "writing compilers." For example, FORTRAN-66 used a loop of this form:

do 100 i=1,n
...
...
...
100 continue

The standards at those early days said "using the loop index OUTSIDE the loop is undefined. Why? The standards guys decided that such loops would be incremented with some sort of BIR/BDR (branch increment register or branch decrement register) so that the loop index would stick in a register for the entire loop. And then since FORTRAN did not require declarations (implicit typing) these guys decided that there would be no requirement to save the value of the loop index register back to memory.

The compiler guys said "you guys are idiots". Do you REALLY believe we can maintain that loop index in a register across the entire loop, for very complex programs? Of course not, we are going to have to spill to memory whenever we reach a register jam. Which means the loop counter will be available after the loop terminates, or even if you jump out of the loop somewhere in the middle."

common sense meets common stupidity.

Linus' comments on the memcpy() issue was pretty clear, and pretty widely acclaimed as rational and sensible. The glibc guys resisted and resisted, however, repeating the "using undefined behavior is bad" mantra we see so much of here...

I don't advocate it as a way of writing programs. But I also don't advocate changing such things just to break existing programs either.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: A note for C programmers

Post by Rein Halbersma »

bob wrote: Linus' comments on the memcpy() issue was pretty clear, and pretty widely acclaimed as rational and sensible. The glibc guys resisted and resisted, however, repeating the "using undefined behavior is bad" mantra we see so much of here...

I don't advocate it as a way of writing programs. But I also don't advocate changing such things just to break existing programs either.
The glibc guys couldn't care less what Linus, you or anybody else are doing, not matter how clear, rational or sensible it might seem to their fanboys. The simple fact is that the C Standard is an explicit contact between programmer and their compiler. That contract offers no guarantees if and when the compiler can prove there is undefined behavior. In fact, the compiler can optimize your code better when it can assume away undefined behavior. The many place of undefined behavior is what makes C so much faster than Java (consider guarding against null pointer derefence everywhere...)

The point you seem to be arguing is that there was an implicit contract between the compiler and the programmer. But even if a certain way of implementing memcpy() was done in a particular way on a many systems for a long time, that doesn't entitle you to any heads up warning. Such an explicit notification is required to be given for deprecating existing and valid language features. But for implementation changes exploiting undefined behavior, no such heads up is required and should not be expected.

Don't blame other people for your mistakes. You were warned explicitly by the Standard not to do that. You did it anyway. So you suffered the consequences. End of story.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

Rein Halbersma wrote:
bob wrote: Linus' comments on the memcpy() issue was pretty clear, and pretty widely acclaimed as rational and sensible. The glibc guys resisted and resisted, however, repeating the "using undefined behavior is bad" mantra we see so much of here...

I don't advocate it as a way of writing programs. But I also don't advocate changing such things just to break existing programs either.
The glibc guys couldn't care less what Linus, you or anybody else are doing, not matter how clear, rational or sensible it might seem to their fanboys. The simple fact is that the C Standard is an explicit contact between programmer and their compiler. That contract offers no guarantees if and when the compiler can prove there is undefined behavior. In fact, the compiler can optimize your code better when it can assume away undefined behavior. The many place of undefined behavior is what makes C so much faster than Java (consider guarding against null pointer derefence everywhere...)

The point you seem to be arguing is that there was an implicit contract between the compiler and the programmer. But even if a certain way of implementing memcpy() was done in a particular way on a many systems for a long time, that doesn't entitle you to any heads up warning. Such an explicit notification is required to be given for deprecating existing and valid language features. But for implementation changes exploiting undefined behavior, no such heads up is required and should not be expected.

Don't blame other people for your mistakes. You were warned explicitly by the Standard not to do that. You did it anyway. So you suffered the consequences. End of story.
Rattle on all you want. The point is, there is NO good reason to break existing code, unless you are actually improving the code you are modifying. Absolutely none. BTW the glibc guys DID care. Did you read their comments? Have you looked at the current source? I would assume not.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

syzygy wrote:
syzygy wrote:According to paragraph 4 of the C99 standard:
A program that is correct in all other aspects, operating on correct data, containing unspecified behavior shall be a correct program and act in accordance with 5.1.2.3.
Oops! I read this as "containing undefined behavior". But what it says is "unspecified behavior". So paragraph 5.1.2.3 (the abstract machine story) most likely does not apply to program executions exhibiting undefined behavior...
If you think about that for a minute, how can those be different? It would seem to me that "undefined" == "unspecified".
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: A note for C programmers

Post by mvk »

bob wrote:If you think about that for a minute, how can those be different? It would seem to me that "undefined" == "unspecified".
The answer to your question is in the C standard on page 3.