A note for C programmers

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

syzygy wrote:
mvk wrote:This was Ubuntu 10.04. I also checked Ubuntu 12.04 and it says the same.
Mine is Fedora.
[ The later one is now testing my simple-minded Syzygy bases hack BTW. WDL only, no DTZ yet. So far scoring 56% vs. the previous version (which already had 5pc draw-bases), after 1409 games and counting: +450-295=664 ]
That's a lot...
I run Fedora also. No FORTIFY there, thankfully. Shades of Pascal and trying to prevent the user from doing anything that is dangerous. Or useful...
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: A note for C programmers

Post by mvk »

syzygy wrote:
mvk wrote:This was Ubuntu 10.04. I also checked Ubuntu 12.04 and it says the same.
Mine is Fedora.
As far as I can see, FORTIFY_SOURCE=2 comes with default gcc v4. Did they dumb it down in Fedora?
syzygy wrote:
[ The later one is now testing my simple-minded Syzygy bases hack BTW. WDL only, no DTZ yet. So far scoring 56% vs. the previous version (which already had 5pc draw-bases), after 1409 games and counting: +450-295=664 ]
That's a lot...
Yes, thank you very much!
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

mvk wrote:
syzygy wrote:
mvk wrote:This was Ubuntu 10.04. I also checked Ubuntu 12.04 and it says the same.
Mine is Fedora.
As far as I can see, FORTIFY_SOURCE=2 comes with default gcc v4. Did they dumb it down in Fedora?
syzygy wrote:
[ The later one is now testing my simple-minded Syzygy bases hack BTW. WDL only, no DTZ yet. So far scoring 56% vs. the previous version (which already had 5pc draw-bases), after 1409 games and counting: +450-295=664 ]
That's a lot...
Yes, thank you very much!
My interpretation of the comments about this stuff is that disabled is the normal default. You can increase it to 1 or 2 as you see fit. I read it as gcc 4+ is required for it to work at all, not that it was used by default.
kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 4:48 am

Re: A note for C programmers

Post by kbhearn »

https://fedoraproject.org/wiki/Security ... y/Features

This seems to indicate FORTIFY_SOURCE is enabled by default on fedora.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: A note for C programmers

Post by wgarvin »

bob wrote:Where do you look in a program that is working on every other machine, and has for years, and suddenly just says "Abort" and returns to the shell prompt on Mavericks?
My suggestion would be to start it under the debugger and put a breakpoint on the abort function, and look at the stack trace. Of course I may be spoiled by having access to an actual easy-to-use debugger in Visual Studio.

mvk wrote:
bob wrote: Please return to the docs of the 80's or 90's and look at the man page for strcpy().

I don't re-read man pages every few weeks...

Code: Select all

DESCRIPTION
       The  strcpy()  function	copies the string pointed to by src (including
       the terminating '\0' character) to the array pointed to by  dest.  The
       strings	may not overlap, and the destination string dest must be large
       enough to receive the copy.

[... snip ...]

GNU				  1993-04-11			     STRCPY(3)
K&R 1988 says the same:
K&R wrote:B3. String functions: <string.h>

...

Except for memmove, the behavior is undefined if copying takes place between overlapping objects.
"Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to having demons fly out of your nose."

You have to put aside your knowledge of how strcpy is implemented and consider it rationally. As MVK pointed out this has been clearly documented as undefined behavior for at least 25 years, just like memcpy. And anybody using memcpy for overlapping copies clearly deserves what they get, because the spec says its undefined... why should strcpy be different?

Of course its annoying to find a latent bug like that in code you wrote so long ago, but I don't think its fair to blame the library vendor in a case like this.

For 25 years the languages spec has promised you that your program might break on some platform, if you relied on this behavior. Finally it did! :lol:


And now to play Devil's advocate:
Like almost everyone else, I end up relying on undefined behavior all the time in my own programs too, sometimes even on purpose (!), its pretty difficult to completely avoid it. e.g. as far as I know null pointers are not guaranteed to be represented by all-zero-bits, yet I memset pointer-containing structures to zero just like everybody else does. Signed-integer overflows and pointer-arithmetic overflows are rampant in production code, as are platform-specific assumptions about 2's complement representation and bit-shift operands, theoretically-unsafe type puns and other strict-aliasing violations, and many other problems of that sort. Most of the time the programmers writing that code don't understand that its unsafe; sometimes they do understand all those rules and and just don't realize they are writing code that breaks them (and thus has undefined semantics), and in some rare cases they DO realize they are writing it but its expedient to do it anyway for some reason, perhaps they just have no other way to make the compiler do what they want. There are apparently almost 200 types of undefined behavior in C99, and I don't want to even guess how many there are in C++.

The world could really use a better-designed (and safer) systems programming language, but it doesn't seem likely anything suitable will ever reach the popularity of C and C++, at least not anytime soon!
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: A note for C programmers

Post by wgarvin »

There are surprisingly many programmers out there who write C or C++ code every day, but have never had much exposure to the idea of undefined behavior in these languages, or its consequences. They think of optimizing compilers as something helpful, but to write sound programs it might be more helpful to think of the compiler as your enemy! :lol:

(bob is obviously not one of those newbies, so I couldn't resist poking a bit of fun at him in the previous post. Of course he's indignant that some pointless change by a library implementor broke his decades-old code, and I don't think he would try to defend undefined behavior in most situations, its a rats-nest of potential problems, and makes it very difficult to actually write large programs in C or C++ that are secure and future-proof).

To anyone reading this who hasn't had much exposure to undefined behavior before, I offer the following depressing reading material.

(1) Dangerous Optimizations and the Loss of Causality is a classic presentation about the growing trend of compiler writers exploiting the freedom of undefined behavior. The spec lets them assume you never invoke undefined behavior in your code; as they are increasingly taking advantage of that, more and more "working" old code is put at risk of failing in subtle and unexpected ways. Code that invokes any undefined behavior is basically a potential time bomb that might someday, with the help of an eager compiler writer, decide to blow up your program, or (much worse) introduce subtle security vulnerabilities into it. Every C or C++ programmer ought to know enough about this stuff to avoid falling into the bear pit.

(2) Undefined Behavior: What Happened to My Code? is a nice paper with several frightening examples of the trouble undefined behavior can cause: safety checks optimized out, division-by-zero causing a signal even in code that code explicitly checks for zero before the divide, and other horrors.

(3) Understanding Integer Overflow in C/C++ is another eye-opening paper, I'm sure I've linked it here before. The authors surveyed many real-world programs with their dynamic checker and found lots of examples of undefined behavior bugs.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

wgarvin wrote:
bob wrote:Where do you look in a program that is working on every other machine, and has for years, and suddenly just says "Abort" and returns to the shell prompt on Mavericks?
My suggestion would be to start it under the debugger and put a breakpoint on the abort function, and look at the stack trace. Of course I may be spoiled by having access to an actual easy-to-use debugger in Visual Studio.

WHAT "abort function?" I have no idea what it is even called in Apple-land. I could think of a few good names. POS comes to mind. :)




mvk wrote:
bob wrote: Please return to the docs of the 80's or 90's and look at the man page for strcpy().

I don't re-read man pages every few weeks...

Code: Select all

DESCRIPTION
       The  strcpy&#40;)  function	copies the string pointed to by src &#40;including
       the terminating '\0' character&#41; to the array pointed to by  dest.  The
       strings	may not overlap, and the destination string dest must be large
       enough to receive the copy.

&#91;... snip ...&#93;

GNU				  1993-04-11			     STRCPY&#40;3&#41;
K&R 1988 says the same:
K&R wrote:B3. String functions: <string.h>

...

Except for memmove, the behavior is undefined if copying takes place between overlapping objects.
"Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to having demons fly out of your nose."

You have to put aside your knowledge of how strcpy is implemented and consider it rationally. As MVK pointed out this has been clearly documented as undefined behavior for at least 25 years, just like memcpy. And anybody using memcpy for overlapping copies clearly deserves what they get, because the spec says its undefined... why should strcpy be different?

Of course its annoying to find a latent bug like that in code you wrote so long ago, but I don't think its fair to blame the library vendor in a case like this.

For 25 years the languages spec has promised you that your program might break on some platform, if you relied on this behavior. Finally it did! :lol:


And now to play Devil's advocate:
Like almost everyone else, I end up relying on undefined behavior all the time in my own programs too, sometimes even on purpose (!), its pretty difficult to completely avoid it. e.g. as far as I know null pointers are not guaranteed to be represented by all-zero-bits, yet I memset pointer-containing structures to zero just like everybody else does. Signed-integer overflows and pointer-arithmetic overflows are rampant in production code, as are platform-specific assumptions about 2's complement representation and bit-shift operands, theoretically-unsafe type puns and other strict-aliasing violations, and many other problems of that sort. Most of the time the programmers writing that code don't understand that its unsafe; sometimes they do understand all those rules and and just don't realize they are writing code that breaks them (and thus has undefined semantics), and in some rare cases they DO realize they are writing it but its expedient to do it anyway for some reason, perhaps they just have no other way to make the compiler do what they want. There are apparently almost 200 types of undefined behavior in C99, and I don't want to even guess how many there are in C++.

The world could really use a better-designed (and safer) systems programming language, but it doesn't seem likely anything suitable will ever reach the popularity of C and C++, at least not anytime soon!
My complaint is not that they rewrote the function, or used some new hardware instruction that causes overlapped source/destination to break, but that they simply checked for the overlap (at a measurable cost based on the complaints I have read) and then simply exited the code quietly, printing "Abort" with no other informative text.

This "overlapping" problem has long been known. It was an issue even in PL/1 which I have used extensively. But so long as you avoided the one ugly case of strcpy(a+n, a) (different function in PL/1, but same exact idea/implementation) it worked just fine, and all the docs pointed out where the failure happened.

Breaking it just for the hell of it seems poor. Linus Torvalds raked the glibc folks over the coals when they decided to change mempcy() to copy right-to-left, which broke so many programs it was not funny. I don't see a rational way to do right-to-left on a string, since you have to go left-to-right to find the terminating null, so this seems like a silly change. If they were going to do this, why wouldn't they detect the overlap, and then rather than aborting, call memmove() instead? Now we get a real fix, not broken code.

I don't want "safer". I suffered through Pascal and similar attempts to prevent the programmer from doing anything wrong, at least semantically. And even the Pascal guys had to eventually provide a C-like "union" facility to let us access the same address with different data types when necessary. If one can write in asm, one can do anything, anyway. Phooey on trying to make the compiler idiot-proof.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

wgarvin wrote:There are surprisingly many programmers out there who write C or C++ code every day, but have never had much exposure to the idea of undefined behavior in these languages, or its consequences. They think of optimizing compilers as something helpful, but to write sound programs it might be more helpful to think of the compiler as your enemy! :lol:

(bob is obviously not one of those newbies, so I couldn't resist poking a bit of fun at him in the previous post. Of course he's indignant that some pointless change by a library implementor broke his decades-old code, and I don't think he would try to defend undefined behavior in most situations, its a rats-nest of potential problems, and makes it very difficult to actually write large programs in C or C++ that are secure and future-proof).

To anyone reading this who hasn't had much exposure to undefined behavior before, I offer the following depressing reading material.

(1) Dangerous Optimizations and the Loss of Causality is a classic presentation about the growing trend of compiler writers exploiting the freedom of undefined behavior. The spec lets them assume you never invoke undefined behavior in your code; as they are increasingly taking advantage of that, more and more "working" old code is put at risk of failing in subtle and unexpected ways. Code that invokes any undefined behavior is basically a potential time bomb that might someday, with the help of an eager compiler writer, decide to blow up your program, or (much worse) introduce subtle security vulnerabilities into it. Every C or C++ programmer ought to know enough about this stuff to avoid falling into the bear pit.

(2) Undefined Behavior: What Happened to My Code? is a nice paper with several frightening examples of the trouble undefined behavior can cause: safety checks optimized out, division-by-zero causing a signal even in code that code explicitly checks for zero before the divide, and other horrors.

(3) Understanding Integer Overflow in C/C++ is another eye-opening paper, I'm sure I've linked it here before. The authors surveyed many real-world programs with their dynamic checker and found lots of examples of undefined behavior bugs.
The concept of a compiler that forces you to write bug-free code is, obviously, ridiculous. I don't expect that. But at the very least, if I do something that violates some taboo, and the compiler (or library routine) detects that violation, I would expect some sort of notification that explains the problem. Compilers cheerily inform me of deprecated optimization options. Intel seems to change 'em monthly. But it does tell me, and it doesn't refuse to compile without saying anything, it just points out that this will cease to work at some point in the future. Seems rational to do it that way, rather than this Apple approach which basically caused me to spend a week looking for a problem I assumed was introduced recently, until I realized I could not reproduce the problem anywhere except on my Mac with Mavericks.

If I ever meet the programmer responsible, he will have one bodily orifice the size of my boot. And he may well have to have said boot surgically removed.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

kbhearn wrote:https://fedoraproject.org/wiki/Security ... y/Features

This seems to indicate FORTIFY_SOURCE is enabled by default on fedora.
All I can say is that it is not on mine. I have several others I can check when I get to the office on Monday, but my office box does not have it enabled (=0). And I did not do anything to cause that myself, I was not even aware it had crept into the library/compiler. I am not sure how the overlap is detected, but it can't be free. I want fast code, I'll take care of debugging it as needed.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: A note for C programmers

Post by michiguel »

wgarvin wrote:There are surprisingly many programmers out there who write C or C++ code every day, but have never had much exposure to the idea of undefined behavior in these languages, or its consequences. They think of optimizing compilers as something helpful, but to write sound programs it might be more helpful to think of the compiler as your enemy! :lol:

(bob is obviously not one of those newbies, so I couldn't resist poking a bit of fun at him in the previous post. Of course he's indignant that some pointless change by a library implementor broke his decades-old code, and I don't think he would try to defend undefined behavior in most situations, its a rats-nest of potential problems, and makes it very difficult to actually write large programs in C or C++ that are secure and future-proof).

To anyone reading this who hasn't had much exposure to undefined behavior before, I offer the following depressing reading material.

(1) Dangerous Optimizations and the Loss of Causality is a classic presentation about the growing trend of compiler writers exploiting the freedom of undefined behavior. The spec lets them assume you never invoke undefined behavior in your code; as they are increasingly taking advantage of that, more and more "working" old code is put at risk of failing in subtle and unexpected ways. Code that invokes any undefined behavior is basically a potential time bomb that might someday, with the help of an eager compiler writer, decide to blow up your program, or (much worse) introduce subtle security vulnerabilities into it. Every C or C++ programmer ought to know enough about this stuff to avoid falling into the bear pit.

(2) Undefined Behavior: What Happened to My Code? is a nice paper with several frightening examples of the trouble undefined behavior can cause: safety checks optimized out, division-by-zero causing a signal even in code that code explicitly checks for zero before the divide, and other horrors.

(3) Understanding Integer Overflow in C/C++ is another eye-opening paper, I'm sure I've linked it here before. The authors surveyed many real-world programs with their dynamic checker and found lots of examples of undefined behavior bugs.
Undefined behavior, when and where?
http://www.youtube.com/watch?v=qpUMYQe6uHY

Miguel