A note for C programmers

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

Rein Halbersma wrote:
bob wrote:Food for thought. The English language is well-defined.
Think again: http://english.stackexchange.com/questi ... nt-regions

Human languages are anything but well-defined, they are extremely context-sensitive (time, place, person).
BTW, here is another "undefined" activity. A race condition in a parallel program. Shouldn't happen, correct? But EVERYBODY is allowing race conditions when they store in their hash table. I suppose it would be OK for Apple to just crash any program where that happens? Even though we KNOW what we are doing?

That's pretty malicious compiler behavior, IMHO. Just because you should do something because it MIGHT cause a problem if you don't know what you are doing does not mean you shouldn't do it if you do know what to expect.
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: A note for C programmers

Post by Henk »

bob wrote:
Rein Halbersma wrote:
bob wrote:Food for thought. The English language is well-defined.
Think again: http://english.stackexchange.com/questi ... nt-regions

Human languages are anything but well-defined, they are extremely context-sensitive (time, place, person).
I'll bet you won't find one native English-speaker that would say "undefined" and "unspecified" don't mean the same basic concept unless you want to get down to a minute semantic war based on what is meant by "is" for example.
Undefined means there exists no definition. So there is no description which states what it is. Unspecified means there is nothing specified. So there is nothing stated that make it specific. So if something is undefined it may still be specified. But if something is unspecified it is undefined as well.

So defined implies specified
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: A note for C programmers

Post by wgarvin »

bob wrote:
Rein Halbersma wrote:
bob wrote:Food for thought. The English language is well-defined.
Think again: http://english.stackexchange.com/questi ... nt-regions

Human languages are anything but well-defined, they are extremely context-sensitive (time, place, person).
I'll bet you won't find one native English-speaker that would say "undefined" and "unspecified" don't mean the same basic concept unless you want to get down to a minute semantic war based on what is meant by "is" for example.
Okay, we get that you're still annoyed about it, although I'm surprised that two weeks have passed and you still haven't admitted yet that you're completely wrong! :lol:

That old code was invoking undefined behavior, it has been unsafe code for decades and just because it fortuitously happened to do what you expected, on some platforms and some versions of some compilers, that is no reason to expect it to just continue to work in the future. Your insistence that you ought to be able to rely on this code because "it worked before" is actually rather alarming. Programming languages don't work that way.

If you want to write code with one version of one compiler on one platform, then by all means experiment to see what that specific compiler does and then write programs that depend on its quirks. If you want to write portable programs, or future-proof programs, then you have to follow the rules of the language spec, not just the quirks of one compiler. If the spec says "X is undefined behavior" it is basically saying "you're not allowed to do this, anything at all might happen and we accept no responsibility for the consequences".

I stand by my statement that a lot of C and C++ programmers, even ones with many years of daily experience, don't really know how undefined behavior works, and what they don't know can hurt them!


Roger Miller's basketball analogy seems like a pretty good one:
Somebody once told me that in basketball you can't hold the ball and run. I got a basketball and tried it and it worked just fine. He obviously didn't understand basketball.
Q: "But why can't you hold the ball and run in basketball?"
A: "Because it's against the rules" or "because that's not the way the game is played."

[Edit: my post count is evil 666. Maybe I should never post here again!!]
Last edited by wgarvin on Thu Dec 05, 2013 7:24 pm, edited 1 time in total.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: A note for C programmers

Post by AlvaroBegue »

bob wrote:
AlvaroBegue wrote:I am confused by the statement "Not one single instance of strcpy() will break on the example I gave where you do strcpy(st, st+n)." Isn't this whole thread about an instance where it did break?
It broke because Apple broke it. Apple introduced a check for overlapping source and destination and aborted if it was detected. Code works perfectly.
I consider that a courtesy. Of course it would be better if they could give you a meaningful error message about it. But aborting is better than doing what you expect it to do.

`strcpy(st, st+n)' can certainly do things you don't expect, even if the compiler isn't breaking things on purpose: http://stackoverflow.com/questions/1293 ... mplemented

EDIT: Here's how other people handled the situation: https://lists.gnu.org/archive/html/bug- ... 00014.html
This will be fixed in the next release of [...]
That's how one should react to things like this happening.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

AlvaroBegue wrote:
bob wrote:
AlvaroBegue wrote:I am confused by the statement "Not one single instance of strcpy() will break on the example I gave where you do strcpy(st, st+n)." Isn't this whole thread about an instance where it did break?
It broke because Apple broke it. Apple introduced a check for overlapping source and destination and aborted if it was detected. Code works perfectly.
I consider that a courtesy. Of course it would be better if they could give you a meaningful error message about it. But aborting is better than doing what you expect it to do.

`strcpy(st, st+n)' can certainly do things you don't expect, even if the compiler isn't breaking things on purpose: http://stackoverflow.com/questions/1293 ... mplemented

EDIT: Here's how other people handled the situation: https://lists.gnu.org/archive/html/bug- ... 00014.html
This will be fixed in the next release of [...]
That's how one should react to things like this happening.
You are simply not reading what I have written. Let me try once more. There is ABSOLUTELY no way my use of strcpy() will break.

Here is the basic call:

strcpy(a, a+n);

First, a is a properly formatted string with a terminating null. Guaranteed, because I read the stuff in and add the null in the correct place.

Second, n < strlen(a), which means that a+n (where n is just an integer offset) is guaranteed to point to a character before the terminating null, or the terminating null itself. Hence no buffer overrun is possible.

finally, since the two strings use the same character array, I know that the array is long enough to hold the original string (it already does, and it has been properly verified to be no longer than the array, as any good programmer would do), which means it is also long enough to hold the right-most end of that same string since that is shorter than the original.

An example:

char buf[4096];

Buf is initialized to "abc def ghi jkl" which is a 16 character string with the terminating null on the end.

strcpy(a, a+4);

simply converts that string to "def ghi jkl" with the terminating null still in place.

Before the copy, strlen(a) == 15;

n=4 (n < 15)

After the copy, strlen(a)=11;

There is nothing that can break that so long as it is copied left-to-right. Absolutely nothing. n can never be negative, and in fact can not even be zero in my code.

Again, as I have stated MANY times, the above will never misbehave unless someone decides to copy right to left. But with a null-terminated string that is not likely to happen as it is slower.

In the case of Apple, they had to do two bad things. 1. Search for the null and then use that to determine if the strings overlap. Slows things down unnecessarily; and 2, simply print "Abort" with no explanation or anything.

Undefined behavior is not always bad. I know what I am doing when I use integer overflow. I know what I am doing when I choose to allow a parallel race condition. I don't need the compiler to jump in and break the code intentionally. If Apple had simply broken race conditions rather than strcpy() more would be complaining because everyone deals with races on the hash stores. As opposed to locking before writing which is a performance killer with no significant gain.

The link you gave is a classic example of what I consider to be bad behavior. How many things did they break just to make some sort of pedantic stand on undefined behavior means crashing is as acceptable as doing the right thing?

Another point that neither you nor anyone else has addressed...

once you know the strings overlap, why not just call memmove() from within strcpy()? In fact, since they are looking specifically to see if they overlap, why not just change strcpy to this:

memmove(st1, st2, strlen(st2)+1);

and be done with it? Now nothing is broken, the undefined behavior is gone, everybody is happy, no bugs get reported. How is that worse than just changing well-established behavior to become "crash"????
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

wgarvin wrote:
bob wrote:
Rein Halbersma wrote:
bob wrote:Food for thought. The English language is well-defined.
Think again: http://english.stackexchange.com/questi ... nt-regions

Human languages are anything but well-defined, they are extremely context-sensitive (time, place, person).
I'll bet you won't find one native English-speaker that would say "undefined" and "unspecified" don't mean the same basic concept unless you want to get down to a minute semantic war based on what is meant by "is" for example.
Okay, we get that you're still annoyed about it, although I'm surprised that two weeks have passed and you still haven't admitted yet that you're completely wrong! :lol:

That old code was invoking undefined behavior, it has been unsafe code for decades and just because it fortuitously happened to do what you expected, on some platforms and some versions of some compilers, that is no reason to expect it to just continue to work in the future. Your insistence that you ought to be able to rely on this code because "it worked before" is actually rather alarming. Programming languages don't work that way.

If you want to write code with one version of one compiler on one platform, then by all means experiment to see what that specific compiler does and then write programs that depend on its quirks. If you want to write portable programs, or future-proof programs, then you have to follow the rules of the language spec, not just the quirks of one compiler. If the spec says "X is undefined behavior" it is basically saying "you're not allowed to do this, anything at all might happen and we accept no responsibility for the consequences".

I stand by my statement that a lot of C and C++ programmers, even ones with many years of daily experience, don't really know how undefined behavior works, and what they don't know can hurt them!


Roger Miller's basketball analogy seems like a pretty good one:
Somebody once told me that in basketball you can't hold the ball and run. I got a basketball and tried it and it worked just fine. He obviously didn't understand basketball.
Q: "But why can't you hold the ball and run in basketball?"
A: "Because it's against the rules" or "because that's not the way the game is played."

[Edit: my post count is evil 666. Maybe I should never post here again!!]
Couple of points.

1. I know what undefined means. I've programmed in C long enough to even remember why it was originally classified that way. Hint: my code is not an example. It was recognized to be an issue when the overlap is in the other direction, namely strcpy(a+n, a); I get that. I got that when I wrote my own string functions for Fortran.

2. I don't advocate (in general) using undefined behavior. However most all chess programmers do, in the case of hardware races on memory stores to the hash table. You can find those old discussions that have fired up multiple times here. You can eliminate the race, at a HUGE cost in performance. Or you can ignore the race, since the actual races to store at the same memory address are not that common. And you can protect yourself from the potential "undefined results" with a little clever programming (lockless hash as just one example).

3. This has not been about whether my overlapping strcpy() was right, wrong or indifferent. It has been about Apple capriciously breaking it. And it didn't just break my code. Older versions of many pieces of software now crash until they are fixed. Even the ubiquitous bash shell started to crash under Mavericks. Apple had three choices:

(a) do nothing, which is what has been done for at least the 35 or so years I have used C.

(b) actually fix it by mapping it to memmove() which works correctly for overlapping source/destination.

(c) capriciously cause it to crash every time it is detected, breaking a ton of existing software.

I claim (c) is the worst possible scenario they could have followed. It should not have been done. If they had to do ANYTHING, why not (b) and actually fix it so that it works whether the source/destination overlap or not? How, exactly, would that be worse than just breaking all programs that use it? Fix 'em.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: A note for C programmers

Post by Rein Halbersma »

bob wrote:
Rein Halbersma wrote:
bob wrote:Food for thought. The English language is well-defined.
Think again: http://english.stackexchange.com/questi ... nt-regions

Human languages are anything but well-defined, they are extremely context-sensitive (time, place, person).
I'll bet you won't find one native English-speaker that would say "undefined" and "unspecified" don't mean the same basic concept unless you want to get down to a minute semantic war based on what is meant by "is" for example.
Unfortunately, computer programs are not (yet) written in English. You write in C, and C happens to define those terms differently -and as Miguel epxlained, more precisely according to their etymological roots- than in colloquial speech. The Standard is what is relevant here, not your dictionary.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: A note for C programmers

Post by AlvaroBegue »

bob wrote:You are simply not reading what I have written. Let me try once more. There is ABSOLUTELY no way my use of strcpy() will break.
I think you are the one not reading. I posted this link: http://stackoverflow.com/questions/1293 ... mplemented

It's not a long thread. In it, someone actually gets his code to behave inappropriately when changing compilers, even though all the same guarantees you list are satisfied, and the reason for the code breaking seems to be an optimization the compiler did, assuming the buffers don't overlap.
Last edited by AlvaroBegue on Thu Dec 05, 2013 8:41 pm, edited 1 time in total.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: A note for C programmers

Post by Rein Halbersma »

bob wrote:
Rein Halbersma wrote:
bob wrote:Food for thought. The English language is well-defined.
Think again: http://english.stackexchange.com/questi ... nt-regions

Human languages are anything but well-defined, they are extremely context-sensitive (time, place, person).
BTW, here is another "undefined" activity. A race condition in a parallel program. Shouldn't happen, correct? But EVERYBODY is allowing race conditions when they store in their hash table. I suppose it would be OK for Apple to just crash any program where that happens? Even though we KNOW what we are doing?

That's pretty malicious compiler behavior, IMHO. Just because you should do something because it MIGHT cause a problem if you don't know what you are doing does not mean you shouldn't do it if you do know what to expect.
Glad you brought that up. Yes, your classic hash XOR paper, for all its beauty, exploits undefined behavior. And no, you DON'T know what you are doing unless you write in assembly and bypass the C compiler for that piece of code. Expect a crash report any time soon.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: A note for C programmers

Post by Rein Halbersma »

bob wrote: Undefined behavior is not always bad. I know what I am doing when I use integer overflow. I know what I am doing when I choose to allow a parallel race condition. I don't need the compiler to jump in and break the code intentionally.
Driving 150mph on the highway is not always bad. Michael Schumacher knows what he is doing when speeding. He doesn't need highway police to jump in and break his ride intentionally. Yet, the same traffic laws apply to him as to you and me.

So again: if you want to use a particular implementation of a library function, write it yourself; if you want a particular implementation of integer overflow, write it in assembly. Don't get mad when your compiler vendor does not provide it for you.