A note for C programmers

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

michiguel wrote:
bob wrote:Food for thought. The English language is well-defined. Here's an online thesaurus lookup for unspecified.

Synonyms for unspecified
adj not specified

* undefined
* undetermined
* general
* unmentioned
* vague

Pretty sad when a standards committee begins to redefine commonly used words to mean something other than their usual meanings...
The language of the C standard is perfect. Specify and define do not have identical meanings, and they used it correctly. Specify involves explicit numeration, define involves probably an explanation and a description of the limits..

Miguel
You will have to run that one by me again. I'm a native English speaker, and I consider the two terms to be almost identical.

Definitions:

specify:

identify clearly and definitely.

define:

state or describe exactly the nature, scope, or meaning of.

There choice of terms is anything BUT "reasonable and accepted use". The definition of specify says nothing at all about "explicit enumeration".
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

mvk wrote:Here is a quote:
"If you fail to follow normal programming practices, and it blows up when you run the thing, that's hardly something to whack the vendor about. After all you _can_ put your foot under a running lawnmower, but should you do so, you don't have much reason to complain about the result. This is the same kind of thing. You _can_ do some things, but the question is _should_ you do them and if you do, who is responsible?"
Apples and oranges.

This had been done for years. It has worked for as long as I have programmed in C, which is probably at LEAST as long as anyone here. Why should they SUDDENLY break it?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

Rein Halbersma wrote:
bob wrote: I didn't "think" I understood strcpy(). I DID understand strcpy(). I STILL understand strcpy(). take a look at the source. Apple made an arrogant and ill-conceived decision that has brought them a lot of flak from many different directions, just as the glibc guys did with memcpy().
The C Standard is a precise technical specification (and not a dictionary where similar but slightly different terms can be loosely used as exact synonyms) on how to translate and implement a list of programming syntax constructs for an abstract machine into concrete assembly instructions for a concrete machine. Your particular machine and your particular library implementation might do things (signed integer overflow, memcpy, strcpy) in a perfectly well-defined and well-specified way, but that still leaves the compiler leeway in translating your C code to either those or to some other concrete machine instructions.

If you really feel that strongly about it, the proper way is to submit a technical proposal to the C Standard Committee on how to limit the particular piece of undefined or unspecified behavior you find disagreeable. I bet there will be others with a different environment that benefit from the current compiler freedom, and such a proposal would be unlikely to get accepted. The other alternative is to program those offending pieces in assembly for your particular machine.
The argument has already been heard in the case of memcpy(). Torvalds suggested, reasonably, that memcpy() be equated with memmove() and the problem (and the minor hitch in the behavior) completely disappears. No code broken, everything works perfectly, overlapping or not. Same could have been done for strcpy(). Again, no code broken, and a "undefined" operation could be COMPLETELY eliminated. Fix it, or ignore it and do the best you can, don't break it for the hell of it.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: A note for C programmers

Post by Rein Halbersma »

bob wrote:Food for thought. The English language is well-defined.
Think again: http://english.stackexchange.com/questi ... nt-regions

Human languages are anything but well-defined, they are extremely context-sensitive (time, place, person).
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: A note for C programmers

Post by syzygy »

bob wrote:
AlvaroBegue wrote:The standard defines "unspecified behavior", not "unspecified". Similarly with "undefined behavior". You may not like these names, but they are used every day by a lot of people.

"Undefined behavior" means it works when you first run it, it passes all your tests and it explodes in your face when you show it to your boss or your most important customer. :)

Although you are generally right that people shouldn't go around breaking code that has worked forever, I recommend you just fix the code and move on.
Undefined behavior does NOT mean it will explode. Not one single instance of strcpy() will break on the example I gave where you do strcpy(st, st+n). Integer overflow does NOT make checksum break.
It means it MAY explode. The standard allows it. Compiler writers are making more and more use of this freedom. It makes programs faster.

Code: Select all

#include <stdio.h>

int main&#40;)
&#123;
  int i, k = 0;

  for &#40;i = 1; i > 0; i += i&#41;
    k++;
  printf&#40;"k = %d\n", k&#41;;
  return 0;
&#125;
What do you expect as output?
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: A note for C programmers

Post by michiguel »

bob wrote:
michiguel wrote:
bob wrote:Food for thought. The English language is well-defined. Here's an online thesaurus lookup for unspecified.

Synonyms for unspecified
adj not specified

* undefined
* undetermined
* general
* unmentioned
* vague

Pretty sad when a standards committee begins to redefine commonly used words to mean something other than their usual meanings...
The language of the C standard is perfect. Specify and define do not have identical meanings, and they used it correctly. Specify involves explicit numeration, define involves probably an explanation and a description of the limits..

Miguel
You will have to run that one by me again. I'm a native English speaker, and I consider the two terms to be almost identical.
It is not English, it is Latin...

The standard uses the words adhering to the strict spirit of their meaning, not what an abridged thesaurus will say. Still, in the definitions you quote, it is clear that they have two distinct meanings. The key is in the words "identify" and "describe".

define (v.)
late 14c., "to specify; to end," from Old French defenir "to end, terminate, determine," and directly from Latin definire "to limit, determine, explain," from de- "completely" (see de-) + finire "to bound, limit," from finis "boundary, end" (see finish (n.)). Related: Defined; defining.
http://www.etymonline.com/index.php?all ... hmode=none

specify (v.)
early 14c., "to speak;" mid-14c. "to name explicitly," from Old French specifier, especefier (13c.) and directly from Late Latin specificare "mention particularly," from specificus (see specific). Related: Specified; specifying.
http://www.etymonline.com/index.php?all ... hmode=none

Do you really use the words specify and define in the same way?

Miguel
PS: Now this is getting really off-topic and I am out. You can do whatever you want with you code, but I hope no beginner will deep into "undefined behavior" areas of the standard.

Definitions:

specify:

identify clearly and definitely.

define:

state or describe exactly the nature, scope, or meaning of.

There choice of terms is anything BUT "reasonable and accepted use". The definition of specify says nothing at all about "explicit enumeration".
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: A note for C programmers

Post by AlvaroBegue »

bob wrote:
AlvaroBegue wrote:The standard defines "unspecified behavior", not "unspecified". Similarly with "undefined behavior". You may not like these names, but they are used every day by a lot of people.

"Undefined behavior" means it works when you first run it, it passes all your tests and it explodes in your face when you show it to your boss or your most important customer. :)

Although you are generally right that people shouldn't go around breaking code that has worked forever, I recommend you just fix the code and move on.
Undefined behavior does NOT mean it will explode. Not one single instance of strcpy() will break on the example I gave where you do strcpy(st, st+n). Integer overflow does NOT make checksum break.
I am confused by the statement "Not one single instance of strcpy() will break on the example I gave where you do strcpy(st, st+n)." Isn't this whole thread about an instance where it did break?

It is possible that those "does NOT" in your post are correct, but you can't be certain that there isn't a programming environment out there for which they don't hold, and you can't replace them with "will NOT".

The performance of string manipulation is not critical for the types of programs I write most of the time, so I use std::string and generally do things at a pretty high level. This way my code expresses my intent better and I stay out of a lot of the trouble that comes from byte-by-byte manipulation of C-style strings (ease of introducing bugs and buffer-overflow vulnerabilities... and now this new flavor of trouble you just stumbled upon).

Signed integer overflow is allowed to throw an exception, or the compiler could perform some optimization assuming it will not happen. Of course this optimization might happen in one part of the code but not in another, and you would end up with inconsistent checksums.

Unsigned integer overflow, on the other hand, is guaranteed to do exactly what you expect it to do, so use unsigned types for your checksums and you'll be fine.

All you are doing here is showing that your knowledge of programming is getting obsolete. The constructive thing to do is to drop the "get off my lawn" attitude and learn a couple of things. It will actually make you a better programmer.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

syzygy wrote:
bob wrote:
AlvaroBegue wrote:The standard defines "unspecified behavior", not "unspecified". Similarly with "undefined behavior". You may not like these names, but they are used every day by a lot of people.

"Undefined behavior" means it works when you first run it, it passes all your tests and it explodes in your face when you show it to your boss or your most important customer. :)

Although you are generally right that people shouldn't go around breaking code that has worked forever, I recommend you just fix the code and move on.
Undefined behavior does NOT mean it will explode. Not one single instance of strcpy() will break on the example I gave where you do strcpy(st, st+n). Integer overflow does NOT make checksum break.
It means it MAY explode. The standard allows it. Compiler writers are making more and more use of this freedom. It makes programs faster.

Code: Select all

#include <stdio.h>

int main&#40;)
&#123;
  int i, k = 0;

  for &#40;i = 1; i > 0; i += i&#41;
    k++;
  printf&#40;"k = %d\n", k&#41;;
  return 0;
&#125;
What do you expect as output?
back to my point. Apple MADE it explode. If the source and destination overlap, the program crashes, period.

As far as the above, it would depend on word size. On my mac, where int defaults to 32 bits, I'd expect 31. On a Cray, I would expect 63. On an old 8086, I would expect 7.

If the compiler can't deal with the overflow/underflow, it is broken.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

AlvaroBegue wrote:
bob wrote:
AlvaroBegue wrote:The standard defines "unspecified behavior", not "unspecified". Similarly with "undefined behavior". You may not like these names, but they are used every day by a lot of people.

"Undefined behavior" means it works when you first run it, it passes all your tests and it explodes in your face when you show it to your boss or your most important customer. :)

Although you are generally right that people shouldn't go around breaking code that has worked forever, I recommend you just fix the code and move on.
Undefined behavior does NOT mean it will explode. Not one single instance of strcpy() will break on the example I gave where you do strcpy(st, st+n). Integer overflow does NOT make checksum break.
I am confused by the statement "Not one single instance of strcpy() will break on the example I gave where you do strcpy(st, st+n)." Isn't this whole thread about an instance where it did break?
It broke because Apple broke it. Apple introduced a check for overlapping source and destination and aborted if it was detected. Code works perfectly.


It is possible that those "does NOT" in your post are correct, but you can't be certain that there isn't a programming environment out there for which they don't hold, and you can't replace them with "will NOT".
I base that on experience. Crafty runs on every C compiler / operating system combo on the planet. And it has always been able to create an opening book with no problems, which is the code that uses the above strcpy(). That's what I base it on. When Crafty doesn't work on some platform, I hear about it quickly and fix it.

The performance of string manipulation is not critical for the types of programs I write most of the time, so I use std::string and generally do things at a pretty high level. This way my code expresses my intent better and I stay out of a lot of the trouble that comes from byte-by-byte manipulation of C-style strings (ease of introducing bugs and buffer-overflow vulnerabilities... and now this new flavor of trouble you just stumbled upon).
As the saying goes, "strings are strings". They are handled byte-by-byte regardless of what you do. In a language like (say) PL/1, where you do string assignments directly, you can do the same thing I did in PL/1 using the substr() function. Or forget strings and just copy one array to another, where the two arrays overlap. Etc.


Signed integer overflow is allowed to throw an exception, or the compiler could perform some optimization assuming it will not happen. Of course this optimization might happen in one part of the code but not in another, and you would end up with inconsistent checksums.

Unsigned integer overflow, on the other hand, is guaranteed to do exactly what you expect it to do, so use unsigned types for your checksums and you'll be fine.

All you are doing here is showing that your knowledge of programming is getting obsolete. The constructive thing to do is to drop the "get off my lawn" attitude and learn a couple of things. It will actually make you a better programmer.
I do well enough programming, thank you. My point is that Apple broke something INTENTIONALLY that works everywhere else, on every other compiler and library. That's a bit ridiculous. Or capricious.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

Rein Halbersma wrote:
bob wrote:Food for thought. The English language is well-defined.
Think again: http://english.stackexchange.com/questi ... nt-regions

Human languages are anything but well-defined, they are extremely context-sensitive (time, place, person).
I'll bet you won't find one native English-speaker that would say "undefined" and "unspecified" don't mean the same basic concept unless you want to get down to a minute semantic war based on what is meant by "is" for example.