A note for C programmers

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bnemias
Posts: 373
Joined: Thu Aug 14, 2008 3:21 am
Location: Albuquerque, NM

Re: A note for C programmers

Post by bnemias »

mvk wrote:
michiguel wrote:In this particular case, rather than trusting an undefined behavior, would not be better to write a dedicated left2right_strcpy(), which should be one or two lines of code?
That, or memmove(), which is designed to move regions within the same buffer correctly, and handling overlaps ok.
Exactly. Most strcpy() docs explicitly mention that overlapped string behavior is undefined, and redirect you to memmove() in that case.
bnemias
Posts: 373
Joined: Thu Aug 14, 2008 3:21 am
Location: Albuquerque, NM

Re: A note for C programmers

Post by bnemias »

bob wrote:
bnemias wrote:
bob wrote:
bnemias wrote:
bob wrote:Changing the library to make it suddenly not work seems foolish when there is absolutely ZERO reason to do so.
Because there is reason to do it. It catches bugs. Perhaps not in your code, but any code that relies on the implementation of a subroutine is ripe for bugs. I don't have any idea if that is in fact the reason they made the change or not. But when you write code that depends on the implementation of a subroutine, you open yourself up to breakage whenever that subroutine changes-- whatever the reason. There may be important reasons why that subroutine must change, and to hold back a library just because some people make bad assumptions about the library is absurd.

It's writing solid code 101. Do not write code that depends on the implementation of the libraries you use.

Hm, so when I do a sort, is it safe or not? When I do a read() does it stop at my byte count, or stop at the number of bytes left in the file, or does it read more than I intended? Can I use this procedure inside a thread or can I not?

I depend on the behavior of functions all over the place.
behavior != implementation. you should not rely on how the functions you use do what they do. you should rely on them to work according to the docs, but assume nothing about how they are implemented.
Please return to the docs of the 80's or 90's and look at the man page for strcpy().

I don't re-read man pages every few weeks...
I'm sorry, that's not a rational argument. You take advantage of undefined behavior, you pay the price when the library changes. That's just how it is.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: A note for C programmers

Post by mvk »

bob wrote: Please return to the docs of the 80's or 90's and look at the man page for strcpy().

I don't re-read man pages every few weeks...

Code: Select all

DESCRIPTION
       The  strcpy()  function	copies the string pointed to by src (including
       the terminating '\0' character) to the array pointed to by  dest.  The
       strings	may not overlap, and the destination string dest must be large
       enough to receive the copy.

[... snip ...]

GNU				  1993-04-11			     STRCPY(3)
K&R 1988 says the same:
K&R wrote:B3. String functions: <string.h>

...

Except for memmove, the behavior is undefined if copying takes place between overlapping objects.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

bnemias wrote:
bob wrote:
bnemias wrote:
bob wrote:
bnemias wrote:
bob wrote:Changing the library to make it suddenly not work seems foolish when there is absolutely ZERO reason to do so.
Because there is reason to do it. It catches bugs. Perhaps not in your code, but any code that relies on the implementation of a subroutine is ripe for bugs. I don't have any idea if that is in fact the reason they made the change or not. But when you write code that depends on the implementation of a subroutine, you open yourself up to breakage whenever that subroutine changes-- whatever the reason. There may be important reasons why that subroutine must change, and to hold back a library just because some people make bad assumptions about the library is absurd.

It's writing solid code 101. Do not write code that depends on the implementation of the libraries you use.

Hm, so when I do a sort, is it safe or not? When I do a read() does it stop at my byte count, or stop at the number of bytes left in the file, or does it read more than I intended? Can I use this procedure inside a thread or can I not?

I depend on the behavior of functions all over the place.
behavior != implementation. you should not rely on how the functions you use do what they do. you should rely on them to work according to the docs, but assume nothing about how they are implemented.
Please return to the docs of the 80's or 90's and look at the man page for strcpy().

I don't re-read man pages every few weeks...
I'm sorry, that's not a rational argument. You take advantage of undefined behavior, you pay the price when the library changes. That's just how it is.
It wasn't "undefined" when I started using C.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

mvk wrote:
bob wrote: Please return to the docs of the 80's or 90's and look at the man page for strcpy().

I don't re-read man pages every few weeks...

Code: Select all

DESCRIPTION
       The  strcpy&#40;)  function	copies the string pointed to by src &#40;including
       the terminating '\0' character&#41; to the array pointed to by  dest.  The
       strings	may not overlap, and the destination string dest must be large
       enough to receive the copy.

&#91;... snip ...&#93;

GNU				  1993-04-11			     STRCPY&#40;3&#41;
K&R 1988 says the same:
K&R wrote:B3. String functions: <string.h>

...

Except for memmove, the behavior is undefined if copying takes place between overlapping objects.
Here is the key point.

"undefined" does NOT imply "the lib guys are suddenly going to start checking for overlapping addresses, and simply print "Abort" if it is detected. I consider that beyond lousy software development. I consider "undefined" to be a hack. It should work. Or it should not work. Period. Either is acceptable. Not a sudden change, when the hardware has not changed one bit.
bnemias
Posts: 373
Joined: Thu Aug 14, 2008 3:21 am
Location: Albuquerque, NM

Re: A note for C programmers

Post by bnemias »

bob wrote:
mvk wrote:
bob wrote: Please return to the docs of the 80's or 90's and look at the man page for strcpy().

I don't re-read man pages every few weeks...

Code: Select all

DESCRIPTION
       The  strcpy&#40;)  function	copies the string pointed to by src &#40;including
       the terminating '\0' character&#41; to the array pointed to by  dest.  The
       strings	may not overlap, and the destination string dest must be large
       enough to receive the copy.

&#91;... snip ...&#93;

GNU				  1993-04-11			     STRCPY&#40;3&#41;
K&R 1988 says the same:
K&R wrote:B3. String functions: <string.h>

...

Except for memmove, the behavior is undefined if copying takes place between overlapping objects.
Here is the key point.

"undefined" does NOT imply "the lib guys are suddenly going to start checking for overlapping addresses, and simply print "Abort" if it is detected. I consider that beyond lousy software development.
The irony here is that is exactly what you do in libraries to detect bugs in the calling code. And you consider that "beyond lousy."
bnemias
Posts: 373
Joined: Thu Aug 14, 2008 3:21 am
Location: Albuquerque, NM

Re: A note for C programmers

Post by bnemias »

bob wrote:It wasn't "undefined" when I started using C.
Most people when they see this, realize their code is assuming something it shouldn't. Perhaps like you they weren't aware of the overlapping restriction. Generally they just fix their code without making a big stink about it. Problem solved.

Perhaps this change is the precursor to a real implementation change on the horizon. And this was their way of preparing you for that change so you could easily track the problem.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: A note for C programmers

Post by michiguel »

bob wrote:
michiguel wrote:
bob wrote:
mvk wrote:
lucasart wrote:
bob wrote: What on earth is so bad about copying the end of a string to the beginning?
It's a O(N^2) parsing algorithm, instead of O(N) for normal parsing. Plus it's ugly and relies on an undefined strcpy() behaviour, apparently (although I would never have guessed that strcpy() could be implemented right to left).
bob wrote: As far as "a complete beginner" I am quite a bit beyond that...
I do not doubt it. All you have to do is confess that you wrote a piece of newbie-like code 40 years ago :wink:
At least Crafty is safe from PGN viruses now.

I love the claim that ReadPGN is supposedly two decades older than PGN, and the implication that Fortran has the equivalent of strcpy. All distracting from the bug, which was Crafty's alone. No, shoot the messenger instead.
There is absolutely ZERO danger here to any virus or security threat. None. Nada. That discussion is pointless.

I am copying within a buffer. One that was filled with a safe read with the proper byte count. Impossible to overflow or overwrite...

1. SOME of us saved games in the 70's. Not exactly PGN, but the part that parses the list of moves is exactly the same as was always done, sorry. Blitz/Cray Blitz saved everything in such a file. Time controls, time for each move, each move itself, etc. Just because there was no PGN standard doesn't mean there was no way to save a complete game and use it later for whatever purpose... In other words, a lack of knowledge on your part doesn't constitute an exaggeration on my part. COKO IV had the ability to save a game in a machine-readable form and then go back through it on demand, as did Greenblatt's program...

2. Fortran had NO strings when my program was started. See standard Fortran-66.

I'm not shooting any messenger at all. I happen to agree with Torvalds that if something works today, it should work tomorrow unless there is a good reason to break it. On Apple Mavericks, all you see is "Abort". No hint about what the problem was, no nothing. Running under the debugger shows absolutely nothing either. Nice software design.

It is impossible to make a language idiot-proof, trying to do so is a waste of time for the compiler people / library people, an irritation to the programmers smart enough to know what they are doing, and it STILL does not prevent a stupid programmer from doing stupid things that cause buffer overruns and such.
There are two completely separate issues.
1) the decision to change the inner workings of a library, knowing that it could break old, non-maintained but still used, buggy programs.
2) the decision to deliberately write _today_ non-standard code with undefined behavior, just because it worked before.

#1 is pedantic and arrogant, #2 is an extremely bad idea.

So, thanks for warning about #1, but that does not make #2 a reasonable option, no matter how we look at it.

In this particular case, rather than trusting an undefined behavior, would not be better to write a dedicated left2right_strcpy(), which should be one or two lines of code?

Miguel
Why would I write a left-to-write string copy when I have had one forever in the C library? :)
That is the point, you never had one in strcpy.

Miguel

I suppose one could be safe and NEVER use library calls, but it sure makes the code grow larger.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:
mvk wrote:
lucasart wrote:
bob wrote: What on earth is so bad about copying the end of a string to the beginning?
It's a O(N^2) parsing algorithm, instead of O(N) for normal parsing. Plus it's ugly and relies on an undefined strcpy() behaviour, apparently (although I would never have guessed that strcpy() could be implemented right to left).
bob wrote: As far as "a complete beginner" I am quite a bit beyond that...
I do not doubt it. All you have to do is confess that you wrote a piece of newbie-like code 40 years ago :wink:
At least Crafty is safe from PGN viruses now.

I love the claim that ReadPGN is supposedly two decades older than PGN, and the implication that Fortran has the equivalent of strcpy. All distracting from the bug, which was Crafty's alone. No, shoot the messenger instead.
There is absolutely ZERO danger here to any virus or security threat. None. Nada. That discussion is pointless.

I am copying within a buffer. One that was filled with a safe read with the proper byte count. Impossible to overflow or overwrite...

1. SOME of us saved games in the 70's. Not exactly PGN, but the part that parses the list of moves is exactly the same as was always done, sorry. Blitz/Cray Blitz saved everything in such a file. Time controls, time for each move, each move itself, etc. Just because there was no PGN standard doesn't mean there was no way to save a complete game and use it later for whatever purpose... In other words, a lack of knowledge on your part doesn't constitute an exaggeration on my part. COKO IV had the ability to save a game in a machine-readable form and then go back through it on demand, as did Greenblatt's program...

2. Fortran had NO strings when my program was started. See standard Fortran-66.

I'm not shooting any messenger at all. I happen to agree with Torvalds that if something works today, it should work tomorrow unless there is a good reason to break it. On Apple Mavericks, all you see is "Abort". No hint about what the problem was, no nothing. Running under the debugger shows absolutely nothing either. Nice software design.

It is impossible to make a language idiot-proof, trying to do so is a waste of time for the compiler people / library people, an irritation to the programmers smart enough to know what they are doing, and it STILL does not prevent a stupid programmer from doing stupid things that cause buffer overruns and such.
There are two completely separate issues.
1) the decision to change the inner workings of a library, knowing that it could break old, non-maintained but still used, buggy programs.
2) the decision to deliberately write _today_ non-standard code with undefined behavior, just because it worked before.

#1 is pedantic and arrogant, #2 is an extremely bad idea.

So, thanks for warning about #1, but that does not make #2 a reasonable option, no matter how we look at it.

In this particular case, rather than trusting an undefined behavior, would not be better to write a dedicated left2right_strcpy(), which should be one or two lines of code?

Miguel
Why would I write a left-to-write string copy when I have had one forever in the C library? :)
That is the point, you never had one in strcpy.

Miguel
Actually I did. I LOOKED at strcpy(). My original C manual gave the C source for many of the library functions. It is not exactly "big". There's no sensible way to do strcpy OTHER than left-to-right since strings are defined left-to-right in C. What is NOT sensible is breaking it in a stupid way so that you don't even bother informing the user what went wrong.



I suppose one could be safe and NEVER use library calls, but it sure makes the code grow larger.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: A note for C programmers

Post by mvk »

bob wrote:
mvk wrote:
bob wrote:On Apple Mavericks, all you see is "Abort". No hint about what the problem was, no nothing. Running under the debugger shows absolutely nothing either. Nice software design.
There is a message logged to the system log, as was mentioned before. Just open de Console application and read "crafty: detected source and destination buffer overlap". And then a nice readable crash report is saved in ~/Library/Logs/DiagnosticsReports/, containing, amongst others, a stack trace.
Fine. I am SURE every C programmer on the planet is aware of such. Makes a lot more sense than displaying an error message right out of the library like "Abort(source and destination strings overlap)". Let's put it in a non-obvious place and see if the programmer can find it.
The take home lesson, also nice for your students, is one can become more productive if one spends some time learning the tools one uses. In this case, syslog is a good place to look when a process has died unexpectedly.