A note for C programmers

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

bnemias wrote:
syzygy wrote:This type of "catching" bugs I do not consider a good reason.
I do, at least from a library user's perspective-- I want to remove all such invalid assumptions from my code. But yeah, from a library maintainer's perspective, I would opt not to make changes solely for that reason.

Not that this is why the change was made. Bob asserts it was done for no good reason. Somehow, I'm unwilling to believe that until I see the actual changelog.
I don't know that you will see it. This issue is not in current gnu glibc releases, at least none that we have including the most recent Fedora versions. But even worse, Mavericks gives you this when you run crafty and do the following

host% crafty
read pgnfile
Abort
host%

Does THAT seem like a reasonable thing to do? No hint as to what is wrong...

Here's an actual run with the bug, for fun...

scrappy% crafty
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].

Crafty v23.6 (1 cpus)

White(1): read /users/hyatt/crafty/db.pgn
Abort
scrappy%


How's that for "being safe and helpful."

Reminds me of the 1970's era Xerox UTS operating system where if you typed any command incorrectly, it would simply say:

eh?

They eventually improved it to say

eh? (at 29)

where 29 was the character position where parsing broke down.

Giant step backward, IMHO. Which convinced several here to remove OS x and install Linux. I am now considering that myself since this is the second issue I have had (clang 1 year ago would not compile crafty correctly, every other version of gcc/icc/msvc/you-name-it compiled it perfectly. Looks like Apple is using Microsoft as a role model.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

Rein Halbersma wrote:
bob wrote:I remembered a sharp put-down by Linus Torvalds a year or two back on this subject with the glibc folks. They decided to copy right to left, for no good reason. And as he pointed out, this broke tons of programs that had used the right-to-left copy assumption. They replied, "but the man page says don't do it, so what's the problem?" He answered "because you are breaking thousands of programs that have used it for years. What's the advantage in the change? It is no faster to copy right to left than left to right (just set the direction flag). With no advantage to change, why change? just because you can doesn't cut it." Apparently they didn't listen.
http://sourceware.org/bugzilla/show_bug.cgi?id=12518

The underlying reason for breakage is that memcpy for overlapping ranges was changed somewhere around glibc 2.13. The left-to-right vs right-to-left behavior was also undefined for memcpy. The direction actually makes a performance difference depending on the processor architecture, and the glibc folks actually took advantage of that freedom, but breaking a lot of code in the processs. Linus argued that they should have used memmove with a check for overlap.
Note this is "strcpy()" not "memcpy()". Different issue. screwing around with strcpy() for whatever reason seems more "iffy".
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: A note for C programmers

Post by michiguel »

bob wrote:
mvk wrote:
lucasart wrote:
bob wrote: What on earth is so bad about copying the end of a string to the beginning?
It's a O(N^2) parsing algorithm, instead of O(N) for normal parsing. Plus it's ugly and relies on an undefined strcpy() behaviour, apparently (although I would never have guessed that strcpy() could be implemented right to left).
bob wrote: As far as "a complete beginner" I am quite a bit beyond that...
I do not doubt it. All you have to do is confess that you wrote a piece of newbie-like code 40 years ago :wink:
At least Crafty is safe from PGN viruses now.

I love the claim that ReadPGN is supposedly two decades older than PGN, and the implication that Fortran has the equivalent of strcpy. All distracting from the bug, which was Crafty's alone. No, shoot the messenger instead.
There is absolutely ZERO danger here to any virus or security threat. None. Nada. That discussion is pointless.

I am copying within a buffer. One that was filled with a safe read with the proper byte count. Impossible to overflow or overwrite...

1. SOME of us saved games in the 70's. Not exactly PGN, but the part that parses the list of moves is exactly the same as was always done, sorry. Blitz/Cray Blitz saved everything in such a file. Time controls, time for each move, each move itself, etc. Just because there was no PGN standard doesn't mean there was no way to save a complete game and use it later for whatever purpose... In other words, a lack of knowledge on your part doesn't constitute an exaggeration on my part. COKO IV had the ability to save a game in a machine-readable form and then go back through it on demand, as did Greenblatt's program...

2. Fortran had NO strings when my program was started. See standard Fortran-66.

I'm not shooting any messenger at all. I happen to agree with Torvalds that if something works today, it should work tomorrow unless there is a good reason to break it. On Apple Mavericks, all you see is "Abort". No hint about what the problem was, no nothing. Running under the debugger shows absolutely nothing either. Nice software design.

It is impossible to make a language idiot-proof, trying to do so is a waste of time for the compiler people / library people, an irritation to the programmers smart enough to know what they are doing, and it STILL does not prevent a stupid programmer from doing stupid things that cause buffer overruns and such.
There are two completely separate issues.
1) the decision to change the inner workings of a library, knowing that it could break old, non-maintained but still used, buggy programs.
2) the decision to deliberately write _today_ non-standard code with undefined behavior, just because it worked before.

#1 is pedantic and arrogant, #2 is an extremely bad idea.

So, thanks for warning about #1, but that does not make #2 a reasonable option, no matter how we look at it.

In this particular case, rather than trusting an undefined behavior, would not be better to write a dedicated left2right_strcpy(), which should be one or two lines of code?

Miguel
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: A note for C programmers

Post by mcostalba »

bob wrote: Why would you rewrite code that parses PGN? Is that going to make the engine stronger?
Because I enjoy doing it....for how odd it may seem to you.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: A note for C programmers

Post by mvk »

bob wrote:On Apple Mavericks, all you see is "Abort". No hint about what the problem was, no nothing. Running under the debugger shows absolutely nothing either. Nice software design.
There is a message logged to the system log, as was mentioned before. Just open de Console application and read "crafty: detected source and destination buffer overlap". And then a nice readable crash report is saved in ~/Library/Logs/DiagnosticsReports/, containing, amongst others, a stack trace.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: A note for C programmers

Post by mvk »

michiguel wrote:In this particular case, rather than trusting an undefined behavior, would not be better to write a dedicated left2right_strcpy(), which should be one or two lines of code?
That, or memmove(), which is designed to move regions within the same buffer correctly, and handling overlaps ok.
bnemias
Posts: 373
Joined: Thu Aug 14, 2008 3:21 am
Location: Albuquerque, NM

Re: A note for C programmers

Post by bnemias »

bob wrote:
bnemias wrote:
bob wrote:Changing the library to make it suddenly not work seems foolish when there is absolutely ZERO reason to do so.
Because there is reason to do it. It catches bugs. Perhaps not in your code, but any code that relies on the implementation of a subroutine is ripe for bugs. I don't have any idea if that is in fact the reason they made the change or not. But when you write code that depends on the implementation of a subroutine, you open yourself up to breakage whenever that subroutine changes-- whatever the reason. There may be important reasons why that subroutine must change, and to hold back a library just because some people make bad assumptions about the library is absurd.

It's writing solid code 101. Do not write code that depends on the implementation of the libraries you use.

Hm, so when I do a sort, is it safe or not? When I do a read() does it stop at my byte count, or stop at the number of bytes left in the file, or does it read more than I intended? Can I use this procedure inside a thread or can I not?

I depend on the behavior of functions all over the place.
behavior != implementation. you should not rely on how the functions you use do what they do. you should rely on them to work according to the docs, but assume nothing about how they are implemented.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

michiguel wrote:
bob wrote:
mvk wrote:
lucasart wrote:
bob wrote: What on earth is so bad about copying the end of a string to the beginning?
It's a O(N^2) parsing algorithm, instead of O(N) for normal parsing. Plus it's ugly and relies on an undefined strcpy() behaviour, apparently (although I would never have guessed that strcpy() could be implemented right to left).
bob wrote: As far as "a complete beginner" I am quite a bit beyond that...
I do not doubt it. All you have to do is confess that you wrote a piece of newbie-like code 40 years ago :wink:
At least Crafty is safe from PGN viruses now.

I love the claim that ReadPGN is supposedly two decades older than PGN, and the implication that Fortran has the equivalent of strcpy. All distracting from the bug, which was Crafty's alone. No, shoot the messenger instead.
There is absolutely ZERO danger here to any virus or security threat. None. Nada. That discussion is pointless.

I am copying within a buffer. One that was filled with a safe read with the proper byte count. Impossible to overflow or overwrite...

1. SOME of us saved games in the 70's. Not exactly PGN, but the part that parses the list of moves is exactly the same as was always done, sorry. Blitz/Cray Blitz saved everything in such a file. Time controls, time for each move, each move itself, etc. Just because there was no PGN standard doesn't mean there was no way to save a complete game and use it later for whatever purpose... In other words, a lack of knowledge on your part doesn't constitute an exaggeration on my part. COKO IV had the ability to save a game in a machine-readable form and then go back through it on demand, as did Greenblatt's program...

2. Fortran had NO strings when my program was started. See standard Fortran-66.

I'm not shooting any messenger at all. I happen to agree with Torvalds that if something works today, it should work tomorrow unless there is a good reason to break it. On Apple Mavericks, all you see is "Abort". No hint about what the problem was, no nothing. Running under the debugger shows absolutely nothing either. Nice software design.

It is impossible to make a language idiot-proof, trying to do so is a waste of time for the compiler people / library people, an irritation to the programmers smart enough to know what they are doing, and it STILL does not prevent a stupid programmer from doing stupid things that cause buffer overruns and such.
There are two completely separate issues.
1) the decision to change the inner workings of a library, knowing that it could break old, non-maintained but still used, buggy programs.
2) the decision to deliberately write _today_ non-standard code with undefined behavior, just because it worked before.

#1 is pedantic and arrogant, #2 is an extremely bad idea.

So, thanks for warning about #1, but that does not make #2 a reasonable option, no matter how we look at it.

In this particular case, rather than trusting an undefined behavior, would not be better to write a dedicated left2right_strcpy(), which should be one or two lines of code?

Miguel
Why would I write a left-to-write string copy when I have had one forever in the C library? :)

I suppose one could be safe and NEVER use library calls, but it sure makes the code grow larger.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

mvk wrote:
bob wrote:On Apple Mavericks, all you see is "Abort". No hint about what the problem was, no nothing. Running under the debugger shows absolutely nothing either. Nice software design.
There is a message logged to the system log, as was mentioned before. Just open de Console application and read "crafty: detected source and destination buffer overlap". And then a nice readable crash report is saved in ~/Library/Logs/DiagnosticsReports/, containing, amongst others, a stack trace.
Fine. I am SURE every C programmer on the planet is aware of such. Makes a lot more sense than displaying an error message right out of the library like "Abort(source and destination strings overlap)". Let's put it in a non-obvious place and see if the programmer can find it.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

bnemias wrote:
bob wrote:
bnemias wrote:
bob wrote:Changing the library to make it suddenly not work seems foolish when there is absolutely ZERO reason to do so.
Because there is reason to do it. It catches bugs. Perhaps not in your code, but any code that relies on the implementation of a subroutine is ripe for bugs. I don't have any idea if that is in fact the reason they made the change or not. But when you write code that depends on the implementation of a subroutine, you open yourself up to breakage whenever that subroutine changes-- whatever the reason. There may be important reasons why that subroutine must change, and to hold back a library just because some people make bad assumptions about the library is absurd.

It's writing solid code 101. Do not write code that depends on the implementation of the libraries you use.

Hm, so when I do a sort, is it safe or not? When I do a read() does it stop at my byte count, or stop at the number of bytes left in the file, or does it read more than I intended? Can I use this procedure inside a thread or can I not?

I depend on the behavior of functions all over the place.
behavior != implementation. you should not rely on how the functions you use do what they do. you should rely on them to work according to the docs, but assume nothing about how they are implemented.
Please return to the docs of the 80's or 90's and look at the man page for strcpy().

I don't re-read man pages every few weeks...