A note for C programmers

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: A note for C programmers

Post by Sven »

mvk wrote:I love the claim that ReadPGN is supposedly two decades older than PGN
Actually Crafty has had a ReadPGN() function since version 15.17 which appeared around 1998. The code snippet discussed above did not change since then but at least ReadPGN() is not older than approx. 15 years. Whether this kind of overlapping strcpy() stuff was already present in CrayBlitz is unknown to me but if it was then most probably it was not used for PGN parsing since the PGN standard was created in 1994 and CrayBlitz played its last tournament around that time.

Already since 15.17 ReadPGN() has also one other place where overlapping copy is used but there it is solved correctly (I am ignoring efficiency aspects here) through a temp buffer, so there must have been some awareness of the "undefined behaviour" issue back then already ... The problematic code snippet (actually there are two in ReadPGN()) even starts with a redundant "strcpy(temp, input_buffer)" statement where the temp buffer is not used afterwards, so it looks as if the correct implementation was planned but somehow did not make it into the release.

Sven
bnemias
Posts: 373
Joined: Thu Aug 14, 2008 3:21 am
Location: Albuquerque, NM

Re: A note for C programmers

Post by bnemias »

bob wrote:Changing the library to make it suddenly not work seems foolish when there is absolutely ZERO reason to do so.
Because there is reason to do it. It catches bugs. Perhaps not in your code, but any code that relies on the implementation of a subroutine is ripe for bugs. I don't have any idea if that is in fact the reason they made the change or not. But when you write code that depends on the implementation of a subroutine, you open yourself up to breakage whenever that subroutine changes-- whatever the reason. There may be important reasons why that subroutine must change, and to hold back a library just because some people make bad assumptions about the library is absurd.

It's writing solid code 101. Do not write code that depends on the implementation of the libraries you use.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: A note for C programmers

Post by syzygy »

bnemias wrote:
bob wrote:Changing the library to make it suddenly not work seems foolish when there is absolutely ZERO reason to do so.
Because there is reason to do it. It catches bugs. Perhaps not in your code, (...)
It only "catches" bugs in the sense that it breaks existing code like it broke crafty. This type of "catching" bugs I do not consider a good reason.

A good reason to break very reasonable assumptions would be if this allowed a clearly more efficient implementation. I doubt that is the case here.
bnemias
Posts: 373
Joined: Thu Aug 14, 2008 3:21 am
Location: Albuquerque, NM

Re: A note for C programmers

Post by bnemias »

syzygy wrote:This type of "catching" bugs I do not consider a good reason.
I do, at least from a library user's perspective-- I want to remove all such invalid assumptions from my code. But yeah, from a library maintainer's perspective, I would opt not to make changes solely for that reason.

Not that this is why the change was made. Bob asserts it was done for no good reason. Somehow, I'm unwilling to believe that until I see the actual changelog.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: A note for C programmers

Post by Rein Halbersma »

bob wrote:I remembered a sharp put-down by Linus Torvalds a year or two back on this subject with the glibc folks. They decided to copy right to left, for no good reason. And as he pointed out, this broke tons of programs that had used the right-to-left copy assumption. They replied, "but the man page says don't do it, so what's the problem?" He answered "because you are breaking thousands of programs that have used it for years. What's the advantage in the change? It is no faster to copy right to left than left to right (just set the direction flag). With no advantage to change, why change? just because you can doesn't cut it." Apparently they didn't listen.
http://sourceware.org/bugzilla/show_bug.cgi?id=12518

The underlying reason for breakage is that memcpy for overlapping ranges was changed somewhere around glibc 2.13. The left-to-right vs right-to-left behavior was also undefined for memcpy. The direction actually makes a performance difference depending on the processor architecture, and the glibc folks actually took advantage of that freedom, but breaking a lot of code in the processs. Linus argued that they should have used memmove with a check for overlap.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: A note for C programmers

Post by syzygy »

bnemias wrote:
syzygy wrote:This type of "catching" bugs I do not consider a good reason.
I do, at least from a library user's perspective-- I want to remove all such invalid assumptions from my code. But yeah, from a library maintainer's perspective, I would opt not to make changes solely for that reason.
I am pretty sure it is usually the other way around. Some maintainers knowingly break reasonable assumptions essentially just because they can. They feel morally superior and consider that the inferiors should be punished and scared away. In the process, they break applications such as Adobe's flashplayer and give (for example) Linux a bad name.

The fact is that bugs are made, not only by inferior programmers but also by first class programmers. If there is a very good reason to break existing buggy programs then so be it, but imho what is at most a microoptimisation in strcpy() shouldn't be allowed to break existing binaries.

Some more information:
https://bugzilla.redhat.com/show_bug.cgi?id=638477
http://sourceware.org/bugzilla/show_bug.cgi?id=12518
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

mcostalba wrote:
lucasart wrote: I do not doubt it. All you have to do is confess that you wrote a piece of newbie-like code 40 years ago :wink:
What is amazing to me is that there is 40 years old code still never touched. The fact that 'it works perfectly' is not a reason for me (perhaps I am a bit odd, but many software developers are odd too, they like to write software), I have fully rewritten many times entire parts of code even though it was working perfectly. Software is a live thing. If you don't maintain it, it slowly dies...even if it works perfectly.
Why would you rewrite code that parses PGN? Is that going to make the engine stronger?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

mvk wrote:
lucasart wrote:
bob wrote: What on earth is so bad about copying the end of a string to the beginning?
It's a O(N^2) parsing algorithm, instead of O(N) for normal parsing. Plus it's ugly and relies on an undefined strcpy() behaviour, apparently (although I would never have guessed that strcpy() could be implemented right to left).
bob wrote: As far as "a complete beginner" I am quite a bit beyond that...
I do not doubt it. All you have to do is confess that you wrote a piece of newbie-like code 40 years ago :wink:
At least Crafty is safe from PGN viruses now.

I love the claim that ReadPGN is supposedly two decades older than PGN, and the implication that Fortran has the equivalent of strcpy. All distracting from the bug, which was Crafty's alone. No, shoot the messenger instead.
There is absolutely ZERO danger here to any virus or security threat. None. Nada. That discussion is pointless.

I am copying within a buffer. One that was filled with a safe read with the proper byte count. Impossible to overflow or overwrite...

1. SOME of us saved games in the 70's. Not exactly PGN, but the part that parses the list of moves is exactly the same as was always done, sorry. Blitz/Cray Blitz saved everything in such a file. Time controls, time for each move, each move itself, etc. Just because there was no PGN standard doesn't mean there was no way to save a complete game and use it later for whatever purpose... In other words, a lack of knowledge on your part doesn't constitute an exaggeration on my part. COKO IV had the ability to save a game in a machine-readable form and then go back through it on demand, as did Greenblatt's program...

2. Fortran had NO strings when my program was started. See standard Fortran-66.

I'm not shooting any messenger at all. I happen to agree with Torvalds that if something works today, it should work tomorrow unless there is a good reason to break it. On Apple Mavericks, all you see is "Abort". No hint about what the problem was, no nothing. Running under the debugger shows absolutely nothing either. Nice software design.

It is impossible to make a language idiot-proof, trying to do so is a waste of time for the compiler people / library people, an irritation to the programmers smart enough to know what they are doing, and it STILL does not prevent a stupid programmer from doing stupid things that cause buffer overruns and such.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

Sven Schüle wrote:
mvk wrote:I love the claim that ReadPGN is supposedly two decades older than PGN
Actually Crafty has had a ReadPGN() function since version 15.17 which appeared around 1998. The code snippet discussed above did not change since then but at least ReadPGN() is not older than approx. 15 years. Whether this kind of overlapping strcpy() stuff was already present in CrayBlitz is unknown to me but if it was then most probably it was not used for PGN parsing since the PGN standard was created in 1994 and CrayBlitz played its last tournament around that time.

Already since 15.17 ReadPGN() has also one other place where overlapping copy is used but there it is solved correctly (I am ignoring efficiency aspects here) through a temp buffer, so there must have been some awareness of the "undefined behaviour" issue back then already ... The problematic code snippet (actually there are two in ReadPGN()) even starts with a redundant "strcpy(temp, input_buffer)" statement where the temp buffer is not used afterwards, so it looks as if the correct implementation was planned but somehow did not make it into the release.

Sven
ReadPGN was present for a lot longer than that, just not in a separate function. Crafty's opening book has ALWAYS been created by reading a PGN file. I elected to parse this way because unix does not have "record boundaries" formally, particularly when using the normal read() system call. This means you can read N bytes from a file containing chess moves, and break a move in half if it falls at the end of the bytes you read. My quick-and-dirty approach was to use a large buffer, and as it neared empty, read another block and concatenate. Simple enough and it worked almost forever. And still does except for the Mac OS X Mavericks change.

I won't begin to speculate why the bug was in one place and not another. It was written so long ago I can't begin to recall.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

bnemias wrote:
bob wrote:Changing the library to make it suddenly not work seems foolish when there is absolutely ZERO reason to do so.
Because there is reason to do it. It catches bugs. Perhaps not in your code, but any code that relies on the implementation of a subroutine is ripe for bugs. I don't have any idea if that is in fact the reason they made the change or not. But when you write code that depends on the implementation of a subroutine, you open yourself up to breakage whenever that subroutine changes-- whatever the reason. There may be important reasons why that subroutine must change, and to hold back a library just because some people make bad assumptions about the library is absurd.

It's writing solid code 101. Do not write code that depends on the implementation of the libraries you use.

Hm, so when I do a sort, is it safe or not? When I do a read() does it stop at my byte count, or stop at the number of bytes left in the file, or does it read more than I intended? Can I use this procedure inside a thread or can I not?

I depend on the behavior of functions all over the place.