A note for C programmers

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

A note for C programmers

Post by bob »

Just discovered this bug in Crafty, which worked on gcc compiler versions thru 4.7.2, but when I moved to 4.7.3 on my macbook, boom.

this is the issue:

char buffer[256] = "abcdef";

strcpy (buffer, buffer+2);

Has always worked in the past, even though the man page says "two strings should not overlap." After tracking it down, I remembered a sharp put-down by Linus Torvalds a year or two back on this subject with the glibc folks. They decided to copy right to left, for no good reason. And as he pointed out, this broke tons of programs that had used the right-to-left copy assumption. They replied, "but the man page says don't do it, so what's the problem?" He answered "because you are breaking thousands of programs that have used it for years. What's the advantage in the change? It is no faster to copy right to left than left to right (just set the direction flag). With no advantage to change, why change? just because you can doesn't cut it." Apparently they didn't listen.

I only presume this is really a glibc problem on Mavericks that was not there on Mountain Lion, although on Mountain Lion I did install gcc 4.7.2, which mac ports now avoids in favor of 4.7.3.

In any case, beware.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: A note for C programmers

Post by lucasart »

Are you sure it's not memcpy? strcpy has to copy left to right, becuase the copy only stops once a null terminator is found.

Also, what's the practical application of doing such an overlapping copy? I'm a bit curious, because I never had to do that in DiscoCheck (all versions up to 3.7.1 were written in pure C). Seems like hacky code that you should have been designed another way.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

lucasart wrote:Are you sure it's not memcpy? strcpy has to copy left to right, becuase the copy only stops once a null terminator is found.

Also, what's the practical application of doing such an overlapping copy? I'm a bit curious, because I never had to do that in DiscoCheck (all versions up to 3.7.1 were written in pure C). Seems like hacky code that you should have been designed another way.
Absolutely certain.

The code is in ReadPGN() in utility.c...

The failure was here:

skip = strstr(input_buffer, buffer) + strlen(buffer);
strcpy(input_buffer, skip);

I had fixed this in another place a long while back, but about 20 lines below that fix, the above appeared.

input buffer looks like this: (without the quotes)

"e4 e5"

the idea is to parse the e4 and remove it from the buffer, and continue until buffer has nothing left, then read more in and continue...

Works on my linux box, with gcc or intel, works on our cluster compilers/libraries, seems to work everywhere up through Mountain Lion on my macbook, but Mavericks breaks it solidly.

Fix is to copy to temp buffer, then from temp buffer back to main buffer...
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: A note for C programmers

Post by lucasart »

bob wrote:
lucasart wrote:Are you sure it's not memcpy? strcpy has to copy left to right, becuase the copy only stops once a null terminator is found.

Also, what's the practical application of doing such an overlapping copy? I'm a bit curious, because I never had to do that in DiscoCheck (all versions up to 3.7.1 were written in pure C). Seems like hacky code that you should have been designed another way.
Absolutely certain.

The code is in ReadPGN() in utility.c...

The failure was here:

skip = strstr(input_buffer, buffer) + strlen(buffer);
strcpy(input_buffer, skip);

I had fixed this in another place a long while back, but about 20 lines below that fix, the above appeared.

input buffer looks like this: (without the quotes)

"e4 e5"

the idea is to parse the e4 and remove it from the buffer, and continue until buffer has nothing left, then read more in and continue...
So what you want to do is to tokenize a string? Why don't you use strtok? It would be cleaner IMO, and avoid the strcpy completely.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: A note for C programmers

Post by mar »

lucasart wrote:Are you sure it's not memcpy? strcpy has to copy left to right, becuase the copy only stops once a null terminator is found.
I wonder too, how can you strcpy from right? That would require two passes (slower).
OTOH why use hacks like that? It's ugly and slow to strcpy within the buffer. A lexer should tokenize input, copying input is a very bad idea.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

lucasart wrote:
bob wrote:
lucasart wrote:Are you sure it's not memcpy? strcpy has to copy left to right, becuase the copy only stops once a null terminator is found.

Also, what's the practical application of doing such an overlapping copy? I'm a bit curious, because I never had to do that in DiscoCheck (all versions up to 3.7.1 were written in pure C). Seems like hacky code that you should have been designed another way.
Absolutely certain.

The code is in ReadPGN() in utility.c...

The failure was here:

skip = strstr(input_buffer, buffer) + strlen(buffer);
strcpy(input_buffer, skip);

I had fixed this in another place a long while back, but about 20 lines below that fix, the above appeared.

input buffer looks like this: (without the quotes)

"e4 e5"

the idea is to parse the e4 and remove it from the buffer, and continue until buffer has nothing left, then read more in and continue...
So what you want to do is to tokenize a string? Why don't you use strtok? It would be cleaner IMO, and avoid the strcpy completely.
No, I simply want to remove the first "token" from the string, leaving the rest intact (and un-parsed because I am not sure I have a complete token at the end of the string. Code is extremely old, so I was unaware of this overlap at all. This same code parses the pgn moves, the tags, the result, etc. I can not tell whether I have read everything in until I finish processing whatever I get. I've always used strtok() to parse, but when you have different types of data in the same buffer, it becomes more complex.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

mar wrote:
lucasart wrote:Are you sure it's not memcpy? strcpy has to copy left to right, becuase the copy only stops once a null terminator is found.
I wonder too, how can you strcpy from right? That would require two passes (slower).
OTOH why use hacks like that? It's ugly and slow to strcpy within the buffer. A lexer should tokenize input, copying input is a very bad idea.
I would not venture a guess without looking. All I can say for certain is that the above breaks on Mavericks but nowhere else. Removing the overlap and all is fine.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: A note for C programmers

Post by syzygy »

Some examples:
http://demin.ws/blog/english/2011/07/14 ... d-strings/

It seems many implementations first determine the length of the string, then copy part of it in chunks of 8 or 4 bytes. Once you know the length, you can also do this from right to left.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: A note for C programmers

Post by syzygy »

syzygy wrote:It seems many implementations first determine the length of the string
This might not be what is happening. It seems glibc provides an optimised implementation if the string length is known at compile time.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

syzygy wrote:
syzygy wrote:It seems many implementations first determine the length of the string
This might not be what is happening. It seems glibc provides an optimised implementation if the string length is known at compile time.
It's moot anyway. :) Already fixed. I am not sure what apple uses for their glibc. All my other systems don't have this problem, so it is either something very recent in glibc, or else it is an apple "addition".