strcpy() revisited

Discussion of chess software programming and technical issues.

Moderator: Ras

wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: strcpy() revisited

Post by wgarvin »

hgm wrote:Except that the programs that really suffered buffer overflow because of this already had been fixed long before. After all, doing a strcpy(a, b) as while(*a++ = *b++); can only end in two ways: either it works perfectly, or a lies within the sting b, and it will lead to an infinite repetitive string, certainly causing a segfault.

No buffer overflow would ever be caught by this measure that would not have segfaulted by itself.

It only makes a difference for the harmless cases, that worked absolutely correctly.
This post by Dann Corbit in the openchess thread of 2 weeks ago seems to me to be evidence that overlapping calls to strcpy can malfunction without necessarily causing a segfault.

Code: Select all

GCC gave me this:

dcorbit@dcorbit /q/cc
$ cat bozo.c
#include <string.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
char b[32];
strcpy(b, "123456789012345");
strcpy(b + 1, b);
printf("[%s]\n", b);
return 0;
}

dcorbit@dcorbit /q/cc
$ gcc -Wall -ansi -pedantic bozo.c

dcorbit@dcorbit /q/cc
$ ./a
[1123456788012345]

Look at it carefully, is it what you expected?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

wgarvin wrote:
hgm wrote:
bob wrote:You drive a hybrid? I watched one crash and EXPLODE a few months back. No survivors. Shoot, crashing is "undefined behavior" right? No reason to "do the right thing"...
Well, that guy was asking for it. He had not fastened his seatbelt. The specs of the car did not mention what would happen if you turned the contact key without first fastening your seatbelt. Driving without seatbelt is illegal, right? So of course they programmed the car to explode when someone attempts that. That is just a safety measure, to increase the safety of everyone. He might have hurt himself, driving without seatbelt, and we would not want that, would we? If you ask for undefined behavior, you should get it! :lol:
I may have a better car analogy to undefined behavior.

Relying on undefined behavior is like running back and forth across a busy highway, dodging through the cars. "That's illegal, don't do that", say the traffic police. "We have crosswalks for a reason. You're going to get hit by a car. Don't say we didn't warn you."

In that scenario, Bob's reply would be something like "I've been doing this every day since 1968, and I've never been hit by a car before. Any car who actually decides to hit me is stupid, since they can clearly see that I'm running across the road. Why do they not respect my clear intentions." But the compiler is more like minivans driven by a mother of three, distracted by her children. Or maybe like a Mack truck.

Programming languages have rules. The language spec spells out clearly what you can rely on, and what you can't. Programmers who want their programs to work, and keep working in the future, need to learn and follow those rules. They are not the rules of the x86 architecture, but a completely different set of abstract rules that might or might not make any sense. But we still have to follow them, or else accept that our programs will occasionally get hit by a Mack truck.
Bad analogy.

1. Is it a KNOWN bad idea to drive a hybrid-drive car? Not that I am aware of. My ex department chair drives a Prius, for example. But did the designers take all possible precautions to protect the occupants in case of a wreck? No idea. All I can say is I saw what looked like a flashbulb going off, A REALLY big flash bulb, and this car was scattered across 3 lanes of interstate.

2. Walking across traffic is known to be dangerous. Now if you had qualified it to say "never more than one vehicle on that road at a time, I would agree with you. My home town was small, and had NO "crosswalks" And nobody EVER got hit while I lived there, or until my parents finally died where I got no "intel" on local happenings after that.

Overflow is NOT like running across 6 busy lanes of traffic. It MIGHT be more like one car, going at an unknown speed, making detecting that car a bit harder. But you do have eyes, ears, and you ought to be able to safely cross a road under those conditions. Until someone comes thru at 220mph perhaps. But then they would be breaking all known laws where I live.

Languages have rules. Some of which are vague (what EXACTLY is undefined behavior, because I know of absolutely no X86 instruction that behaves in an unpredictable way depending n the weather or whatever). Is avoiding UB possible? How with simple arithmetic operators? Do you test every pair of operands before doing the operation? REALLY? Has ANYONE been told that such is necessary to avoid undefined behavior? Every programming class I have been in, or that I have taught, has explained overflow exactly as it works. I do not even mention the old univac crap dealing with non-2's complement values and such. Overflow is well-defined. It is often bad, unless you are actually expecting wrap. Signed overflow is no less defined than unsigned overflow, except in the rather archaic C standard.

I'll bet if you send me any program you wrote, I can find potential undefined behavior everywhere. Because you are too damned lazy to check every pair of operands before you actually do the math. Or are you too interested in performance and are willing to accept the risk? Or do you know that for some operations there is no chance of an overflow (just like I knew there was no chance of a failure with strcpy() as I had carefully coded it)? In any case, MOST ignore it in the case of overflow. Yet the compiler does have the option, according to the small group arguing against my preference, of doing ANYTHING should an overflow occur. Which would mean they COULD actually insert a jo error after every arithmetic operation, and if control ever gets to error, they could zero memory, delete a few files, and then hang in an infinite loop. ANYTHING is allowed, correct? And you walked right into it by allowing the overflow to occur.

I don't buy that argument. Arithmetic overflow is not undefined, and I do believe EVERYBODY on planet earth has potential overflow in their programs. Which produces undefined behavior that is not really undefined because the compiler stays out of the way. MOST of the time. Would be nicer if it stayed out of the way ALL the time, however.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

syzygy wrote:
hgm wrote:Not a very impressive analogy to someone living in Amsterdam, where I, as most other cyclists, ignore traffic lights by habit, and cross the road whenever I judge it safe. :wink:

But more seriously, I think the part that needs to be added to really make this an analogy to what we are talking about is:

"So starting now we decide to shoot everyone that makes it across the road alive, and we impose a hefty extra tax on car use to pay for their burrial."

Now who would still be happy with the law?
I suppose you're talking specifically about Apple now. I would say it is more like "Starting from now we will arrest everybody that makes it across the road, so that they will realise their mistake before it kills them".

I might personally be in favour of "we don't care if they don't know what they're doing and end up killing themselves", but I can understand Apple's thinking. Apple does have something to lose if people release buggy insecure software.
You do realize that the Apple "fix" has a significant performance cost? Rather than just copying until you find a zero, you first find the zero, check the addresses for overlap using that length, and then copy until you find a zero.
syzygy
Posts: 5743
Joined: Tue Feb 28, 2012 11:56 pm

Re: strcpy() revisited

Post by syzygy »

bob wrote:
syzygy wrote:
hgm wrote:Not a very impressive analogy to someone living in Amsterdam, where I, as most other cyclists, ignore traffic lights by habit, and cross the road whenever I judge it safe. :wink:

But more seriously, I think the part that needs to be added to really make this an analogy to what we are talking about is:

"So starting now we decide to shoot everyone that makes it across the road alive, and we impose a hefty extra tax on car use to pay for their burrial."

Now who would still be happy with the law?
I suppose you're talking specifically about Apple now. I would say it is more like "Starting from now we will arrest everybody that makes it across the road, so that they will realise their mistake before it kills them".

I might personally be in favour of "we don't care if they don't know what they're doing and end up killing themselves", but I can understand Apple's thinking. Apple does have something to lose if people release buggy insecure software.
You do realize that the Apple "fix" has a significant performance cost? Rather than just copying until you find a zero, you first find the zero, check the addresses for overlap using that length, and then copy until you find a zero.
Well, I think at this stage Apple is more concerned with getting people to fix their programs. I assume a future version will not include this check and silently corrupt memory.

Or was this an example of -D_FORTIFY_SOURCE in action? In that case, just compile the code (after fixing it) without this flag (-U_FORTIFY_SOURCE).

Btw, optimised versions of strcpy() in fact work by first doing an optimised strlen() followed by an optimised memcpy(), the latter going backwards depending on string length.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

wgarvin wrote:
hgm wrote:Except that the programs that really suffered buffer overflow because of this already had been fixed long before. After all, doing a strcpy(a, b) as while(*a++ = *b++); can only end in two ways: either it works perfectly, or a lies within the sting b, and it will lead to an infinite repetitive string, certainly causing a segfault.

No buffer overflow would ever be caught by this measure that would not have segfaulted by itself.

It only makes a difference for the harmless cases, that worked absolutely correctly.
This post by Dann Corbit in the openchess thread of 2 weeks ago seems to me to be evidence that overlapping calls to strcpy can malfunction without necessarily causing a segfault.

Code: Select all

GCC gave me this:

dcorbit@dcorbit /q/cc
$ cat bozo.c
#include <string.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
char b[32];
strcpy(b, "123456789012345");
strcpy(b + 1, b);
printf("[%s]\n", b);
return 0;
}

dcorbit@dcorbit /q/cc
$ gcc -Wall -ansi -pedantic bozo.c

dcorbit@dcorbit /q/cc
$ ./a
[1123456788012345]

Look at it carefully, is it what you expected?
That's because of their hand-coded/optimized code that does different things depending on how many bytes you copy. It is quirky. And you want to know what is REALLY funny? this damned mavericks library did NOT detect that overlap. :)

Now isn't that absolutely-frickin' amazing? Here is the code:

#include <string.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
char b[64];
int i;

printf("overlap pass 1\n");
for (i=1;i<20;i++) {
strcpy(b, "123456789012345");
strcpy(b + i, b);
printf("(%d) [%s] strlen=%d\n", i, b, strlen(b));
}

printf("overlap pass 2\n");
for (i=1;i<20;i++) {
strcpy(b, "123456789012345");
strcpy(b, b + i);
printf("(%d) [%s] strlen=%d\n", i, b, strlen(b));
}
return 0;
}

Here is the output on mavericks:

scrappy% ./tst2
overlap pass 1
(1) [1123456788012345] strlen=15
(2) [12123456787812345] strlen=15
(3) [123123456786782345] strlen=15
(4) [1234123456785678345] strlen=15
(5) [12345123456784567845] strlen=15
(6) [123456123456783456785] strlen=15
(7) [1234567123456782345678] strlen=15
(8) [123456781234567812345678] strlen=15
(9) [1234567891234567891234567] strlen=15
(10) [12345678901234567890123456] strlen=15
(11) [123456789011234567890112345] strlen=15
(12) [1234567890121234567890121234] strlen=15
(13) [12345678901231234567890123123] strlen=15
(14) [123456789012341234567890123412] strlen=15
(15) [1234567890123451234567890123451] strlen=15
(16) [123456789012345] strlen=15
(17) [123456789012345] strlen=15
(18) [123456789012345] strlen=15
(19) [123456789012345] strlen=15
overlap pass 2
Abort


So isn't that absolutely-frickin' wonderful? They abort on the overlap that is perfectly safe, they ignore the one that causes the problems. What a WONDERFUL group of library folks, wouldn't you agree? They couldn't even break the most dangerous case.

wow..
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

syzygy wrote:
bob wrote:
syzygy wrote:
hgm wrote:Not a very impressive analogy to someone living in Amsterdam, where I, as most other cyclists, ignore traffic lights by habit, and cross the road whenever I judge it safe. :wink:

But more seriously, I think the part that needs to be added to really make this an analogy to what we are talking about is:

"So starting now we decide to shoot everyone that makes it across the road alive, and we impose a hefty extra tax on car use to pay for their burrial."

Now who would still be happy with the law?
I suppose you're talking specifically about Apple now. I would say it is more like "Starting from now we will arrest everybody that makes it across the road, so that they will realise their mistake before it kills them".

I might personally be in favour of "we don't care if they don't know what they're doing and end up killing themselves", but I can understand Apple's thinking. Apple does have something to lose if people release buggy insecure software.
You do realize that the Apple "fix" has a significant performance cost? Rather than just copying until you find a zero, you first find the zero, check the addresses for overlap using that length, and then copy until you find a zero.
Well, I think at this stage Apple is more concerned with getting people to fix their programs. I assume a future version will not include this check and silently corrupt memory.

Or was this an example of -D_FORTIFY_SOURCE in action? In that case, just compile the code (after fixing it) without this flag (-U_FORTIFY_SOURCE).

Btw, optimised versions of strcpy() in fact work by first doing an optimised strlen() followed by an optimised memcpy(), the latter going backwards depending on string length.
Look at my post about 3-4 up from here. Apple does NOT catch that overlap at all. SO I don't know exactly WHAT they are interested in doing at this point. There are two types of overlap.

strcpy(a+1, a); is the dangerous one that overwrites what is being copied. Apple Mavericks does not detect this at all, and just screws up the copying badly.

strcpy(a,a+1); is the perfectly safe one that Mavericks chooses to abort on. What a bunch of absolute idiots...
syzygy
Posts: 5743
Joined: Tue Feb 28, 2012 11:56 pm

Re: strcpy() revisited

Post by syzygy »

bob wrote:2. Walking across traffic is known to be dangerous.
The standard is quite precise in specifying what kind of constructs cause UB under what circumstances. That overlapping strcpy() causes UB is written everywhere.
Languages have rules. Some of which are vague (what EXACTLY is undefined behavior, because I know of absolutely no X86 instruction that behaves in an unpredictable way depending n the weather or whatever).
"for which this standard imposes no requirements"
http://c-faq.com/ansi/undef.html

Anyone with a sane mind spending a bit of time should be able to grasp it, in the end. Why can't you?
Is avoiding UB possible? How with simple arithmetic operators? Do you test every pair of operands before doing the operation?
Usually the programmer knows perfectly well that certain (signed) arithmetic operations will not overflow.

In cases where the programmer has difficulty to predict whether a particular signed arithmetic operation will overflow, he should certainly consider putting in some checks, because if the overflow happens, that will quickly lead to unexpected results. Even if the overflow is guaranteed to wrap, getting a negative number out of an addition of two positive integers will be surprising to most people.
Or do you know that for some operations there is no chance of an overflow (just like I knew there was no chance of a failure with strcpy() as I had carefully coded it)?
That's completely different. Strcpy with overlapping arguments is UB. Addition with integers known not to overflow is not UB.

(I'll give you a counterargument: if the programmer expected 32-bit ints, he might run into trouble on a 16-bit platform.)
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: strcpy() revisited

Post by wgarvin »

bob wrote:So isn't that absolutely-frickin' wonderful? They abort on the overlap that is perfectly safe, they ignore the one that causes the problems. What a WONDERFUL group of library folks, wouldn't you agree? They couldn't even break the most dangerous case.

wow..
Still, you persist in blaming the library vendors for the problems exposed by their perfectly legal change to their strcpy implementation.

A change that caused incorrect programs, programs that are breaking the well-documented API restrictions for strcpy() that have not changed in at least 25 years, to cleanly abort instead of silently corrupting memory or who knows what. The people to blame for any malfunctioning programs after this change are the idiot programmers who wrote overlapping calls to strcpy(). They got hit by the Mack truck, and its nobody's fault but their own. Apple did them a large favor by helping them find these bugs that they were apparently too lazy or incompetent to find on their own. Any users affected by the crashing programs would be wise to stop using software provided by such incompetent programmers. If those programmers are incapable or unwilling to write correct, safe programs they should find another line of work. I certainly don't want to use their software, because it might format my hard drive and that would be unpleasant for me! :lol:

I mean, why have specified APIs at all, if lazy and incompetent programmers are just going to ignore them and the rest of us are supposed to just put up with this? Library vendors should not be constrained to one implementation choice forever, just because some programmer perhaps named "bob" thought he knew the optimal way to write strcpy and thought he should be able to depend on the undefined behavior of overlapping strcpy magically just working. They are supposed to be allowed to change this whenever they want, and no user binaries are supposed to be affected by that. If they are, those programs were already broken and if I were the author of one, I'd surely be grateful to Apple for helping me find out about it and fix it.

I guess you'd better not change the interface to Crafty at all, ever. Somebody might be depending on undocumented aspects of it, and their program might break when they upgrade Crafty. By your logic, that would of course be your fault and a terrible thing. Since you are a professor of CS, surely you understand that programs that violate the preconditions of a standard library function deserve whatever random things happen to them as a result. Those are not "working" programs by any definition that I ever understood.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

wgarvin wrote:
bob wrote:So isn't that absolutely-frickin' wonderful? They abort on the overlap that is perfectly safe, they ignore the one that causes the problems. What a WONDERFUL group of library folks, wouldn't you agree? They couldn't even break the most dangerous case.

wow..
Still, you persist in blaming the library vendors for the problems exposed by their perfectly legal change to their strcpy implementation.
A moment of rational discussion and thinking here, acceptable?

Is it rational to absolutely break a strcpy() option that works correctly everywhere, but fail to detect the other overlap that is KNOWN to produce buffer overruns and even segfaults?

Does that REALLY seem like a reasonable thing to do? Shouldn't they break ALL overlapping calls to strcpy() as opposed to just the ones that are KNOWN to not cause any problems at all?

Forget about this "well undefined could mean one is allowed to crash cleanly, the other is allowed to do whatever it does and possibly crash or corrupt." That argument REALLY makes no sense, and if that is the argument you want to stick with, I'm not interested in the debate. They should at LEAST abort on all overlapping operands would you not agree? Not just on the harmless ones letting the others wreck their havoc. That most definitely sounds either (a) malicious or (b) idiotic. I tend to think (b) myself.



A change that caused incorrect programs, programs that are breaking the well-documented API restrictions for strcpy() that have not changed in at least 25 years, to cleanly abort instead of silently corrupting memory or who knows what.
That is simply a completely false statement. They did NOT make the programs that have a serious problem abort. They let 'em run and do their damage, overwriting the string being copied while it is being copied. They aborted the overlap that is guaranteed to not corrupt or overrun anything.

You really think that is "doing some good"? Leaving the most dangerous option free to do its worst? We REALLY don't agree about software development if you buy that..




The people to blame for any malfunctioning programs after this change are the idiot programmers who wrote overlapping calls to strcpy(). They got hit by the Mack truck, and its nobody's fault but their own. Apple did them a large favor by helping them find these bugs that they were apparently too lazy or incompetent to find on their own. Any users affected by the crashing programs would be wise to stop using software provided by such incompetent programmers. If those programmers are incapable or unwilling to write correct, safe programs they should find another line of work. I certainly don't want to use their software, because it might format my hard drive and that would be unpleasant for me! :lol:

I mean, why have specified APIs at all, if lazy and incompetent programmers are just going to ignore them and the rest of us are supposed to just put up with this? Library vendors should not be constrained to one implementation choice forever, just because some programmer perhaps named "bob" thought he knew the optimal way to write strcpy and thought he should be able to depend on the undefined behavior of overlapping strcpy magically just working. They are supposed to be allowed to change this whenever they want, and no user binaries are supposed to be affected by that. If they are, those programs were already broken and if I were the author of one, I'd surely be grateful to Apple for helping me find out about it and fix it.

I guess you'd better not change the interface to Crafty at all, ever. Somebody might be depending on undocumented aspects of it, and their program might break when they upgrade Crafty. By your logic, that would of course be your fault and a terrible thing. Since you are a professor of CS, surely you understand that programs that violate the preconditions of a standard library function deserve whatever random things happen to them as a result. Those are not "working" programs by any definition that I ever understood.
And this is not an acceptable development action by any definition I have ever understood. Eliminate the harmless usage and allow the really dangerous overlaps to continue with no warning, error or anything. Until they crash or whatever. It IS "undefined" I guess...

I just have a hard time rationalizing "undefined behavior" as "things that will work get wrecked, things that are dangerous are allowed to do whatever damage they can do. Programmer's fault no matter what. Why not leave the abort out then, because my strcpy would never have hurt anything.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: strcpy() revisited

Post by AlvaroBegue »

bob wrote:They aborted the overlap that is guaranteed to not corrupt or overrun anything.
This is it for me: You didn't understand anything. I am going to opt out of getting email updates about these UB threads.