strcpy() revisited

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: strcpy() revisited

Post by hgm »

wgarvin wrote:I think maybe you missed the point... The optimization that Ronald reported, where his compiler replaced the two strcpy calls with a strpcpy and a memcpy -- THAT optimization, and probably some similar other optimizations it has, only works if the two strings don't overlap. THAT is what you're giving up if you decide to change the API specification of strcpy so that strings may now overlap. That optimization becomes unsafe and has to be disabled, or at least gated behind a run-time test and alternate codepath, with significant extra costs.
But the point seems invalid. There is no extra cost. Apple does make the test, for no other purpose than to abort. Everyone pays that price. And what do they get for it: users of obsolete code get aborts. Authors of obsolute code have to do tedious debugging for lack of a proper diagnostic message. And users of perfectly compliant programs get a slowdown.

None of that would have been needed if they had simply skipped the test, and let the UB run its course. But given that they do the test, they could just as eaisly have the code path that failed the test just refrain from the optimization. Users of compliant programs would not even notice that, as their control flows along the other path, which could still safely contain all these optimizations. There is no 'extra cost' for them, other than the cost that Apple charges them to pester users of obsolete code. And the other code path could have perfectly and 100% securely handled all other cases too. Then at least the users would have gotten something back for the price they payed, in terms of increased reliability and security.
Several times during the debate, you or bob have claimed that there was no possible performance benefit to forbidding overlapping copies, becuase of Linus's argument that memmove could be implemented as efficiently as memcpy (at least for the non-overlapping cases). This example from Ronald convincingly refutes that argument.
Not true, as demonstrated above.
Its a nice performance optimization that is only possible because the length of the string is known not to change, which is only easy to know because the two strings don't overlap. So 25 years ago, the C spec was written to forbid overlapping copies, by declaring them as undefined behavior. And here we see an actual clever compiler optimization that can actually make real-world programs faster, and is only possible because of that restriction. It seems to me to be a clear and convincing demonstration of the value that such restrictions contribute to the possible performance.

OTOH, the must-not-overlap restriction and similar other UB restrictions (signed overflow etc.) do also come with a real-world cost: they confuse programmers, or programmers forget about them, or just accidently violate them without noticing. And then we get UB and broken programs, which is obviously bad. I'm not trying to claim the optimization benefit necessarily outweighs these bad costs. But I do think the argument that "there is zero benefit" has been convincingly refuted now.
I think you are refuting the wrong point. The issue is not whether UB in the strcpy specs could be good. It is whether the Apple implementation, to test for overlap and abort if it finds it, brings anything for anyone. All optimizations you mention here could still be done without testing for overlap (and thus without aborting) at better performance in every compliant case.
syzygy
Posts: 5722
Joined: Tue Feb 28, 2012 11:56 pm

Re: strcpy() revisited

Post by syzygy »

hgm wrote:But the point seems invalid. There is no extra cost. Apple does make the test, for no other purpose than to abort.
The point is not about what Apple did at all.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: strcpy() revisited

Post by wgarvin »

hgm wrote:
wgarvin wrote:I think maybe you missed the point... The optimization that Ronald reported, where his compiler replaced the two strcpy calls with a strpcpy and a memcpy -- THAT optimization, and probably some similar other optimizations it has, only works if the two strings don't overlap. THAT is what you're giving up if you decide to change the API specification of strcpy so that strings may now overlap. That optimization becomes unsafe and has to be disabled, or at least gated behind a run-time test and alternate codepath, with significant extra costs.
But the point seems invalid. There is no extra cost. Apple does make the test, for no other purpose than to abort. Everyone pays that price. And what do they get for it: users of obsolete code get aborts. Authors of obsolute code have to do tedious debugging for lack of a proper diagnostic message. And users of perfectly compliant programs get a slowdown.

None of that would have been needed if they had simply skipped the test, and let the UB run its course. But given that they do the test, they could just as eaisly have the code path that failed the test just refrain from the optimization. Users of compliant programs would not even notice that, as their control flows along the other path, which could still safely contain all these optimizations.
Ah, I think I understand your confusion. The Apple code is probably in the library function, and only gets invoked if that library function is actually called. If the compiler "got smart" and did something else on the front of that (like the optimizations Ronald described, and which I am talking about in the last few posts) then Apple's test never gets to happen. I think that result was shown in the test results that were posted in this thread yesterday, unless I mis-read them?

[Edit: I might be thinking of this post by MVK where a strcpy call was replaced by instructions to move 16 bytes directly, and a strlen on a constant was replaced by the integer constant 15.]

Anyway, I'm not sure why you think its an invalid point that the compiler's optimization becomes unsafe if you want to make this specification change to the API of strcpy. You could change the code of the library function to permit overlapping copies, but then the compiler either needs to disable these optimizations entirely, or generate code to perform an overlap test before the inlined small-and-clever intrinsic code it wants to generate (which would significantly complicate things and would add some additional runtime cost). So that's my point. You guys say there is no cost to getting rid of the "no overlaps" restriction, and I'm saying that this is part of the cost, lost or degraded optimization opportunities.

The overlap test that Apple performs inside their function probably doesn't cost too much... if its anything like the glibc memcpy (e.g. if perhaps it is implemented by "strlen then memcpy"), then they already want to dispatch to one of several different implementations based on things like length and pointer alignment. For the memcpy case, that was why Linus believed it could be converted into memmove semantics "free of charge": the necessary test-and-dispatch costs were basically already being paid by that implementation. But even if Apple is paying a small extra cost for that test, at least it is revealing bugs in programs that might have otherwise gone undetected for a long time, and caused those programs to mysteriously corrupt their data or otherwise produce incorrect results.
Last edited by wgarvin on Thu Dec 12, 2013 7:50 pm, edited 2 times in total.
User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: strcpy() revisited

Post by hgm »

syzygy wrote:Not really, because you were responding to a post by Wylie that was definitely about my example.
You really think you know better what I was talking about than I do myself? :shock:
But I have already explained twice in great detail how modern versions of gcc compile this code.
Well, you did not post the assembly code, and your suggestion that I should generate it myself was quite useless.
What known length? The compiler cannot know with what command line argument the user will invoke the program.
OK, sorry I missed the fact that it came from the command line; I thought I had seen the literal "12345", and thought I had seen it in the source. But apparently it was in Wylie's post. And before you vent any paranoic comments about that, please notice that I was invoking a.exe without any arguments.

It is not clear to me why memcpy would be any faster than strcpy, and I cannot very well see that without seeing the code, which my compiler obviously does not generate.
User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: strcpy() revisited

Post by hgm »

syzygy wrote:
hgm wrote:But the point seems invalid. There is no extra cost. Apple does make the test, for no other purpose than to abort.
The point is not about what Apple did at all.
If you think that, it explains why what you have been saying so far is so little to the point! :wink:
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: strcpy() revisited

Post by wgarvin »

Okay, so before you guys were saying that you wanted strcpy to just silently handle the overlaps, and how Apple could have done that instead of what they did, and how they were all MORONS for not doing it because obviously it would have had no cost.

I take it you have completely abandoned these arguments now, and changed your position to something else? I wish you had told us that before, so that we needn't have wasted the time to demolish them..
User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: strcpy() revisited

Post by hgm »

wgarvin wrote:Ah, I think I understand your confusion. The Apple code is probably in the library function, and only gets invoked if that library function is actually called. If the compiler "got smart" and did something else on the front of that (like the optimizations Ronald described, and which I am talking about in the last few posts) then Apple's test never gets to happen. I think that result was shown in the test results that were posted in this thread yesterday, unless I mis-read them?
The optimizations described by Roland were not on an Apple, right? So I think it is those two cases that are in danger of getting confused. They are also only vaguely related, because in Bob's example the strlen() was known at compile time. It did not involve the method described by Ronald. Marcel described that. The 'trick' it used to eliminate the strcpy was by inlining quadword moves. This trick would always work, btw, irrespective of overlap. So no UB assumption is needed to make it safe. But it was only applied to the case of moving to higher addresses. Not sure why that was (alignment, perhaps?).

So either I am still just as confused as I was, or I am (still) right.
Anyway, I'm not sure why you think its an invalid point that the compiler's optimization becomes unsafe if you want to make this specification change to the API of strcpy.
Because I never proposed the API should be changed in a way that would make that optimization unsafe. What I want changed is the 'nasal demons clause', that would give compiler writers completely free hand in inflicting maximum unnecessary damage to their victims, abusing a freedom that was only given to them for the purpose of making some optimization possible. The strcpy UB was established for removing the need to test for overlap. If they test for overlap anyway, it should be forbidden to do any nastiness, as they obviously do. And that applies even more to integer overflow. Common sense would of course make this self-evident, but it seems that common sense is lacking entirely in some circles (or perhaps overruled by commercial interests), so that i should be enforced.
You could change the code of the library function to permit overlapping copies, but then the compiler either needs to disable these optimizations entirely, or generate code to perform an overlap test before the inlined small-and-clever intrinsic code it wants to generate (which would significantly complicate things and would add some additional runtime cost). So that's my point. You guys say there is no cost to getting rid of the "no overlaps" restriction, and I'm saying that this is part of the cost, lost or degraded optimization opportunities.

The overlap test that Apple performs inside their function probably doesn't cost too much... if its anything like the glibc memcpy (e.g. if perhaps it is implemented by "strlen then memcpy"), then they already want to dispatch to one of several different implementations based on things like length and pointer alignment. For the memcpy case, that was why Linus believed it could be converted into memmove semantics "free of charge": the necessary test-and-dispatch costs were basically already being paid by that implementation. But even if Apple is paying a small extra cost for that test, at least it is revealing bugs in programs that might have otherwise gone undetected for a long time, and caused those programs to mysteriously corrupt their data or otherwise produce incorrect results.
That is a lousy argument. If the purpose was to detect bugs, they could have restricted this to some debugging mode. There is no need to burden correct and debugged code with this extra overhead.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

wgarvin wrote:
hgm wrote:
wgarvin wrote:Its not a "reserved word", just a function whose semantics are intrinsically known to the compiler.
Yes, this is the point that puzzled me. When I first learned C, all these functions were just external functions, in no way different from or privileged with respect to functions you could declare and define yourself. And as system headers ccontain just prototypes, there was nothing against using a system header to declare strcpy, but then provide your own definition of it (conforming to the prototype). And in fact this is still exactly how it works in the gcc I use (3.4.4). In the quoted standard, however, this seems to evoke UB.
I think you can still do that if you want to, although if you're replacing a C library function then you can't realistically change its API.. but GCC does offer -fno-builtin as a way to turn off these optimizations, and then it will call your replacement strcpy / memcpy / whatever the same way it would call any regular function. I think GCC does support a bunch of weird embedded platforms and microcontrollers, etc. where these kind of 'builtin function' optimizations could easily just get in your way.
I don't like that very much, as this apparently requires an exhaustive list of new reserved identifiers, which the programmer should know. It would not be so bad if the compiler warned against redefinition of such a reserved identifier, but gcc 3.4.4 doesn't, not even with -Wall. A better design, IMO, would be to add a single new keyword 'standard' (or 'library', or perhaps '__standard__') that could be added to prototypes for which it is an error to provide a definition. There would have been no need for any UB in that case, as it would just be forbidden to do the things that now are defined to cause UB.
The programmer probably already should know them, if they are including a standard header that defines them... they can't then also define their own different function that collides with the standard one, right?
I think that is asking way too much. If I want to use strcpy() and do "man strcpy()" it gives me the prototype, tells me to "#include <string.h>" and then defines what strcpy does. Am I REALLY supposed to go look at /usr/include/string.h and study ALL of the prototypes included and remember their names? For example, go look at "math.h". Do ALL of those have to be remembered? I don't consider that practical/reasonable at all.
User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: strcpy() revisited

Post by hgm »

wgarvin wrote:Okay, so before you guys were saying that you wanted strcpy to just silently handle the overlaps, and how Apple could have done that instead of what they did, and how they were all MORONS for not doing it because obviously it would have had no cost.
Well, that is still true, right? The way they handle it now, throwing in an overlap test to do an abort, has no cost benefit whatsoever compared to silently handling all overlaps. Once you start testing for this, you can handle anything, and any of the mentioned optimizations can still work.
I take it you have completely abandoned these arguments now, and changed your position to something else? I wish you had told us that before, so that we needn't have wasted the time to demolish them..
I think the confusion is that you take the fact that we think Apple should have handled the overlaps silently and usefully to imply that the API specs should be changed from UB to fully-defined behavior. But that is not a valid implication. That the specs say the behavior is undefined, doesn't mean compiler writers are forced to do something counterproductive, like aborting code that perfectly worked in virtually every other C implementation to date. Of course they qualify as morons (or malicious) for doing counter-productive things without any benefit justifying them. There must be a benefit to justify what they did, and there isn't. Everyone suffers. You might say "but just a little", but that is of course too much. Zero would be too much to justify anything.
Last edited by hgm on Thu Dec 12, 2013 9:44 pm, edited 2 times in total.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

syzygy wrote:
bob wrote:[No idea what they are talking about there. There is no size argument to strcpy(). strcpy(dest, src) is all there is.
Yes, that is a strange mistake. I suppose it should have read string (or source string) argument.

This is interesting:

Code: Select all

#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
  char *a = argv[1];
  char b[256];
  strcpy(b, a);
  strcpy(b+1, b);
  printf("strlen("%s") = %d\n", b, strlen(b));
  return 0;
}

Code: Select all

$ gcc -O3 bla.c
$ ./a.out 12345
strlen("112345") = 5
How can that be?

Answer: the program learns the original length of the string b from the first strcpy() which it implements using stpcpy(). The second strcpy() is implemented using memcpy(). Since strcpy(b+1, b) cannot possibly involve overlapping regions, this memcpy() cannot possibly change the length of b.
Say what? memcpy() does NOT say it copies byte by byte. In fact, it has several versions, some of which copy 8-byte-chunks. That certainly will overlap in your example.

So the program does not have to recalculate strlen(b) but can output the result it found earlier.
You consider that acceptable behavior?