strcpy() revisited

Discussion of chess software programming and technical issues.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

wgarvin wrote:
syzygy wrote:strlen("112345") = 5[/code]
How can that be?

Answer: the program learns the original length of the string b from the first strcpy() which it implements using stpcpy(). The second strcpy() is implemented using memcpy(). Since strcpy(b+1, b) cannot possibly involve overlapping regions, this memcpy() cannot possibly change the length of b. So the program does not have to recalculate strlen(b) but can output the result it found earlier.
Nice! Anyone want to bet whether or not this find will silence those critics who have complained that there are no possible optimization benefits from the non-overlapping API specification of strcpy?

[more explicitly: In order to implement bob's suggestion that they redirect any overlapping strcpy calls to memmove, the implementation provider would also have to disable this optimization. That might also be complicated by the fact that different groups maintain the library and the compiler. But fortunately, they have (drumroll) STANDARDS to follow, that allow them to independently perform clever optimizations like this without breaking any legal programs.]
I am still curious, because memcpy() has SEVERAL internal definitions. One that copies byte by byte. Others that are used for longer blocks and which copy qword by qword.

As far as that "optimization", you think it is "cute" or "nice". DO you think it "reasonable"? Yes the overlap is "outside the spec". But getting strlen() WRONG is a bit outside "reasonable".
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

syzygy wrote:
hgm wrote:
wgarvin wrote:Nice! Anyone want to bet whether or not this find will silence those critics who have complained that there are no possible optimization benefits from the non-overlapping API specification of strcpy?
Well, to spoil the betting:

How is always performing a strlen() that was not requested, and then using the obtained value if by sheer luck a real call to strlen follows a 'clever optimization'?
As I alreay explained (seems people don't read anymore nowadays), the strlen() of the original string is obtained for free by doing the first strcpy(). This first strcpy() is replaced by a stpcpy() which returns the address of the trailing 0, so that a simple subtraction is all it takes to obtain the value of strlen(). This value is then used to perform the second strcpy() using memcpy(), apparently copying backwards. Since strlen() is now known anyway, the value is retained to completely optimise away the final strlen().

Note that this also shows that even if strcpy() is implemented using a naive while (*dst++ = +src++); loop instead of using a vectorised implementation (such as the one by Agner Fog), it is very well possible that strcpy() ends up being implemented as a memcpy() which copies from right to left, which would of course fail completely on Bob's "safe" abuse of strcpy.
Also note that NO compiler to date has done this "right-to-left" copy or else ReadPGN() would have failed somewhere along the way...

A "toy" program with the two copies back to back is one thing. A real program is something else again...
User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: strcpy() revisited

Post by hgm »

bob wrote:Say what? memcpy() does NOT say it copies byte by byte. In fact, it has several versions, some of which copy 8-byte-chunks. That certainly will overlap in your example.
That is why this is considered an optimization. 8 by 8 copying is probably a bit faster than byte by byte. (Not sure how alignment would affect this, however; source and destination do not need to have the same modulao-8 alignment at all.)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

wgarvin wrote:Okay, so before you guys were saying that you wanted strcpy to just silently handle the overlaps, and how Apple could have done that instead of what they did, and how they were all MORONS for not doing it because obviously it would have had no cost.

I take it you have completely abandoned these arguments now, and changed your position to something else? I wish you had told us that before, so that we needn't have wasted the time to demolish them..
I feel exactly the same as when I started. They waste time on EVERY strcpy() to check for overlap. WRONGLY I might add because they only catch strcpy(a, a+n), but NOT strcpy(a+n, a) which is the one more likely to break. SO the burn cpu cycles to get the string length of the source. Then they only check one of the two overlapping cases where they should have checked both, and then they abort if they detect the first, ignore the second and let it crash, when they COULD have fixed both with a simple call to memmove(). The strlen() optimization is utter nonsense. Not exactly a common event and not exactly an earth-shattering speedup for real programs. Making strcpy() work for all arguments would be a development improvement, with no harmful side-effects other than the silly optimization given that has been discussed in old usenet newsgroups in the past, btw.

So we DON'T want to fix overlapping strcpy() because it would prevent a silly optimization where we can use the old string length since there should have been no overlap? Seems perfectly reasonable to me. Somewhere. Some alternate universe. Where insanity reigns supreme...
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: strcpy() revisited

Post by wgarvin »

hgm wrote:
wgarvin wrote:Ah, I think I understand your confusion. The Apple code is probably in the library function, and only gets invoked if that library function is actually called. If the compiler "got smart" and did something else on the front of that (like the optimizations Ronald described, and which I am talking about in the last few posts) then Apple's test never gets to happen. I think that result was shown in the test results that were posted in this thread yesterday, unless I mis-read them?
The optimizations described by Roland were not on an Apple, right? So I think it is those two cases that are in danger of getting confused. They are also only vaguely related, because in Bob's example the strlen() was known at compile time. It did not involve the method described by Ronald. Marcel described that. The 'trick' it used to eliminate the strcpy was by inlining quadword moves. This trick would always work, btw, irrespective of overlap. So no UB assumption is needed to make it safe. But it was only applied to the case of moving to higher addresses. Not sure why that was (alignment, perhaps?).
Ah ! Thats true that the one MVK posted would still work, that's a very good point. The ones shown after that by Ronald would not. I will try to be more careful not to mix them up.
hgm wrote: So either I am still just as confused as I was, or I am (still) right.
Anyway, I'm not sure why you think its an invalid point that the compiler's optimization becomes unsafe if you want to make this specification change to the API of strcpy.
Because I never proposed the API should be changed in a way that would make that optimization unsafe. What I want changed is the 'nasal demons clause', that would give compiler writers completely free hand in inflicting maximum unnecessary damage to their victims, abusing a freedom that was only given to them for the purpose of making some optimization possible. The strcpy UB was established for removing the need to test for overlap. If they test for overlap anyway, it should be forbidden to do any nastiness, as they obviously do.
K, hold on there. The library function apparently tests for overlap, and aborts. I argued already (or actually, Linus and then bob argued already) that the test is cheap and providing memmove instead of abort would cost "effectively" nothing, and be more user-friendly. But there's also the compiler's treatment of the intrinsic function to consider, not just what the library code does if control ever reaches it. And some compilers, as Ronald has shown, do clever optimizations there which won't be possible if you redefine that API to say the copying has to work even with overlapping strings. So thats an inevitable consequence of "forbidding them to do nastiness" (or, "changing the API specification of strcpy" as I think I put it before). Maybe its worth it--hell, it probably is. But contrary to what bob claimed, the price is not zero. Maybe currently Apple doesn't do that optimization, but if they follow the suggestion to support overlapping copies then they won't ever be able to do it the way GCC can do it today.
hgm wrote:And that applies even more to integer overflow. Common sense would of course make this self-evident, but it seems that common sense is lacking entirely in some circles (or perhaps overruled by commercial interests), so that i should be enforced.
You could change the code of the library function to permit overlapping copies, but then the compiler either needs to disable these optimizations entirely, or generate code to perform an overlap test before the inlined small-and-clever intrinsic code it wants to generate (which would significantly complicate things and would add some additional runtime cost). So that's my point. You guys say there is no cost to getting rid of the "no overlaps" restriction, and I'm saying that this is part of the cost, lost or degraded optimization opportunities.

The overlap test that Apple performs inside their function probably doesn't cost too much... if its anything like the glibc memcpy (e.g. if perhaps it is implemented by "strlen then memcpy"), then they already want to dispatch to one of several different implementations based on things like length and pointer alignment. For the memcpy case, that was why Linus believed it could be converted into memmove semantics "free of charge": the necessary test-and-dispatch costs were basically already being paid by that implementation. But even if Apple is paying a small extra cost for that test, at least it is revealing bugs in programs that might have otherwise gone undetected for a long time, and caused those programs to mysteriously corrupt their data or otherwise produce incorrect results.
That is a lousy argument. If the purpose was to detect bugs, they could have restricted this to some debugging mode. There is no need to burden correct and debugged code with this extra overhead.
Maybe they didn't think enough people would run in the debugging mode? They might have wanted programmers to always have the info, and just underestimated how many compiled binaries and source programs were out there in the wild, that relied on the UB. Maybe after the outcry they will reevaluate, as the glibc guys did (eventually) with the memcpy changes. I admit I was surprised to find out there were so many broken programs out there. I mean, strcpy and memcpy are some of the most basic standard library functions that every C and C++ programmer should know how to use correctly. Its shocking that some people think overlapping memcpy is supposed to work correctly, I don't really understand how they can program for years and years in a language like C without ever learning how it works.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

wgarvin wrote:
hgm wrote:
wgarvin wrote:I think maybe you missed the point... The optimization that Ronald reported, where his compiler replaced the two strcpy calls with a strpcpy and a memcpy -- THAT optimization, and probably some similar other optimizations it has, only works if the two strings don't overlap. THAT is what you're giving up if you decide to change the API specification of strcpy so that strings may now overlap. That optimization becomes unsafe and has to be disabled, or at least gated behind a run-time test and alternate codepath, with significant extra costs.
But the point seems invalid. There is no extra cost. Apple does make the test, for no other purpose than to abort. Everyone pays that price. And what do they get for it: users of obsolete code get aborts. Authors of obsolute code have to do tedious debugging for lack of a proper diagnostic message. And users of perfectly compliant programs get a slowdown.

None of that would have been needed if they had simply skipped the test, and let the UB run its course. But given that they do the test, they could just as eaisly have the code path that failed the test just refrain from the optimization. Users of compliant programs would not even notice that, as their control flows along the other path, which could still safely contain all these optimizations.
Ah, I think I understand your confusion. The Apple code is probably in the library function, and only gets invoked if that library function is actually called. If the compiler "got smart" and did something else on the front of that (like the optimizations Ronald described, and which I am talking about in the last few posts) then Apple's test never gets to happen. I think that result was shown in the test results that were posted in this thread yesterday, unless I mis-read them?

[Edit: I might be thinking of this post by MVK where a strcpy call was replaced by instructions to move 16 bytes directly, and a strlen on a constant was replaced by the integer constant 15.]

Anyway, I'm not sure why you think its an invalid point that the compiler's optimization becomes unsafe if you want to make this specification change to the API of strcpy. You could change the code of the library function to permit overlapping copies, but then the compiler either needs to disable these optimizations entirely, or generate code to perform an overlap test before the inlined small-and-clever intrinsic code it wants to generate (which would significantly complicate things and would add some additional runtime cost). So that's my point. You guys say there is no cost to getting rid of the "no overlaps" restriction, and I'm saying that this is part of the cost, lost or degraded optimization opportunities.

The overlap test that Apple performs inside their function probably doesn't cost too much... if its anything like the glibc memcpy (e.g. if perhaps it is implemented by "strlen then memcpy"), then they already want to dispatch to one of several different implementations based on things like length and pointer alignment. For the memcpy case, that was why Linus believed it could be converted into memmove semantics "free of charge": the necessary test-and-dispatch costs were basically already being paid by that implementation. But even if Apple is paying a small extra cost for that test, at least it is revealing bugs in programs that might have otherwise gone undetected for a long time, and caused those programs to mysteriously corrupt their data or otherwise produce incorrect results.
Can we back up a minute and return to the actual conditions of the test?

1. compiler used. gcc 4.7.3. It does NOT break the overlapping strcpy() itself, we have it running on other boxes and it works just fine. Compiler was NOT clang, although I have it on my box. clang and gcc produce the SAME "Abort" message. Because, as I explicitly mentioned early on, this is done in the Apple C library code. The developers even posted the code showing the test for the abort. In the library.

2. It doesn't happen on any other machine I have. I have gcc versions going all the way back to 3.4.2. So it clearly is not an issue in the normal Linux glibc code or in gcc itself.

So to be clear, we are talking ONLY about code in Apples's C library, not anything else, so far as I can tell.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

wgarvin wrote:
hgm wrote:
wgarvin wrote:Ah, I think I understand your confusion. The Apple code is probably in the library function, and only gets invoked if that library function is actually called. If the compiler "got smart" and did something else on the front of that (like the optimizations Ronald described, and which I am talking about in the last few posts) then Apple's test never gets to happen. I think that result was shown in the test results that were posted in this thread yesterday, unless I mis-read them?
The optimizations described by Roland were not on an Apple, right? So I think it is those two cases that are in danger of getting confused. They are also only vaguely related, because in Bob's example the strlen() was known at compile time. It did not involve the method described by Ronald. Marcel described that. The 'trick' it used to eliminate the strcpy was by inlining quadword moves. This trick would always work, btw, irrespective of overlap. So no UB assumption is needed to make it safe. But it was only applied to the case of moving to higher addresses. Not sure why that was (alignment, perhaps?).
Ah ! Thats true that the one MVK posted would still work, that's a very good point. The ones shown after that by Ronald would not. I will try to be more careful not to mix them up.
hgm wrote: So either I am still just as confused as I was, or I am (still) right.
Anyway, I'm not sure why you think its an invalid point that the compiler's optimization becomes unsafe if you want to make this specification change to the API of strcpy.
Because I never proposed the API should be changed in a way that would make that optimization unsafe. What I want changed is the 'nasal demons clause', that would give compiler writers completely free hand in inflicting maximum unnecessary damage to their victims, abusing a freedom that was only given to them for the purpose of making some optimization possible. The strcpy UB was established for removing the need to test for overlap. If they test for overlap anyway, it should be forbidden to do any nastiness, as they obviously do.
K, hold on there. The library function apparently tests for overlap, and aborts. I argued already (or actually, Linus and then bob argued already) that the test is cheap and providing memmove instead of abort would cost "effectively" nothing, and be more user-friendly. But there's also the compiler's treatment of the intrinsic function to consider, not just what the library code does if control ever reaches it. And some compilers, as Ronald has shown, do clever optimizations there which won't be possible if you redefine that API to say the copying has to work even with overlapping strings. So thats an inevitable consequence of "forbidding them to do nastiness" (or, "changing the API specification of strcpy" as I think I put it before). Maybe its worth it--hell, it probably is. But contrary to what bob claimed, the price is not zero. Maybe currently Apple doesn't do that optimization, but if they follow the suggestion to support overlapping copies then they won't ever be able to do it the way GCC can do it today.
hgm wrote:And that applies even more to integer overflow. Common sense would of course make this self-evident, but it seems that common sense is lacking entirely in some circles (or perhaps overruled by commercial interests), so that i should be enforced.
You could change the code of the library function to permit overlapping copies, but then the compiler either needs to disable these optimizations entirely, or generate code to perform an overlap test before the inlined small-and-clever intrinsic code it wants to generate (which would significantly complicate things and would add some additional runtime cost). So that's my point. You guys say there is no cost to getting rid of the "no overlaps" restriction, and I'm saying that this is part of the cost, lost or degraded optimization opportunities.

The overlap test that Apple performs inside their function probably doesn't cost too much... if its anything like the glibc memcpy (e.g. if perhaps it is implemented by "strlen then memcpy"), then they already want to dispatch to one of several different implementations based on things like length and pointer alignment. For the memcpy case, that was why Linus believed it could be converted into memmove semantics "free of charge": the necessary test-and-dispatch costs were basically already being paid by that implementation. But even if Apple is paying a small extra cost for that test, at least it is revealing bugs in programs that might have otherwise gone undetected for a long time, and caused those programs to mysteriously corrupt their data or otherwise produce incorrect results.
That is a lousy argument. If the purpose was to detect bugs, they could have restricted this to some debugging mode. There is no need to burden correct and debugged code with this extra overhead.
Maybe they didn't think enough people would run in the debugging mode? They might have wanted programmers to always have the info, and just underestimated how many compiled binaries and source programs were out there in the wild, that relied on the UB. Maybe after the outcry they will reevaluate, as the glibc guys did (eventually) with the memcpy changes. I admit I was surprised to find out there were so many broken programs out there. I mean, strcpy and memcpy are some of the most basic standard library functions that every C and C++ programmer should know how to use correctly. Its shocking that some people think overlapping memcpy is supposed to work correctly, I don't really understand how they can program for years and years in a language like C without ever learning how it works.
Some of these arguments are somewhat circular. For example, we were previously discussing a compiler change to detect and report overflow. ONLY if a compile-time switch were used. So why does a user have to turn it on there, but not for the overlap test? You can't have it both ways. That one won't be used (compiler switch) so they make it default, while the other defaults on (overlap or really PARTIAL overlap detection).

Doesn't that again seem both inconsistent AND strange???
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

syzygy wrote:
hgm wrote:
syzygy wrote:I don't know what you are talking about, but I am talking about the example I gave. You can compile it yourself with "gcc -O3 -S" and inspect the assembly code.
I was talking about Bob's example, which was more relevant
Not really, because you were responding to a post by Wylie that was definitely about my example.
I cannot know what the assembler was that the optimizer produced in your case.
With an ancient compiler you will indeed get a quite different result. But I have already explained twice in great detail how modern versions of gcc compile this code.
The most efficient, of course, would be to determine the length of the string at compile time, and make use of the known length to inline a dedicated move. Calling strlen() at run time for a string of known length already seems poor code.
What known length? The compiler cannot know with what command line argument the user will invoke the program.
While that might work for YOUR example, it does not work for the code in ReadPGN() that broke. The overlap test is in the library, not in the compiler. Both clang AND gcc 4.7.3 both fail the same way on mavericks, but NOT on my linux boxes. So it's library, not clever compiler trickery.
bnemias
Posts: 373
Joined: Thu Aug 14, 2008 3:21 am
Location: Albuquerque, NM

Re: strcpy() revisited

Post by bnemias »

bob wrote:I feel exactly the same as when I started.
So you missed the point.
bob wrote:They waste time on EVERY strcpy() to check for overlap.
That's normal in libraries to detect bugs in the calling code. If you think this is the permanent state of the library for release code, I suggest you submit a bug report.
bob wrote: WRONGLY I might add because they only catch strcpy(a, a+n), but NOT strcpy(a+n, a) which is the one more likely to break. ... SO the burn cpu cycles to get the string length of the source. Then they only check one of the two overlapping cases where they should have checked both, and then they abort if they detect the first, ignore the second and let it crash,
If the lib is doing something wrong, submit a bug report. Maybe you can improve it to detect all cases of UB.
bob wrote:Seems perfectly reasonable to me. Somewhere. Some alternate universe. Where insanity reigns supreme... ... when they COULD have fixed both with a simple call to memmove().
Yes, let's cover up bugs instead of expose them.... that is insanity.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: strcpy() revisited

Post by mvk »

wgarvin wrote: Maybe they didn't think enough people would run in the debugging mode? They might have wanted programmers to always have the info, and just underestimated how many compiled binaries and source programs were out there in the wild, that relied on the UB.
Those existing programs are not affected, as the change is introduced during translation time, not in the link-time instance of strpcy(). One already must have rebuilt the project for the checks to enter, and then XCode would show the stack trace and the offending line right in your face. There is a reason there is no outcry about it other than here.
[Account deleted]