strcpy() revisited

mvk · Post by **mvk** » Fri Dec 13, 2013 1:11 am

bob wrote: Here's the top (just #includes) from /usr/include/string.h on my macbook mavericks box:

#include <_types.h>
#include <sys/cdefs.h>
#include <Availability.h>
#include <sys/_types/_size_t.h>
#include <sys/_types/_null.h>

All that is after that is prototypes.

Here is the important part:

#if defined (__GNUC__) && _FORTIFY_SOURCE > 0 && !defined (__cplusplus)
/* Security checking functions. */
#include <secure/_string.h>
#endif

bnemias · Post by **bnemias** » Fri Dec 13, 2013 1:14 am

bob wrote:
bnemias wrote:
bob wrote:It takes one more add and one more compare. How hard is that? I think it is doing something wrong by detecting the overlap in the first place.
You do realize that to implement the solution you've been advocating, covering up the bug by invoking memmove(), that it is necessary to actually detect the overlap?
You do realize that I said EXACTLY THAT? Since they chose to detect the overlap, why just abort rather than actually fix it? Simple enough now? I have only written that a dozen times. Seems silly to waste the CPU cycles to catch it, when it is not done in most programs, and then get NOTHING from those wasted cycles... If you'd join a thread by reading from the top, this wasted post would not be necessary.

I was just pointing out the absurdity of that particular quoted statement. It may be wasted to you, but I've read the thread and consider it rather amusing and therefore not wasted.

bob · Post by **bob** » Fri Dec 13, 2013 1:17 am

wgarvin wrote:Okay, so a few weeks ago a completely legal change made by Apple to their strcpy implementation, caused bob's program (which had a bug in it) to not work correctly on his Mac. It didn't just silently misbehave though; it aborted before any harm was done. And bob has been complaining about it ever since.

Let me just point out the considerable irony of this, because once I stopped and thought it through, it was rather funny to me:

Apple helped him find a subtle bug in Crafty, in fact it was likely THE oldest bug still remaining in Crafty, since by his own admission this code is extremely old and has never been changed. And bob is unhappy about it. He thinks they should have done something else. ANYTHING else. He thinks it was literally the meanest, nastiest thing Apple could have done to his poor little program.

He has run this code on many other platforms over the years, and not one of those C implementations helped him find his bug. All of them silently did something--bob asserts they were doing "the right thing", and many of them probably WERE doing the thing bob thought they ought to do, but since the standard doesn't say they have to do that, there's no way we can be completely sure.

In fact there's probably Crafty binaries out there somewhere, compiled for some platform where overlapping strcpy can fail in a silent and subtle way, on which ReadPGN() will mangle that input and/or not do exactly what bob expected. The poor user stuck with THAT version of Crafty, might never have tried that feature on it; but if he did, he must surely have been disappointed! Too bad he couldn't be bothered to file a bug report about it, after all he's just a user who wants programs to "work", and while he might be willing to help diagnose and fix broken programs, also he might not and that's totally fair.

So to recap: Apple helped bob find an extremely old and subtle bug, which none of his other target systems lifted a finger to help him find, but bob feels little gratitude for this. In fact, one of his many complaints seems to be that Apple didn't help him enough, since the standard library function just printed 'Abort' and didn't write something more informative to stderr, or something, and bob didn't know or care what the right system log to look in was, and for at least for the first few days, it didn't occur to him to fire up the program under a debugger and see why it was aborting. He was (and seemingly still is) indignant about his time being was "wasted" diagnosing the program, because Apple didn't help him enough. But it also would have been just fine with him if they had not helped him at all and just let his program possibly trash memory or spawn nasal demons or whatever. Then he would have been blissfully unaware of the bug, and his users would (hopefully) also have been blissfully unaware of the bug, and as we all know, ignorance is bliss!

(He has also argued repeatedly that the bug shouldn't really be a bug, that the compiler or library should just do what he thought they would do--DWIM!!--and that there would never be any performance cost to this--not true of course).

I mean, I've certainly had my share of bugs in my code before, but if my code has a bug in it I recognize that its my fault and I am grateful when other parties help me find those bugs, even if it means my code was crashing until I get the fix figured out and deployed. As the programmer who wrote that code, I'm the one responsible for making sure it is a correct, working program. Even if I wanted to fob that responsibility off on the library vendor or whoever else, no other party is capable of making my program do what I want. Its correctness is something only I can achieve, and to do that I have to follow the rules of the language, and this is not some difficult, elusive thing--its what programming is all about.

Now let me repeat what I have actually posted, rather than the rambling stuff above.

1. Apple introduced a check that catches overlaps in one direction (we now know it doesn't catch them in both directions).

2. When they detect that overlap case, they just displayed Abort on stderr. Nothing else. Had they displayed "Abort - overlapping src/destination strings." There would have been absolutely no problem. I wouldn't have liked it, but that would have been the end of it.

3. Instead of that, they produced a confusing error message that a user reported to me as an apparent new bug added in 23.8 (he reported he had been able to build books with previous versions just fine.)

4. I spent a week+ debugging, trying to reproduce what he was seeing, not making the connection to a new change in Mavericks that was not there in Mountain Lion.

5. I finally tracked it down. I reported it here as something everyone might want to know, since anyone could be using the overlapping string trick without even realizing it if the code was not written last week.

That was it. I object to crashing a program that has worked for years, and which STILL works on every other operating system / library / compiler. I object to not being specific about the problem. So what if it would show up under Xcode. Do most users run a chess engine under Xcode? I don't use it, because I want compatibility between Mac and Linux and Make does that quite nicely along with gdb.

My complaint was solely about breaking something that was working, without a useful error description, making it quite time-consuming to track the bug down until it finally became apparent that the problem was Mavericks-only.

All the other noise was just that. Noise. I fixed it instantly, ONCE it was found. But let the UB debate continue.

bob · Post by **bob** » Fri Dec 13, 2013 1:22 am

wgarvin wrote:
bob wrote:
bnemias wrote:
bob wrote:It takes one more add and one more compare. How hard is that? I think it is doing something wrong by detecting the overlap in the first place.
You do realize that to implement the solution you've been advocating, covering up the bug by invoking memmove(), that it is necessary to actually detect the overlap?
You do realize that I said EXACTLY THAT? Since they chose to detect the overlap, why just abort rather than actually fix it? Simple enough now? I have only written that a dozen times. Seems silly to waste the CPU cycles to catch it, when it is not done in most programs, and then get NOTHING from those wasted cycles... If you'd join a thread by reading from the top, this wasted post would not be necessary.
In fairness, reading this thread from the top would take hours.

We didn't get NOTHING from the wasted cycles. The bug in your program got found, and fixed. Many other bugs got found, and some of them will get fixed. Lots of users got annoyed, and maybe they will get their software from more reliable sources in the future (though I doubt it).

Anyway, don't you see the inconsistency between your two positions? On the one hand you rail against them wasting 1-2 cycles per strcpy for a comparison and highly-predictable branch (i.e. almost always there is no overlap). On the other hand, you seem to think C compilers should assign semantics to your source code, which would have a much higher performance cost than just 1-2 cycles here and there. C and C++ have both always chosen performance over safety.

My belief is still the same, the compiler should do "the right thing". I'm not sure what you mean by "assign semantics to my program." I supply the semantics, specifically, in the form of C source. I just want it to do what I write. Forget the not very useful optimizations that ignore things like signed overflow and such. Just do what I write. Not what the compiler thinks I meant.

should I choose to write (if a + 1 > a) then produce code that makes that test. Not some other test.

syzygy · Post by **syzygy** » Fri Dec 13, 2013 1:23 am

bob wrote:
syzygy wrote:
bob wrote:
bnemias wrote:
bob wrote:Seems perfectly reasonable to me. Somewhere. Some alternate universe. Where insanity reigns supreme... ... when they COULD have fixed both with a simple call to memmove().
Yes, let's cover up bugs instead of expose them.... that is insanity.
So, to recap the position you must believe, "In order to compile more efficiently and be able to use tricky optimizations, it is perfectly OK to slow down strcpy() by a factor of 2."??? I wonder if those "wonderful optimizations" will offset that factor of 2? That is, I wonder if this is one of those "useful optimizations" that slows the code down rather than speeding it up?
No he said it is insane to cover up bugs instead of exposing them.

There are two things here:
1) Apple pushing people to fix their bugs;
2) the good sides of strcpy() UB.

Ad 1)
The "ethics" of what Apple did may be debatable, but what they did does work. The extra check certainly does not slow down strcpy() by a factor of 2, that's just nonsense.
What does it cost to walk down the string once? Twice? They have to walk it once to get the length, then again to copy. Close enough to 2x to use it as an approximation of the cost.

I guess you haven't heard of caches. Plus, just reading is considerably faster than reading and writing.

Anyway, if you compile with -D_FORTIFY_SOURCE, then you choose to have these checks.

And what they did does not exactly "work". They only catch overlap in one direction, and NOT the worst one at that...
As far as I understand, it only does not work on some constant strings that are not copied using the library implementation.

You were going to install Linux.
Once I finish grading, yes. Does that help anyone but me, however?

You obviously don't care about anyone else that would like to compile and run your code with highly optimising compilers that do not respect "your intentions" for UB or that are not x86_64.

syzygy · Post by **syzygy** » Fri Dec 13, 2013 1:27 am

bob wrote:My belief is still the same, the compiler should do "the right thing". I'm not sure what you mean by "assign semantics to my program." I supply the semantics, specifically, in the form of C source.

The C you supplied does not have semantics.

If you mean it is "hyatt C" and not C, then you should use a "hyatt C" compiler.

wgarvin · Post by **wgarvin** » Fri Dec 13, 2013 1:49 am

syzygy wrote:
bob wrote:My belief is still the same, the compiler should do "the right thing". I'm not sure what you mean by "assign semantics to my program." I supply the semantics, specifically, in the form of C source.
The C you supplied does not have semantics.

If you mean it is "hyatt C" and not C, then you should use a "hyatt C" compiler.

What I was actually thinking of there (when I wrote that bob wanted C compilers to assign semantics to [currently-undefined] things in his program and that would hurt performance much more than 1-2 cycles here or there), was the arguments over whether "X * 2 / 2" is allowed to mean "X". I'm not clear exactly what bob's opinion is on that now, he seemed to think that signed overflow should always wrap 2's-complement style, but also that if X was a 32-bit signed int on an x64 platform, X*2 should be a 64-bit result and not actually overflow. At one point he seemed to want the compiler to generate an imul 2 followed by an idiv 2, or something like that. Maybe he changed his mind about that, I can't keep track anymore. But if modern compilers reduce this to "X", of course that will be faster than anything else they might do to it. In order to assign behavior to what is currently UB, we are imposing new code generation limitations and requirements on the compilers, which would hurt the quality of the generated code.

Many of the various "undefined behavior" cases in the language spec are being exploited in one way or another by modern compilers to generate more efficient code for the other cases whose semantics ARE defined by the standard. And Chris Lattner gave several examples of this, and you and Marcel and others in this thread have also turned up several neat examples. So we can see that the existence of the UB in the language is actually helping compilers to generate faster code for legal programs. Bob wants to get rid of the UB and replace it with "the right thing" on x64 or whatever other platform the compiler happens to be on; even if this could be done in a consistent fashion, it would definitely have a cost to performance. Lattner described in several of his examples what those costs would be (the optims that would be inhibited, or the extra instructions the compiler would have to insert). If the UB in strcpy were to be removed, the nice stpcpy+memcpy optimization you found would be a casualty of that.

Maybe the standard committee or one of the vendors should develop a new "safe C" dialect that assigns semantics to most or all of the UB. This would have a cost in performance, but big benefits for reliability, security, etc. My guess is such a language might be 10-20% slower than current "maximum performance" C implementations are, but I don't really know. Maybe the penalty would be smaller than that. Even if it was 50% slower, for many domains it would still be fine. I doubt bob would want to write Crafty in "safe C" if it produced 20% slower compiled programs, though.

bob · Post by **bob** » Fri Dec 13, 2013 4:27 am

syzygy wrote:
bob wrote:My belief is still the same, the compiler should do "the right thing". I'm not sure what you mean by "assign semantics to my program." I supply the semantics, specifically, in the form of C source.
The C you supplied does not have semantics.

If you mean it is "hyatt C" and not C, then you should use a "hyatt C" compiler.

Do you know the definition of "semantics"? Apparently not.

bob · Post by **bob** » Fri Dec 13, 2013 4:35 am

wgarvin wrote:
syzygy wrote:
bob wrote:My belief is still the same, the compiler should do "the right thing". I'm not sure what you mean by "assign semantics to my program." I supply the semantics, specifically, in the form of C source.
The C you supplied does not have semantics.

If you mean it is "hyatt C" and not C, then you should use a "hyatt C" compiler.

What I was actually thinking of there (when I wrote that bob wanted C compilers to assign semantics to [currently-undefined] things in his program and that would hurt performance much more than 1-2 cycles here or there), was the arguments over whether "X * 2 / 2" is allowed to mean "X". I'm not clear exactly what bob's opinion is on that now, he seemed to think that signed overflow should always wrap 2's-complement style, but also that if X was a 32-bit signed int on an x64 platform, X*2 should be a 64-bit result and not actually overflow. At one point he seemed to want the compiler to generate an imul 2 followed by an idiv 2, or something like that. Maybe he changed his mind about that, I can't keep track anymore. But if modern compilers reduce this to "X", of course that will be faster than anything else they might do to it. In order to assign behavior to what is currently UB, we are imposing new code generation limitations and requirements on the compilers, which would hurt the quality of the generated code.

Many of the various "undefined behavior" cases in the language spec are being exploited in one way or another by modern compilers to generate more efficient code for the other cases whose semantics ARE defined by the standard. And Chris Lattner gave several examples of this, and you and Marcel and others in this thread have also turned up several neat examples. So we can see that the existence of the UB in the language is actually helping compilers to generate faster code for legal programs. Bob wants to get rid of the UB and replace it with "the right thing" on x64 or whatever other platform the compiler happens to be on; even if this could be done in a consistent fashion, it would definitely have a cost to performance. Lattner described in several of his examples what those costs would be (the optims that would be inhibited, or the extra instructions the compiler would have to insert). If the UB in strcpy were to be removed, the nice stpcpy+memcpy optimization you found would be a casualty of that.

Maybe the standard committee or one of the vendors should develop a new "safe C" dialect that assigns semantics to most or all of the UB. This would have a cost in performance, but big benefits for reliability, security, etc. My guess is such a language might be 10-20% slower than current "maximum performance" C implementations are, but I don't really know. Maybe the penalty would be smaller than that. Even if it was 50% slower, for many domains it would still be fine. I doubt bob would want to write Crafty in "safe C" if it produced 20% slower compiled programs, though.

My opinion on x * 2 / 2 == X depends on the value of X. On X86, for example, I'm happy with X*2/2 being X if X is 30 bits or less. I am just as happy with it being X if X is 31 bits or less. But what I REALLY want is X * 2 / 2. Not some "this is equal to this" thing. If I wanted X there, I would have put X there. So there MUST be some reason for me to use the * 2 / 2 pair of operators. Since I used 'em, leave 'em and do 'em. I don't care about sloppy macros and such that produce that. Fix the macros. Don't use macros. But don't use sloppy macros as an excuse to do something I didn't ask for.

Is that a simple and direct enough answer. For X86, let x * 2 / 2 resolve to whatever the compiler can come up with. If it absolutely must use the shift to eliminate the multiply for powers of two, I suppose I can live with that. I would RATHER it just do the multiply, since _I_ could have done the shift if I wanted to do that operation.

I like optimizers that do the right thing. But I don't want them to go so far as to begin to change what I specified was to be done. And then hide behind the "undefined behavior" curtain when I challenge it on its misbehavior. Maybe a -Orational setting might be what would suit me best. Which says "shuffle code, save cycles, but don't take short-cuts on the operations themselves, just because you can".

It's certainly a grey area, but I am proficient enough to write code that does EXACTLY what I want, and I don't need the compiler going behind me and second-guessing. It doesn't screw up signed overflow in NORMAL calculations, so leave 'em the heck alone when you happen to notice that a constant expression might overflow. Do the computations you are given, and if it overflows, assume that was my intent. Don't assume that my intent was to give you permission to do anything you want when you notice an overflow. Just do what I asked.

That is what a good compiler SHOULD do. Consistency above all else, IMHO.

Angrim · Post by **Angrim** » Fri Dec 13, 2013 4:36 am

syzygy wrote:I don't know how helpful and complete it will be, but gcc-4.9 will have an UB detector:
UndefinedBehaviorSanitizer (ubsan), a fast undefined behavior detector, has been added and can be enabled via -fsanitize=undefined. Various computations will be instrumented to detect undefined behavior at runtime. UndefinedBehaviorSanitizer is currently available for the C and C++ languages.
http://gcc.gnu.org/gcc-4.9/changes.html

That looks useful, should find a few new bugs with it

strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited