strcpy() revisited

Rein Halbersma · Post by **Rein Halbersma** » Thu Dec 12, 2013 11:12 am

hgm wrote:But strcpy is not a reserved keyword, is it? It is just the name of a routine. Declared in a header file, and defined in the library source.

7.1.3

Each header declares or deﬁnes all identiﬁers listed in its associated subclause, and optionally declares or deﬁnes identiﬁers listed in its associated future library directions subclause and identiﬁers which are always reserved either for any use or for use as ﬁle scope identiﬁers.

The "each header" refers to 7.1.2 where an exhaustive summary is given of all the C Standard Library headers. Each identifier in those headers is reserved.

You can read about it at leisure in the Standard yourself. It's not that hard.

hgm · Post by **hgm** » Thu Dec 12, 2013 11:36 am

So the 'context' in which strcpy would be a reserved identifier would be any file that does an #include <string.h>, and this allows the optimizer to be certain about what strcpy will do? I guess then there isn't any need for putting a defenition (and inlining it) in the header file.

I still don't understand why the Apple compiler generates external calls to inlined functions that do not exist independnetly. Are Word and Match accidentally standard library functions? Parser.c only #includes 3 standard headers:

Code: Select all

#include <stdio.h>
#include <ctype.h>
#include <string.h>

Rein Halbersma · Post by **Rein Halbersma** » Thu Dec 12, 2013 11:42 am

hgm wrote:So the 'context' in which strcpy would be a reserved identifier would be any file that does an #include <string.h>, and this allows the optimizer to be certain about what strcpy will do? I guess then there isn't any need for putting a defenition (and inlining it) in the header file.

Unlike C++, the C Standard Library does not have templates, so it can have headers that only contain declarations and the Runtime Library will contain the already compiled function definitions that you link against. Inside those definitions, the optimizations can be done (and even at link-time, additional optimizations).

I still don't understand why the Apple compiler generates external calls to inlined functions that do not exist independnetly. Are Word and Match accidentally standard library functions?

You mentioned error messages containing _Match and _Word, but your code shows Match and Word. If you in fact have _Match and _Word in your code, then it's undefined behavior:

7.1.3 first bullet:
-All identiﬁers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.

hgm · Post by **hgm** » Thu Dec 12, 2013 11:52 am

wgarvin wrote:Nice! Anyone want to bet whether or not this find will silence those critics who have complained that there are no possible optimization benefits from the non-overlapping API specification of strcpy?

Well, to spoil the betting:

How is always performing a strlen() that was not requested, and then using the obtained value if by sheer luck a real call to strlen follows a 'clever optimization'? It seems to me it would be way faster to only do the strlen() when it was actually requested. It seems more a case of 'damage control' to me. Usually I am no longer interested in the lengths of strings after I copied them, so it seems this measure doesn't do very much to control the damage suffered by doing unrequested strlen computations.

hgm · Post by **hgm** » Thu Dec 12, 2013 12:00 pm

Rein Halbersma wrote:You mentioned error messages containing _Match and _Word, but your code shows Match and Word. If you in fact have _Match and _Word in your code, then it's undefined behavior:

7.1.3 first bullet:
-All identiﬁers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.

Well, I am doing this from memory now, as the e-mail with the bug report is on another machine. But many C compilers prefix all user-defined identifiers with an underscore, to prevent name clashes with symbols they generate themselves (e.g. jump labels) or assembler mnemonics. The linker doesn't know about C, and its error messages for undefined symbols would use the name of the routines in the compiler-generated assembler. So the C code does use Word and Match, (that I am sure of), but in assembler this becomes _Word and _Match. From what I remember the Apple C compiler also had this habit (like my own (Cygwin) gcc). I don't think the Linux gcc does such prefixing, however.

syzygy · Post by **syzygy** » Thu Dec 12, 2013 12:12 pm

hgm wrote:
wgarvin wrote:Nice! Anyone want to bet whether or not this find will silence those critics who have complained that there are no possible optimization benefits from the non-overlapping API specification of strcpy?
Well, to spoil the betting:

How is always performing a strlen() that was not requested, and then using the obtained value if by sheer luck a real call to strlen follows a 'clever optimization'?

As I alreay explained (seems people don't read anymore nowadays), the strlen() of the original string is obtained for free by doing the first strcpy(). This first strcpy() is replaced by a stpcpy() which returns the address of the trailing 0, so that a simple subtraction is all it takes to obtain the value of strlen(). This value is then used to perform the second strcpy() using memcpy(), apparently copying backwards. Since strlen() is now known anyway, the value is retained to completely optimise away the final strlen().

Note that this also shows that even if strcpy() is implemented using a naive while (*dst++ = +src++); loop instead of using a vectorised implementation (such as the one by Agner Fog), it is very well possible that strcpy() ends up being implemented as a memcpy() which copies from right to left, which would of course fail completely on Bob's "safe" abuse of strcpy.

hgm · Post by **hgm** » Thu Dec 12, 2013 12:49 pm

Well, actually the only cases where strlen() was called and printed were those where any library calls were optimized away, and the copying was performed by inlined moving of a quadword. This would obviously have required a compile-time call to strlen(). In all cases where strcpy was actually called, there was an abort before any invocation of strlen().

So in fact we don't even have any evidence that this optimization takes place after a true call to strcpy. We could never see this from the output of the program, as only non-overlapping copy areas would survive the call, and for those the optimization would indeed always be safe, and thus give the same answer as a non-optimized-away strlen call.

But even so, the claim that the possibility to do this optimization depends on the possibility of UB seems to be false:

* The library call to strcpy must always do the strlen to determine if the copy areas overlap. When it detects overlap, it would as a free side effect know if it had to copy forward or backward, and in any case it would know the correct length.
* When strcpy is optimized away for inlined move instructions, because at compile time the length of the string is known and small, it could also have checked at compile time if the copy areas overlapped, and thus whether the copy would conserve the string length or not, and when it didn't they would actually know how much the new length would be, and could have used that info for optimizing away strlen.

The fact that they don't even test for overlap in the static case makes the excuse that they all do this because they care so much about buffer overflows fall apart. You can still have buffer overflows. Actually more dangerous ones than with the while(*p++ = *q++); implementation, because it would be able to continue after the overflow, exploiting the data corruption, while the latter case would give a guaranteed segfault.

There still doesn't seem to be any benefit from detecting the UB. Just a performance penalty. Using the length of a string that was obtained as a side effect of strcpy could always have been used to optimize away later strlen calls. Provided it had been calculated correctly.

syzygy · Post by **syzygy** » Thu Dec 12, 2013 2:37 pm

hgm wrote:Well, actually the only cases where strlen() was called and printed were those where any library calls were optimized away, and the copying was performed by inlined moving of a quadword. This would obviously have required a compile-time call to strlen(). In all cases where strcpy was actually called, there was an abort before any invocation of strlen().

I don't know what you are talking about, but I am talking about the example I gave. You can compile it yourself with "gcc -O3 -S" and inspect the assembly code.

Btw, I am not talking about Apple's implementation. I am talking about compilers making use of the freedom that UB gives them. Strcpy() with overlapping regions is UB, so the compiler knows that it cannot happen (for programs that it must compile correctly).

Bob's argument, at least at some point, was that strcpy(b, b+1) cannot fail (even though it did). My example shows that this is not true even when strcpy() is implemented with a simple forward looping while(). The compiler might not even call strcpy() but instead invoke a backward looping memcpy(). What the compiler will ultimately do is simply not predictable. But what you can rely on is that strcpy() will work correctly for non-overlapping regions, and that in this case it will not affect apparently unrelated parts of the program (such as a later invocation of strlen()).

Of course one could redefine strcpy() to not have UB in case of overlapping regions. What should be clear is that this would not only have advantages.

wgarvin · Post by **wgarvin** » Thu Dec 12, 2013 3:02 pm

hgm wrote:But strcpy is not a reserved keyword, is it? It is just the name of a routine. Declared in a header file, and defined in the library source.

As I explained already, after it has seen #include <string.h>, which it recognizes as introducing "standard library version of strcpy" into the global scope... Then invocations of "strcpy" can be treated specially by the compiler, and they are.

Its not a "reserved word", just a function whose semantics are intrinsically known to the compiler. (It knows it can use its _builtin_strcpy to get the same result, for example, and it knows that the source and dest operand strings must not overlap because the API specification of C standard library strcpy says that is undefined behaviour).

Its really not that weird for the compiler to emit something else besides a call instruction to call strcpy from the library. Consider what happens after it has seen the definition of one of your functions, and then it sees you calling that function from somewhere else and maybe it decides to inline your definition at that call site. It can do that because it "knows" the definition, at least in the dumb sense of knowing what code to inline there.

Intrinsic functions are just cases where more knowledge about the semantics of a specific vendor-provided function are built into the compiler, so that it can translate your "calls" to it into some efficient inline thing, if it wants to.

Its basically the same as what happens with other intrinsic functions the compiler offers (such as the ones that convince it to generate a single SSE vector instruction, or a rotate or bswap instruction, for example). They wouldn't bother, for most regular library functions, to build special knowledge into the compiler. They can and do take that step for extremely-common utility functions like strcpy, strlen and memcpy. Because all other things being equal, it can improve the performance of correct programs.

wgarvin · Post by **wgarvin** » Thu Dec 12, 2013 4:29 pm

Here is some more info about it. I found it with a 5-second google search (although I guess I had the advantage that I already knew exactly what I was looking for

):

http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

6.56 Other Built-in Functions Provided by GCC

GCC provides a large number of built-in functions other than the ones mentioned above. Some of these are for internal use in the processing of exceptions or variable-length argument lists and are not documented here because they may change from time to time; we do not recommend general use of these functions.

The remaining functions are provided for optimization purposes.

GCC includes built-in versions of many of the functions in the standard C library. The versions prefixed with __builtin_ are always treated as having the same meaning as the C library function even if you specify the -fno-builtin option. (see C Dialect Options) Many of these functions are only optimized in certain cases; if they are not optimized in a particular case, a call to the library function is emitted.

Outside strict ISO C mode (-ansi, -std=c90, -std=c99 or -std=c11), the functions _exit, alloca, bcmp, bzero, dcgettext, dgettext, dremf, dreml, drem, exp10f, exp10l, exp10, ffsll, ffsl, ffs, fprintf_unlocked, fputs_unlocked, gammaf, gammal, gamma, gammaf_r, gammal_r, gamma_r, gettext, index, isascii, j0f, j0l, j0, j1f, j1l, j1, jnf, jnl, jn, lgammaf_r, lgammal_r, lgamma_r, mempcpy, pow10f, pow10l, pow10, printf_unlocked, rindex, scalbf, scalbl, scalb, signbit, signbitf, signbitl, signbitd32, signbitd64, signbitd128, significandf, significandl, significand, sincosf, sincosl, sincos, stpcpy, stpncpy, strcasecmp, strdup, strfmon, strncasecmp, strndup, toascii, y0f, y0l, y0, y1f, y1l, y1, ynf, ynl and yn may be handled as built-in functions. All these functions have corresponding versions prefixed with __builtin_, which may be used even in strict C90 mode.

The ISO C99 functions _Exit, acoshf, acoshl, acosh, asinhf, asinhl, asinh, atanhf, atanhl, atanh, cabsf, cabsl, cabs, cacosf, cacoshf, cacoshl, cacosh, cacosl, cacos, cargf, cargl, carg, casinf, casinhf, casinhl, casinh, casinl, casin, catanf, catanhf, catanhl, catanh, catanl, catan, cbrtf, cbrtl, cbrt, ccosf, ccoshf, ccoshl, ccosh, ccosl, ccos, cexpf, cexpl, cexp, cimagf, cimagl, cimag, clogf, clogl, clog, conjf, conjl, conj, copysignf, copysignl, copysign, cpowf, cpowl, cpow, cprojf, cprojl, cproj, crealf, creall, creal, csinf, csinhf, csinhl, csinh, csinl, csin, csqrtf, csqrtl, csqrt, ctanf, ctanhf, ctanhl, ctanh, ctanl, ctan, erfcf, erfcl, erfc, erff, erfl, erf, exp2f, exp2l, exp2, expm1f, expm1l, expm1, fdimf, fdiml, fdim, fmaf, fmal, fmaxf, fmaxl, fmax, fma, fminf, fminl, fmin, hypotf, hypotl, hypot, ilogbf, ilogbl, ilogb, imaxabs, isblank, iswblank, lgammaf, lgammal, lgamma, llabs, llrintf, llrintl, llrint, llroundf, llroundl, llround, log1pf, log1pl, log1p, log2f, log2l, log2, logbf, logbl, logb, lrintf, lrintl, lrint, lroundf, lroundl, lround, nearbyintf, nearbyintl, nearbyint, nextafterf, nextafterl, nextafter, nexttowardf, nexttowardl, nexttoward, remainderf, remainderl, remainder, remquof, remquol, remquo, rintf, rintl, rint, roundf, roundl, round, scalblnf, scalblnl, scalbln, scalbnf, scalbnl, scalbn, snprintf, tgammaf, tgammal, tgamma, truncf, truncl, trunc, vfscanf, vscanf, vsnprintf and vsscanf are handled as built-in functions except in strict ISO C90 mode (-ansi or -std=c90).

There are also built-in versions of the ISO C99 functions acosf, acosl, asinf, asinl, atan2f, atan2l, atanf, atanl, ceilf, ceill, cosf, coshf, coshl, cosl, expf, expl, fabsf, fabsl, floorf, floorl, fmodf, fmodl, frexpf, frexpl, ldexpf, ldexpl, log10f, log10l, logf, logl, modfl, modf, powf, powl, sinf, sinhf, sinhl, sinl, sqrtf, sqrtl, tanf, tanhf, tanhl and tanl that are recognized in any mode since ISO C90 reserves these names for the purpose to which ISO C99 puts them. All these functions have corresponding versions prefixed with __builtin_.

The ISO C94 functions iswalnum, iswalpha, iswcntrl, iswdigit, iswgraph, iswlower, iswprint, iswpunct, iswspace, iswupper, iswxdigit, towlower and towupper are handled as built-in functions except in strict ISO C90 mode (-ansi or -std=c90).

The ISO C90 functions abort, abs, acos, asin, atan2, atan, calloc, ceil, cosh, cos, exit, exp, fabs, floor, fmod, fprintf, fputs, frexp, fscanf, isalnum, isalpha, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, isxdigit, tolower, toupper, labs, ldexp, log10, log, malloc, memchr, memcmp, memcpy, memset, modf, pow, printf, putchar, puts, scanf, sinh, sin, snprintf, sprintf, sqrt, sscanf, strcat, strchr, strcmp, strcpy, strcspn, strlen, strncat, strncmp, strncpy, strpbrk, strrchr, strspn, strstr, tanh, tan, vfprintf, vprintf and vsprintf are all recognized as built-in functions unless -fno-builtin is specified (or -fno-builtin-function is specified for an individual function). All of these functions have corresponding versions prefixed with __builtin_.

GCC provides built-in versions of the ISO C99 floating-point comparison macros that avoid raising exceptions for unordered operands. They have the same names as the standard macros ( isgreater, isgreaterequal, isless, islessequal, islessgreater, and isunordered) , with __builtin_ prefixed. We intend for a library implementor to be able to simply #define each standard macro to its built-in equivalent. In the same fashion, GCC provides fpclassify, isfinite, isinf_sign and isnormal built-ins used with __builtin_ prefixed. The isinf and isnan built-in functions appear both with and without the __builtin_ prefix.

(emphasis added by me)

strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited