gcc / clib bug?

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
hgm
Posts: 28393
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

gcc / clib bug?

Post by hgm »

Code: Select all

 struct t_tree {
    struct t_tree *left, *right;
    char name;                    /* Not just 1 char - space for whole name */
  };                              /* is allocated.  Maybe a little cheesy? */
  
  struct t_dirs {
    struct t_dirs *left, *right;
    time_t mtime;                 /* dir's modification time */
    struct t_tree *files;
    char name;                    /* ditto */
  };

// in searc_directory():

      if (!*i) {                          /* if dir isn't in dir tree, add him */
        *i = malloc(sizeof(struct t_dirs) + strlen(dir));
        (*i)->left = (*i)->right = NULL;
        (*i)->files = NULL;
        strcpy(&(*i)->name, dir);
      }

The ICS code, when compiled with gcc 4.6.3 on Ubuntu 12.04 crashes. I could trace the problem to the above code. Although the code is very ugly, using a field defined as scalar char as if it is a string, it doesn't seem really in error: the struct as declared could hold the terminating 0-byte, and it is allocated in such a way that it is guaranteed there is enough empty space behind it to hold the entire string (which happened to be "./players/m" at the point where it crashed).

So how come the code crashes, with the OS complaint of "buffer overflow"?

Code: Select all

*** buffer overflow detected ***: bin/chessd terminated
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(__fortify_fail+0x45)[0xb770c045]
/lib/i386-linux-gnu/libc.so.6(+0x102e1a)[0xb770ae1a]
/lib/i386-linux-gnu/libc.so.6(+0x10214d)[0xb770a14d]
./lib/chessd.so(search_directory+0xe5)[0xb75b8385]
./lib/chessd.so(+0x17b14)[0xb7585b14]
./lib/chessd.so(process_input+0x62b)[0xb7586eab]
./lib/chessd.so(select_loop+0x9fb)[0xb75a9abb]
bin/chessd(main+0x3ea)[0x8048eba]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb76214d3]
bin/chessd[0x8048fa1]
The error disappears when I replace name by name[2] in the struct defenition, which suggest the allocated space is one byte short. Is strcpy somehow using a byte it should not use, at the end of the string?
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: gcc / clib bug?

Post by AlvaroBegue »

You are using the language in a way that doesn't conform to the standard, so the compiler is allowed to do anything it wishes, including issuing a buffer-overflow error.

There is a fairly detailed discussion about this here: http://stackoverflow.com/questions/3711 ... d-behavior
User avatar
hgm
Posts: 28393
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: gcc / clib bug?

Post by hgm »

That's fine in theory, but I would like to understand why the implementation I am dealing with actually crashes, and crashes so bad that the operating system notices it, producing the stack trace. If it would be because the compiler used array-bound checking, I would be shocked (because that would make the code so inefficient), but it would be in its rights to complain. But I know that it doesn't, because in that case it should also have complained when I changed the declaration to name[2], as the strlen("./player/m") equals 10, and would clearly overrun the declared buffer.

So knowing that there is no bound checking, and that the malloc did leave enough empty space behind the struct, that the C standard does force &(*i)->name to be a valid pointer into this allocated memory region, how could strcpy (which should have no knowledge of how the pointer was created) crash?

Some of the formal arguments offered in the link you give seem extremely doubtful to me. I cannot imagine that using code like

Code: Select all

char p = malloc(100);
p[99] = 'a';
would be a violation of the standard (and could thus result in undefined behavior), just because p was not pointing in any statically or automatically declared array. That would mean that according to the standard C would be a completely useless language, as it is no longer possible to access malloc'ed memory in a compliant way. I never studied the standard, but I would say that what malloc(N) returns MUST be equivalent to an array of N char. So creating a pointer to any of the elements in it must make it valid to access all elements behind it that are still in the allocated region by incrementing the (char*) pointer. Absolutely nothing undefined there.

What would happen when I declare

Code: Select all

struct x { char name; };
struct y { char longName[1000]; }

struct x *p;
struct y *q;

p = (struct y *) malloc(sizeof(struct x) + 7);
q = (struct x *) y;
strcpy(&(q->name), "blablah");
Would that leave room for any undefined behavior?
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: gcc / clib bug?

Post by AlvaroBegue »

hgm wrote:That's fine in theory, but I would like to understand why the implementation I am dealing with actually crashes, and crashes so bad that the operating system notices it, producing the stack trace.
Yes, it would be good to understand what it's doing under the hood to produce this crash, but I don't have any good ideas as to what it is. Can you reproduce the problem with a 10-line program?
So knowing that there is no bound checking, and that the malloc did leave enough empty space behind the struct, that the C standard does force &(*i)->name to be a valid pointer into this allocated memory region, how could strcpy (which should have no knowledge of how the pointer was created) crash?
The thing is, when the compiler optimizes your code, it can assume you are not going to invoke undefined behavior. If you do, some optimization somewhere might not be sound.
Some of the formal arguments offered in the link you give seem extremely doubtful to me. I cannot imagine that using code like

Code: Select all

char p = malloc(100);
p[99] = 'a';
would be a violation of the standard (and could thus result in undefined behavior), just because p was not pointing in any statically or automatically declared array.
I agree, that should work. And I am pretty sure the standard covers it, although I am not going to check.
[...] I never studied the standard, but I would say that what malloc(N) returns MUST be equivalent to an array of N char.
Yes, but if you access that array through a pointer to a smaller array, the compiler can assume you are not going to access it beyond the boundary of the smaller array.

Although it's not your case, the processor may have different addressing modes for small indices, and the compiler can choose to use it, if the array you are pointing at is small enough.
What would happen when I declare

Code: Select all

struct x { char name; };
struct y { char longName[1000]; }

struct x *p;
struct y *q;

p = (struct y *) malloc(sizeof(struct x) + 7);
q = (struct x *) y;
strcpy(&(q->name), "blablah");
Would that leave room for any undefined behavior?
That doesn't compile. You can't use the type `y' as if it where a variable, when assigning to `q'.
User avatar
hgm
Posts: 28393
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: gcc / clib bug?

Post by hgm »

Code: Select all

struct x { char name; };
struct y { char longName[1000]; }

struct x *p;
struct y *q;

p = (struct y *) malloc(sizeof(struct x) + 7);
q = (struct x *) p;
strcpy(&(q->name), "blablah"); 
Oh, sorry, I meant to cast p, not y.

Unfortunately I cannot reproduce this at all, as I don't even have that compiler. It is someone else having this problem, who reported it to me (as I seem to be the de-facto maintainer of the ICS code).

Although I agree that in principle optimizations could use invalid assumptions, I don't see how that could be the case here. The crash occurs in a library routine strcpy, which should be completely general, and completely unaware of how the code that calls it generated the pointer it gets passed as destination. In order to work for fully defined behavior (i.e. is I pass an empty string to strcpy, so it would fit in the scalar char field), it MUST pass a pointer to an element within the malloc'ed region, with enough bytes behind it. Just like strcpy cannot know how the pointer was generated, the optimizer cannot know what strcpy does. I could very well link it with my own defenition of strcpy in another object file.
syzygy
Posts: 5743
Joined: Tue Feb 28, 2012 11:56 pm

Re: gcc / clib bug?

Post by syzygy »

[quote="hgm"]

Code: Select all

*** buffer overflow detected ***: bin/chessd terminated
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(__fortify_fail+0x45)[0xb770c045][/quote]
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg02055.html
[quote]The intent of this patch is to add some checks that have no or non-measurable runtime overhead, so something that can be enabled for all programs and libraries in an operating system. The patch certainly doesn't prevent all buffer overflows, but should prevent many common ones. It works by computing a constant (conservative) number of bytes remaining to the end of object(s) each destination pointer passed to memory and string functions, if possible checking for overflows at compile time, if not possible passing that constant size to special checking alternatives of the memory/string functions.[/quote]
I suppose the crash will disappear if the code is compiled with -D_FORTIFY_SOURCE flags.
User avatar
hgm
Posts: 28393
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: gcc / clib bug?

Post by hgm »

Thanks, we will try this.

Another problem that surfaced:

On other commands the ICS crashes with the error that it cannot find the routine floorf in the chessd.so file. Not surprising, because the symbol floorf occurs nowhere in the ICS code.

Doing a 'grep floorf *' does find it in binary and object files, though, in particular in comproc.o. The source file comproc.c does contain calls to floor and ceil:

Code: Select all

int count;
float start_perc, stop_perc;
         startpoint = floor((float) count * start_perc);
         stoppoint = ceil((float) count * stop_perc) - 1;
As I understand the standard the (float) cast is meaningless, and the operands of the multiply should be casted to double before being multiplied. But it seems the compiler tries to be smart here, by doing the multiply as float, and even passing a float to floor. Which it renames to floorf to allow this.

But floorf doesn't seem to be known in clib.so, so the run-time system now expects it in my own code...
User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: gcc / clib bug?

Post by Evert »

hgm wrote:The error disappears when I replace name by name[2] in the struct defenition, which suggest the allocated space is one byte short. Is strcpy somehow using a byte it should not use, at the end of the string?
Have you tried name[1]? That seems more reasonable to me (and avoids the ugly & when passing the element), and I suppose it might just be possible that the compiler gets confused because it assumes that name is a scalar rather than the first element of an array (no idea what type of optimisation it would do to get confused by though).

If name[2] works, have you checked the length of the strings when the code runs to verify that they indeed don't fit in the allocated space?
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: gcc / clib bug?

Post by Sven »

hgm wrote:

Code: Select all

struct x { char name; };
struct y { char longName[1000]; }

struct x *p;
struct y *q;

p = (struct y *) malloc(sizeof(struct x) + 7);
q = (struct x *) p;
strcpy(&(q->name), "blablah"); 
Oh, sorry, I meant to cast p, not y.

Unfortunately I cannot reproduce this at all, as I don't even have that compiler. It is someone else having this problem, who reported it to me (as I seem to be the de-facto maintainer of the ICS code).

Although I agree that in principle optimizations could use invalid assumptions, I don't see how that could be the case here. The crash occurs in a library routine strcpy, which should be completely general, and completely unaware of how the code that calls it generated the pointer it gets passed as destination. In order to work for fully defined behavior (i.e. is I pass an empty string to strcpy, so it would fit in the scalar char field), it MUST pass a pointer to an element within the malloc'ed region, with enough bytes behind it. Just like strcpy cannot know how the pointer was generated, the optimizer cannot know what strcpy does. I could very well link it with my own defenition of strcpy in another object file.
I can only say that people who write "clean" software do not have to bother with this type of problems. None of the code snippets you gave here makes much sense to me, except if you say we are in an "obfuscated C" contest.
I would NEVER even attempt to store the result of a function call like malloc() that returns a pointer to "something" in a variable that has less than the size of a pointer.
I would NEVER attempt to declare a variable 'c' of type char, assign "something" to it, and then access "something else" in the way 'c[99]'.
And I would NEVER declare two pointer variables x and y pointing to data of two different types, and then assign addresses to them that point each to data of the other type.
And I would NEVER attempt to copy a string to an address which I (presumably incompletely) stored in only one character - even if I "know" that something like structure alignment seems to make four characters out of one, or anything similar.

Why would ANYONE need to do so, instead of writing it the normal, straightforward way?

If you declare q as a pointer to 'struct y' then, according to typing concept, the expression q->name should be impossible without casting q to 'struct x *' since 'struct y' has no member 'name'. So I don't know what you intended to write.

I think that crashing is really the best that can happen here, if it compiles at all. Anything else would be disappointing for me ;-)

Fighting bad programming practice regards,
Sven
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: gcc / clib bug?

Post by Sven »

hgm wrote:

Code: Select all

 struct t_tree {
    struct t_tree *left, *right;
    char name;                    /* Not just 1 char - space for whole name */
  };                              /* is allocated.  Maybe a little cheesy? */
  
  struct t_dirs {
    struct t_dirs *left, *right;
    time_t mtime;                 /* dir's modification time */
    struct t_tree *files;
    char name;                    /* ditto */
  };

// in searc_directory():

      if (!*i) {                          /* if dir isn't in dir tree, add him */
        *i = malloc(sizeof(struct t_dirs) + strlen(dir));
        (*i)->left = (*i)->right = NULL;
        (*i)->files = NULL;
        strcpy(&(*i)->name, dir);
      }
A better solution might be something like this:

Code: Select all

  struct t_dirs {
    struct t_dirs *left, *right;
    time_t mtime;                 /* dir's modification time */
    struct t_tree *files;
  };

  struct t_dirs * p = *i;
  if (!p) {                          /* if dir isn't in dir tree, add him */
    p = malloc(sizeof(struct t_dirs) + strlen(dir) + 1);
    p->left = p->right = NULL;
    p->files = NULL;
    strcpy(p + sizeof(struct t_dirs), dir);
  }
Sven