Could you check this little piece of C code?

Discussion of chess software programming and technical issues.

Moderator: Ras

wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Could you check this little piece of C code?

Post by wgarvin »

Sven Schüle wrote:Could you give a concrete example how your "aliasing" situation could look like in order to create different behaviour between left-to-right and right-to-left evaluation of the operands of == in the expression "*++scan == *++match", where scan and match are both pointers?

If I understand you correctly you say that something like

Code: Select all

scan = (unsigned char *) &match;
could create such a problem. But even in this case the modification '++scan' does not change the contents of the 'match' variable. This would be the case only when dealing with references instead of pointers, which is not given here.
Actually you are very close to describing such a case! If you start with something like:

Code: Select all

scan = (unsigned char *) &match;
Then when you increment the match pointer, the value returned by *scan might change. (It would work the other way around too).

Consider this sub-expression:

Code: Select all

 (*++scan == *++match) 
I think the evaluation could be performed in at least 3 different orders:
(1) increment scan, dereference scan, increment match, dereference match, compare the two dereferenced values [notice that this one increments match after scan is dereferenced]
(2) increment match, dereference match, increment scan, dereference scan, compare the two dereferenced values
(3) increment scan, increment match, dereference scan, dereference match, compare the two dereferenced values [notice that this one increments match before scan is dereferenced]

[Edit: I think section "6.5 Expressions" of the standard, allows any "character type" lvalue expression to access any part of any object. In other words char*, unsigned char* and signed char* must all be assumed to alias with anything, because they are pointers to a "character type". I found that in the article mentioned below]

Code: Select all

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

    * a type compatible with the effective type of the object,
    * a qualified version of a type compatible with the effective type of the object,
    * a type that is the signed or unsigned type corresponding to the effective type of the object,
    * a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
    * an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
    * a character type.
Meanwhile, for more strict-aliasing gotchas than you ever imagined or wanted to know about, I recommend this article:
http://cellperformance.beyond3d.com/art ... asing.html
Last edited by wgarvin on Wed Mar 17, 2010 5:53 pm, edited 1 time in total.
User avatar
Bo Persson
Posts: 262
Joined: Sat Mar 11, 2006 8:31 am
Location: Malmö, Sweden
Full name: Bo Persson

Re: Could you check this little piece of C code?

Post by Bo Persson »

wgarvin wrote:
If they are char* pointers and that is the source of the problem, then it means two things:
(1) the compiler is probably generating stupid code for it also, and
(2) changing them to unsigned char* might remove the bizarro potential aliasing, which would fix the warning and the bad codegen (but I'm not at all sure about this... try it and see.)
For historical reasons, char* pointers can point to anything, as that is what happened in C code long before void* was invented.

Apart from the warning problem, there is a potential performance problem from using char pointers. If the compiler cannot figure out if they actually alias anything important, it has to be extremely conservative in its optimizations.

Making the pointers unsigned char* doesn't help either.
User avatar
Bo Persson
Posts: 262
Joined: Sat Mar 11, 2006 8:31 am
Location: Malmö, Sweden
Full name: Bo Persson

Re: Could you check this little piece of C code?

Post by Bo Persson »

wgarvin wrote:
I can't find a clear statement somewhere about whether unsigned char* gets the same free pass from the strict aliasing rule that char* does.. maybe Dann or someone can dig it out of the spec if they have time?
The types char and unsigned char are identical on some hardware, so they must follow the same rules. Including the aliasing rules.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Could you check this little piece of C code?

Post by wgarvin »

Bo Persson wrote:
wgarvin wrote:
I can't find a clear statement somewhere about whether unsigned char* gets the same free pass from the strict aliasing rule that char* does.. maybe Dann or someone can dig it out of the spec if they have time?
The types char and unsigned char are identical on some hardware, so they must follow the same rules. Including the aliasing rules.
Whoops, I just edited that part out of my post. You are correct (they are all "character types" according to the standard).

I guess we can conclude that the behaviour of the original code snippet was ambiguous and the warnings were legitimate, and rewriting it to avoid side effects in the comparison expressions is the safest way to fix it.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Could you check this little piece of C code?

Post by Sven »

wgarvin wrote:
Sven Schüle wrote:Could you give a concrete example how your "aliasing" situation could look like in order to create different behaviour between left-to-right and right-to-left evaluation of the operands of == in the expression "*++scan == *++match", where scan and match are both pointers?

If I understand you correctly you say that something like

Code: Select all

scan = (unsigned char *) &match;
could create such a problem. But even in this case the modification '++scan' does not change the contents of the 'match' variable. This would be the case only when dealing with references instead of pointers, which is not given here.
Actually you are very close to describing such a case! If you start with something like:

Code: Select all

scan = (unsigned char *) &match;
Then when you increment the match pointer, the value returned by *scan might change. (It would work the other way around too).

Consider this sub-expression:

Code: Select all

 (*++scan == *++match) 
I think the evaluation could be performed in at least 3 different orders:
(1) increment scan, dereference scan, increment match, dereference match, compare the two dereferenced values [notice that this one increments match after scan is dereferenced]
(2) increment match, dereference match, increment scan, dereference scan, compare the two dereferenced values
(3) increment scan, increment match, dereference scan, dereference match, compare the two dereferenced values [notice that this one increments match before scan is dereferenced]
You can safely exclude (3) IMO since evaluation of one operand of == will be completed before evalulating the other one.

It is kind of weird. I still believe that the original code example is *not* ambiguous. It is possible, though, that a peculiar compiler does not know about that due to these aliasing problems.

If I look at a standard strcmp() implementation like this:

Code: Select all

int strcmp(const char *s1, const char *s2)
{
    while((*s1 && *s2) && (*s1++ == *s2++));
    return *(--s1) - *(--s2);
}
then I see the same kind of expression. So our conclusion is that this is also "ambiguous"? Really weird. Makes programming a pain.

Sven
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Could you check this little piece of C code?

Post by Sven »

wgarvin wrote:I guess we can conclude that the behaviour of the original code snippet was ambiguous and the warnings were legitimate, and rewriting it to avoid side effects in the comparison expressions is the safest way to fix it.
There *are* no side effects in that code, so there is nothing to avoid IMO. The compiler simply cannot know that potential aliasing problems are in fact not around in the given case, so he emits that remark. I still vote for the original code while locally suppressing that compiler remark. I see nothing ambiguous or unsafe in that original piece of code.

Sven
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Could you check this little piece of C code?

Post by michiguel »

bob wrote:
michiguel wrote:
wgarvin wrote:
Milos wrote:
Dann Corbit wrote:No, actually, I am now convinced that the result is not unspecified. Even though the arguments can be evaluated in any order for any fragment between sequence points in the statement, in this case it does not matter.

After all, if it performs the first assignment before the second or the second before the first (for instance) the result is the same, since the assignments are identical.
You finally got it. The result of the original expression is in general case defined but unspecified. However, since all the elements are identical it is also specified. Since, compiler cannot distinguish a special case from a general one, it issues the warning.
The warning is syntactically correct but semantically wrong.
If the language guaranteed that scan could not point into the 4 bytes occupied by the pointer variable match, and vice versa, then the original expression would not be unspecified at all, because each subexpression of the form (*++p) has only one evaluation order, and each && is a sequence point. So the warning would be spurious in that case.
[Edit: between each sequence point, the two pointer increments have to occur before any reads that depend on the value of that pointer, so the result is unambigous even if there are multiple possible evaluation orders. Its possible the compiler does not track enough state or analyse that state in a nuanced enough way to detect this, though.]

However, if you allow the possibility of that bizarro aliasing, then the result of each comparison is now influenced by the order in which the increments and dereferences are evaluated. In which case the warning is legitimate, even though its about an aliasing situation which nobody would do deliberately.

Were scan and match declared as char* pointers? If thats the case, it might be forcing the compiler to assume that they could alias anything. Even if the compiler is able to prove in this particular case that the bizarro aliasing is impossible, its under no obligation to do that, and issuing the warning is a reasonable thing for it to do.

If they are char* pointers and that is the source of the problem, then it means two things:
(1) the compiler is probably generating stupid code for it also, and
(2) changing them to unsigned char* might remove the bizarro potential aliasing, which would fix the warning and the bad codegen (but I'm not at all sure about this... try it and see.)
It is already unsigned char *scan, *match;


Edit: You could probably also replace it with this:

Code: Select all

do {
      } while(
            (++scan, ++match, *scan == *match) && (++scan, ++match, *scan == *match) &&
            (++scan, ++match, *scan == *match) && (++scan, ++match, *scan == *match) &&
            (++scan, ++match, *scan == *match) && (++scan, ++match, *scan == *match) &&
            (++scan, ++match, *scan == *match) && (++scan, ++match, *scan == *match) &&
            scan < strend); 
But I think your version is clearer.

Yes, your code works without warning.

I think that the compiler does not like "side effects" before and after the == in general, regardless of their correctness.

Miguel
One simple question... why not just write this as a simple loop and let the compiler "unroll" it into the above mess??? Would be easier to read, understand, and change.
Of course, it it were my code I would not write anything but

Code: Select all

	do {
		++scan; ++match;
	} while (scan < strend && *scan == *match); 
However, I am afraid to touch the unrolling in case the author counted on that somewhere else on the code. I doubt it, but...

Miguel
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Could you check this little piece of C code?

Post by Sven »

michiguel wrote:
bob wrote:One simple question... why not just write this as a simple loop and let the compiler "unroll" it into the above mess??? Would be easier to read, understand, and change.
Of course, it it were my code I would not write anything but

Code: Select all

	do {
		++scan; ++match;
	} while (scan < strend && *scan == *match);
However, I am afraid to touch the unrolling in case the author counted on that somewhere else on the code.
I think that way it won't be unrolled since the compiler does not know the number of instances to use. Also I doubt that this variant is semantically equivalent to the original code since there the "scan < strend" check was done every 8 comparisons only.

Maybe you (or Bob) meant this:

Code: Select all

bool done = false;
do {
    for (int i = 0; i < 8 && !done; i++) { // inner loop could be unrolled
        ++scan;
        ++match;
        if (*scan == *match) done = true; // to break the outer loop but avoid "goto" :-)
    }
} while (!done && scan < strend);
Don't know whether compilers do create efficient code for that ...

Sven
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Could you check this little piece of C code?

Post by michiguel »

Sven Schüle wrote:
michiguel wrote:
bob wrote:One simple question... why not just write this as a simple loop and let the compiler "unroll" it into the above mess??? Would be easier to read, understand, and change.
Of course, it it were my code I would not write anything but

Code: Select all

	do {
		++scan; ++match;
	} while (scan < strend && *scan == *match);
However, I am afraid to touch the unrolling in case the author counted on that somewhere else on the code.
I think that way it won't be unrolled since the compiler does not know the number of instances to use. Also I doubt that this variant is semantically equivalent to the original code since there the "scan < strend" check was done every 8 comparisons only.

Maybe you (or Bob) meant this:

Code: Select all

bool done = false;
do {
    for (int i = 0; i < 8 && !done; i++) { // inner loop could be unrolled
        ++scan;
        ++match;
        if (*scan == *match) done = true; // to break the outer loop but avoid "goto" :-)
    }
} while (!done && scan < strend);
Don't know whether compilers do create efficient code for that ...

Sven
I was not referring to that. If it were my code, I would not be concern about unrolling at all because most likely it won't change performance there. Just for the sake of exercise, I might prefer

Code: Select all

do {
	int i;
	++scan; ++match;
	for (i = 0; *scan == *match && i < 7 ; i++)
		++scan; ++match;
} while (*scan == *match && scan < strend);
*IF* this piece of code were so critical for performance, I will not go for this at all and I may like to eliminate the comparison completely with a sentinel rather than doing it every 8 round and expanding the memory allocated by 8 bytes. I would use only 1 extra byte after the array.

Code: Select all

/* make sure that at the end of the array *scan != *match */
*strend = 0; /* this is at the end of *scan by definition */
match [strend-scan] = 1;

/* find the first difference */
do { 
	++scan; ++match;
} while (*scan == *match );
but if performance is an issue, I would not stop there and I may try to compare at least 4 bytes at once with a long int (8 if it is 64 bits). If we are going to optimize, we should go all the way. I prefer not doing it at all and write straightforward code.

Miguel
Dann Corbit
Posts: 12803
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Could you check this little piece of C code?

Post by Dann Corbit »

michiguel wrote:
Sven Schüle wrote:
michiguel wrote:
bob wrote:One simple question... why not just write this as a simple loop and let the compiler "unroll" it into the above mess??? Would be easier to read, understand, and change.
Of course, it it were my code I would not write anything but

Code: Select all

	do {
		++scan; ++match;
	} while (scan < strend && *scan == *match);
However, I am afraid to touch the unrolling in case the author counted on that somewhere else on the code.
I think that way it won't be unrolled since the compiler does not know the number of instances to use. Also I doubt that this variant is semantically equivalent to the original code since there the "scan < strend" check was done every 8 comparisons only.

Maybe you (or Bob) meant this:

Code: Select all

bool done = false;
do {
    for (int i = 0; i < 8 && !done; i++) { // inner loop could be unrolled
        ++scan;
        ++match;
        if (*scan == *match) done = true; // to break the outer loop but avoid "goto" :-)
    }
} while (!done && scan < strend);
Don't know whether compilers do create efficient code for that ...

Sven
I was not referring to that. If it were my code, I would not be concern about unrolling at all because most likely it won't change performance there. Just for the sake of exercise, I might prefer

Code: Select all

do {
	int i;
	++scan; ++match;
	for (i = 0; *scan == *match && i < 7 ; i++)
		++scan; ++match;
} while (*scan == *match && scan < strend);
*IF* this piece of code were so critical for performance, I will not go for this at all and I may like to eliminate the comparison completely with a sentinel rather than doing it every 8 round and expanding the memory allocated by 8 bytes. I would use only 1 extra byte after the array.

Code: Select all

/* make sure that at the end of the array *scan != *match */
*strend = 0; /* this is at the end of *scan by definition */
match [strend-scan] = 1;

/* find the first difference */
do { 
	++scan; ++match;
} while (*scan == *match );
but if performance is an issue, I would not stop there and I may try to compare at least 4 bytes at once with a long int (8 if it is 64 bits). If we are going to optimize, we should go all the way. I prefer not doing it at all and write straightforward code.

Miguel
This last statement is profound and wonderful (I agree with it wholeheartedly).

BTW, most compilers are very good at unrolling for you anyway.