gcc miscompilation in Stockfish 1.6

bob · Post by **bob** » Mon Dec 28, 2009 4:45 pm

mcostalba wrote:
bob wrote: GCC is a very robust compiler, breaking it is almost always a user issue rather than a compiler issue. As it is here.
If with -O1 and -O3 the functionality is different it is a compiler bug by definition. BTW rewriting Score as a struct instead of an enum (as posted above) does not change anything.

We already know it is not faster, but it is cleaner in this way because you avoid a lot of double assignments / updates scattered through the code (-70 lines of code if I remember correctly).

that's not necessarily a compiler bug. For example, if you have an undefined variable (used before it is defined) it will almost certainly behave differently at different levels of optimization, and it is not a compiler bug at all. There are lots of ways to produce just this kind of behaviour, and all of them are programming bugs rather than compiler bugs.

bob · Post by **bob** » Mon Dec 28, 2009 4:48 pm

mcostalba wrote:
Bo Persson wrote: In my experience you find a genuine code generation bug about once every 10 years. This might very well be your turn, but I wouldn't bet on it.
May be it is my turn

I think I have found the miscompiled line, with following version it works for me:
Code: Select all
/// Score enum keeps a midgame and an endgame value in a single
/// integer (enum), first LSB 16 bits are used to store endgame
/// value, while upper bits are used for midgame value.

enum Score { ENSURE_32_BIT_SIZE = 1 << 31 };

inline Value eg_value(Score s) { return Value(((s & 0xffff) << 16) >> 16); }
The first thing is to ensure gcc uses an integer for the enum and not a smaller type (and this is allowed by standard so is not a gcc bug), but the crucial point is the rewrite of the eg_value() function that now gives the correct result also with gcc -O3 (and I have verified both Intel and MSVC compile also this version correctly).

So the problem was this one:
Code: Select all
inline Value eg_value(Score s) { return Value(int16_t(s & 0xffff)); }
Function eg_value() extracts the signed lower 16 bits from the 32 bit enum 's'. This function got miscompiled under gcc -O3 with some recent gcc version.

If someone could test with a 64 bit compile (I don't have a 64 bit machine), I would be very glad so that I can at last release 1.6.2 with all the stuff fixed, included a rewrite of pop_1st_bit() to avoid the reported aliasing issue.

There are several ways to produce this kind of behaviour. I assume you already use -Wall so that it will produce complaints about uninitialized variables and such? next issue is pointers. Higher optimization levels assume that two different pointers do not point to the same memory address. Eugene's egtb.cpp is an example of code that will completely break because of this if you don't tell the compiler that this is an issue.

bob · Post by **bob** » Mon Dec 28, 2009 4:51 pm

mcostalba wrote:
Gian-Carlo Pascutto wrote:
mcostalba wrote: Sorry, I don't understand this. Why should be an error a right shift of a negative number ?
It's not an error, it's implementation defined. Meaning, if you switch compilers or compiler versions, or even optimization settings, the result is allowed to change.

If you care for the exact outcome (which I would presume), then the code is wrong.

I would think the right shift of an int is sign extended.
There is nothing in the C standard that guarantees that. See section 6.5.7, point 5:

http://www.open-std.org/JTC1/SC22/WG14/ ... /n1256.pdf
Thanks,

so I presume also this one should be changed
Code: Select all
inline Value mg_value(Score s) { return Value((int(s) + 32768) >> 16); } 
Have you a suggestion on how to extract the _signed_ 16 bits of the lower and upper part of a 32 bit field (integer) ?

Yes. Don't store negative numbers like that. Do like the IEEE floating point exponents and use "excess N (N=127 in 32 bit IEEE)". That is, add a number that is guaranteed to produce a positive result. Then subtract it after extracting the positive number. Then you get away from this kind of issue.

mcostalba · Post by **mcostalba** » Mon Dec 28, 2009 4:59 pm

Gerd Isenberg wrote: You may use div instead of shift right:
Code: Select all
inline Value mg_value(Score s) {return Value(int(s & ~0xffff) / 0x10000);} 
inline Value eg_value(Score s) {return Value(int(short(s &  0xffff)));} 

Thanks regarding the divide it works with a little adjustment to take in account the lower 16 bits carry:

Code: Select all

inline Value mg_value(Score s) { return Value(((int(s) + 32768) & ~0xffff) / 0x10000); }

Instead cast to short doesn't because according to the standard, "If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined."

See http://www.velocityreviews.com/forums/t ... d-int.html

So probably I will use the follow one that works and it _seems_ standard compliant

Code: Select all

inline Value eg_value(Score s) { return Value((int)(unsigned(s) & 0x7fffu) - (int)(unsigned(s) & 0x8000u)); }

mcostalba · Post by **mcostalba** » Mon Dec 28, 2009 5:04 pm

Zach Wegner wrote:
zamar wrote:Note that here compiler is free to choose between signed and unsigned type:
Code: Select all
enum Score { ENSURE_32_BIT_SIZE = 1 << 31 };
while here compiler is forced to use signed int:
Code: Select all
enum Score { ENSURE_32_BIT_SIZE_MAX = 1 << 31,  ENSURE_32_BIT_SIZE_MIN = -(1<<31)};
No, for the last one it would have to use something more than 32 bit. In 32 bits, (1<<31) and -(1<<31) are the same number. And in the first example, it would have to use unsigned ints, bit 31 is the sign bit so (1<<31) is negative in signed arithmetic. Your examples would work if you change the 31s to 30s, though.

Ok, what about something like this ?

Code: Select all

enum Score { ENSURE_32_BITS_SIZE_P = (1 << 30),  ENSURE_32_BITS_SIZE_N = -(1<<30)};

Bo Persson · Post by **Bo Persson** » Mon Dec 28, 2009 5:30 pm

mcostalba wrote:
Gian-Carlo Pascutto wrote: There is nothing in the C standard that guarantees that. See section 6.5.7, point 5:

http://www.open-std.org/JTC1/SC22/WG14/ ... /n1256.pdf
Ok, so what about this one ?
Code: Select all
inline Value eg_value(Score s) { return Value((int)(unsigned(s) & 0x7fffu) - (int)(unsigned(s) & 0x8000u)); }
I have verified it works with all the compilers and does not rely on shifts or cast to shorts. I have found it googling around and it seems safe to me.

And how much do we now save, compared to actually storing two separate values of type short?

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Dec 28, 2009 5:35 pm

mcostalba wrote:
Gerd Isenberg wrote: You may use div instead of shift right:
Code: Select all
inline Value mg_value(Score s) {return Value(int(s & ~0xffff) / 0x10000);} 
inline Value eg_value(Score s) {return Value(int(short(s &  0xffff)));} 
Thanks regarding the divide it works with a little adjustment to take in account the lower 16 bits carry:
Code: Select all
inline Value mg_value(Score s) { return Value(((int(s) + 32768) & ~0xffff) / 0x10000); }
Instead cast to short doesn't because according to the standard, "If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined."

See http://www.velocityreviews.com/forums/t ... d-int.html

So probably I will use the follow one that works and it _seems_ standard compliant
Code: Select all
inline Value eg_value(Score s) { return Value((int)(unsigned(s) & 0x7fffu) - (int)(unsigned(s) & 0x8000u)); }

I forgot you didn't have SWAR of signed shorts inside an signed int, but upper dependent on lower - therefor Bob's advice for a bias to positive integers seems very reasonable. The standard compliant for the lower part is very elegant and simple and was new to me.

mcostalba · Post by **mcostalba** » Mon Dec 28, 2009 5:39 pm

Bo Persson wrote: And how much do we now save, compared to actually storing two separate values of type short?

It is too late to roll back, there are hundreds of lines of code to change scattered in many files if we want to come back to the "separated variables" approach.

As an afterthought I agree that this trick of using a single integer for two scores absolutely does not deserve the effort to make it work !

Anyhow it is already in, so let's keep it. I hope that other authors that will want to use the same approach have the road already well marked, because I think we have done the grunt work to fix all the subtle issues of this trick.

Lesson learned: avoid tricks as much as possible

On the good side I have to say that using only a single Score allows to simplify the code in many places in evaluation, and this is good because evaluation is already difficult as is and a bit of simplification there is a good thing IMHO.

Bo Persson · Post by **Bo Persson** » Mon Dec 28, 2009 5:40 pm

Gian-Carlo Pascutto wrote:
mcostalba wrote: Sorry, I don't understand this. Why should be an error a right shift of a negative number ?
It's not an error, it's implementation defined. Meaning, if you switch compilers or compiler versions, or even optimization settings, the result is allowed to change.

If you care for the exact outcome (which I would presume), then the code is wrong.

I would think the right shift of an int is sign extended.
There is nothing in the C standard that guarantees that. See section 6.5.7, point 5:

http://www.open-std.org/JTC1/SC22/WG14/ ... /n1256.pdf

On the other hand, gcc does define the shift as expected.

http://gcc.gnu.org/onlinedocs/gcc-4.4.2 ... ementation

I would expect all compilers for x86 hardware to do exactly that. The exception is there for 1's complement or signed magnitude hardware, where shifting bits is really interesting.

Sven · Post by **Sven** » Mon Dec 28, 2009 5:41 pm

Bo Persson wrote:
mcostalba wrote:
Gian-Carlo Pascutto wrote: There is nothing in the C standard that guarantees that. See section 6.5.7, point 5:

http://www.open-std.org/JTC1/SC22/WG14/ ... /n1256.pdf
Ok, so what about this one ?
Code: Select all
inline Value eg_value(Score s) { return Value((int)(unsigned(s) & 0x7fffu) - (int)(unsigned(s) & 0x8000u)); }
I have verified it works with all the compilers and does not rely on shifts or cast to shorts. I have found it googling around and it seems safe to me.
And how much do we now save, compared to actually storing two separate values of type short?

Exactly what I wanted to say, right in this minute. I think the whole issue has gone too far into bit twiddling "at its best" while not being necessary at all.

I just renew my "class Score" proposal. This one compiles under GCC and includes everything you would really need. There is no bit operation involved, just two 16-bit values are kept and operated upon simultaneously. The user code needs a little bit of editing but I think it should be worth the small effort:

Code: Select all

class Score {
public:
    // constructors
    Score() : mg(0), eg(0) {}
    Score(int mg_val, int eg_val) : mg(mg_val), eg(eg_val) {}
    Score(int eg_val) : mg(0), eg(eg_val) {} // maybe subject of discussion

    // accessors
    int mg_value() const { return int(mg); }
    int eg_value() const { return int(eg); }

    // unary minus operator
    Score operator-() { return Score(-int(mg), -int(eg)); }

    // binary member operators
    Score operator+(Score s) { return Score(mg + s.mg, eg + s.eg); }
    Score operator-(Score s) { return Score(mg - s.mg, eg - s.eg); }
    Score operator+=(Score s) { mg += s.mg; eg += s.eg; return *this; }
    Score operator-=(Score s) { mg -= s.mg; eg -= s.eg; return *this; }

    // binary friend operators
    friend bool operator==(Score const & s1, Score const & s2)
        { return s1.mg == s2.mg && s1.eg == s2.eg; }
    friend bool operator!=(Score const & s1, Score const & s2) { return !(s1 == s2); }
    friend Score operator*(int i, Score s) { return Score(i * s.mg, i * s.eg); }
    friend Score operator*(Score s, int i) { return i * s; }

    // Division must be handled separately for each term
    friend Score operator/(Score s, int i) { return Score(s.mg / i, s.eg / i); }

    // Only declared but not defined. We don't want to multiply two scores due to
    // a very high risk of overflow. So user should explicitly convert to integer.
    friend Score operator*(Score s1, Score s2);

private:
    int16_t mg;
    int16_t eg;
};

Maybe someone comes up with something even better?

Sven

gcc miscompilation in Stockfish 1.6

Re: gcc miscompilation in Stockfish 1.6

Re: gcc miscompilation in Stockfish 1.6

Re: gcc miscompilation in Stockfish 1.6

Re: gcc miscompilation in Stockfish 1.6

Re: gcc miscompilation in Stockfish 1.6

Re: gcc miscompilation in Stockfish 1.6

Re: gcc miscompilation in Stockfish 1.6

Re: gcc miscompilation in Stockfish 1.6

Re: gcc miscompilation in Stockfish 1.6

Re: gcc miscompilation in Stockfish 1.6