Page 1 of 2

Mysterious segfault

Posted: Mon May 07, 2007 11:48 am
by Tord Romstad
Glaurung 2-epsilon, the development version of Glaurung 2 which I released yesterday, apparently doesn't work in 64-bit Linux, although it does work correctly in 32-bit Linux. In 64-bit Linux, it segfaults immediately at startup. Bernard Bauer has run my program in gdb and found that the segfault occurs in the following function:

Code: Select all

static const int BitTable[64] = {
  63, 30, 3, 32, 25, 41, 22, 33, 15, 50, 42, 13, 11, 53, 19, 34, 61, 29, 2,
  51, 21, 43, 45, 10, 18, 47, 1, 54, 9, 57, 0, 35, 62, 31, 40, 4, 49, 5, 52,
  26, 60, 6, 23, 44, 46, 27, 56, 16, 7, 39, 48, 24, 59, 14, 12, 55, 38, 28,
  58, 20, 37, 17, 36, 8
};

Square pop_1st_bit(Bitboard *b) {
  Bitboard bb = *b ^ (*b - 1);
  uint32 fold = int(bb) ^ int(bb >> 32);
  *b &= (*b - 1);
  return Square(BitTable[(fold * 0x783a9b23) >> 26]);
}
The exact location of the crash is the line with *b &= (*b - 1).

What is wrong here, and why does this error occur only in 64-bit mode?

Tord

Re: Mysterious segfault

Posted: Mon May 07, 2007 12:12 pm
by Alessandro Scotti
One thing that might be worth trying is:
int index = (fold * 0x...)
and then checking the bounds... maybe the 32-bit "trick" is fooling the compiler.

Re: Mysterious segfault

Posted: Mon May 07, 2007 1:01 pm
by Gerd Isenberg
Alessandro Scotti wrote:One thing that might be worth trying is:
int index = (fold * 0x...)
and then checking the bounds... maybe the 32-bit "trick" is fooling the compiler.
I guess you are right. The 32*32bit product might be 64-bit - and shr 26 leaves more than six bit alive so that a additional & 63 might be necessary - or a conditional compile of a pure de Bruijn approach for 64-bit with shr 58!

Re: Mysterious segfault

Posted: Mon May 07, 2007 1:53 pm
by hgm
How should I understand this? Is 'int' by default 64-bit in 64-bit mode, so that the litteral constant is taken as a 64-bit signed int, and both are converted to u64 before the product is calculated?

In that case you might have to write the constant differently, or cast it to u32 before multiplying.

Re: Mysterious segfault

Posted: Mon May 07, 2007 3:16 pm
by Gerd Isenberg
hgm wrote:How should I understand this? Is 'int' by default 64-bit in 64-bit mode, so that the litteral constant is taken as a 64-bit signed int, and both are converted to u64 before the product is calculated?

In that case you might have to write the constant differently, or cast it to u32 before multiplying.
Int is still 32-bit (even long under w64, which requires long long) - may be constant literals in conjunction with multiplication is implicitly a long-expression here. Only a vague idea, may be Tord can confirm the reason of the crash later.

Re: Mysterious segfault

Posted: Mon May 07, 2007 4:29 pm
by hgm
Well, then I would say this definitely is a compiler error. The standard requires conversion of operands to the common type, and performing the operator in that type. and the common type of u32 and int32 should be u32. I don't think multiplication is defined as having a double-length result (although on most hardware it of course this instruction is available). If it was, it should also be defined as such in 32-bit mode...

Re: Mysterious segfault

Posted: Mon May 07, 2007 5:16 pm
by Tord Romstad
hgm wrote:Well, then I would say this definitely is a compiler error.
Perhaps it is - I don't know the standard well enough to be sure. At any rate, this compiler error seems to be sufficiently common that working around it is worth the effort.

I decided to solve the problem by introducing an optional non-folding bitscan for 64-bit CPUs. Thanks to everyone for your help!

Tord

Re: Mysterious segfault

Posted: Mon May 07, 2007 7:17 pm
by bob
actually if you multiply two 32 bit ints, you get a 64 bit result. Always has been this way. But if you don't follow that with a divide, which also has a 64 bit dividend, then you lose the extra bits. This has been a hardware feature to prevent problems with things like

a = b * c / d;

if it was all done in 32 bits, you would need to know the actual values for a, b and c before producing the assembly language to execute the operations. the 64 bit multiply result (on 32 bit machines) nicely side-steps this issue. If it were not for this, you couldn't multiply two values where each might be more than 16 bits wide...

Re: Mysterious segfault

Posted: Mon May 07, 2007 8:32 pm
by Tord Romstad
hgm wrote:Well, then I would say this definitely is a compiler error.
In the end, it turned out not to be a compiler error, but yet another stupid programmer error. It turned out that I had the following type declaration:

Code: Select all

typedef unsigned int uint32;
Which is of course incorrect when using GCC in 64-bit Linux!

I really should learn to use GNU Autoconf and Automake in order to handle this kind of issues, instead of bothering this group with all my stupid questions. :(

Tord

Re: Mysterious segfault

Posted: Mon May 07, 2007 9:14 pm
by Gerd Isenberg
Tord Romstad wrote:
hgm wrote:Well, then I would say this definitely is a compiler error.
In the end, it turned out not to be a compiler error, but yet another stupid programmer error. It turned out that I had the following type declaration:

Code: Select all

typedef unsigned int uint32;
Which is of course incorrect when using GCC in 64-bit Linux!

I really should learn to use GNU Autoconf and Automake in order to handle this kind of issues, instead of bothering this group with all my stupid questions. :(

Tord
I think it is of general interest to post about such bugs here!
And I still don't get it. What was wrong with the typedef?
Isn't sizeof(int) == 4 with your compiler?

Gerd