Mysterious segfault

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Mysterious segfault

Post by Tord Romstad »

Glaurung 2-epsilon, the development version of Glaurung 2 which I released yesterday, apparently doesn't work in 64-bit Linux, although it does work correctly in 32-bit Linux. In 64-bit Linux, it segfaults immediately at startup. Bernard Bauer has run my program in gdb and found that the segfault occurs in the following function:

Code: Select all

static const int BitTable[64] = {
  63, 30, 3, 32, 25, 41, 22, 33, 15, 50, 42, 13, 11, 53, 19, 34, 61, 29, 2,
  51, 21, 43, 45, 10, 18, 47, 1, 54, 9, 57, 0, 35, 62, 31, 40, 4, 49, 5, 52,
  26, 60, 6, 23, 44, 46, 27, 56, 16, 7, 39, 48, 24, 59, 14, 12, 55, 38, 28,
  58, 20, 37, 17, 36, 8
};

Square pop_1st_bit(Bitboard *b) {
  Bitboard bb = *b ^ (*b - 1);
  uint32 fold = int(bb) ^ int(bb >> 32);
  *b &= (*b - 1);
  return Square(BitTable[(fold * 0x783a9b23) >> 26]);
}
The exact location of the crash is the line with *b &= (*b - 1).

What is wrong here, and why does this error occur only in 64-bit mode?

Tord
Alessandro Scotti

Re: Mysterious segfault

Post by Alessandro Scotti »

One thing that might be worth trying is:
int index = (fold * 0x...)
and then checking the bounds... maybe the 32-bit "trick" is fooling the compiler.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Mysterious segfault

Post by Gerd Isenberg »

Alessandro Scotti wrote:One thing that might be worth trying is:
int index = (fold * 0x...)
and then checking the bounds... maybe the 32-bit "trick" is fooling the compiler.
I guess you are right. The 32*32bit product might be 64-bit - and shr 26 leaves more than six bit alive so that a additional & 63 might be necessary - or a conditional compile of a pure de Bruijn approach for 64-bit with shr 58!
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Mysterious segfault

Post by hgm »

How should I understand this? Is 'int' by default 64-bit in 64-bit mode, so that the litteral constant is taken as a 64-bit signed int, and both are converted to u64 before the product is calculated?

In that case you might have to write the constant differently, or cast it to u32 before multiplying.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Mysterious segfault

Post by Gerd Isenberg »

hgm wrote:How should I understand this? Is 'int' by default 64-bit in 64-bit mode, so that the litteral constant is taken as a 64-bit signed int, and both are converted to u64 before the product is calculated?

In that case you might have to write the constant differently, or cast it to u32 before multiplying.
Int is still 32-bit (even long under w64, which requires long long) - may be constant literals in conjunction with multiplication is implicitly a long-expression here. Only a vague idea, may be Tord can confirm the reason of the crash later.
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Mysterious segfault

Post by hgm »

Well, then I would say this definitely is a compiler error. The standard requires conversion of operands to the common type, and performing the operator in that type. and the common type of u32 and int32 should be u32. I don't think multiplication is defined as having a double-length result (although on most hardware it of course this instruction is available). If it was, it should also be defined as such in 32-bit mode...
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Mysterious segfault

Post by Tord Romstad »

hgm wrote:Well, then I would say this definitely is a compiler error.
Perhaps it is - I don't know the standard well enough to be sure. At any rate, this compiler error seems to be sufficiently common that working around it is worth the effort.

I decided to solve the problem by introducing an optional non-folding bitscan for 64-bit CPUs. Thanks to everyone for your help!

Tord
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Mysterious segfault

Post by bob »

actually if you multiply two 32 bit ints, you get a 64 bit result. Always has been this way. But if you don't follow that with a divide, which also has a 64 bit dividend, then you lose the extra bits. This has been a hardware feature to prevent problems with things like

a = b * c / d;

if it was all done in 32 bits, you would need to know the actual values for a, b and c before producing the assembly language to execute the operations. the 64 bit multiply result (on 32 bit machines) nicely side-steps this issue. If it were not for this, you couldn't multiply two values where each might be more than 16 bits wide...
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Mysterious segfault

Post by Tord Romstad »

hgm wrote:Well, then I would say this definitely is a compiler error.
In the end, it turned out not to be a compiler error, but yet another stupid programmer error. It turned out that I had the following type declaration:

Code: Select all

typedef unsigned int uint32;
Which is of course incorrect when using GCC in 64-bit Linux!

I really should learn to use GNU Autoconf and Automake in order to handle this kind of issues, instead of bothering this group with all my stupid questions. :(

Tord
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Mysterious segfault

Post by Gerd Isenberg »

Tord Romstad wrote:
hgm wrote:Well, then I would say this definitely is a compiler error.
In the end, it turned out not to be a compiler error, but yet another stupid programmer error. It turned out that I had the following type declaration:

Code: Select all

typedef unsigned int uint32;
Which is of course incorrect when using GCC in 64-bit Linux!

I really should learn to use GNU Autoconf and Automake in order to handle this kind of issues, instead of bothering this group with all my stupid questions. :(

Tord
I think it is of general interest to post about such bugs here!
And I still don't get it. What was wrong with the typedef?
Isn't sizeof(int) == 4 with your compiler?

Gerd