Mysterious segfault

sje · Post by **sje** » Mon May 07, 2007 9:18 pm

Here's some code from near the very beginning of Symbolic's initialization:

// Assertions

  Assert&#40;sizeof&#40;CTByte&#41;  == 1&#41;;
  Assert&#40;sizeof&#40;CTSByte&#41; == 1&#41;;

  Assert&#40;sizeof&#40;CTWord&#41;  == 2&#41;;
  Assert&#40;sizeof&#40;CTSWord&#41; == 2&#41;;

  Assert&#40;sizeof&#40;CTDwrd&#41;  == 4&#41;;
  Assert&#40;sizeof&#40;CTSDwrd&#41; == 4&#41;;

  Assert&#40;sizeof&#40;CTQwrd&#41;  == 8&#41;;
  Assert&#40;sizeof&#40;CTSQwrd&#41; == 8&#41;;

It's been helpful when porting across different compilers and hosts.

Tord Romstad · Post by **Tord Romstad** » Mon May 07, 2007 9:34 pm

Gerd Isenberg wrote:
Tord Romstad wrote:
hgm wrote:Well, then I would say this definitely is a compiler error.
In the end, it turned out not to be a compiler error, but yet another stupid programmer error. It turned out that I had the following type declaration:
Code: Select all
typedef unsigned int uint32;
Which is of course incorrect when using GCC in 64-bit Linux!

I really should learn to use GNU Autoconf and Automake in order to handle this kind of issues, instead of bothering this group with all my stupid questions.

Tord
I think it is of general interest to post about such bugs here!
And I still don't get it. What was wrong with the typedef?
Isn't sizeof(int) == 4 with your compiler?

Yes, it is. I made yet another stupid mistake and copied the above code after I had fixed the bug. I had "typedef unsigned long uint32", and a "long", as you certainly know, is a 64-bit int on some systems (not on mine, though). I hope it makes sense now.

I apologize for the confusion.

Tord

Guetti · Post by **Guetti** » Mon May 07, 2007 9:39 pm

It seems to work now on my gentoo linux x86-64 system after uncommenting #define USE_32BIT_ATTACKS and USE_FOLDED_BITSCAN!! Almost 800 KNPS seems not bad!
Test system: AMD Athlon(tm) 64 Processor 3400+

andreas@belugabox ~/g2-epsilon/src $ ./glaurung
uci
id name Glaurung 2-epsilon
id author Tord Romstad
option name Hash type spin default 32 min 4 max 1024
option name Ponder type check default true
option name OwnBook type check default true
option name UCI_Chess960 type check default false
option name UCI_EngineAbout type string default Glaurung by Tord Romstad, see http://www.glaurungchess.com
uciok
go depth 15
info depth 2
info depth 2 score cp 11 time 1 nodes 45 nps 45000 pv b1c3 b8c6
info depth 3
info depth 3 score cp 54 time 1 nodes 197 nps 197000 pv b1c3 b8c6 g1f3
info depth 4
info depth 4 score cp 11 time 2 nodes 353 nps 176500 pv b1c3 b8c6 g1f3 g8f6
info depth 5
info depth 5 score cp 15 time 3 nodes 771 nps 257000 pv b1c3 b8c6 g1f3 g8f6 e2e3
info depth 6
info depth 6 score cp 11 time 4 nodes 1583 nps 395750 pv b1c3 b8c6 g1f3 g8f6 e2e3 e7e6
info depth 7
info depth 7 score cp 21 time 9 nodes 3557 nps 395222 pv b1c3 b8c6 g1f3 g8f6 e2e3 e7e6 f1d3
info depth 8
info depth 8 score cp 11 time 14 nodes 6378 nps 455571 pv b1c3 b8c6 g1f3 g8f6 e2e3 e7e6 f1d3 f8c5
info depth 9
info depth 9 score cp 35 time 40 nodes 17227 nps 430675 pv b1c3 b8c6 g1f3 g8f6 e2e3 e7e6 f1d3 c6b4 e1g1 b4d3 c2d3
info depth 10
info depth 10 score cp 11 time 122 nodes 37686 nps 308901 pv b1c3 b8c6 g1f3 g8f6 e2e3 e7e6 f1d3 f8c5 e1g1 e8g8
info depth 11
info depth 11 score cp 21 time 244 nodes 80707 nps 330766 pv b1c3 b8c6 g1f3 g8f6 e2e3 e7e6 f1d3 f8d6 e1g1 f6g4 g2g3
info depth 12
info depth 12 score cp 23 time 483 nodes 180209 nps 373103 pv b1c3 b8c6 g1f3 g8f6 e2e3 e7e6 f1d3 c6b4 e1g1 b4d3 c2d3 f8d6 f3d4
info depth 13
info depth 13 score cp 11 time 1042 nodes 419731 nps 402812 pv b1c3 b8c6 g1f3 g8f6 e2e3 e7e6 f1d3 c6b4 e1g1 b4d3 c2d3 f8d6 c3b5 e8g8 b5d6 c7d6
info currmove d2d4 currmovenumber 2
info nodes 420000 nps 402684 time 1042
info currmove e2e4 currmovenumber 3
info currmove d2d3 currmovenumber 4
info currmove g1h3 currmovenumber 5
info currmove e2e3 currmovenumber 6
info currmove a2a3 currmovenumber 7
info currmove g2g3 currmovenumber 8
info currmove b1a3 currmovenumber 9
info currmove c2c4 currmovenumber 10
info currmove h2h3 currmovenumber 11
info currmove b2b4 currmovenumber 12
info currmove g2g4 currmovenumber 13
info currmove b2b3 currmovenumber 14
info currmove h2h4 currmovenumber 15
info currmove a2a4 currmovenumber 16
info currmove c2c3 currmovenumber 17
info currmove g1f3 currmovenumber 18
info currmove f2f4 currmovenumber 19
info currmove f2f3 currmovenumber 20
info depth 14
info currmove b1c3 currmovenumber 1
info depth 14 score cp 11 time 1666 nodes 932312 nps 559611 pv b1c3 b8c6 g1f3 g8f6 e2e3 e7e6 f1e2 f8e7 d2d4 d7d5 e1g1 e8g8 c1d2 c8d7
info currmove d2d4 currmovenumber 2
info currmove e2e4 currmovenumber 3
info nodes 1290000 nps 628654 time 2052
info depth 14 score cp 23 time 2774 nodes 1995791 nps 719463 pv e2e4 b8c6 g1f3 g8f6 b1c3 e7e6 f1b5 f8b4 d2d3 e8g8 e1g1 d7d5 b5c6 b7c6 f3e5
info currmove d2d3 currmovenumber 4
info currmove g1h3 currmovenumber 5
info currmove e2e3 currmovenumber 6
info currmove b1a3 currmovenumber 7
info currmove a2a3 currmovenumber 8
info currmove c2c4 currmovenumber 9
info currmove g2g3 currmovenumber 10
info currmove h2h3 currmovenumber 11
info currmove g2g4 currmovenumber 12
info currmove b2b4 currmovenumber 13
info currmove h2h4 currmovenumber 14
info currmove b2b3 currmovenumber 15
info currmove a2a4 currmovenumber 16
info currmove c2c3 currmovenumber 17
info currmove g1f3 currmovenumber 18
info currmove f2f4 currmovenumber 19
info currmove f2f3 currmovenumber 20
info depth 15
info currmove e2e4 currmovenumber 1
info nodes 2220000 nps 724306 time 3064
info nodes 3210000 nps 786186 time 4083
info depth 15 score cp 25 time 4300 nodes 3423806 nps 796233 pv e2e4 b8c6 g1f3 g8f6 b1c3 e7e6 f1b5 f8b4 d2d3 e8g8 e1g1 d7d5 b5c6 b7c6 f3e5 c8b7
info currmove b1c3 currmovenumber 2
info currmove d2d4 currmovenumber 3
info currmove g1h3 currmovenumber 4
info currmove e2e3 currmovenumber 5
info currmove b2b3 currmovenumber 6
info currmove b1a3 currmovenumber 7
info currmove a2a4 currmovenumber 8
info currmove d2d3 currmovenumber 9
info currmove h2h3 currmovenumber 10
info currmove a2a3 currmovenumber 11
info currmove g2g4 currmovenumber 12
info currmove b2b4 currmovenumber 13
info currmove c2c4 currmovenumber 14
info currmove g2g3 currmovenumber 15
info currmove h2h4 currmovenumber 16
info currmove c2c3 currmovenumber 17
info currmove f2f4 currmovenumber 18
info currmove g1f3 currmovenumber 19
info currmove f2f3 currmovenumber 20
bestmove e2e4 ponder b8c6

Gerd Isenberg · Post by **Gerd Isenberg** » Mon May 07, 2007 10:18 pm

Tord Romstad wrote: Yes, it is. I made yet another stupid mistake and copied the above code after I had fixed the bug. I had "typedef unsigned long uint32", and a "long", as you certainly know, is a 64-bit int on some systems (not on mine, though). I hope it makes sense now.

I apologize for the confusion.

Tord

He he he!
At least Bob got confused as well.

LoopList · Post by **LoopList** » Mon May 07, 2007 10:20 pm

Hi Tord!

The following code seems to produce a bug:

Code: Select all

void UCIInputParser&#58;&#58;skip_whitespace&#40;) &#123;
    while&#40;isspace&#40;this->inputLine&#91;this->currentIndex&#93;))
      this->currentIndex++;
  &#125;

this->inputLine is a Bad Pointer. You should use a Bounds Checker Tool in order to fix all bugs!

Gerd Isenberg · Post by **Gerd Isenberg** » Mon May 07, 2007 10:57 pm

bob wrote:actually if you multiply two 32 bit ints, you get a 64 bit result. Always has been this way. But if you don't follow that with a divide, which also has a 64 bit dividend, then you lose the extra bits. This has been a hardware feature to prevent problems with things like

a = b * c / d;

if it was all done in 32 bits, you would need to know the actual values for a, b and c before producing the assembly language to execute the operations. the 64 bit multiply result (on 32 bit machines) nicely side-steps this issue. If it were not for this, you couldn't multiply two values where each might be more than 16 bits wide...

This is what vc2005 makes out of Tord's routine with 32*32-bit mul, shift right 26. It looses precision (as intended). The intermediate product is still 32-bit. I think this is specified by C and is not caused by compiler's ability to understand the semantic of the routine

After
imul eax, 783a9b23H
the higher 32-bit of rax are zero!

Why the compiler uses the 64-bit
shr rax, 26
instead of the shorter
shr eax, 26
is another question...

In assembly only one operand mul/imul have double sized results on x86 or x64 with implicit destination edx:eax or rdx:rax.

Code: Select all

static const int BitTable&#91;64&#93; = &#123;
  63, 30, 3, 32, 25, 41, 22, 33, 15, 50, 42, 13, 11, 53, 19, 34, 61, 29, 2,
  51, 21, 43, 45, 10, 18, 47, 1, 54, 9, 57, 0, 35, 62, 31, 40, 4, 49, 5, 52,
  26, 60, 6, 23, 44, 46, 27, 56, 16, 7, 39, 48, 24, 59, 14, 12, 55, 38, 28,
  58, 20, 37, 17, 36, 8
&#125;;

Square pop_1st_bit&#40;Bitboard *b&#41; &#123;
  Bitboard bb = *b ^ (*b - 1&#41;;
  uint32 fold = int&#40;bb&#41; ^ int&#40;bb >> 32&#41;;
  *b &= (*b - 1&#41;;
  return Square&#40;BitTable&#91;&#40;fold * 0x783a9b23&#41; >> 26&#93;);
&#125;


b$ = 8
?pop_1st_bit@@YAHPEA_K@Z PROC
  00000	48 8b 01	 mov	 rax, QWORD PTR &#91;rcx&#93;
  00003	48 8d 50 ff	 lea	 rdx, QWORD PTR &#91;rax-1&#93;
  00007	4c 8b c2	 mov	 r8, rdx
  0000a	48 23 d0	 and	 rdx, rax
  0000d	4c 33 c0	 xor	 r8, rax
  00010	48 89 11	 mov	 QWORD PTR &#91;rcx&#93;, rdx
  00013	48 8d 0d 00 00
	00 00		 lea	 rcx, OFFSET FLAT&#58;BitTable
  0001a	49 8b c0	 mov	 rax, r8
  0001d	48 c1 e8 20	 shr	 rax, 32
  00021	41 33 c0	 xor	 eax, r8d
  00024	69 c0 23 9b 3a
	78		 imul	 eax, 783a9b23H
  0002a	48 c1 e8 1a	 shr	 rax, 26
  0002e	8b 04 81	 mov	 eax, DWORD PTR &#91;rcx+rax*4&#93;
  00031	c3		 ret	 0
?pop_1st_bit@@YAHPEA_K@Z ENDP

Gerd Isenberg · Post by **Gerd Isenberg** » Mon May 07, 2007 11:14 pm

even if i use explicite division, the assembly is the same

Code: Select all

  return Square&#40;BitTable&#91;fold * 0x783a9b23 / 0x04000000&#93;);

hgm · Post by **hgm** » Tue May 08, 2007 12:22 pm

Indeed, I don't think Bob is right. He is right about that hardware allows you to do obtain double-length products and use double-length dividends, but the C standard forbids the compiler to make use of this hardware feature. int32*int32 should give you int32 with the high-order 32-bits clipped. If you loses high-order bits because of that, it is called integer overflow.

There is no C operator that directly uses the hardware feature of double-precision integer multiply/divide, just like there is no operator to use rotate or rotate-through-carry instructions. (The latter might be supplied as intrinsics.)

For multiplication you could use the expression

Code: Select all

int32 a, b;

(&#40;int 64&#41; a&#41; * (&#40;int64&#41; b&#41;;

to force a 64-bit result, in the hope that the optimizer recognizes that it is not really needed to extend the precision of the factors before multiplication, but that it could use the 32x32->64 instruction. For division, you could inform the compiler tat you are not interested in the high-order 32-bit of the quotient by casting it to 32-bit:

Code: Select all

int64 c; int32 d;

&#40;int32&#41; &#40;c /d&#41;;

So to get the effect that Bob wants one should write

Code: Select all

a = &#40;int32&#41; ( (&#40;int64&#41; a&#41; * (&#40;int64&#41; b&#41; / d&#41;;

and hope for a smart optimizer. In 64-bit mode you could double all the lengths.

bob · Post by **bob** » Tue May 08, 2007 7:50 pm

hgm wrote:Indeed, I don't think Bob is right. He is right about that hardware allows you to do obtain double-length products and use double-length dividends, but the C standard forbids the compiler to make use of this hardware feature. int32*int32 should give you int32 with the high-order 32-bits clipped. If you loses high-order bits because of that, it is called integer overflow.

There is no C operator that directly uses the hardware feature of double-precision integer multiply/divide, just like there is no operator to use rotate or rotate-through-carry instructions. (The latter might be supplied as intrinsics.)

For multiplication you could use the expression
Code: Select all
int32 a, b;

(&#40;int 64&#41; a&#41; * (&#40;int64&#41; b&#41;;
to force a 64-bit result, in the hope that the optimizer recognizes that it is not really needed to extend the precision of the factors before multiplication, but that it could use the 32x32->64 instruction. For division, you could inform the compiler tat you are not interested in the high-order 32-bit of the quotient by casting it to 32-bit:
Code: Select all
int64 c; int32 d;

&#40;int32&#41; &#40;c /d&#41;;
So to get the effect that Bob wants one should write
Code: Select all
a = &#40;int32&#41; ( (&#40;int64&#41; a&#41; * (&#40;int64&#41; b&#41; / d&#41;;
and hope for a smart optimizer. In 64-bit mode you could double all the lengths.

Actually, what I was quoting came from an old FORTRAN test. Where the compiler was smart enough to do the 2*32 bit multiply, and follow it up with the 64 bit divide, and then check for overflow after the divide. code looked like this (assuming a = b * c / d)

Code: Select all

         mov   eax, b
         imul   c
         idiv    d
         jo       overflow
         mov   a, eax

That avoids the ugliness in C where you need to know the size of the values first, before you decide to do the divide before the multiply or after...

Mysterious segfault

Helpful assertions

Re: Mysterious segfault

Re: Mysterious segfault

Re: Mysterious segfault

Re: Mysterious segfault

Re: Mysterious segfault

Re: Mysterious segfault

Re: Mysterious segfault

Re: Mysterious segfault