Kogge Stone, Vector Based

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

smatovic
Posts: 2645
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Kogge Stone, Vector Based

Post by smatovic »

Maybe it is of interest for someone,
the 8 directions for Kogge-Stone are done via an ulong8 vector datatype,
on my gpu it performs as fast as magic bitboards...

Code: Select all


        // generate pseudo legal moves

        while( bbWork )

            bbTemp = 0;
            bbMoves = 0;

            // pop 1st bit
            pos     = pop_1st_bit(bbWork);

            piece   = getPiece(board, pos);


            // all 8 directions in parrallel via vector ulong8
            pro8 = ~bbBlockers;
            gen8 = SetMaskBB[pos];
            r8 = shift8[piece];
            pro8 &= avoidWrap8[piece];

            // do kogge stone
            gen8 |= pro8 & (&#40;gen8 << r8&#41; | &#40;gen8 >> &#40;64-r8&#41;));
            pro8 &=       (&#40;pro8 << r8&#41; | &#40;pro8 >> &#40;64-r8&#41;));
            gen8 |= pro8 & (&#40;gen8 << 2*r8&#41; | &#40;gen8 >> &#40;64-2*r8&#41;));
            pro8 &=       (&#40;pro8 << 2*r8&#41; | &#40;pro8 >> &#40;64-2*r8&#41;));
            gen8 |= pro8 & (&#40;gen8 << 4*r8&#41; | &#40;gen8 >> &#40;64-4*r8&#41;));

            // Shift one for Captures
            gen8 = (&#40;gen8 << r8&#41; | &#40;gen8 >> &#40;64-r8&#41;)) & avoidWrap8&#91;piece&#93;;
            bbTemp = gen8.s0 | gen8.s1 | gen8.s2 | gen8.s3 | gen8.s4 | gen8.s5 | gen8.s6 | gen8.s7;     

            // Verify
            bbTemp &= ( piece == PAWN&#41; ? &#40;PawnAttackTables&#91;som*64+pos&#93; & bbOpposite&#41;  | &#40;PawnAttackTables&#91;&#40;som+2&#41;*64+pos&#93; & ~bbBlockers&#41;  &#58; AttackTables&#91;som*6*64+piece*64+pos&#93; ;

            // Pawn double square
            if ( piece == PAWN && ( ( som == WHITE && &#40;pos>>3&#41; == 1&#41; || &#40;som == BLACK && &#40;pos>>3&#41; == 6 ) ) ) &#123;
                to  = &#40;som == BLACK&#41;? pos-8  &#58; pos+8;
                cpt = &#40;som == BLACK&#41;? pos-16 &#58; pos+16;
                if (   (~bbBlockers & SetMaskBB&#91;to&#93;) && (~bbBlockers & SetMaskBB&#91;cpt&#93;) )
                    bbTemp |= SetMaskBB&#91;cpt&#93;;
            &#125;

            // Captures
            bbMoves  = &#40;bbTemp & bbOpposite&#41;;
            // Non cpatures
            bbMoves |= &#40;qs == 0&#41;? bbTemp & ~bbBlockers &#58; 0 ;


full code:
https://github.com/smatovic/Zeta/tree/z ... koggestone


--
Srdja
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Kogge Stone, Vector Based

Post by Gerd Isenberg »

Wow!

You process a vector of 8 bitboards with individial shifts per SIMD instruction, i.e. {9, 1, -7, -8, -9, -1, 7, 8} for a queen - positive amounts shifting left, negative shifting right, but you still need the generalized Kogge-Stone with rotate, since shifting left by negative amount is not shifting right, but shifing left with amount and 63 (mod 64).

Do you have any links to the concrete SIMD instruction set?

Gerd
smatovic
Posts: 2645
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Kogge Stone, Vector Based

Post by smatovic »

the idea for the generalized shift is taken from the chessprogramming wiki,
Do you have any links to the concrete SIMD instruction set?
For Nvidia there is the PTX instruction set:

www.nvidia.com/content/CUDA-ptx_isa_1.4.pdf

for AMD CAL

http://coachk.cs.ucf.edu/courses/CDA693 ... amming.pdf

--
Srdja
smatovic
Posts: 2645
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Kogge Stone, Vector Based

Post by smatovic »

Just tested the idea with the AMD OpenCL compiler,
it doesn't work.

Code: Select all

Error&#58; Building Program &#40;clBuildProgram&#41;
, status&#58;-11buildlog&#58; "/tmp/OCLfv1xCw.cl", line 1357&#58; error&#58; vector operation requires identical
          element types
              gen8 |= pro8 & (&#40;gen8 << r8&#41; | &#40;gen8 >> &#40;64-r8&#41;));
                                    ^

--
Srdja
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Kogge Stone, Vector Based

Post by Gerd Isenberg »

smatovic wrote: For Nvidia there is the PTX instruction set:

www.nvidia.com/content/CUDA-ptx_isa_1.4.pdf
Thanks. Pity - PTX has no rotate instruction. It is called SIMT (single-instruction, multiple-thread) rather than SIMD. So let say a shl.b64 a, b instruction is executed in 8 treads, each operating on corresponding elements of the vectors?
smatovic
Posts: 2645
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Kogge Stone, Vector Based

Post by smatovic »

So let say a shl.b64 a, b instruction is executed in 8 treads, each operating on corresponding elements of the vectors?
Afaik, yes.

On Nvidia Fermi devices 32 threads are executed in SIMT fashion in one "Warp" over 4 clock cycles.

--
Srdja
smatovic
Posts: 2645
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Kogge Stone, Vector Based, 2*4 directions, ulong4

Post by smatovic »

another solution is to split left and right shifts in 2*4 directions via ulong4.....


Code: Select all

__constant ulong4 shift4&#91;&#93; =

&#123;
&#123;     0, 0, 0, 0 &#125;,                      // EMTPY
&#123;     9, 0, 7, 8 &#125;,                     // PAWN
&#123;    17,10, 6,15 &#125;,                    // KNIGTH
&#123;     9, 1, 7, 8 &#125;,                   // KING
&#123;     9, 0, 7, 0 &#125;,                  // BISHOP
&#123;     0, 1, 0, 8 &#125;,                 // ROOK
&#123;     9, 1, 7, 8 &#125;                 // QUEEN
&#125;;    

Code: Select all

     
            // 2*4 directions via ulong4, left and right shifting
            s4 = shift4&#91;&#40;piece>>1&#41;&#93;;

            // 1*4 directions, left shift, in parrallel via vector ulong4
            pro4 = ~bbBlockers;
            gen4 = SetMaskBB&#91;pos&#93;;

            // do kogge stone
            gen4 |= pro4 & &#40;gen4 << s4&#41;;
            pro4 &=        &#40;pro4 << s4&#41;;
            gen4 |= pro4 & &#40;gen4 << 2*s4&#41;;
            pro4 &=        &#40;pro4 << 2*s4&#41;;
            gen4 |= pro4 & &#40;gen4 << 4*s4&#41;;

            // Shift one further
            gen4 =  &#40;gen4 << s4&#41;;
            bbTemp = gen4.s0 | gen4.s1 | gen4.s2 | gen4.s3;     

            // 1*4 directions, right shift, in parrallel via vector ulong4
            pro4 = ~bbBlockers;
            gen4 = SetMaskBB&#91;pos&#93;;

            // do kogge stone
            gen4 |= pro4 & &#40;gen4 >> s4&#41;;
            pro4 &=        &#40;pro4 >> s4&#41;;
            gen4 |= pro4 & &#40;gen4 >> 2*s4&#41;;
            pro4 &=        &#40;pro4 >> 2*s4&#41;;
            gen4 |= pro4 & &#40;gen4 >> 4*s4&#41;;

            // Shift one further
            gen4 =  &#40;gen4 >> s4&#41;;
            bbTemp |= gen4.s0 | gen4.s1 | gen4.s2 | gen4.s3;     

            // Verify
            bbTemp &= ( &#40;piece>>1&#41; == PAWN&#41; ? &#40;PawnAttackTables&#91;som*64+pos&#93; & bbOpposite&#41;  | &#40;PawnAttackTables&#91;&#40;som+2&#41;*64+pos&#93; & ~bbBlockers&#41;  &#58; AttackTables&#91;&#40;som*7*64&#41;+(&#40;piece>>1&#41;*64&#41;+pos&#93; ;

            // Captures
            bbMoves  = &#40;bbTemp & bbOpposite&#41;;
            // Non cpatures
            bbMoves |= &#40;qs == 0&#41;? bbTemp & ~bbBlockers &#58; 0 ;


--
Srdja
smatovic
Posts: 2645
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Kogge Stone, Vector Based, 2*4 directions, ulong4

Post by smatovic »

argh, i forgot the wraps....but the example shows how the idea works...

--
Srdja
smatovic
Posts: 2645
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Kogge Stone, Vector Based, 2*4 directions...update

Post by smatovic »

just for the record...

the different shifts did not perform well on the gpu,
so i redesigned the move gen a bit,
the same code is run for all pieces,
except castle, en passant and pawn promotion moves.

OepnCL pseudo code:

Code: Select all

    // ####################################################################
    // ### generate all moves except castle, en passant and pawn promo  ### 
    // ####################################################################
  __private bool qs = false;
  __private Bitboard bbTemp;
  __private Bitboard bbWork;
  __private Bitboard bbMove;
  __private Bitboard bbBlockers;
  __private Bitboard bbMe;
  __private Bitboard bbOpp;
  __private ulong4 bbWrap4;
  __private ulong4 bbPro4;
  __private ulong4 bbGen4;
...

for each $piece on $sq do

    bbWork  = SETMASKBB&#40;sq&#41;;
    // get koggestone wraps
    bbWrap4 = wraps4&#91;0&#93;;
    // generator and propagator &#40;piece and empty squares&#41;
    bbGen4  = &#40;ulong4&#41;bbWork;
    bbPro4  = &#40;ulong4&#41;(~bbBlockers&#41;;
    // kogge stone shift left via ulong4 vector
    bbPro4 &= bbWrap4;
    bbGen4 |= bbPro4  & &#40;bbGen4 << shift4&#41;;
    bbPro4 &=           &#40;bbGen4 << shift4&#41;;
    bbGen4 |= bbPro4  & &#40;bbGen4 << 2*shift4&#41;;
    bbPro4 &=           &#40;bbGen4 << 2*shift4&#41;;
    bbGen4 |= bbPro4  & &#40;bbGen4 << 4*shift4&#41;;
    bbGen4  = bbWrap4 & &#40;bbGen4 << shift4&#41;;
    bbTemp |= bbGen4.s0|bbGen4.s1|bbGen4.s2|bbGen4.s3;
    // get koggestone wraps
    bbWrap4 = wraps4&#91;1&#93;;
    // set generator and propagator &#40;piece and empty squares&#41;
    bbGen4  = &#40;ulong4&#41;bbWork;
    bbPro4  = &#40;ulong4&#41;(~bbBlockers&#41;;
    // kogge stone shift right via ulong4 vector
    bbPro4 &= bbWrap4;
    bbGen4 |= bbPro4  & &#40;bbGen4 >> shift4&#41;;
    bbPro4 &=           &#40;bbGen4 >> shift4&#41;;
    bbGen4 |= bbPro4  & &#40;bbGen4 >> 2*shift4&#41;;
    bbPro4 &=           &#40;bbGen4 >> 2*shift4&#41;;
    bbGen4 |= bbPro4  & &#40;bbGen4 >> 4*shift4&#41;;
    bbGen4  = bbWrap4 & &#40;bbGen4 >> shift4&#41;;
    bbTemp |= bbGen4.s0|bbGen4.s1|bbGen4.s2|bbGen4.s3;
    // consider knights
    bbTemp  = &#40;piece==KNIGHT&#41;?BBFULL&#58;bbTemp;
    // verify captures
    n       = &#40;piece==PAWN&#41;?stm&#58;piece;
    bbWork  = AttackTables&#91;n*64+sq&#93;;
    bbMove  = bbWork&bbTemp&bbOpp;
    // verify non captures
    bbWork  = &#40;piece==PAWN&#41;?&#40;AttackTablesPawnPushes&#91;stm*64+sq&#93;)&#58;bbWork;
    bbMove |= &#40;qs&#41;?BBEMPTY&#58;&#40;bbWork&bbTemp&~bbBlockers&#41;;

end

here the needed consta&#324;t tables

Code: Select all

// move generator costants
// wraps for kogge stone shifts
__constant ulong4 shift4 = &#40;ulong4&#41;( 9, 1, 7, 8&#41;;

__constant ulong4 wraps4&#91;2&#93; =
&#123;
  // wraps shift left
  &#40;ulong4&#41;(
            0xfefefefefefefe00, // <<9
            0xfefefefefefefefe, // <<1 */
            0x7f7f7f7f7f7f7f00, // <<7 */
            0xffffffffffffff00  // <<8 */
          ),

  // wraps shift right
  &#40;ulong4&#41;(
            0x007f7f7f7f7f7f7f, // >>9
            0x7f7f7f7f7f7f7f7f, // >>1 
            0x00fefefefefefefe, // >>7
            0x00ffffffffffffff  // >>8
          )
&#125;;
__constant Bitboard AttackTablesPawnPushes&#91;2*64&#93; = 
&#123;
  /* white pawn pushes */
  0x100,0x200,0x400,0x800,0x1000,0x2000,0x4000,0x8000,0x1010000,0x2020000,0x4040000,0x8080000,0x10100000,0x20200000,0x40400000,0x80800000,0x1000000,0x2000000,0x4000000,0x8000000,0x10000000,0x20000000,0x40000000,0x80000000,0x100000000,0x200000000,0x400000000,0x800000000,0x1000000000,0x2000000000,0x4000000000,0x8000000000,0x10000000000,0x20000000000,0x40000000000,0x80000000000,0x100000000000,0x200000000000,0x400000000000,0x800000000000,0x1000000000000,0x2000000000000,0x4000000000000,0x8000000000000,0x10000000000000,0x20000000000000,0x40000000000000,0x80000000000000,0x100000000000000,0x200000000000000,0x400000000000000,0x800000000000000,0x1000000000000000,0x2000000000000000,0x4000000000000000,0x8000000000000000,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,
  /* black pawn pushes */
  0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x1,0x2,0x4,0x8,0x10,0x20,0x40,0x80,0x100,0x200,0x400,0x800,0x1000,0x2000,0x4000,0x8000,0x10000,0x20000,0x40000,0x80000,0x100000,0x200000,0x400000,0x800000,0x1000000,0x2000000,0x4000000,0x8000000,0x10000000,0x20000000,0x40000000,0x80000000,0x100000000,0x200000000,0x400000000,0x800000000,0x1000000000,0x2000000000,0x4000000000,0x8000000000,0x10100000000,0x20200000000,0x40400000000,0x80800000000,0x101000000000,0x202000000000,0x404000000000,0x808000000000,0x1000000000000,0x2000000000000,0x4000000000000,0x8000000000000,0x10000000000000,0x20000000000000,0x40000000000000,0x80000000000000
&#125;;
/* piece attack tables */
__constant Bitboard AttackTables&#91;7*64&#93; = 
&#123;
  /* white pawn */
  0x200,0x500,0xa00,0x1400,0x2800,0x5000,0xa000,0x4000,0x20000,0x50000,0xa0000,0x140000,0x280000,0x500000,0xa00000,0x400000,0x2000000,0x5000000,0xa000000,0x14000000,0x28000000,0x50000000,0xa0000000,0x40000000,0x200000000,0x500000000,0xa00000000,0x1400000000,0x2800000000,0x5000000000,0xa000000000,0x4000000000,0x20000000000,0x50000000000,0xa0000000000,0x140000000000,0x280000000000,0x500000000000,0xa00000000000,0x400000000000,0x2000000000000,0x5000000000000,0xa000000000000,0x14000000000000,0x28000000000000,0x50000000000000,0xa0000000000000,0x40000000000000,0x200000000000000,0x500000000000000,0xa00000000000000,0x1400000000000000,0x2800000000000000,0x5000000000000000,0xa000000000000000,0x4000000000000000,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,
  /* black pawn */
  0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x2,0x5,0xa,0x14,0x28,0x50,0xa0,0x40,0x200,0x500,0xa00,0x1400,0x2800,0x5000,0xa000,0x4000,0x20000,0x50000,0xa0000,0x140000,0x280000,0x500000,0xa00000,0x400000,0x2000000,0x5000000,0xa000000,0x14000000,0x28000000,0x50000000,0xa0000000,0x40000000,0x200000000,0x500000000,0xa00000000,0x1400000000,0x2800000000,0x5000000000,0xa000000000,0x4000000000,0x20000000000,0x50000000000,0xa0000000000,0x140000000000,0x280000000000,0x500000000000,0xa00000000000,0x400000000000,0x2000000000000,0x5000000000000,0xa000000000000,0x14000000000000,0x28000000000000,0x50000000000000,0xa0000000000000,0x40000000000000,
  /* knight */
  0x20400,0x50800,0xa1100,0x142200,0x284400,0x508800,0xa01000,0x402000,0x2040004,0x5080008,0xa110011,0x14220022,0x28440044,0x50880088,0xa0100010,0x40200020,0x204000402,0x508000805,0xa1100110a,0x1422002214,0x2844004428,0x5088008850,0xa0100010a0,0x4020002040,0x20400040200,0x50800080500,0xa1100110a00,0x142200221400,0x284400442800,0x508800885000,0xa0100010a000,0x402000204000,0x2040004020000,0x5080008050000,0xa1100110a0000,0x14220022140000,0x28440044280000,0x50880088500000,0xa0100010a00000,0x40200020400000,0x204000402000000,0x508000805000000,0xa1100110a000000,0x1422002214000000,0x2844004428000000,0x5088008850000000,0xa0100010a0000000,0x4020002040000000,0x400040200000000,0x800080500000000,0x1100110a00000000,0x2200221400000000,0x4400442800000000,0x8800885000000000,0x100010a000000000,0x2000204000000000,0x4020000000000,0x8050000000000,0x110a0000000000,0x22140000000000,0x44280000000000,0x88500000000000,0x10a00000000000,0x20400000000000,
  /* king */
  0x302,0x705,0xe0a,0x1c14,0x3828,0x7050,0xe0a0,0xc040,0x30203,0x70507,0xe0a0e,0x1c141c,0x382838,0x705070,0xe0a0e0,0xc040c0,0x3020300,0x7050700,0xe0a0e00,0x1c141c00,0x38283800,0x70507000,0xe0a0e000,0xc040c000,0x302030000,0x705070000,0xe0a0e0000,0x1c141c0000,0x3828380000,0x7050700000,0xe0a0e00000,0xc040c00000,0x30203000000,0x70507000000,0xe0a0e000000,0x1c141c000000,0x382838000000,0x705070000000,0xe0a0e0000000,0xc040c0000000,0x3020300000000,0x7050700000000,0xe0a0e00000000,0x1c141c00000000,0x38283800000000,0x70507000000000,0xe0a0e000000000,0xc040c000000000,0x302030000000000,0x705070000000000,0xe0a0e0000000000,0x1c141c0000000000,0x3828380000000000,0x7050700000000000,0xe0a0e00000000000,0xc040c00000000000,0x203000000000000,0x507000000000000,0xa0e000000000000,0x141c000000000000,0x2838000000000000,0x5070000000000000,0xa0e0000000000000,0x40c0000000000000,
  /* bishop */
  0x8040201008040200,0x80402010080500,0x804020110a00,0x8041221400,0x182442800,0x10204885000,0x102040810a000,0x102040810204000,0x4020100804020002,0x8040201008050005,0x804020110a000a,0x804122140014,0x18244280028,0x1020488500050,0x102040810a000a0,0x204081020400040,0x2010080402000204,0x4020100805000508,0x804020110a000a11,0x80412214001422,0x1824428002844,0x102048850005088,0x2040810a000a010,0x408102040004020,0x1008040200020408,0x2010080500050810,0x4020110a000a1120,0x8041221400142241,0x182442800284482,0x204885000508804,0x40810a000a01008,0x810204000402010,0x804020002040810,0x1008050005081020,0x20110a000a112040,0x4122140014224180,0x8244280028448201,0x488500050880402,0x810a000a0100804,0x1020400040201008,0x402000204081020,0x805000508102040,0x110a000a11204080,0x2214001422418000,0x4428002844820100,0x8850005088040201,0x10a000a010080402,0x2040004020100804,0x200020408102040,0x500050810204080,0xa000a1120408000,0x1400142241800000,0x2800284482010000,0x5000508804020100,0xa000a01008040201,0x4000402010080402,0x2040810204080,0x5081020408000,0xa112040800000,0x14224180000000,0x28448201000000,0x50880402010000,0xa0100804020100,0x40201008040201,
  /* rook */
  0x1010101010101fe,0x2020202020202fd,0x4040404040404fb,0x8080808080808f7,0x10101010101010ef,0x20202020202020df,0x40404040404040bf,0x808080808080807f,0x10101010101fe01,0x20202020202fd02,0x40404040404fb04,0x80808080808f708,0x101010101010ef10,0x202020202020df20,0x404040404040bf40,0x8080808080807f80,0x101010101fe0101,0x202020202fd0202,0x404040404fb0404,0x808080808f70808,0x1010101010ef1010,0x2020202020df2020,0x4040404040bf4040,0x80808080807f8080,0x1010101fe010101,0x2020202fd020202,0x4040404fb040404,0x8080808f7080808,0x10101010ef101010,0x20202020df202020,0x40404040bf404040,0x808080807f808080,0x10101fe01010101,0x20202fd02020202,0x40404fb04040404,0x80808f708080808,0x101010ef10101010,0x202020df20202020,0x404040bf40404040,0x8080807f80808080,0x101fe0101010101,0x202fd0202020202,0x404fb0404040404,0x808f70808080808,0x1010ef1010101010,0x2020df2020202020,0x4040bf4040404040,0x80807f8080808080,0x1fe010101010101,0x2fd020202020202,0x4fb040404040404,0x8f7080808080808,0x10ef101010101010,0x20df202020202020,0x40bf404040404040,0x807f808080808080,0xfe01010101010101,0xfd02020202020202,0xfb04040404040404,0xf708080808080808,0xef10101010101010,0xdf20202020202020,0xbf40404040404040,0x7f80808080808080,
  /* queen */
  0x81412111090503fe,0x2824222120a07fd,0x404844424150efb,0x8080888492a1cf7,0x10101011925438ef,0x2020212224a870df,0x404142444850e0bf,0x8182848890a0c07f,0x412111090503fe03,0x824222120a07fd07,0x4844424150efb0e,0x80888492a1cf71c,0x101011925438ef38,0x20212224a870df70,0x4142444850e0bfe0,0x82848890a0c07fc0,0x2111090503fe0305,0x4222120a07fd070a,0x844424150efb0e15,0x888492a1cf71c2a,0x1011925438ef3854,0x212224a870df70a8,0x42444850e0bfe050,0x848890a0c07fc0a0,0x11090503fe030509,0x22120a07fd070a12,0x4424150efb0e1524,0x88492a1cf71c2a49,0x11925438ef385492,0x2224a870df70a824,0x444850e0bfe05048,0x8890a0c07fc0a090,0x90503fe03050911,0x120a07fd070a1222,0x24150efb0e152444,0x492a1cf71c2a4988,0x925438ef38549211,0x24a870df70a82422,0x4850e0bfe0504844,0x90a0c07fc0a09088,0x503fe0305091121,0xa07fd070a122242,0x150efb0e15244484,0x2a1cf71c2a498808,0x5438ef3854921110,0xa870df70a8242221,0x50e0bfe050484442,0xa0c07fc0a0908884,0x3fe030509112141,0x7fd070a12224282,0xefb0e1524448404,0x1cf71c2a49880808,0x38ef385492111010,0x70df70a824222120,0xe0bfe05048444241,0xc07fc0a090888482,0xfe03050911214181,0xfd070a1222428202,0xfb0e152444840404,0xf71c2a4988080808,0xef38549211101010,0xdf70a82422212020,0xbfe0504844424140,0x7fc0a09088848281,
&#125;;
I implemented it also on cpu, but it is not competitive to magic bitboards...at least it should be SIMD friendly now...

--
Srdja