BitBoard Tests Magic v Non-Rotated 32 Bits v 64 Bits

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Harald
Posts: 317
Joined: Thu Mar 09, 2006 1:07 am

Re: BitBoard Tests (Sherwin Union)

Post by Harald »

bb_attacks_sherwin_union:

Code: Select all

/*******************************************************************/
/*
The bitboard attacks of Michael Sherwin with union 
as part of the board structure.

(c) 2006 Harald Lüßen
*/
/*******************************************************************/


#include "bb_ifdef.h"
#include "bb_basics.h"
#include "bb_bitboard.h"
#include "bb_board.h"
#include "bb_main.h"


#if USE_SHERWIN_UNION_BITBOARDS()

/*******************************************************************/

/*
    directions and shifts
    +-----+-----+-----+
    |<<= 9|<<= 8|<<= 7|
    +-----+-----+-----+
    |<<= 1|     |>>= 1|
    +-----+-----+-----+
    |>>= 7|>>= 8|>>= 9|
    +-----+-----+-----+

    We use this mapping of the normal board squares to bitboard bits
    +-------------------------+
    | 63 62 61 60 59 58 57 56 | 8
    | 55 54 53 52 51 50 49 48 | 7
    | 47 46 45 44 43 42 41 40 | 6
    | 39 38 37 36 35 35 33 32 | 5
    | 31 30 29 28 27 26 25 24 | 4
    | 23 22 21 20 19 18 17 16 | 3
    | 15 14 13 12 11 10  9  8 | 2
    |  7  6  5  4  3  2  1  0 | 1
    +-------------------------+
       a  b  c  d  e  f  g  h
*/

/*******************************************************************/

// Make sure this is done right.
#define BIG_ENDIAN&#40;)    &#40;0&#41;
#define LITTLE_ENDIAN&#40;) &#40;1&#41;

#if BIG_ENDIAN&#40;) && !LITTLE_ENDIAN&#40;)
// Big end first means row 8 is first byte with my square encoding.
struct BBBytes
&#123;
    unsigned char row8;
    unsigned char row7;
    unsigned char row6;
    unsigned char row5;
    unsigned char row4;
    unsigned char row3;
    unsigned char row2;
    unsigned char row1;
&#125;;
#elif LITTLE_ENDIAN&#40;) && !BIG_ENDIAN&#40;)
// Little end first means row 1 is first byte with my square encoding.
struct BBBytes
&#123;
    unsigned char row1;
    unsigned char row2;
    unsigned char row3;
    unsigned char row4;
    unsigned char row5;
    unsigned char row6;
    unsigned char row7;
    unsigned char row8;
&#125;;
#else
    #error big little endian
#endif


union BBUnion
&#123;
    Bits64   bb;
    BBBytes  bbb;
&#125;;


/*******************************************************************/

// Bishops

/*
The typical 1-bits of a bishop on d4 are shown below, 
with - indicating a don't care bit&#58;
 . . . . . . . -
 - . . . . . 1 .  giving 1 bit  or 2 different values from row 7
 . 1 . . . 1 . .  giving 2 bits or 4 different values from row 6
 . . 1 . 1 . . .  giving 2 bits or 4 different values from row 5
 . . . B . . . .  giving 0 bits or 0 different values from row 4
 . . 1 . 1 . . .  giving 2 bits or 4 different values from row 3
 . 1 . . . 1 . .  giving 2 bits or 4 different values from row 2
 - . . . . . - .

The values are just a mapping of the relevant bits to a compact value. 
For bits named a to i the value for Bd4 will be&#58;
 . . . . . . . -
 - . . . . . i .  giving 1 bit  or 2 different values from row 7
 . g . . . h . .  giving 2 bits or 4 different values from row 6
 . . e . f . . .  giving 2 bits or 4 different values from row 5
 . . . B . . . .  giving 0 bits or 0 different values from row 4
 . . c . d . . .  giving 2 bits or 4 different values from row 3
 . a . . . b . .  giving 2 bits or 4 different values from row 2
 - . . . . . - .

-> 0x0000000ihgfedcba as index. With 2^9 = 512 index values. 

There are 4 squares with 9 bits like Bd4. Other squares need another amount of bits.

 6 5 5 5 5 5 5 6
 5 5 5 5 5 5 5 5
 5 5 7 7 7 7 5 5
 5 5 7 9 9 7 5 5
 5 5 7 9 9 7 5 5
 5 5 7 7 7 7 5 5
 5 5 5 5 5 5 5 5
 6 5 5 5 5 5 5 6

 4 squares with 9 bits for 512 indices are 2048 indices.
12 squares with 7 bits for 128 indices are 1536 indices.
44 squares with 5 bits for  32 indices are 1408 indices.
 4 squares with 6 bits for  64 indices are  256 indices.
                                     Total 5248 indices.
*/

// The big X mask for all moves from sq, but only inner 6x6 area
Bitboard bishopBits&#91;64&#93;;

void initBishopBits&#40;)
&#123;
    int sq;
    for ( sq = 0; sq < 64; ++sq )
    &#123;
        bishopBits&#91;sq&#93; = 0;
        int i;
        for ( i = sq - 9; i >= 0 && i % 8 != 7; i -= 9 )
        &#123;
            bishopBits&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        for ( i = sq - 7; i >= 0 && i % 8 != 0; i -= 7 )
        &#123;
            bishopBits&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        for ( i = sq + 9; i < 64 && i % 8 != 0; i += 9 )
        &#123;
            bishopBits&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        for ( i = sq + 7; i < 64 && i % 8 != 7; i += 7 )
        &#123;
            bishopBits&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        bishopBits&#91;sq&#93; &= C64&#40;0x007e7e7e7e7e7e00&#41;;
        //logf << "bishopBits " << sq << endl;
        //logf << bishopBits&#91;sq&#93;.txt8lines&#40;) << endl;
    &#125;
&#125;


// Number of needed bits per square for bishops
const byte squareBitsB&#91;64&#93; = 
&#123;
  6, 5, 5, 5, 5, 5, 5, 6,
  5, 5, 5, 5, 5, 5, 5, 5,
  5, 5, 7, 7, 7, 7, 5, 5,
  5, 5, 7, 9, 9, 7, 5, 5,
  5, 5, 7, 9, 9, 7, 5, 5,
  5, 5, 7, 7, 7, 7, 5, 5,
  5, 5, 5, 5, 5, 5, 5, 5,
  6, 5, 5, 5, 5, 5, 5, 6,
&#125;;


/*
The big Table with index-bits for all rows.
The indices are in this order&#58;
0000999999999 4 times 9 bit
0001999999999
0010999999999
0011999999999
0100007777777 12 times 7 bit
01....7777777
0110117777777
0111000666666 4 times 6 bit
0111001666666
0111010666666
0111011666666
0111100055555 44 times 5 bit
........55555
1010001155555
bishopRows&#91;sq&#93;&#91;row2to7&#93;&#91;bitsOnRow&#93;
*/
short int bishopRows&#91;64&#93;&#91;6&#93;&#91;64&#93;;

void initBishopRows&#40;)
&#123;
    int baseIndex = 0;
    for ( int bits = 9; bits >= 5; --bits )
    &#123;
        for ( int sq = 0; sq < 64; ++sq )
        &#123;
            if ( squareBitsB&#91;sq&#93; != bits )
                continue;
            Bitboard bb = bishopBits&#91;sq&#93;;
            bb >>= 9;
            int shift = 0;
            for ( int row = 0; row < 6; ++row )
            &#123;
                int p = &#40;bb >> &#40;row * 8&#41;) & 0x3f;
                for ( int pattern = 0; pattern < 64; ++pattern )
                &#123;
                    int index = 0;
                    int s = shift;
                    for ( int i = 0; i < 6; ++i )
                    &#123;
                        if ( p & &#40;1 << i&#41; )
                        &#123;
                            index |= ( &#40;pattern & &#40;1 << i&#41;) ? &#40;1 << s&#41; &#58; 0 );
                            s++;
                            if ( pattern == 63 )
                                shift++;
                        &#125;
                    &#125;
                    bishopRows&#91;sq&#93;&#91;row&#93;&#91;pattern&#93; = baseIndex + index;
                    //logf << "bishopRows " << sq << " " << row << " " << pattern << " &#58; ";
                    //logf << bishopRows&#91;sq&#93;&#91;row&#93;&#91;pattern&#93; << endl;
                &#125;
            &#125;
            baseIndex += &#40;1 << bits&#41;;
        &#125;
    &#125;
&#125;


// From index to attack bits
Bitboard bishopAttackTable&#91;5248&#93;;

void initBishopAttacks&#40;)
&#123;
    int baseIndex = 0;
    for ( int bits = 9; bits >= 5; --bits )
    &#123;
        for ( int sq = 0; sq < 64; ++sq )
        &#123;
            if ( squareBitsB&#91;sq&#93; != bits )
                continue;
            Bitboard bb = bishopBits&#91;sq&#93;;
            for ( int index = 0; index < &#40;1 << bits&#41;; ++ index )
            &#123;
                Bitboard occ = 0;
                int i = index;
                for ( int rsq = 0; rsq < 64; ++rsq )
                &#123;
                    if ( bb.test_bit&#40; rsq ) )
                    &#123;
                        if ( i & 1 )
                            occ.set_bit&#40; rsq );
                        i >>= 1;
                    &#125;
                &#125;
                Bitboard att = 0;
                int j;
                for ( j = sq + 9; j < 64 && &#40;j & 7&#41; != 0; j += 9 )
                &#123;
                    att.set_bit&#40; j );
                    if ( occ.test_bit&#40; j ) )
                        break;
                &#125;
                for ( j = sq + 7; j < 64 && &#40;j & 7&#41; != 7; j += 7 )
                &#123;
                    att.set_bit&#40; j );
                    if ( occ.test_bit&#40; j ) )
                        break;
                &#125;
                for ( j = sq - 9; j >= 0 && &#40;j & 7&#41; != 7; j -= 9 )
                &#123;
                    att.set_bit&#40; j );
                    if ( occ.test_bit&#40; j ) )
                        break;
                &#125;
                for ( j = sq - 7; j >= 0 && &#40;j & 7&#41; != 0; j -= 7 )
                &#123;
                    att.set_bit&#40; j );
                    if ( occ.test_bit&#40; j ) )
                        break;
                &#125;
                bishopAttackTable&#91;baseIndex + index&#93; = att;
            &#125;
            baseIndex += &#40;1 << bits&#41;;
        &#125;
    &#125;
&#125;


/**
The bishop attack bitboard for a square and occupied bitboard
*/
Bitboard bishopAttacks&#40; int sq, Bitboard occ )
&#123;
    // The remaining blocking pieces in the X-rays
    occ  &= bishopBits&#91;sq&#93;; 
    occ >>= 1;
    BBUnion occu;
    occu.bb = occ;

    // Since every square has its set of row values the six row lookups 
    // simply map any blockers to specific bits that when ored together 
    // gives an offset in the bishop attack table.
    short int *bRows = &bishopRows&#91;sq&#93;&#91;0&#93;&#91;0&#93;;
    int index = &#40;bRows +   0&#41;&#91;occu.bbb.row2&#93;  // row 2
              | &#40;bRows +  64&#41;&#91;occu.bbb.row3&#93;  // row 3
              | &#40;bRows + 128&#41;&#91;occu.bbb.row4&#93;  // row 4
              | &#40;bRows + 192&#41;&#91;occu.bbb.row5&#93;  // row 5
              | &#40;bRows + 256&#41;&#91;occu.bbb.row6&#93;  // row 6
              | &#40;bRows + 320&#41;&#91;occu.bbb.row7&#93;; // row 7
    return bishopAttackTable&#91;index&#93;;
&#125;


/*******************************************************************/

// Rooks

/*
The required bit number per square for rooks looks like this&#58;

12 11 11 11 11 11 11 12    4 x 12 bit =  4 x 4096 entries = 16384
11 10 10 10 10 10 10 11   24 x 11 bit = 24 x 2048 entries = 49152
11 10 10 10 10 10 10 11   36 x 10 bit = 36 x 1024 entries = 36864
11 10 10 10 10 10 10 11                            total = 102400
11 10 10 10 10 10 10 11
11 10 10 10 10 10 10 11
11 10 10 10 10 10 10 11
12 11 11 11 11 11 11 12
*/

// The big + mask for all moves from sq
Bitboard rookBits&#91;64&#93;;

void initRookBits&#40;)
&#123;
    int sq;
    for ( sq = 0; sq < 64; ++sq )
    &#123;
        rookBits&#91;sq&#93; = 0;
        int i;
        for ( i = sq - 1; i >= 0 && i % 8 != 7; --i )
        &#123;
            rookBits&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        for ( i = sq - 8; i >= 0; i -= 8 )
        &#123;
            rookBits&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        for ( i = sq + 1; i < 64 && i % 8 != 0; ++i )
        &#123;
            rookBits&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        for ( i = sq + 8; i < 64; i += 8 )
        &#123;
            rookBits&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        if ( &#40;sq & 7&#41; != 7 )
            rookBits&#91;sq&#93; &= C64&#40;0x7f7f7f7f7f7f7f7f&#41;;
        if ( &#40;sq & 7&#41; != 0 )
            rookBits&#91;sq&#93; &= C64&#40;0xfefefefefefefefe&#41;;
        if ( &#40;sq / 8&#41; != 7 )
            rookBits&#91;sq&#93; &= C64&#40;0x00ffffffffffffff&#41;;
        if ( &#40;sq / 8&#41; != 0 )
            rookBits&#91;sq&#93; &= C64&#40;0xffffffffffffff00&#41;;
        //logf << "rookBits " << sq << endl;
        //logf << rookBits&#91;sq&#93;.txt8lines&#40;) << endl;
    &#125;
&#125;


// Number of needed bits per square for rooks
const byte squareBitsR&#91;64&#93; = 
&#123;
  12, 11, 11, 11, 11, 11, 11, 12,
  11, 10, 10, 10, 10, 10, 10, 11,
  11, 10, 10, 10, 10, 10, 10, 11,
  11, 10, 10, 10, 10, 10, 10, 11,
  11, 10, 10, 10, 10, 10, 10, 11,
  11, 10, 10, 10, 10, 10, 10, 11,
  11, 10, 10, 10, 10, 10, 10, 11,
  12, 11, 11, 11, 11, 11, 11, 12,
&#125;;


/*
The big Table with index-bits for all rows.
The indices are in this order&#58;
00000cccccccccccc 4 times 12 bit
00001cccccccccccc
00010cccccccccccc
00011cccccccccccc
001000bbbbbbbbbbb 24 times 11 bit
0.....bbbbbbbbbbb
100111bbbbbbbbbbb
1010000aaaaaaaaaa 36 times 10 bit
.......aaaaaaaaaa
1110011aaaaaaaaaa
This index is very big because its values &#40;0-102400&#41; do not fit in 2 byte short int.
rookRows&#91;sq&#93;&#91;row1to8&#93;&#91;bitsOnRow&#93;
*/
int rookRows&#91;64&#93;&#91;8&#93;&#91;256&#93;;

void initRookRows&#40;)
&#123;
    int baseIndex = 0;
    for ( int bits = 12; bits >= 10; --bits )
    &#123;
        for ( int sq = 0; sq < 64; ++sq )
        &#123;
            if ( squareBitsR&#91;sq&#93; != bits )
                continue;
            Bitboard bb = rookBits&#91;sq&#93;;
            int shift = 0;
            for ( int row = 0; row < 8; ++row )
            &#123;
                int p = &#40;bb >> &#40;row * 8&#41;) & 0xff;
                for ( int pattern = 0; pattern < 256; ++pattern )
                &#123;
                    int index = 0;
                    int s = shift;
                    for ( int i = 0; i < 8; ++i )
                    &#123;
                        if ( p & &#40;1 << i&#41; )
                        &#123;
                            index |= ( &#40;pattern & &#40;1 << i&#41;) ? &#40;1 << s&#41; &#58; 0 );
                            s++;
                            if ( pattern == 255 )
                                shift++;
                        &#125;
                    &#125;
                    rookRows&#91;sq&#93;&#91;row&#93;&#91;pattern&#93; = baseIndex + index;
                    //logf << "rookRows " << sq << " " << row << " " << pattern << " &#58; ";
                    //logf << rookRows&#91;sq&#93;&#91;row&#93;&#91;pattern&#93; << endl;
                &#125;
            &#125;
            baseIndex += &#40;1 << bits&#41;;
        &#125;
    &#125;
&#125;


// From index to attack bits
Bitboard rookAttackTable&#91;102400&#93;;

void initRookAttacks&#40;)
&#123;
    int baseIndex = 0;
    for ( int bits = 12; bits >= 10; --bits )
    &#123;
        for ( int sq = 0; sq < 64; ++sq )
        &#123;
            if ( squareBitsR&#91;sq&#93; != bits )
                continue;
            Bitboard bb = rookBits&#91;sq&#93;;
            for ( int index = 0; index < &#40;1 << bits&#41;; ++ index )
            &#123;
                Bitboard occ = 0;
                int i = index;
                for ( int rsq = 0; rsq < 64; ++rsq )
                &#123;
                    if ( bb.test_bit&#40; rsq ) )
                    &#123;
                        if ( i & 1 )
                            occ.set_bit&#40; rsq );
                        i >>= 1;
                    &#125;
                &#125;
                Bitboard att = 0;
                int j;
                for ( j = sq + 1; j < 64 && &#40;j & 7&#41; != 0; ++j )
                &#123;
                    att.set_bit&#40; j );
                    if ( occ.test_bit&#40; j ) )
                        break;
                &#125;
                for ( j = sq + 8; j < 64; j += 8 )
                &#123;
                    att.set_bit&#40; j );
                    if ( occ.test_bit&#40; j ) )
                        break;
                &#125;
                for ( j = sq - 1; j >= 0 && &#40;j & 7&#41; != 7; --j )
                &#123;
                    att.set_bit&#40; j );
                    if ( occ.test_bit&#40; j ) )
                        break;
                &#125;
                for ( j = sq - 8; j >= 0; j -= 8 )
                &#123;
                    att.set_bit&#40; j );
                    if ( occ.test_bit&#40; j ) )
                        break;
                &#125;
                rookAttackTable&#91;baseIndex + index&#93; = att;
            &#125;
            baseIndex += &#40;1 << bits&#41;;
        &#125;
    &#125;
&#125;


/**
The rook attack bitboard for a square and occupied bitboard
*/
Bitboard rookAttacks&#40; int sq, Bitboard occ )
&#123;
    // The remaining blocking pieces in the +-rays
    occ &= rookBits&#91;sq&#93;; 
    BBUnion occu;
    occu.bb = occ;

    // Since every square has its set of row values the six row lookups 
    // simply map any blockers to specific bits that when ored together 
    // gives an offset in the bishop attack table.
    int *rRows = &rookRows&#91;sq&#93;&#91;0&#93;&#91;0&#93;;
    int index = &#40;rRows +    0&#41;&#91;occu.bbb.row1&#93;  // row 1
              | &#40;rRows +  256&#41;&#91;occu.bbb.row2&#93;  // row 2
              | &#40;rRows +  512&#41;&#91;occu.bbb.row3&#93;  // row 3
              | &#40;rRows +  768&#41;&#91;occu.bbb.row4&#93;  // row 4
              | &#40;rRows + 1024&#41;&#91;occu.bbb.row5&#93;  // row 5
              | &#40;rRows + 1280&#41;&#91;occu.bbb.row6&#93;  // row 6
              | &#40;rRows + 1536&#41;&#91;occu.bbb.row7&#93;  // row 7
              | &#40;rRows + 1792&#41;&#91;occu.bbb.row8&#93;; // row 8
    return rookAttackTable&#91;index&#93;;
&#125;


/*******************************************************************/

const Bitboard dirMaskRight&#91;8&#93; =
&#123;
    // 0, line_h, line_gh, line_fh, line_eh, line_dh, line_ch, line_bh, 
    0, C64&#40;0x0101010101010101&#41;, C64&#40;0x0303030303030303&#41;, C64&#40;0x0707070707070707&#41;, C64&#40;0x0f0f0f0f0f0f0f0f&#41;, 
    C64&#40;0x1f1f1f1f1f1f1f1f&#41;, C64&#40;0x3f3f3f3f3f3f3f3f&#41;, C64&#40;0x7f7f7f7f7f7f7f7f&#41;
&#125;;

const Bitboard dirMaskLeft&#91;8&#93; =
&#123;
    // line_ag, line_af, line_ae, line_ad, line_ac, line_ab, line_a, 0, 
    C64&#40;0xfefefefefefefefe&#41;, C64&#40;0xfcfcfcfcfcfcfcfc&#41;, C64&#40;0xf8f8f8f8f8f8f8f8&#41;, C64&#40;0xf0f0f0f0f0f0f0f0&#41;, 
    C64&#40;0xe0e0e0e0e0e0e0e0&#41;, C64&#40;0xc0c0c0c0c0c0c0c0&#41;, C64&#40;0x8080808080808080&#41;, 0
&#125;;

const Bitboard dirMaskUp&#91;8&#93; =
&#123;
    // row_28, row_38, row_48, row_58, row_68, row_78, row_8, 0, 
    C64&#40;0xffffffffffffff00&#41;, C64&#40;0xffffffffffff0000&#41;, C64&#40;0xffffffffff000000&#41;, C64&#40;0xffffffff00000000&#41;, 
    C64&#40;0xffffff0000000000&#41;, C64&#40;0xffff000000000000&#41;, C64&#40;0xff00000000000000&#41;, 0
&#125;;

const Bitboard dirMaskDown&#91;8&#93; =
&#123;
    // 0, row_1, row_12, row_13, row_14, row_15, row_16, row_17,
    0, C64&#40;0x00000000000000ff&#41;, C64&#40;0x000000000000ffff&#41;, C64&#40;0x0000000000ffffff&#41;, C64&#40;0x00000000ffffffff&#41;, 
    C64&#40;0x000000ffffffffff&#41;, C64&#40;0x0000ffffffffffff&#41;, C64&#40;0x00ffffffffffffff&#41;
&#125;;


/*******************************************************************/
/*******************************************************************/

/**
Prepare the slider attacks back transformation table.
Put the scattered bits of an sliding attack pattern 
back to the original bitboard.
*/
void Board&#58;&#58;init_slider_attacks_index&#40;)
&#123;
    initBishopBits&#40;);
    initBishopRows&#40;);
    initBishopAttacks&#40;);
    initRookBits&#40;);
    initRookRows&#40;);
    initRookAttacks&#40;);
&#125;


/*******************************************************************/

/**
Get a bitboard with all positions set to 1 which can be attacked 
from a bishop, rook or queen on the square moving in the direction.
*/
Bitboard Board&#58;&#58;direction_attacks&#40; byte square, byte dir ) const
&#123;
    Bitboard result;
    Bitboard occ = wpieces_ | bpieces_;

    // 4 3 2
    // 5 0 1
    // 6 7 8
    switch ( dir )
    &#123;
      case 1&#58;
        result = rookAttacks&#40; square, occ );
        result &= dirMaskRight&#91;square & 7&#93;;
        break;
      case 5&#58;
        result = rookAttacks&#40; square, occ );
        result &= dirMaskLeft&#91;square & 7&#93;;
        break;
      case 7&#58;
        result = rookAttacks&#40; square, occ );
        result &= dirMaskDown&#91;square >> 3&#93;;
        break;
      case 3&#58;
        result = rookAttacks&#40; square, occ );
        result &= dirMaskUp&#91;square >> 3&#93;;
        break;
      case 8&#58;
        result = bishopAttacks&#40; square, occ );
        result &= dirMaskRight&#91;square & 7&#93;;
        result &= dirMaskDown&#91;square >> 3&#93;;
        break;
      case 4&#58;
        result = bishopAttacks&#40; square, occ );
        result &= dirMaskLeft&#91;square & 7&#93;;
        result &= dirMaskUp&#91;square >> 3&#93;;
        break;
      case 2&#58;
        result = bishopAttacks&#40; square, occ );
        result &= dirMaskRight&#91;square & 7&#93;;
        result &= dirMaskUp&#91;square >> 3&#93;;
        break;
      case 6&#58;
        result = bishopAttacks&#40; square, occ );
        result &= dirMaskLeft&#91;square & 7&#93;;
        result &= dirMaskDown&#91;square >> 3&#93;;
        break;
      default&#58;
        result = 0;
        break;
    &#125;

    return result;
&#125;


/*******************************************************************/

/**
Get a bitboard with all positions set to 1 which can be attacked 
from a rook or queen on the square.
*/
Bitboard Board&#58;&#58;orthogonal_attacks&#40; byte square ) const
&#123;
    Bitboard occ = wpieces_ | bpieces_;
    Bitboard result = rookAttacks&#40; square, occ );
    return result;
&#125;


/*******************************************************************/

/**
Get a bitboard with all positions set to 1 which can be attacked 
from a bishop or queen on the square.
*/
Bitboard Board&#58;&#58;diagonal_attacks&#40; byte square ) const
&#123;
    Bitboard occ = wpieces_ | bpieces_;
    Bitboard result = bishopAttacks&#40; square, occ );
    return result;
&#125;


/*******************************************************************/

#endif // #if USE_SHERWIN_UNION_BITBOARDS&#40;)
Harald
Posts: 317
Joined: Thu Mar 09, 2006 1:07 am

Re: BitBoard Tests (Romi Pitty)

Post by Harald »

bb_attacks_romi_pitty:

Code: Select all

/*******************************************************************/
/*
The bitboard attacks of RomiChess &#40;Michael Sherwin&#41; with a little bit 
of the Pitty method &#40;Carey&#41; as part of the board structure.

&#40;c&#41; 2007 Harald Lüßen
*/
/*******************************************************************/


#include "bb_ifdef.h"
#include "bb_basics.h"
#include "bb_bitboard.h"
#include "bb_board.h"
#include "bb_main.h"


#if USE_ROMI_BITBOARDS&#40;)

/*******************************************************************/

/*
    directions and shifts
    +-----+-----+-----+
    |<<= 9|<<= 8|<<= 7|
    +-----+-----+-----+
    |<<= 1|     |>>= 1|
    +-----+-----+-----+
    |>>= 7|>>= 8|>>= 9|
    +-----+-----+-----+

    We use this mapping of the normal board squares to bitboard bits
    +-------------------------+
    | 63 62 61 60 59 58 57 56 | 8
    | 55 54 53 52 51 50 49 48 | 7
    | 47 46 45 44 43 42 41 40 | 6
    | 39 38 37 36 35 35 33 32 | 5
    | 31 30 29 28 27 26 25 24 | 4
    | 23 22 21 20 19 18 17 16 | 3
    | 15 14 13 12 11 10  9  8 | 2
    |  7  6  5  4  3  2  1  0 | 1
    +-------------------------+
       a  b  c  d  e  f  g  h
*/

/*******************************************************************/

// Bishops

// The 4 rays for all moves from sq. Last index for special BSF/BSR results.
Bitboard rayUpLeft&#91;65&#93;;
Bitboard rayUpRight&#91;65&#93;;
Bitboard rayDownLeft&#91;65&#93;;
Bitboard rayDownRight&#91;65&#93;;

void initBishopRays&#40;)
&#123;
    int sq;
    for ( sq = 0; sq < 64; ++sq )
    &#123;
        rayUpLeft&#91;sq&#93; = 0;
        rayUpRight&#91;sq&#93; = 0;
        rayDownLeft&#91;sq&#93; = 0;
        rayDownRight&#91;sq&#93; = 0;
        int i;
        for ( i = sq - 9; i >= 0 && i % 8 != 7; i -= 9 )
        &#123;
            rayDownRight&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        for ( i = sq - 7; i >= 0 && i % 8 != 0; i -= 7 )
        &#123;
            rayDownLeft&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        for ( i = sq + 9; i < 64 && i % 8 != 0; i += 9 )
        &#123;
            rayUpLeft&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        for ( i = sq + 7; i < 64 && i % 8 != 7; i += 7 )
        &#123;
            rayUpRight&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        //logf << "initBishopRays " << sq << endl;
        //logf << rayDownRight&#91;sq&#93;.txt8lines&#40;) << endl;
        //logf << rayDownLeft&#91;sq&#93;.txt8lines&#40;) << endl;
        //logf << rayUpRight&#91;sq&#93;.txt8lines&#40;) << endl;
        //logf << rayUpLeft&#91;sq&#93;.txt8lines&#40;) << endl;
    &#125;
    rayUpLeft&#91;64&#93; = 0;
    rayUpRight&#91;64&#93; = 0;
    rayDownLeft&#91;64&#93; = 0;
    rayDownRight&#91;64&#93; = 0;
&#125;


/**
The bishop attack bitboard for a square and occupied bitboard
*/
Bitboard bishopAttacks&#40; int sq, Bitboard occ )
&#123;
    Bitboard    ray = rayUpLeft&#91;sq&#93;;
    Bitboard occray = ray & occ;
    Bitboard attack = ray ^ rayUpLeft&#91;occray.lsb_nr&#40;)&#93;;
       ray  = rayUpRight&#91;sq&#93;;
    occray  = ray & occ;
    attack |= ray ^ rayUpRight&#91;occray.lsb_nr&#40;)&#93;;
       ray  = rayDownLeft&#91;sq&#93;;
    occray  = ray & occ;
    attack |= ray ^ rayDownLeft&#91;occray.msb_nr&#40;)&#93;;
       ray  = rayDownRight&#91;sq&#93;;
    occray  = ray & occ;
    attack |= ray ^ rayDownRight&#91;occray.msb_nr&#40;)&#93;;
    //logf << "bishopAttacks " << sq << endl;
    //logf << occ.txt8lines&#40;) << ',' << endl;
    //logf << attack.txt8lines&#40;) << endl;
    return attack;
&#125;


/*******************************************************************/

// Rooks

// The 4 rays for all moves from sq. Last index for special BSF/BSR results.
Bitboard rayUp&#91;65&#93;;
Bitboard rayDown&#91;65&#93;;
Bitboard rayLeft&#91;65&#93;;
Bitboard rayRight&#91;65&#93;;

void initRookRays&#40;)
&#123;
    int sq;
    for ( sq = 0; sq < 64; ++sq )
    &#123;
        rayUp&#91;sq&#93; = 0;
        rayDown&#91;sq&#93; = 0;
        rayLeft&#91;sq&#93; = 0;
        rayRight&#91;sq&#93; = 0;
        int i;
        for ( i = sq - 1; i >= 0 && i % 8 != 7; --i )
        &#123;
            rayRight&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        for ( i = sq - 8; i >= 0; i -= 8 )
        &#123;
            rayDown&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        for ( i = sq + 1; i < 64 && i % 8 != 0; ++i )
        &#123;
            rayLeft&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        for ( i = sq + 8; i < 64; i += 8 )
        &#123;
            rayUp&#91;sq&#93; |= C64&#40;1&#41; << i;
        &#125;
        //logf << "initRookRays " << sq << endl;
        //logf << rayRight&#91;sq&#93;.txt8lines&#40;) << endl;
        //logf << rayDown&#91;sq&#93;.txt8lines&#40;) << endl;
        //logf << rayLeft&#91;sq&#93;.txt8lines&#40;) << endl;
        //logf << rayUp&#91;sq&#93;.txt8lines&#40;) << endl;
    &#125;
    rayUp&#91;64&#93; = 0;
    rayDown&#91;64&#93; = 0;
    rayLeft&#91;64&#93; = 0;
    rayRight&#91;64&#93; = 0;
&#125;


/**
The rook attack bitboard for a square and occupied bitboard
*/
Bitboard rookAttacks&#40; int sq, Bitboard occ )
&#123;
    Bitboard    ray = rayUp&#91;sq&#93;;
    Bitboard occray = ray & occ;
    Bitboard attack = ray ^ rayUp&#91;occray.lsb_nr&#40;)&#93;;
       ray  = rayLeft&#91;sq&#93;;
    occray  = ray & occ;
    attack |= ray ^ rayLeft&#91;occray.lsb_nr&#40;)&#93;;
       ray  = rayDown&#91;sq&#93;;
    occray  = ray & occ;
    attack |= ray ^ rayDown&#91;occray.msb_nr&#40;)&#93;;
       ray  = rayRight&#91;sq&#93;;
    occray  = ray & occ;
    attack |= ray ^ rayRight&#91;occray.msb_nr&#40;)&#93;;
    //logf << "rookAttacks " << sq << endl;
    //logf << occ.txt8lines&#40;) << ',' << endl;
    //logf << attack.txt8lines&#40;) << endl;
    return attack;
&#125;


/*******************************************************************/

/**
Prepare the slider attacks tables.
*/
void Board&#58;&#58;init_slider_attacks_index&#40;)
&#123;
    initBishopRays&#40;);
    initRookRays&#40;);
&#125;


/*******************************************************************/

/**
Get a bitboard with all positions set to 1 which can be attacked 
from a bishop, rook or queen on the square moving in the direction.
*/
Bitboard Board&#58;&#58;direction_attacks&#40; byte sq, byte dir ) const
&#123;
    Bitboard result;
    Bitboard occ = wpieces_ | bpieces_;

    // 4 3 2
    // 5 0 1
    // 6 7 8
    switch ( dir )
    &#123;
      case 1&#58;
        result = rayRight&#91;sq&#93;;
        occ &= result;
        result ^= rayRight&#91;occ.msb_nr&#40;)&#93;;
        break;
      case 5&#58;
        result = rayLeft&#91;sq&#93;;
        occ &= result;
        result ^= rayLeft&#91;occ.lsb_nr&#40;)&#93;;
        break;
      case 7&#58;
        result = rayDown&#91;sq&#93;;
        occ &= result;
        result ^= rayDown&#91;occ.msb_nr&#40;)&#93;;
        break;
      case 3&#58;
        result = rayUp&#91;sq&#93;;
        occ &= result;
        result ^= rayUp&#91;occ.lsb_nr&#40;)&#93;;
        break;
      case 8&#58;
        result = rayDownRight&#91;sq&#93;;
        occ &= result;
        result ^= rayDownRight&#91;occ.msb_nr&#40;)&#93;;
        break;
      case 4&#58;
        result = rayUpLeft&#91;sq&#93;;
        occ &= result;
        result ^= rayUpLeft&#91;occ.lsb_nr&#40;)&#93;;
        break;
      case 2&#58;
        result = rayUpRight&#91;sq&#93;;
        occ &= result;
        result ^= rayUpRight&#91;occ.lsb_nr&#40;)&#93;;
        break;
      case 6&#58;
        result = rayDownLeft&#91;sq&#93;;
        occ &= result;
        result ^= rayDownLeft&#91;occ.msb_nr&#40;)&#93;;
        break;
      default&#58;
        result = 0;
        break;
    &#125;

    return result;
&#125;


/*******************************************************************/

/**
Get a bitboard with all positions set to 1 which can be attacked 
from a rook or queen on the square.
*/
Bitboard Board&#58;&#58;orthogonal_attacks&#40; byte square ) const
&#123;
    Bitboard occ = wpieces_ | bpieces_;
    Bitboard result = rookAttacks&#40; square, occ );
    return result;
&#125;


/*******************************************************************/

/**
Get a bitboard with all positions set to 1 which can be attacked 
from a bishop or queen on the square.
*/
Bitboard Board&#58;&#58;diagonal_attacks&#40; byte square ) const
&#123;
    Bitboard occ = wpieces_ | bpieces_;
    Bitboard result = bishopAttacks&#40; square, occ );
    return result;
&#125;


/*******************************************************************/

#endif // #if USE_ROMI_BITBOARDS&#40;)
Harald
Posts: 317
Joined: Thu Mar 09, 2006 1:07 am

Re: BitBoard Tests Magic v Non-Rotated 32 Bits v 64 Bits

Post by Harald »

Hi, I coded two new bitboard methods, partly on Michael Sherwin's request. The 3 test positions are the same as always:

Code: Select all

Opening      rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Middle game  1r1q1rk1/2p1bppp/p5b1/3pP3/Bn1Pn3/2N1BN1P/1P2QPP1/R2R2K1 w - - 0 1
Near endgame r2r4/pp3p2/4bkpp/8/7P/3B1P2/PP4P1/1K1R3R b - -
There is a possible error of 1-2% depending on a changed test environment
on my machine. E.g. other background virus scanner.

Sherwin Index with union:
For each square each row of a bitboard generates a few bits of an index
to an bitboard attack table. It's a challenge to do the initialisation right.
The row acces is done with a union.

Code: Select all

  9	-0.05	4.5M	0&#58;31.90	d2-d4  d7-d5 Ng1-f3 Nb8-c6  e2-e3 Ng8-f6  c2-c4 Nc6-b4 Nb1-c3  c7-c6 
  8	+0.15	8.0M	0&#58;52.17	Ra1-c1 Rb8-c8 Nf3-d2 Ne4xc3 Rc1xc3 Nb4-a2 Rc3-c6 Na2-b4  g2-g4 Nb4xc6 Ba4xc6 
  9	+0.16	4.1M	0&#58;22.29	Rd8xd3 Rd1xd3 Be6-f5 Rh1-d1 Ra8-e8 Kb1-c2 Re8-e2 Rd1-d2 Bf5xd3 Kc2xd3 
Romi Pitty:
Use a combination of single rays combined wth the occupied bitboard
and a second ray starting at the most/least significant bit (bsr, bsf).
It is implemented similar in Romichess and Carey calls this algorithm pit(t)y.

Code: Select all

  9	-0.05	4.5M	0&#58;29.34	d2-d4  d7-d5 Ng1-f3 Nb8-c6  e2-e3 Ng8-f6  c2-c4 Nc6-b4 Nb1-c3  c7-c6 
  8	+0.15	8.0M	0&#58;48.28	Ra1-c1 Rb8-c8 Nf3-d2 Ne4xc3 Rc1xc3 Nb4-a2 Rc3-c6 Na2-b4  g2-g4 Nb4xc6 Ba4xc6 
  9	+0.16	4.1M	0&#58;20.56	Rd8xd3 Rd1xd3 Be6-f5 Rh1-d1 Ra8-e8 Kb1-c2 Re8-e2 Rd1-d2 Bf5xd3 Kc2xd3 
The new Statistic is this:

Code: Select all

+----------------------------+-----------------------+--------------+-----------+
| Name                       | Inventor or Supporter | Table Size   | Test-Time |
+----------------------------+-----------------------+--------------+-----------+
| Rotated Bitboards &#40;naive&#41;  | Robert Hyatt          |   77.2 KByte |  87.78 s  |
| Rotated Bitboards &#40;aligned&#41;| ?                     |   35.0 KByte |  86.65 s  |
| Rotated Bitboards &#40;switch&#41; | &#40;Harald Lüßen?)       |   14.9 KByte |  88.91 s  |
| Rotated Indices            | Alessandro Damiani    |  128.3 KByte |  78.93 s  |
| Exploding Bitboards        | Harald Lüßen          |    3.5 KByte | 101.79 s  |
| Magic Multiplication       | Gerd Isenberg         |    9.7 KByte |  91.87 s  |
| Magic Multiplication 32 bit| Gerd Isenberg         |    9.7 KByte |  81.37 s  |
| Sherwin Index              | Michael Sherwin       | 1402.4 KByte |  90.03 s  |
| Sherwin Index with union   | Michael Sherwin       | 1402.4 KByte | 106.36 s  |
| Pradu Minimize Magic       | Pradyumna Kannan      |  844.0 KByte |  81.42 s  |
| Pradu Perfect Magic Hash   |  and                  |  627.4 KByte |  82.09 s  |
| Pradu Big                  | Lasse Hansen          | 2306.0 KByte |  82.33 s  |
| Kogge-Stone                | Steffan Westcott      |    0.0 KByte |  99.32 s  |
| Simple Shift               | --                    |    0.0 KByte | 110.25 s  |
| Naive Shift                | --                    |    0.0 KByte | 111.69 s  |
| Romi/Pitty                 | --                    |    4.1 KByte |  98.18 s  |
+----------------------------+-----------------------+--------------+-----------+
The union access is slower than the shift and mask access. Either I did
something wrong or the compiler optimisation did the trick.
I am not good at assembler, 32/64 bit coding, cache lines and branch
prediction. If I should try another code please prepare a better variant
ready to use based on my source.

For the Romy/Pitty code I had to fix a bug in my lsb_nr (bsf) assembler
function. A nice side effect.

There is another method in discussion in this thread (Gerd's bswap).
I am still trying to understand it. What does bswap() do? Perhaps I
implement it one day.

Harald
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: BitBoard Tests Magic v Non-Rotated 32 Bits v 64 Bits

Post by Gerd Isenberg »

Aleks Peshkov wrote:
Gerd Isenberg wrote:The fascinating thing is the number of possible approaches to get sliding attacks. It is still a nice cycle hunting challenge and contest for some of us ;-)
I am very happy with forgotten Gerd's "bswap" approach:

Code: Select all

    struct M &#123;
        BitBoard singleton; //optional well-known "bitmask" &#40;1 << sq&#41;
        BitBoard vertical; //attack direction on empty board
        BitBoard diagonal;
        BitBoard antidiag;
        BitBoard knight;
        BitBoard king;
        BitBoard whitePawn;
        BitBoard blackPawn;
    &#125; mask&#91;64&#93;;

    BitBoard attack&#40;Square from, BitBoard occupied, BitBoard dirMask&#41; const &#123;
        bit64_t forward = occupied & dirMask;
        bit64_t reverse = bswap&#40;forward&#41;;
        forward -= mask&#91;from&#93;.singleton;
        reverse -= mask&#91;from ^ 070&#93;.singleton;
        forward ^= bswap&#40;reverse&#41;;
        return BitBoard&#40;forward & dirMask&#41;;
    &#125;

    BitBoard bishop&#40;Square from, BitBoard occupied&#41; const &#123;
        return
            attack&#40;from, occupied, mask&#91;from&#93;.diagonal&#41; |
            attack&#40;from, occupied, mask&#91;from&#93;.antidiag&#41;;
    &#125;

    BitBoard rook&#40;Square from, BitBoard occupied&#41; const &#123;
        return
            attack&#40;from, occupied, mask&#91;from&#93;.vertical&#41; |
            rankAttack&#40;from, occupied&#41;; //any traditional approach
    &#125;
32 bit and cache friendly. Memory acceses nicely interleaved with simple arithmetic.
Wow - you really did some fine improvements!
I had some more xors inside the code.
Seems to outperform kindergarten bishops now.

Code: Select all

?_bishopAttacks@@YA_K_KI@Z PROC	
  00000	40 53		 push	 rbx
  00002	4c 8d 15 00 00
	00 00		 lea	 r10, OFFSET FLAT&#58;?mask
  00009	8b d2		 mov	 edx, edx
  0000b	4c 8b c2	 mov	 r8, rdx
  0000e	48 83 f2 38	 xor	 rdx, 56
  00012	49 c1 e0 05	 shl	 r8, 5
  00016	48 c1 e2 05	 shl	 rdx, 5
  0001a	4b 8b 5c 10 08	 mov	 rbx, QWORD PTR &#91;r8+r10+8&#93;
  0001f	4f 8b 4c 10 10	 mov	 r9, QWORD PTR &#91;r8+r10+16&#93;
  00024	4f 8b 04 10	 mov	 r8, QWORD PTR &#91;r8+r10&#93;
  00028	4a 8b 14 12	 mov	 rdx, QWORD PTR &#91;rdx+r10&#93;
  0002c	4c 8b db	 mov	 r11, rbx
  0002f	49 8b c1	 mov	 rax, r9
  00032	48 23 c1	 and	 rax, rcx
  00035	4c 23 d9	 and	 r11, rcx
  00038	4c 8b d0	 mov	 r10, rax
  0003b	49 2b c0	 sub	 rax, r8
  0003e	49 8b cb	 mov	 rcx, r11
  00041	48 0f c9	 bswap	 rcx
  00044	49 0f ca	 bswap	 r10
  00047	4d 2b d8	 sub	 r11, r8
  0004a	48 2b ca	 sub	 rcx, rdx
  0004d	4c 2b d2	 sub	 r10, rdx
  00050	48 0f c9	 bswap	 rcx
  00053	49 0f ca	 bswap	 r10
  00056	49 33 c2	 xor	 rax, r10
  00059	49 33 cb	 xor	 rcx, r11
  0005c	49 23 c1	 and	 rax, r9
  0005f	48 23 cb	 and	 rcx, rbx
  00062	48 03 c1	 add	 rax, rcx
  00065	5b		 pop	 rbx
  00066	c3		 ret	 0
?_bishopAttacks@@YA_K_KI@Z ENDP
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: BitBoard Tests Magic v Non-Rotated 32 Bits v 64 Bits

Post by Michael Sherwin »

Harald wrote:Hi, I coded two new bitboard methods, partly on Michael Sherwin's request. The 3 test positions are the same as always:

Code: Select all

Opening      rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Middle game  1r1q1rk1/2p1bppp/p5b1/3pP3/Bn1Pn3/2N1BN1P/1P2QPP1/R2R2K1 w - - 0 1
Near endgame r2r4/pp3p2/4bkpp/8/7P/3B1P2/PP4P1/1K1R3R b - -
There is a possible error of 1-2% depending on a changed test environment
on my machine. E.g. other background virus scanner.

Sherwin Index with union:
For each square each row of a bitboard generates a few bits of an index
to an bitboard attack table. It's a challenge to do the initialisation right.
The row acces is done with a union.

Code: Select all

  9	-0.05	4.5M	0&#58;31.90	d2-d4  d7-d5 Ng1-f3 Nb8-c6  e2-e3 Ng8-f6  c2-c4 Nc6-b4 Nb1-c3  c7-c6 
  8	+0.15	8.0M	0&#58;52.17	Ra1-c1 Rb8-c8 Nf3-d2 Ne4xc3 Rc1xc3 Nb4-a2 Rc3-c6 Na2-b4  g2-g4 Nb4xc6 Ba4xc6 
  9	+0.16	4.1M	0&#58;22.29	Rd8xd3 Rd1xd3 Be6-f5 Rh1-d1 Ra8-e8 Kb1-c2 Re8-e2 Rd1-d2 Bf5xd3 Kc2xd3 
Romi Pitty:
Use a combination of single rays combined wth the occupied bitboard
and a second ray starting at the most/least significant bit (bsr, bsf).
It is implemented similar in Romichess and Carey calls this algorithm pit(t)y.

Code: Select all

  9	-0.05	4.5M	0&#58;29.34	d2-d4  d7-d5 Ng1-f3 Nb8-c6  e2-e3 Ng8-f6  c2-c4 Nc6-b4 Nb1-c3  c7-c6 
  8	+0.15	8.0M	0&#58;48.28	Ra1-c1 Rb8-c8 Nf3-d2 Ne4xc3 Rc1xc3 Nb4-a2 Rc3-c6 Na2-b4  g2-g4 Nb4xc6 Ba4xc6 
  9	+0.16	4.1M	0&#58;20.56	Rd8xd3 Rd1xd3 Be6-f5 Rh1-d1 Ra8-e8 Kb1-c2 Re8-e2 Rd1-d2 Bf5xd3 Kc2xd3 
The new Statistic is this:

Code: Select all

+----------------------------+-----------------------+--------------+-----------+
| Name                       | Inventor or Supporter | Table Size   | Test-Time |
+----------------------------+-----------------------+--------------+-----------+
| Rotated Bitboards &#40;naive&#41;  | Robert Hyatt          |   77.2 KByte |  87.78 s  |
| Rotated Bitboards &#40;aligned&#41;| ?                     |   35.0 KByte |  86.65 s  |
| Rotated Bitboards &#40;switch&#41; | &#40;Harald Lüßen?)       |   14.9 KByte |  88.91 s  |
| Rotated Indices            | Alessandro Damiani    |  128.3 KByte |  78.93 s  |
| Exploding Bitboards        | Harald Lüßen          |    3.5 KByte | 101.79 s  |
| Magic Multiplication       | Gerd Isenberg         |    9.7 KByte |  91.87 s  |
| Magic Multiplication 32 bit| Gerd Isenberg         |    9.7 KByte |  81.37 s  |
| Sherwin Index              | Michael Sherwin       | 1402.4 KByte |  90.03 s  |
| Sherwin Index with union   | Michael Sherwin       | 1402.4 KByte | 106.36 s  |
| Pradu Minimize Magic       | Pradyumna Kannan      |  844.0 KByte |  81.42 s  |
| Pradu Perfect Magic Hash   |  and                  |  627.4 KByte |  82.09 s  |
| Pradu Big                  | Lasse Hansen          | 2306.0 KByte |  82.33 s  |
| Kogge-Stone                | Steffan Westcott      |    0.0 KByte |  99.32 s  |
| Simple Shift               | --                    |    0.0 KByte | 110.25 s  |
| Naive Shift                | --                    |    0.0 KByte | 111.69 s  |
| Romi/Pitty                 | --                    |    4.1 KByte |  98.18 s  |
+----------------------------+-----------------------+--------------+-----------+
The union access is slower than the shift and mask access. Either I did
something wrong or the compiler optimisation did the trick.
I am not good at assembler, 32/64 bit coding, cache lines and branch
prediction. If I should try another code please prepare a better variant
ready to use based on my source.


Responce: I will take a final best guess and submit it to you. Thanks!

For the Romy/Pitty code I had to fix a bug in my lsb_nr (bsf) assembler
function. A nice side effect.

There is another method in discussion in this thread (Gerd's bswap).
I am still trying to understand it. What does bswap() do? Perhaps I
implement it one day.

Harald
Hi Harald,

Thank you! It is good to have answers! :D

I suppose that for 32 bit machines it would then be best to keep the row indici look-up in-register and just split occ into 2 32 bit halves as that would save 17 32 bit instructions. I wish that I would have just believed you and Gerd about the in-register stuff as now all that needs to be answered is the, '32 bit friendly' in-register question.

OTOH, all very instructive! :D And that big-endian and little-endian code can be useful to a lot of people that want to write portable code. Me included. :D

Best,
Mike
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: BitBoard Tests Magic v Non-Rotated 32 Bits v 64 Bits

Post by Gerd Isenberg »

Harald wrote: There is another method in discussion in this thread (Gerd's bswap).
I am still trying to understand it. What does bswap() do? Perhaps I
implement it one day.
Harald
Bswap reverses the bytes in a 64-bit qword. Byte 0 exchanges with byte 7, 1<->6, 2<->5 and 3<->4.
It performs a little- big-endian or vice versa conversion and makes negative vertical or diagonal directions positive.
So that one can perform the o^(o-2r) trick to generate attacks in both directions. Aleks found some nice optimizations according to the xor of the positive and reversed negative results. Bswap-attacks really becomes an ipc monster. Considering the very moderate lookup size it is clearly one of the favourites. Bswap64 intrinsic is even available in 32-bit mode, performing two 32-bit bswaps and exchanging the semantic of two registers.

from the snoob-test

Code: Select all

	start = clock&#40;);
	for &#40;x = first, cnt=1; &#40;y = snoob&#40;x&#41;) > x; x = y, cnt++) &#123;
		for &#40;int sq = startsq; sq < 64; sq++)
			atta += bishopAttacks&#40;y, sq&#41;;
	&#125;
	stop = clock&#40;);

Code: Select all

              secs  runs       time per run
pradu         2.093 7624512*64 4.289 ns
bswap         2.313 7624512*64 4.740 ns
kindergarten  3.312 7624512*64 6.787 ns
Clearly the advantage of Pradu's approach is despite number of instructions and code size the register usage.
Four scratch registers only - four less than bswap and kindergarten.
That may outweight the occasional L1 misses.
The vc2005 generated 64-bit assembly as a reference.

PraduMagicBishopAttacks

Code: Select all

?PraduMagicBishopAttacks@@YA_K_KI@Z PROC
  00000	8b c2		 mov	 eax, edx
  00002	4c 8d 05 00 00
	00 00		 lea	 r8, OFFSET FLAT&#58;?prabish
  00009	48 c1 e0 05	 shl	 rax, 5
  0000d	4a 8b 54 00 08	 mov	 rdx, QWORD PTR &#91;rax+r8+8&#93;
  00012	48 23 d1	 and	 rdx, rcx
  00015	42 0f b6 4c 00
	18		 movzx	 ecx, BYTE PTR &#91;rax+r8+24&#93;
  0001b	4a 0f af 54 00
	10		 imul	 rdx, QWORD PTR &#91;rax+r8+16&#93;
  00021	4a 8b 04 00	 mov	 rax, QWORD PTR &#91;rax+r8&#93;
  00025	48 d3 ea	 shr	 rdx, cl
  00028	48 8b 04 d0	 mov	 rax, QWORD PTR &#91;rax+rdx*8&#93;
  0002c	c3		 ret	 0
?PraduMagicBishopAttacks@@YA_K_KI@Z ENDP
kindergartenBishopAttacks

Code: Select all

?kindergartenBishopAttacks@@YA_K_KI@Z PROC
  00000	40 53		 push	 rbx
  00002	44 8b d2	 mov	 r10d, edx
  00005	4c 8b c9	 mov	 r9, rcx
  00008	8b c2		 mov	 eax, edx
  0000a	48 c1 e0 05	 shl	 rax, 5
  0000e	48 8d 1d 00 00
	00 00		 lea	 rbx, OFFSET FLAT&#58;__ImageBase
  00015	41 83 e2 07	 and	 r10d, 7
  00019	4c 8b 84 18 08
	00 00 00	 mov	 r8, QWORD PTR ?smsk@@3PAUSMsk@@A&#91;rax+rbx+8&#93;
  00021	48 8b 94 18 10
	00 00 00	 mov	 rdx, QWORD PTR ?smsk@@3PAUSMsk@@A&#91;rax+rbx+16&#93;
  00029	49 bb 02 02 02
	02 02 02 02 02	 mov	 r11, 0202020202020202H
  00033	48 8b c2	 mov	 rax, rdx
  00036	48 23 c1	 and	 rax, rcx
  00039	49 8b c8	 mov	 rcx, r8
  0003c	49 23 c9	 and	 rcx, r9
  0003f	49 0f af c3	 imul	 rax, r11
  00043	49 0f af cb	 imul	 rcx, r11
  00047	48 c1 e8 3a	 shr	 rax, 58
  0004b	49 8d 04 c2	 lea	 rax, QWORD PTR &#91;r10+rax*8&#93;
  0004f	48 c1 e9 3a	 shr	 rcx, 58
  00053	48 8b 84 c3 00
	00 00 00	 mov	 rax, QWORD PTR ?fillUpAttacks@@3PAY07_KA&#91;rbx+rax*8&#93;
  0005b	48 23 c2	 and	 rax, rdx
  0005e	49 8d 0c ca	 lea	 rcx, QWORD PTR &#91;r10+rcx*8&#93;
  00062	48 8b 94 cb 00
	00 00 00	 mov	 rdx, QWORD PTR ?fillUpAttacks@@3PAY07_KA&#91;rbx+rcx*8&#93;
  0006a	49 23 d0	 and	 rdx, r8
  0006d	48 0b c2	 or	 rax, rdx
  00070	5b		 pop	 rbx
  00071	c3		 ret	 0
?kindergartenBishopAttacks@@YA_K_KI@Z ENDP
brianr
Posts: 536
Joined: Thu Mar 09, 2006 3:01 pm

Re: BitBoard Tests Magic v Non-Rotated 32 Bits v 64 Bits

Post by brianr »

Gerd Isenberg wrote:Hi Brian,

how fast is 64-bit with optimization off or a 64-bit debug version?

Strange the the 32-bit magic is that fast with VS2003 despite 64-bit imul calls with up to three 32-bit imuls. I would have expected something like 20-30 seconds for 64-bit mode if you do a lot of 64-bit stuff which is likely with bitboards - considering the doubled register width as well as number of registers. There seems something broken with 2008 beta like all optimization disabled. Would be interesting so see some generated assembly though or to do some profiling.

Also to consider - if you change some code + datastructures to compare, things interact with other stuff due the huge processor heuristics which may behave chaotical.

Can you explain how you do sliding attacks in none-rotated?

Gerd

Btw. sq xor 63 instead of 63-sq may safe a register.
I can't get Tinker to compile with debug or without optimization with the VS2008 Beta2.
Something about addressing exceptions right up in the default settings command parsing,
which is odd since Tinker almost never crashes and seems among the more stable engines.

However, I was able to test Buzz, which has much cleaner code.
The 64 bit VS2008 Beta2 speedup for a simple test of the starting position to 12 plys went from 37sec 297Knps to 14 sec and 759Knps.

I'm thinking one reason for the "minimal" Tinker results earlier are because all of the non-rotated BB code is still in there and
the only change with "magic" BBs is in the basic attack and movegen functions.
Profiling shows Tinker spends 11% in movegen and about 7% in attack.
Eval, search and SEE were not changed.

So, even if magic sped things up by say 50%, it looks like that would only be about a net 10% (half of 18%) at most overall impact,
without more drastic changes throughout.