Fast rook mobility counting

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Fast rook mobility counting

Post by mcostalba »

This is a way to quickly count rook mobility bitboard also in case of taboo squares.

Suppose you have following rook mobility bitboard

Code: Select all

   abcdefgh
8  00100000
7  00100000
6  01X11000
5  00100000
4  00100000
3  00100000
2  00000000
1  00000000

Rook is on square c6, so

Rank rk = 5 and file fl = 2 (starting from 0)

The counting is split in two operation, the first is counting the horizontal ones, this can be done with a fast lookup in a 8bit table:

Code: Select all

BitCount8Bit[(b >> rk*8) & 0xFF]);
The second is counting the vertical line, this can be done with a multiply, with the same logic of the standard counting function in C version.

Code: Select all

const uint64 FileA = 0x0101010101010101ULL;

(((b >> fl) & FileA) * FileA) >> 56;
First shift the vertical line to the first file, then mask with first file to remove horizontal line then multiply by vertical first line so that the 'ones' are all summed starting from bit 56 on. Finally shift down of 56.

So the complete counting function is

Code: Select all

const uint64 FileA = 0x0101010101010101ULL;

int rook_mob_count(uint64 b, int rk, int fl)
{
    return BitCount8Bit[(b >> rk*8) & 0xFF]) + (((b >> fl) & FileA) * FileA) >> 56;
}
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: Fast rook mobility counting

Post by Pradu »

Nice. For all bishop squares you can for sure do it with two multiplys an add, and a rightshift using the same technique. For many bishop squares you can for sure do it with just one multiply and a right shift.

It is possible to do it for both rooks and bishops with one multiply and a right-shift with a lookup but I'm not sure exactly how big or small the lookups can get:

*(lookup_base[sq] + occ*magic[sq]>>someshift[sq])

Hopefully, because we are trying to do popcounts, they can get significantly smaller than move-bitboard databases.
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: Fast rook mobility counting

Post by Pradu »

Pradu wrote:Hopefully, because we are trying to do popcounts, they can get significantly smaller than move-bitboard databases.
Well, it's for sure going to be significantly smaller. For each diagonal/anti-diagonal and file you can for sure reduce it to a minimal hash by making every bit shift to the same index. It is not clear how much worse coupling them would make it.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Fast rook mobility counting

Post by mcostalba »

For a diagonal

Code: Select all

   abcdefgh
8  00000000
7  10000000
6  01000000
5  00x00000
4  00010000
3  00001000
2  00000100
1  00000010

Bishop is in c5 so rk = 4 and fl = 2
We can sum all the ones starting from square a7 (that is bit 48)

Code: Select all

const uint64 Diag = 0x0810204081020481ULL;

(b * Diag) >> 48;

Diag is a number with '1' spaced by 7 bits instead of 8:

Diag = .....10000001000000100000010000001
For a diagonal in the direction a1-h8 instead we need two use a number with the ones spaced by 9 bits.

Code: Select all

const uint64 ADiag = 0x8040201008040201ULL;

(b * ADiag) >> shift[sq];

ADiag is a number with '1' spaced by 9 bits

ADiag = .....100000000100000000100000001
It is interesting to note that Diag and ADiag does NOT depend on the square, only the shift depends on the square.

It would be very interesting to find a number BothDiag so that we sum all ones in 1 go, the diagonal summing on a bit and the anti-diagonal on a different bit so that for a full bishop attack it becomes

Code: Select all

const uint64 BothDiag = ?????????ULL;

int bishop_mob_count(uint64 b, int sq) 
{
    uint64 bb = b * BothDiag;

    return (bb >> shiftD[sq]) + (bb >> shiftAD[sq]);
}
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Fast rook mobility counting

Post by Gerd Isenberg »

Not bad, the ranks need a lookup. Other lines have zero gaps large enough to accumulate bytes with one mul. Tracing the critical path in this optimized assembly, about 12 .. 13 cycles I guess.

Code: Select all

; in:  ecx (cl) square
;      rdx b
; out: eax orthogonal popcount of b
mov   ch, cl
mov   rax, 0101010101010101H
and   ecx, 0x3807  ; swar mask ch = rkx8, cl = fl
mov   r08, rdx
shr   rdx, cl ; (b >> fl)
mov   cl, ch
shr   r08, cl ; (b >> rkx8)
and   rdx, rax
and   r08, ffh
imul  rax, rdx
lea   rdx, BitCount8Bit
shr   rax, 56
movzx edx, byteptr [rdx + r08]
add   eax, edx
ret   0
Take care that you always pass the right bitboards to your specialized popcounts ;-)
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: Fast rook mobility counting

Post by Pradu »

mcostalba wrote:It would be very interesting to find a number BothDiag so that we sum all ones in 1 go, the diagonal summing on a bit and the anti-diagonal on a different bit so that for a full bishop attack it becomes

Code: Select all

const uint64 BothDiag = ?????????ULL;

int bishop_mob_count(uint64 b, int sq) 
{
    uint64 bb = b * BothDiag;

    return (bb >> shiftD[sq]) + (bb >> shiftAD[sq]);
}
This may not be possible for squares where the bits are too close to each other. However you would be able to find one that works for both for some of the edges (say bishop on A4 or something) and maybe some others where the bits are relatively far apart. For these squares you could generate BothDiag[sq] like this:

Code: Select all

lowest_bit_in_popcnt = LBIP
LBIP = 1 << shift&#91;sq&#93;

BothDiag=0
for each possible 1 bit in the bitboard &#40;the_1_bit&#41;&#58;
BothDiag|=&#40;LBIP/the_1_bit&#41;
The popcnt function would be:

Code: Select all

&#40;bitboard*BothDiag&#91;sq&#93;)>>shift&#91;sq&#93;
However the following is possible for all squares:

Code: Select all

 *&#40;base_address&#91;sq&#93;+&#40;bitboard*BothDiag&#91;sq&#93; >> shift&#91;sq&#93;)) 
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: Fast rook mobility counting

Post by Pradu »

I misread yoru post, I just read your posted code... let me think about it.

Code: Select all

const uint64 BothDiag = ?????????ULL;

int bishop_mob_count&#40;uint64 b, int sq&#41; 
&#123;
    uint64 bb = b * BothDiag;

    return &#40;bb >> shiftD&#91;sq&#93;) + &#40;bb >> shiftAD&#91;sq&#93;);
&#125;
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Fast rook mobility counting

Post by mcostalba »

We have a problem with bishops but also with rooks regarding the MSB of the 64 bit bitboard.

The MSB could be too high that we cannot accumulate on that, as example the a1-h8 diagonal

Code: Select all

    abcdefgh
7  00000001
6  00000010
5  00000100
4  0000x000
3  00010000
2  00100000
1  01000000
0  10000000

Piece is in d4 so sq = 36
Here we cannot accumulate on h8 because we go in overflow. A general solution that works also with rooks vertical files too near the H file is the following:

Code: Select all

int count&#40;uint64 b, int sq&#41; &#123;

   return (&#40;b & Mask&#91;sq&#93;) * Acc&#41; >> Shift&#91;sq&#93; + &#40;b >> shiftMSB&#91;sq&#93;);

&#125;
Here Mask[36] is a bitboard with ones on the diagonal a1-h8 BUT the h8 square so to clear the MSB bit, we will accumulate on bit 54 (square g6) instead. And of course Shift[36] == 54

The last term of the formula shifts down the MSB bit, in our case shiftMSB[36] == 63, and so count is incremented if the cleared bit was set. Note that we don't need any mask on the last term because the big shift already removes all the bits but the MSB.

Note also that we don't shift anymore the bitboard in first term but we can mask directly "in place". The Acc constant are indipendent from the diagonal position, but depends only on orientation.

The same formula applies for files, diagonals and antidiagonals so that we can unify in a template:

Code: Select all

enum Direction &#123;
  A1_A8,
  A1_H8,
  A8_H1,
  A1_H1 // Horizontal
&#125;;

template<Direction D>
inline int count&#40;uint64_t b, int sq&#41; &#123;

  const uint64_t // evaluated at compile time
  Acc = &#40;D == A1_A8 ? 0x0101010101010101ULL &#58;
         D == A1_H8 ? 0x8040201008040201ULL &#58; 0x0810204081020481ULL&#41;;

  return (&#40;b & Mask&#91;D&#93;&#91;sq&#93;) * Acc&#41; >> Shift&#91;D&#93;&#91;sq&#93; + &#40;b >> shiftMSB&#91;D&#93;&#91;sq&#93;);

&#125;

// Template specialization for the case of horizontal line
template<>
inline int count<A1_H1>&#40;uint64_t b, int sq&#41; &#123;

  int rk = sq >> 3; // as per square definition in Glaurung
  return BitCount8Bit&#91;&#40;b >> rk*8&#41; & 0xFF&#93;;  // rk*8 could be further optimized, see the above instruction
&#125;

It is of course trivial the sequence of functions to call to get bitcount for bishops, rooks and even queens.

Code: Select all

// Count rook mobility
int r_mob = count<A1_H1>&#40;b, sq&#41; + count<A1_A8>&#40;b, sq&#41;;

// Count bishop mobility
int b_mob = count<A1_H8>&#40;b, sq&#41; + count<A8_H1>&#40;b, sq&#41;;
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Fast rook mobility counting

Post by mcostalba »

In case you don't like 3 lookups we can do with 2

Code: Select all

template<Direction D>
inline int count&#40;uint64_t b, int sq&#41; &#123;

  const uint64_t // evaluated at compile time
  Acc = &#40;D == A1_A8 ? 0x0101010101010101ULL &#58;
         D == A1_H8 ? 0x8040201008040201ULL &#58; 0x0810204081020481ULL&#41;;

  b &= Mask&#91;D&#93;&#91;sq&#93;;
  return (&#40;b * Acc&#41; >> Shift&#91;D&#93;&#91;sq&#93;) & 0xFF + &#40;b >= &#40;1 << 54&#41;);

&#125;
In this case Mask[D][sq] has all the bits set, also the one on the last rank. The point is we don't care if we have an overflow in h8 because we accumulate on g6 as seen before and shift down from there. But we need to preserve the bit on the last rank if it is set so that comparison on the second term is true and we add it afterwords.