DIRECT BITBOARD MOVEGENERATION

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: DIRECT BITBOARD MOVEGENERATION

Post by Gerd Isenberg »

Desperado wrote: - if i could(would) trust my magic-bitboards, now this version is also
competetive with the fast magics.
It might be competitive with magic bitboards in some loop or pure cycle test or even perft (like HQ). But it takes much more registers than magic. That's the big advantage of magic bitboards, beside short code, surrounding code of a real chess program can keep more locals in registers (even scratch register rax, rcx, rdx, r8..r11), which need otherwise to re-load from stack.
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: DIRECT BITBOARD MOVEGENERATION

Post by Desperado »

The assembly for complete bishop attacks:

Code: Select all


	mov	QWORD PTR [rsp+8], rbx
	mov	QWORD PTR [rsp+16], rdi
	movzx	r9d, r8b
	lea	rdi, OFFSET FLAT:msk
	mov	r11, rcx
	mov	rbx, -9223372036854775808		; 8000000000000000H
	mov	rax, r9
	lea	r9, QWORD PTR [r9+r9*2]
	xor	rax, 56					; 00000038H
	lea	r10, QWORD PTR [rax+rax*2]
	mov	rax, QWORD PTR [rdi+r9*8+3088]
	mov	r8, QWORD PTR [rdi+r10*8+4624]
	and	rax, rcx
	and	r8, rdx
	or	r8, rbx
	mov	rcx, r8
	neg	rcx
	and	rcx, r8
	mov	r8, rax
	neg	r8
	bswap	rcx
	and	r8, rax
	mov	rax, QWORD PTR [rdi+r10*8+3088]
	and	rax, rdx
	add	r8, r8
	or	rax, rbx
	mov	rbx, QWORD PTR [rsp+8]
	sub	r8, rcx
	mov	rcx, QWORD PTR [rdi+r9*8+4624]
	and	r8, QWORD PTR [rdi+r9*8+3072]
	mov	rdx, rax
	neg	rdx
	and	rcx, r11
	and	rdx, rax
	mov	rax, rcx
	neg	rax
	bswap	rdx
	and	rax, rcx
	add	rax, rax
	sub	rax, rdx
	and	rax, QWORD PTR [rdi+r9*8+4608]
	mov	rdi, QWORD PTR [rsp+16]
	or	rax, r8
Seems a little bit longer,but i dont know much about interpreting the asm-instructions. Of course some micro-opt. can be done, and things like you
mentioned in some of the posts before, like UI_08 type for sq64...

And yes, ROD needs OD style for the rank...until a _single isolated bit_ can be reversed easily.(dont want to believe bswap is the optimal solution, maybe there nothing faster,but equal and more general !)

Would be happy for some remarks on the assembly.

Thx
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: DIRECT BITBOARD MOVEGENERATION

Post by Gerd Isenberg »

Desperado wrote:The assembly for complete bishop attacks:

Would be happy for some remarks on the assembly.
Not that optimal, register pressure is greater than expected, needs the caller save registers rbx and rdi, beside the vc scratch registers rax, rcx, rdx, r8..r11:

Code: Select all

   mov   QWORD PTR [rsp+8], rbx
   mov   QWORD PTR [rsp+16], rdi 
   ....
   mov   rbx, QWORD PTR [rsp+8] 
   ...
   mov   rdi, QWORD PTR [rsp+16] 
Because of the register pressure, both lines are not that scheduled in parallel, only a bit in the middle. A "pain" in 64-bit mode (as in my assembly) is the need for one register to address the global table.

Code: Select all

   lea   rdi, OFFSET FLAT:msk
The 64-bit constant load also takes one register, it seems:

Code: Select all

  mov   rbx, -9223372036854775808      ; 8000000000000000H 
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: DIRECT BITBOARD MOVEGENERATION

Post by mcostalba »

Desperado wrote: 2: I thought there is an easier way to reverse a _single bit_(isolated but not index avail.) bitboard.
What do you mean exactly ? Could you do some examples of what you need?

Thanks
Marco
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: DIRECT BITBOARD MOVEGENERATION

Post by mcostalba »

Be 2n the length of the word, 8 in our case so that n = 4.

Code: Select all

Our number is 2^x

a)  0100 0000   2^x


b)  0000 0100   (2^x >> n)  = 2^ (x-n)


c)  0010 0000   (2^(x-n))*(2^(x-n)) = 2^(2x-2n)


d) 0000 00010  (2^x) / 2^(2x-2n) = 2^ (x-2x+2n) = x^(2n-x)

If I don't have misunderstood (d) is what we are looking for.

So in code should be:

Code: Select all

b = lo;
b >>=32;
b *=b;
lo /=b;
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: DIRECT BITBOARD MOVEGENERATION

Post by Desperado »

hi,

first some little changes (improvements).

Code: Select all

	 lo = msk[MSK_SWNE][sq64].flipped_hi & occ[1];	//occ[1] == bsw,doesnt need 2nd adress,saves xor
	 lo|=bR8;										          //tried bts -> was worse
	 lo&=-lo;										          //LSB
	 lo = _byteswap_uint64(lo);						    //lsb becomes msb of lo

	 hi = msk[MSK_SWNE][sq64].line_hi & occ[0];		//occ[0] == occ
	 hi&=-hi;

	 diag = msk[MSK_SWNE][sq64].line_ex & (2*hi-lo);

Hi Marco,

Code: Select all


Bitboard(f3):

00000000
00000000
00000000
00000000
00000000
00000100
00000000
00000000

Bitboard(c6) = Reverse(f3): flipped vertical&mirroredHorizontal

00000000
00000000
00100000
00000000
00000000
00000000
00000000
00000000

this is what i like to have for all 64 squares without
extracting the bitindex before.
If this can be done as cheap as vertical-flip(bswap),
i would use an incremental reversed bb instead of incremental flipped bb (bsw in the code)

The advantage would be, that also ranks can be computed that way.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: DIRECT BITBOARD MOVEGENERATION

Post by mcostalba »

Desperado wrote:hi,

first some little changes (improvements).

Code: Select all

	 lo = msk[MSK_SWNE][sq64].flipped_hi & occ[1];	//occ[1] == bsw,doesnt need 2nd adress,saves xor
	 lo|=bR8;										          //tried bts -> was worse
	 lo&=-lo;										          //LSB
	 lo = _byteswap_uint64(lo);						    //lsb becomes msb of lo

	 hi = msk[MSK_SWNE][sq64].line_hi & occ[0];		//occ[0] == occ
	 hi&=-hi;

	 diag = msk[MSK_SWNE][sq64].line_ex & (2*hi-lo);

Hi Marco,

Code: Select all


Bitboard(f3):

00000000
00000000
00000000
00000000
00000000
00000100
00000000
00000000

Bitboard(c6) = Reverse(f3): flipped vertical&mirroredHorizontal

00000000
00000000
00100000
00000000
00000000
00000000
00000000
00000000

this is what i like to have for all 64 squares without
extracting the bitindex before.
If this can be done as cheap as vertical-flip(bswap),
i would use an incremental reversed bb instead of incremental flipped bb (bsw in the code)

The advantage would be, that also ranks can be computed that way.
Are you looking for something like this (but applied to bitbord) ?

Code: Select all

inline Square flip_square(Square s) {
  return Square(int(s) ^ FlipMask);
}
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: DIRECT BITBOARD MOVEGENERATION

Post by Desperado »

oh, you posted while i prepared my post.

And your last question: we don t know "s".(no index).
(yes applied to bb, so to say)
just the isolated bit(board) (like in the example i posted for f3)

(reverse version)

a1...h8
b1...g8
c1...f8
. . .

AND NOT

(bswap version)

a1...a8
b1...b8
c1...c8
. . .
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: DIRECT BITBOARD MOVEGENERATION

Post by mcostalba »

Desperado wrote:oh, you posted while i prepared my post.

And your last question: we don t know "s".(no index).
I would think we could do with a division:

Code: Select all

Bitboard(b):

00000000
00000000
00000000
00000000
00000000
00000100
00000000
00000000

Bitboard(center):

00000000
00000000
00000000
10000000
00000000
00000000
00000000
00000000


Bitboard(center/b):

00000000
00000000
00000000
00000000
00000000
00000000
00100000
00000000

Bitboard(center*center/b):

00000000
00000000
00100000
00000000
00000000
00000000
00000000
00000000
At the end formula should be

Code: Select all

flipped&#40;b&#41; = K/b;  // where K = &#40;1<<32&#41;^2
More or less...

Perhaps there is a faster way to divide for a power of 2 number.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: DIRECT BITBOARD MOVEGENERATION

Post by mcostalba »

mcostalba wrote: Perhaps there is a faster way to divide for a power of 2 number.
Perhaps you can use a bsf() function that is faster then bsr64() and it is enough the 32 bit version:

Code: Select all

fipped&#40;b&#41; = 1 << &#40;64 - bsf32&#40;unsigned&#40;b&#41;);