It might be competitive with magic bitboards in some loop or pure cycle test or even perft (like HQ). But it takes much more registers than magic. That's the big advantage of magic bitboards, beside short code, surrounding code of a real chess program can keep more locals in registers (even scratch register rax, rcx, rdx, r8..r11), which need otherwise to re-load from stack.Desperado wrote: - if i could(would) trust my magic-bitboards, now this version is also
competetive with the fast magics.
DIRECT BITBOARD MOVEGENERATION
Moderators: hgm, Rebel, chrisw
-
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: DIRECT BITBOARD MOVEGENERATION
-
- Posts: 879
- Joined: Mon Dec 15, 2008 11:45 am
Re: DIRECT BITBOARD MOVEGENERATION
The assembly for complete bishop attacks:
Seems a little bit longer,but i dont know much about interpreting the asm-instructions. Of course some micro-opt. can be done, and things like you
mentioned in some of the posts before, like UI_08 type for sq64...
And yes, ROD needs OD style for the rank...until a _single isolated bit_ can be reversed easily.(dont want to believe bswap is the optimal solution, maybe there nothing faster,but equal and more general !)
Would be happy for some remarks on the assembly.
Thx
Code: Select all
mov QWORD PTR [rsp+8], rbx
mov QWORD PTR [rsp+16], rdi
movzx r9d, r8b
lea rdi, OFFSET FLAT:msk
mov r11, rcx
mov rbx, -9223372036854775808 ; 8000000000000000H
mov rax, r9
lea r9, QWORD PTR [r9+r9*2]
xor rax, 56 ; 00000038H
lea r10, QWORD PTR [rax+rax*2]
mov rax, QWORD PTR [rdi+r9*8+3088]
mov r8, QWORD PTR [rdi+r10*8+4624]
and rax, rcx
and r8, rdx
or r8, rbx
mov rcx, r8
neg rcx
and rcx, r8
mov r8, rax
neg r8
bswap rcx
and r8, rax
mov rax, QWORD PTR [rdi+r10*8+3088]
and rax, rdx
add r8, r8
or rax, rbx
mov rbx, QWORD PTR [rsp+8]
sub r8, rcx
mov rcx, QWORD PTR [rdi+r9*8+4624]
and r8, QWORD PTR [rdi+r9*8+3072]
mov rdx, rax
neg rdx
and rcx, r11
and rdx, rax
mov rax, rcx
neg rax
bswap rdx
and rax, rcx
add rax, rax
sub rax, rdx
and rax, QWORD PTR [rdi+r9*8+4608]
mov rdi, QWORD PTR [rsp+16]
or rax, r8
mentioned in some of the posts before, like UI_08 type for sq64...
And yes, ROD needs OD style for the rank...until a _single isolated bit_ can be reversed easily.(dont want to believe bswap is the optimal solution, maybe there nothing faster,but equal and more general !)
Would be happy for some remarks on the assembly.
Thx
-
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: DIRECT BITBOARD MOVEGENERATION
Not that optimal, register pressure is greater than expected, needs the caller save registers rbx and rdi, beside the vc scratch registers rax, rcx, rdx, r8..r11:Desperado wrote:The assembly for complete bishop attacks:
Would be happy for some remarks on the assembly.
Code: Select all
mov QWORD PTR [rsp+8], rbx
mov QWORD PTR [rsp+16], rdi
....
mov rbx, QWORD PTR [rsp+8]
...
mov rdi, QWORD PTR [rsp+16]
Code: Select all
lea rdi, OFFSET FLAT:msk
Code: Select all
mov rbx, -9223372036854775808 ; 8000000000000000H
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: DIRECT BITBOARD MOVEGENERATION
What do you mean exactly ? Could you do some examples of what you need?Desperado wrote: 2: I thought there is an easier way to reverse a _single bit_(isolated but not index avail.) bitboard.
Thanks
Marco
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: DIRECT BITBOARD MOVEGENERATION
Be 2n the length of the word, 8 in our case so that n = 4.
If I don't have misunderstood (d) is what we are looking for.
So in code should be:
Code: Select all
Our number is 2^x
a) 0100 0000 2^x
b) 0000 0100 (2^x >> n) = 2^ (x-n)
c) 0010 0000 (2^(x-n))*(2^(x-n)) = 2^(2x-2n)
d) 0000 00010 (2^x) / 2^(2x-2n) = 2^ (x-2x+2n) = x^(2n-x)
If I don't have misunderstood (d) is what we are looking for.
So in code should be:
Code: Select all
b = lo;
b >>=32;
b *=b;
lo /=b;
-
- Posts: 879
- Joined: Mon Dec 15, 2008 11:45 am
Re: DIRECT BITBOARD MOVEGENERATION
hi,
first some little changes (improvements).
Hi Marco,
this is what i like to have for all 64 squares without
extracting the bitindex before.
If this can be done as cheap as vertical-flip(bswap),
i would use an incremental reversed bb instead of incremental flipped bb (bsw in the code)
The advantage would be, that also ranks can be computed that way.
first some little changes (improvements).
Code: Select all
lo = msk[MSK_SWNE][sq64].flipped_hi & occ[1]; //occ[1] == bsw,doesnt need 2nd adress,saves xor
lo|=bR8; //tried bts -> was worse
lo&=-lo; //LSB
lo = _byteswap_uint64(lo); //lsb becomes msb of lo
hi = msk[MSK_SWNE][sq64].line_hi & occ[0]; //occ[0] == occ
hi&=-hi;
diag = msk[MSK_SWNE][sq64].line_ex & (2*hi-lo);
Code: Select all
Bitboard(f3):
00000000
00000000
00000000
00000000
00000000
00000100
00000000
00000000
Bitboard(c6) = Reverse(f3): flipped vertical&mirroredHorizontal
00000000
00000000
00100000
00000000
00000000
00000000
00000000
00000000
extracting the bitindex before.
If this can be done as cheap as vertical-flip(bswap),
i would use an incremental reversed bb instead of incremental flipped bb (bsw in the code)
The advantage would be, that also ranks can be computed that way.
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: DIRECT BITBOARD MOVEGENERATION
Are you looking for something like this (but applied to bitbord) ?Desperado wrote:hi,
first some little changes (improvements).
Hi Marco,Code: Select all
lo = msk[MSK_SWNE][sq64].flipped_hi & occ[1]; //occ[1] == bsw,doesnt need 2nd adress,saves xor lo|=bR8; //tried bts -> was worse lo&=-lo; //LSB lo = _byteswap_uint64(lo); //lsb becomes msb of lo hi = msk[MSK_SWNE][sq64].line_hi & occ[0]; //occ[0] == occ hi&=-hi; diag = msk[MSK_SWNE][sq64].line_ex & (2*hi-lo);
this is what i like to have for all 64 squares withoutCode: Select all
Bitboard(f3): 00000000 00000000 00000000 00000000 00000000 00000100 00000000 00000000 Bitboard(c6) = Reverse(f3): flipped vertical&mirroredHorizontal 00000000 00000000 00100000 00000000 00000000 00000000 00000000 00000000
extracting the bitindex before.
If this can be done as cheap as vertical-flip(bswap),
i would use an incremental reversed bb instead of incremental flipped bb (bsw in the code)
The advantage would be, that also ranks can be computed that way.
Code: Select all
inline Square flip_square(Square s) {
return Square(int(s) ^ FlipMask);
}
-
- Posts: 879
- Joined: Mon Dec 15, 2008 11:45 am
Re: DIRECT BITBOARD MOVEGENERATION
oh, you posted while i prepared my post.
And your last question: we don t know "s".(no index).
(yes applied to bb, so to say)
just the isolated bit(board) (like in the example i posted for f3)
(reverse version)
a1...h8
b1...g8
c1...f8
. . .
AND NOT
(bswap version)
a1...a8
b1...b8
c1...c8
. . .
And your last question: we don t know "s".(no index).
(yes applied to bb, so to say)
just the isolated bit(board) (like in the example i posted for f3)
(reverse version)
a1...h8
b1...g8
c1...f8
. . .
AND NOT
(bswap version)
a1...a8
b1...b8
c1...c8
. . .
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: DIRECT BITBOARD MOVEGENERATION
I would think we could do with a division:Desperado wrote:oh, you posted while i prepared my post.
And your last question: we don t know "s".(no index).
Code: Select all
Bitboard(b):
00000000
00000000
00000000
00000000
00000000
00000100
00000000
00000000
Bitboard(center):
00000000
00000000
00000000
10000000
00000000
00000000
00000000
00000000
Bitboard(center/b):
00000000
00000000
00000000
00000000
00000000
00000000
00100000
00000000
Bitboard(center*center/b):
00000000
00000000
00100000
00000000
00000000
00000000
00000000
00000000
Code: Select all
flipped(b) = K/b; // where K = (1<<32)^2
Perhaps there is a faster way to divide for a power of 2 number.
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: DIRECT BITBOARD MOVEGENERATION
Perhaps you can use a bsf() function that is faster then bsr64() and it is enough the 32 bit version:mcostalba wrote: Perhaps there is a faster way to divide for a power of 2 number.
Code: Select all
fipped(b) = 1 << (64 - bsf32(unsigned(b));