New simple and fast bitboard move generator

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Tony

Re: New simple and fast bitboard move generator

Post by Tony » Wed May 09, 2007 6:10 am

Gerd Isenberg wrote:
Michael Sherwin wrote:Would this fix the problem?

The corner squares do not matter as far as occupancy goes, so just make sure the corner bits are set.

Also make sure the appropriate corner bit is set for the direction bits.

Now there will always be at least one set bit to find!

And it takes less instructions!!

Code: Select all

typedef struct { 
  u16 idx[64]; 
  u64 bitsN; 
  u64 bitsE; 
  u64 bitsS; 
  u64 bitsW; 
} square; 

u64 rookAttacks(u64 occ, square *sq) { 
    unsigned long ln, le, ls, lw; 
    occ |= fourCorners;
    _BitScanForward64(&ln, occ & sq->bitsN); 
    _BitScanForward64(&le, occ & sq->bitsE); 
    _BitScanReverse64(&ls, occ & sq->bitsS); 
    _BitScanReverse64(&lw, occ & sq->bitsW); 
    u32 idx = sq->idx[ln] | sq->idx[le] | sq->idx[ls] | sq->idx[lw]; 
    return rookLookup[idx]; 
Yes, very good - that would work!
Hmm, don't think so.

For east and west, one should use the A and H file. For N & S, 1st and 2nd rank.

It still would go wrong if the rook is on the border squares ? ie N attacks for B8 ?

Cheers,

Tony

Michael Sherwin
Posts: 2798
Joined: Fri May 26, 2006 1:00 am
Location: OH, USA

Re: New simple and fast bitboard move generator

Post by Michael Sherwin » Wed May 09, 2007 7:47 am

Tony wrote:
Gerd Isenberg wrote:
Michael Sherwin wrote:Would this fix the problem?

The corner squares do not matter as far as occupancy goes, so just make sure the corner bits are set.

Also make sure the appropriate corner bit is set for the direction bits.

Now there will always be at least one set bit to find!

And it takes less instructions!!

Code: Select all

typedef struct { 
  u16 idx[64]; 
  u64 bitsN; 
  u64 bitsE; 
  u64 bitsS; 
  u64 bitsW; 
} square; 

u64 rookAttacks(u64 occ, square *sq) { 
    unsigned long ln, le, ls, lw; 
    occ |= fourCorners;
    _BitScanForward64(&ln, occ & sq->bitsN); 
    _BitScanForward64(&le, occ & sq->bitsE); 
    _BitScanReverse64(&ls, occ & sq->bitsS); 
    _BitScanReverse64(&lw, occ & sq->bitsW); 
    u32 idx = sq->idx[ln] | sq->idx[le] | sq->idx[ls] | sq->idx[lw]; 
    return rookLookup[idx]; 
Yes, very good - that would work!
Hmm, don't think so.

For east and west, one should use the A and H file. For N & S, 1st and 2nd rank.

It still would go wrong if the rook is on the border squares ? ie N attacks for B8 ?

Cheers,

Tony
I think that it would be okay.

A rook on B8 would have the north direction bit for H8 set and that corner bit is set, so the bitscan would find only that bit for north. Then in the index lookup for bit 63 would be a 0 (no contribution towards the index).

A rook on H8 would also have bit 63 set in the north direction and once again that index lookup will be a 0.

In actuality only 'twoCorners are needed, bit 0 and bit 63, but fourCorners sounded better!

Further, a rook on A1 would then have its S and W directions 0 bit set and idx[0] returns a 0. And for N and E bit 63 is set and idx[63] always returns a 0 (if it gets that far with out first finding a set bit).

Please correct me if I am wrong! 8-)
Regards,
Mike

Tony

Re: New simple and fast bitboard move generator

Post by Tony » Wed May 09, 2007 8:24 am

Michael Sherwin wrote:
Tony wrote:
Gerd Isenberg wrote:
Michael Sherwin wrote:Would this fix the problem?

The corner squares do not matter as far as occupancy goes, so just make sure the corner bits are set.

Also make sure the appropriate corner bit is set for the direction bits.

Now there will always be at least one set bit to find!

And it takes less instructions!!

Code: Select all

typedef struct { 
  u16 idx[64]; 
  u64 bitsN; 
  u64 bitsE; 
  u64 bitsS; 
  u64 bitsW; 
} square; 

u64 rookAttacks(u64 occ, square *sq) { 
    unsigned long ln, le, ls, lw; 
    occ |= fourCorners;
    _BitScanForward64(&ln, occ & sq->bitsN); 
    _BitScanForward64(&le, occ & sq->bitsE); 
    _BitScanReverse64(&ls, occ & sq->bitsS); 
    _BitScanReverse64(&lw, occ & sq->bitsW); 
    u32 idx = sq->idx[ln] | sq->idx[le] | sq->idx[ls] | sq->idx[lw]; 
    return rookLookup[idx]; 
Yes, very good - that would work!
Hmm, don't think so.

For east and west, one should use the A and H file. For N & S, 1st and 2nd rank.

It still would go wrong if the rook is on the border squares ? ie N attacks for B8 ?

Cheers,

Tony
I think that it would be okay.

A rook on B8 would have the north direction bit for H8 set and that corner bit is set, so the bitscan would find only that bit for north. Then in the index lookup for bit 63 would be a 0 (no contribution towards the index).

A rook on H8 would also have bit 63 set in the north direction and once again that index lookup will be a 0.

In actuality only 'twoCorners are needed, bit 0 and bit 63, but fourCorners sounded better!

Further, a rook on A1 would then have its S and W directions 0 bit set and idx[0] returns a 0. And for N and E bit 63 is set and idx[63] always returns a 0 (if it gets that far with out first finding a set bit).

Please correct me if I am wrong! 8-)
Ah, ok. I didn't get that the "& sq->bitsN" for B8 would leave the corner squares set.

Tony

Gerd Isenberg
Posts: 2103
Joined: Wed Mar 08, 2006 7:47 pm
Location: Hattingen, Germany

Re: New simple and fast bitboard move generator

Post by Gerd Isenberg » Wed May 09, 2007 7:31 pm

This source with vc2005 results in assembly which looks closest to the hand made one. Other versions are either more sequential or use more than the seven scratch registers, so that it has to save/restore caller safe registers (rbx) on the stack...

Code: Select all

#pragma intrinsic(_BitScanForward64)
#pragma intrinsic(_BitScanReverse64)

typedef struct
{
    u64 bitsN;
    u64 bitsE;
    u64 bitsS;
    u64 bitsW;
    unsigned short index[64];
} SQUARE;

u64 rookAttacks(u64 occ, SQUARE *sq) {
    u64 n, e, s, w;
    unsigned long ln, le;
    occ |= 0x8100000000000081;
    n = occ & sq->bitsN;
    e = occ & sq->bitsE;
    s = occ & sq->bitsS;
    w = occ & sq->bitsW;
    _BitScanForward64(&ln, n);
    _BitScanForward64(&le, e);
    _BitScanReverse64((unsigned long*)&s, s);
    _BitScanReverse64((unsigned long*)&w, w);
    u32 idx = sq->index[ln] | sq->index[le] | sq->index[s] | sq->index[w];
    return rookLookup[idx];
}


occ$ = 8
sq$ = 16
?rookAttacks@@YA_K_KPEAUSQUARE@@@Z PROC
  00000	4c 8b 52 10	 mov	 r10, QWORD PTR [rdx+16]
  00004	4c 8b 42 18	 mov	 r8, QWORD PTR [rdx+24]
  00008	48 b8 81 00 00
	00 00 00 00 81	 mov	 rax, 8100000000000081H
  00012	48 0b c8	 or	 rcx, rax
  00015	48 8b 02	 mov	 rax, QWORD PTR [rdx]
  00018	4c 8b da	 mov	 r11, rdx
  0001b	48 23 c1	 and	 rax, rcx
  0001e	4c 23 d1	 and	 r10, rcx
  00021	4c 23 c1	 and	 r8, rcx
  00024	4c 0f bc c8	 bsf	 r9, rax
  00028	48 8b 42 08	 mov	 rax, QWORD PTR [rdx+8]
  0002c	4d 0f bd c0	 bsr	 r8, r8
  00030	48 23 c1	 and	 rax, rcx
  00033	4d 0f bd d2	 bsr	 r10, r10
  00037	48 0f bc c8	 bsf	 rcx, rax
  0003b	0f b7 44 4a 20	 movzx	 eax, WORD PTR [rdx+rcx*2+32]
  00040	43 0f b7 4c 43
	20		 movzx	 ecx, WORD PTR [r11+r8*2+32]
  00046	42 0f b7 54 4a
	20		 movzx	 edx, WORD PTR [rdx+r9*2+32]
  0004c	48 0b c2	 or	 rax, rdx
  0004f	48 0b c1	 or	 rax, rcx
  00052	43 0f b7 4c 53
	20		 movzx	 ecx, WORD PTR [r11+r10*2+32]
  00058	48 0b c1	 or	 rax, rcx
  0005b	48 8d 0d 00 00
	00 00		 lea	 rcx, OFFSET FLAT:?rookLookup
  00062	48 8b 04 c1	 mov	 rax, QWORD PTR [rcx+rax*8]
  00066	c3		 ret	 0
?rookAttacks@@YA_K_KPEAUSQUARE@@@Z ENDP
; Function compile flags: /Ogtpy
This is the additional overhead to build the sq*:

Code: Select all

SQUARE squares[64];

u64 rookAttacks(u64 occ, u32 sq) {
    return rookAttacks(occ, squares+sq);
}

occ$ = 8
sq$ = 16
?rookAttacks@@YA_K_KI@Z PROC				; rookAttacks, COMDAT
  00000	8b c2		 mov	 eax, edx
  00002	4c 8d 14 80	 lea	 r10, QWORD PTR [rax+rax*4]
  00006	48 8d 05 00 00
	00 00		 lea	 rax, OFFSET FLAT:?squares
  0000d	49 c1 e2 05	 shl	 r10, 5
  00011	4c 03 d0	 add	 r10, rax
  00014	48 b8 81 00 00
	00 00 00 00 81	 mov	 rax, 8100000000000081H
  0001e	4d 8b 42 18	 mov	 r8, QWORD PTR [r10+24]
  00022	4d 8b 4a 10	 mov	 r9, QWORD PTR [r10+16]
  00026	48 0b c8	 or	 rcx, rax
  00029	49 8b 02	 mov	 rax, QWORD PTR [r10]
  0002c	4c 23 c1	 and	 r8, rcx
  0002f	4c 23 c9	 and	 r9, rcx
  00032	48 23 c1	 and	 rax, rcx
  00035	4d 0f bd c0	 bsr	 r8, r8
  00039	4d 0f bd c9	 bsr	 r9, r9
  0003d	48 0f bc d0	 bsf	 rdx, rax
  00041	49 8b 42 08	 mov	 rax, QWORD PTR [r10+8]
  00045	41 0f b7 54 52
	20		 movzx	 edx, WORD PTR [r10+rdx*2+32]
  0004b	48 23 c1	 and	 rax, rcx
  0004e	48 0f bc c8	 bsf	 rcx, rax
  00052	41 0f b7 44 4a
	20		 movzx	 eax, WORD PTR [r10+rcx*2+32]
  00058	43 0f b7 4c 42
	20		 movzx	 ecx, WORD PTR [r10+r8*2+32]
  0005e	48 0b c2	 or	 rax, rdx
  00061	48 0b c1	 or	 rax, rcx
  00064	43 0f b7 4c 4a
	20		 movzx	 ecx, WORD PTR [r10+r9*2+32]
  0006a	48 0b c1	 or	 rax, rcx
  0006d	48 8d 0d 00 00
	00 00		 lea	 rcx, OFFSET FLAT:?rookLookup
  00074	48 8b 04 c1	 mov	 rax, QWORD PTR [rcx+rax*8]
  00078	c3		 ret	 0
?rookAttacks@@YA_K_KI@Z ENDP

Michael Sherwin
Posts: 2798
Joined: Fri May 26, 2006 1:00 am
Location: OH, USA

Re: New simple and fast bitboard move generator

Post by Michael Sherwin » Wed May 09, 2007 11:30 pm

Thanks Gerd,

:D

Mike
Regards,
Mike

Post Reply