ChessUSA.com TalkChess.com
Hosted by Your Move Chess & Games
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

New simple and fast bitboard move generator
Post new topic    TalkChess.com Forum Index -> Computer Chess Club: Programming and Technical Discussions Flat
View previous topic :: View next topic  
Author Message
Gerd Isenberg



Joined: 08 Mar 2006
Posts: 1941
Location: Hattingen, Germany

PostPost subject: Re: New simple and fast bitboard move generator    Posted: Sun May 06, 2007 8:52 am Reply to topic Reply with quote

Assuming bsf/bsr doesn't change the target register if source is zero, and zero is never a found index here, a pre-initialization of the target-register is not necessary if it is already the source. To use only one cacheline for the occ-ands instead of four, i suggest to pack four bitboards into a struct (quad-bitboard) and to have an array of 64 quad-bitboards. Is your index unique over all squares or a char-range only per square? I tried it on the fly with 16-bit short indices, to get some assembly to inspect.

Code:
u64 bits[64][4];
unsigned short indexN[64][64];
unsigned short indexE[64][64];
unsigned short indexS[64][64];
unsigned short indexW[64][64];
u64 rookLookup[...];

__forceinline
u64 rookAttacks(u64 occ, u32 sq) {
    u64 n = occ & bits[sq][0];
    u64 e = occ & bits[sq][1];
    u64 s = occ & bits[sq][2];
    u64 w = occ & bits[sq][3];
    _BitScanForward64((unsigned long*)&n, n);
    _BitScanForward64((unsigned long*)&e, e);
    _BitScanReverse64((unsigned long*)&s, s);
    _BitScanReverse64((unsigned long*)&w, w);
    u32 idx = indexN[sq][n] | indexE[sq][e] | indexS[sq][s] | indexW[sq][w];
    return rookLookup[idx];
}


occ$ = 8
sq$ = 16
?rookAttacks@@YA_K_KI@Z PROC
  00000   4c 8d 1d 00 00
   00 00       lea    r11, OFFSET FLAT:__ImageBase
  00007   4c 8b c9    mov    r9, rcx
  0000a   8b c2       mov    eax, edx
  0000c   48 c1 e0 05    shl    rax, 5
  00010   44 8b d2    mov    r10d, edx
  00013   4e 8b 84 18 00
   00 00 00    mov    r8, QWORD PTR ?bits[rax+r11]
  0001b   4a 8b 94 18 08
   00 00 00    mov    rdx, QWORD PTR ?bits[rax+r11+8]
  00023   49 c1 e2 06    shl    r10, 6
  00027   48 23 d1    and    rdx, rcx
  0002a   4c 23 c1    and    r8, rcx
  0002d   4a 8b 8c 18 10
   00 00 00    mov    rcx, QWORD PTR ?bits[rax+r11+16]
  00035   4a 8b 84 18 18
   00 00 00    mov    rax, QWORD PTR ?bits[rax+r11+24]
  0003d   49 23 c9    and    rcx, r9
  00040   49 23 c1    and    rax, r9
  00043   48 0f bd c9    bsr    rcx, rcx
  00047   4d 0f bc c8    bsf    r9, r8
  0004b   4c 0f bc c2    bsf    r8, rdx
  0004f   48 0f bd c0    bsr    rax, rax
  00053   49 03 c2    add    rax, r10
  00056   41 0f b7 94 43
   00 00 00 00    movzx    edx, WORD PTR ?indexW[r11+rax*2]
  0005f   49 8d 04 0a    lea    rax, QWORD PTR [r10+rcx]
  00063   4b 8d 0c 02    lea    rcx, QWORD PTR [r10+r8]
  00067   41 0f b7 8c 4b
   00 00 00 00    movzx    ecx, WORD PTR ?indexE[r11+rcx*2]
  00070   41 0f b7 84 43
   00 00 00 00    movzx    eax, WORD PTR ?indexS[r11+rax*2]
  00079   48 0b d0    or    rdx, rax
  0007c   48 0b d1    or    rdx, rcx
  0007f   4b 8d 0c 0a    lea    rcx, QWORD PTR [r10+r9]
  00083   41 0f b7 8c 4b
   00 00 00 00    movzx    ecx, WORD PTR ?indexN[r11+rcx*2]
  0008c   48 0b d1    or    rdx, rcx
  0008f   49 8b 84 d3 00
   00 00 00    mov    rax, QWORD PTR ?rookLookup[r11+rdx*8]
  00097   c3       ret    0
?rookAttacks@@YA_K_KI@Z ENDP


Quite modest register usage and the independent instruction chains until the middle of the routine. May be it is improvable - at least code size wise in writing assembly. But again, so far you need to transer six cachelines by nine memory reads, instaed of two in kindergarten-bitboards out of 4.5 KByte. It would be nice to combine the four index-arrays to one, since a found 6-inner blocking square, already implies the attack direction as well. But in the case of no blocking square is found by bsf/bsr, one needs the direction.

For comparison - kindergarten rook-attacks. Shifts by cl seems the bottleneck to probably perform better ipc - but one has to consider shadow-registers and OOOE. The one-dimensional firstRankAttacks2-array saves one C-instruction with explicit *4 scaling instead of implicit *8. The resulting code was shorter and even saved a register with vc2005. Unfortunately the similar trick doesn't work to save the shr 3 with a one-dimensional aFileAttacks[8*64] array.
Code:

extern u64 CACHE_ALIGN aFileAttacks[64][8];     // 4 KByte
extern u8  CACHE_ALIGN firstRankAttacks2[64*8]; // 0.5 KByte


u64 rookAttacks(u64 occ, u32 sq) {
    u32 k = sq &  7;
    u32 i = sq & 56;
    u32 n = (u32)(occ >> i)    & 2*63;
    u64 d = firstRankAttacks2[4*n+k]; // [n/2][k]
    u64 e = 0x0101010101010101 & (occ >> k);
    u64 r = 0x0080402010080400 *  e   >> 58;
    u64 garten  =   aFileAttacks[r][i >> 3];
    return       (  d << i) | (garten << k);   
}

occ$ = 8
sq$ = 16
?rookAttacks@@YA_K_KI@Z PROC
  00000   44 8b c2    mov    r8d, edx
  00003   83 e2 38    and    edx, 56
  00006   4c 8b d1    mov    r10, rcx
  00009   44 8b ca    mov    r9d, edx
  0000c   49 8b d2    mov    rdx, r10
  0000f   41 83 e0 07    and    r8d, 7
  00013   41 8b c8    mov    ecx, r8d
  00016   48 b8 01 01 01
   01 01 01 01 01    mov    rax, 0101010101010101H
  00020   4c 8d 1d 00 00
   00 00       lea    r11, OFFSET FLAT:__ImageBase
  00027   48 d3 ea    shr    rdx, cl
  0002a   48 23 d0    and    rdx, rax
  0002d   48 b8 00 04 08
   10 20 40 80 00    mov    rax, 0080402010080400H
  00037   48 0f af d0    imul    rdx, rax
  0003b   49 8b c1    mov    rax, r9
  0003e   48 c1 e8 03    shr    rax, 3
  00042   48 c1 ea 3a    shr    rdx, 58
  00046   48 8d 14 d0    lea    rdx, QWORD PTR [rax+rdx*8]
  0004a   49 8b 84 d3 00
   00 00 00    mov    rax, QWORD PTR ?aFileAttacks[r11+rdx*8]
  00052   48 d3 e0    shl    rax, cl
  00055   49 8b c9    mov    rcx, r9
  00058   49 d3 ea    shr    r10, cl
  0005b   41 83 e2 7e    and    r10d, 126
  0005f   43 8d 14 90    lea    edx, DWORD PTR [r8+r10*4]
  00063   42 0f b6 94 1a
   00 00 00 00    movzx    edx, BYTE PTR ?firstRankAttacks2[rdx+r11]
  0006c   48 d3 e2    shl    rdx, cl
  0006f   48 0b c2    or    rax, rdx
  00072   c3       ret    0

Gerd
Back to top
View user's profile Send private message
Display posts from previous:   
Subject Author Date/Time
New simple and fast bitboard move generator Michael Sherwin Sat May 05, 2007 3:52 pm
      Re: New simple and fast bitboard move generator Gerd Isenberg Sat May 05, 2007 5:43 pm
            Re: New simple and fast bitboard move generator Gerd Isenberg Sun May 06, 2007 8:52 am
                  Re: New simple and fast bitboard move generator Michael Sherwin Sun May 06, 2007 10:31 am
                        Re: New simple and fast bitboard move generator Gerd Isenberg Sun May 06, 2007 11:16 am
                              Re: New simple and fast bitboard move generator Michael Sherwin Sun May 06, 2007 1:21 pm
                                    Re: New simple and fast bitboard move generator Gerd Isenberg Sun May 06, 2007 1:52 pm
                                          Re: New simple and fast bitboard move generator Michael Sherwin Sun May 06, 2007 3:15 pm
                              Re: New simple and fast bitboard move generator Michael Sherwin Sun May 06, 2007 10:29 pm
                                    Re: New simple and fast bitboard move generator Gerd Isenberg Mon May 07, 2007 8:40 am
                                          Re: New simple and fast bitboard move generator Michael Sherwin Mon May 07, 2007 11:06 am
                                          Re: New simple and fast bitboard move generator Gerd Isenberg Mon May 07, 2007 8:08 pm
                                                Re: New simple and fast bitboard move generator Gerd Isenberg Mon May 07, 2007 11:22 pm
                                                      Re: New simple and fast bitboard move generator Michael Sherwin Tue May 08, 2007 3:18 am
                                                            Re: New simple and fast bitboard move generator Gerd Isenberg Tue May 08, 2007 6:11 am
                                                Re: New simple and fast bitboard move generator Michael Sherwin Tue May 08, 2007 9:25 pm
                                                      Re: New simple and fast bitboard move generator Michael Sherwin Tue May 08, 2007 10:55 pm
                                                      Re: New simple and fast bitboard move generator Gerd Isenberg Wed May 09, 2007 3:18 am
                                                            Re: New simple and fast bitboard move generator Tony Wed May 09, 2007 6:10 am
                                                                  Re: New simple and fast bitboard move generator Michael Sherwin Wed May 09, 2007 7:47 am
                                                                        Re: New simple and fast bitboard move generator Tony Wed May 09, 2007 8:24 am
                                                      Re: New simple and fast bitboard move generator Gerd Isenberg Wed May 09, 2007 7:31 pm
                                                            Re: New simple and fast bitboard move generator Michael Sherwin Wed May 09, 2007 11:30 pm
            Re: New simple and fast bitboard move generator Tony Mon May 07, 2007 6:46 am
                  Re: New simple and fast bitboard move generator Gerd Isenberg Mon May 07, 2007 8:51 am
Post new topic    TalkChess.com Forum Index -> Computer Chess Club: Programming and Technical Discussions

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




Powered by phpBB © 2001, 2005 phpBB Group
Enhanced with Moby Threads