DIRECT BITBOARD MOVEGENERATION

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Jul 25, 2009 9:17 pm

Gerd Isenberg wrote: For intels with fast bsr but slower bswap, this approach might be not that bad as I initially thought. Alternatively one can save 5 ops by upper/lower lookups from the very same cacheline to end up with 9 ops per line (file, rank, dia, antidia) and two independent instruction chains until the final lea/and:
Code: Select all
U64 lineAttacks&#40;U64 occ, enumLine line, enumSquare sq&#41; &#123;
   U64 low, high;
   low  = smsk&#91;sq&#93;&#91;line&#93;.lower & occ;
   high = smsk&#91;sq&#93;&#91;line&#93;.upper & occ;
   low  = -C64&#40;1&#41; << bsr64&#40;low|1&#41;; // ms1b of low &#40;at least bit zero&#41; -> -low
   high = high & -high;            // ls1b of high &#40;if any&#41;
   return smsk&#91;sq&#93;&#91;line&#93;.maskEx & &#40;2*high+low&#41;; // lea option due to add -low
&#125; 

It also looks quite register friendly. 3 regs per line, 6 KByte memory needed for all lines ...

Code: Select all

; rcx - occ
; rdx - pointer to mask structure&#91;sq&#93;&#91;line&#93; &#40;not changed&#41;
mov   rax, rcx
and   rcx, &#91;rdx + LOW&#93;
or    rcx, 1
mov   r8, -1
bsr   rcx, rcx
and   rax, &#91;rdx + HIGH&#93;
shl   r8, cl
mov   rcx, rax
neg   rcx
and   rax, rcx
lea   rax, &#91;2*rax + r8&#93;
and   rax, &#91;rdx + MASKEX&#93;

Does it beat Hyperbola or even Magic?

Desperado · Post by **Desperado** » Sat Jul 25, 2009 10:30 pm

hi.

the thing i worked out with the less time i have left at the moment.
would be happy again for any idea. if anyone would like to the the
performance, feel free

.

Any improvements, explanations what may be wrong here, just
report...thx

Code: Select all

BTB file&#40;const BTB occ,const SQR_ID sq64&#41;
	&#123;
	 //locals
	 BTB fle,lo,hi;

	 //hi&#58; occupancy bits above sq &#40;rnk8 bit is set independent&#41;
	 //lo&#58; occupancy bits below sq &#40;rnk1 bit is set independent&#41;
	 //msk_hi&#58; precalculated bits above sq on file
	 //msk_bit_on_rnkx&#58; corresponding bit for file on rnk x

	 hi = &#40;occ & msk.hi&#91;sq64&#93;) | msk.msk_bit_on_rnk8&#91;sq64&#93;;
	 lo = &#40;occ & msk.lo&#91;sq64&#93;) | msk.msk_bit_on_rnk1&#91;sq64&#93;;

	 hi &= -hi;									//lsb of bits above sq
	 lo  = BB&#40;bsr64&#40;lo&#41;);						//msb of bits below sq

	 fle = ((&#40;hi<<1&#41;-lo&#41; & msk.msk_file&#91;sq64&#93;); //connecting bits 

	 return&#40;fle&#41;;
	&#125;

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Jul 25, 2009 10:57 pm

Your routine is fine, except three optimizations already posted elsewhere in this thread. "High" don't needs the upper bit set, since you may borrow from hidden 2^64 anyway. Only "low" needs to be at least one in case of empty set, so just or "low" by one.

Code: Select all

BTB file&#40;const BTB occ,const SQR_ID sq64&#41;
&#123;
    //locals
    BTB fle,lo,hi;

    //hi&#58; occupancy bits above sq &#40;may be empty&#41;
    //lo&#58; occupancy bits below sq &#40;at least bit zero&#41;
    //msk_hi&#58; precalculated bits above sq on file
    //msk_bit_on_rnkx&#58; corresponding bit for file on rnk x

    hi = &#40;occ & msk.hi&#91;sq64&#93;);
    lo = &#40;occ & msk.lo&#91;sq64&#93;) | 1;

    hi &= -hi;                           //lsb of bits above sq
    lo  = BB&#40;bsr64&#40;lo&#41;);                  //msb of bits below sq

    fle = ((&#40;hi<<1&#41;-lo&#41; & msk.msk_file&#91;sq64&#93;); //connecting bits
    return&#40;fle&#41;;
&#125;

Desperado · Post by **Desperado** » Sat Jul 25, 2009 11:06 pm

of course !

.

20 % faster immediatelly. (about...)

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Jul 25, 2009 11:20 pm

Desperado wrote:of course ! .

20 % faster immediatelly. (about...)

A possible micro optimization is to use -low = -1 << bsr, for x86 lea-instruction with 2*high+(-low) instead of (high<<1)-low. ) Nine operations. Few registers, independent instruction chains. Your idea has become hot now. Needs a name

Desperado · Post by **Desperado** » Sat Jul 25, 2009 11:51 pm

you are making fun !?

Desperado · Post by **Desperado** » Sun Jul 26, 2009 9:09 am

hi.

well, if i am thinking for a name, what do you think about this?

"Significant Bit Solution (SBS)".

So, thank you Gerd for your improvements and explanations.
At the moment i dont know how to get it better.
Maybe the next days (when time is left), i will compare with my magic
bitboard scheme.

Gerd Isenberg · Post by **Gerd Isenberg** » Sun Jul 26, 2009 11:12 am

Desperado wrote:hi.

well, if i am thinking for a name, what do you think about this?

"Significant Bit Solution (SBS)".

Sounds a bit frumpy, imho. My proposals:

Hoffmann's Obstructed Difference Action

Desperado's Obstructed Difference
Obstructed Difference

mcostalba · Post by **mcostalba** » Sun Jul 26, 2009 11:35 am

Gerd Isenberg wrote:Your idea has become hot now. Needs a name

If idea remains hot after speed test vs magic bitboards I would be very glad to try to integrate in Stockfish and test there in real games with a complete chess engine around it.

In any case it is very interesting. The only little issue that I note is the use of bsr64 that is not usable on 32 bit machines, as the one that I have and on which I can test BTW

One of the advantage of magic bitboards is that there exsist a 32 bit friendly version, and this is very important for a candidate general solution to sliding attacks calculation as I think could be this new one.

xsadar · Post by **xsadar** » Sun Jul 26, 2009 6:59 pm

Gerd Isenberg wrote:
Desperado wrote:hi.

well, if i am thinking for a name, what do you think about this?

"Significant Bit Solution (SBS)".
Sounds a bit frumpy, imho. My proposals:

Hoffmann's Obstructed Difference Action
Desperado's Obstructed Difference
Obstructed Difference

I would vote "Obstruction Difference" (instead of obstructed) or "Hoffman's Obstruction Difference" (omit "action", but we can leave the smiley in the name if you want

).
"Obstructed difference" sounds like it's obstructing the difference somehow, but "obstruction difference" sounds like what it's doing: taking the difference of the obstructions.

By the way, thanks to both of you for working this out. Oddly, just before I read this the other day, I was thinking there had to be a way to generate moves similar to this, but couldn't quite work it out in my head.

DIRECT BITBOARD MOVEGENERATION

Re: DIRECT BITBOARD MOVEGENERATION

Re: DIRECT BITBOARD MOVEGENERATION

Re: DIRECT BITBOARD MOVEGENERATION

Re: DIRECT BITBOARD MOVEGENERATION

Re: DIRECT BITBOARD MOVEGENERATION

Re: DIRECT BITBOARD MOVEGENERATION

Re: DIRECT BITBOARD MOVEGENERATION

Re: DIRECT BITBOARD MOVEGENERATION

Re: DIRECT BITBOARD MOVEGENERATION

Re: DIRECT BITBOARD MOVEGENERATION