DIRECT BITBOARD MOVEGENERATION

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: DIRECT BITBOARD MOVEGENERATION

Post by Desperado »

Well,

reversed = (UI_64) 1 << (63-bsf64(bb))

is what i need. (uhps...63 not 64 btw :) )


Unfortunately, bsf is not that faster as we would like it. Same for lzcnt.
Furthermore i was creating the version only because to eliminate the bitscans.
And because the bit is already isolated, i thought of logical operations
(like rot,or,and...) to get the bb reversed.

i dont give up,and keep it always in mind.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: DIRECT BITBOARD MOVEGENERATION

Post by mcostalba »

Desperado wrote:Well,

reversed = (UI_64) 1 << (63-bsf64(bb))

is what i need. (uhps...63 not 64 btw :) )


Unfortunately, bsf is not that faster as we would like it. Same for lzcnt.
Furthermore i was creating the version only because to eliminate the bitscans.
And because the bit is already isolated, i thought of logical operations
(like rot,or,and...) to get the bb reversed.

i dont give up,and keep it always in mind.
Here what comes to my mind is a De Bruijn Multiplication with a following lookup that retrieve the needed bitboard directly, without passing through the square.
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: DIRECT BITBOARD MOVEGENERATION

Post by Desperado »

This idea is to find somewhere else in this thread already.
Doesn t profit, because in this case we may better use
the kindgarten-bb approach.(mul,sh,lookup attacks).

And something totally different now: how can i add

PersonX wrote:

text

to my post.(doesnt see a tag for that...)
User avatar
rvida
Posts: 481
Joined: Thu Apr 16, 2009 12:00 pm
Location: Slovakia, EU

Re: DIRECT BITBOARD MOVEGENERATION

Post by rvida »

Desperado wrote: And something totally different now: how can i add

PersonX wrote:

text

to my post.(doesnt see a tag for that...)
Its Quote="PersonX" tag, ends with /Quote. Place it in [].
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: DIRECT BITBOARD MOVEGENERATION

Post by Desperado »

Thx Richard.

But not to forget beside the bit-twiddles...

Reverse OD improvements(snap for single line):
Desperado wrote:hi,

first some little changes (improvements).

Code: Select all

	 lo = msk&#91;MSK_SWNE&#93;&#91;sq64&#93;.flipped_hi & occ&#91;1&#93;;	//occ&#91;1&#93; == bsw,doesnt need 2nd adress,saves xor
	 lo|=bR8;										          //tried bts -> was worse
	 lo&=-lo;										          //LSB
	 lo = _byteswap_uint64&#40;lo&#41;;						    //lsb becomes msb of lo

	 hi = msk&#91;MSK_SWNE&#93;&#91;sq64&#93;.line_hi & occ&#91;0&#93;;		//occ&#91;0&#93; == occ
	 hi&=-hi;

	 diag = msk&#91;MSK_SWNE&#93;&#91;sq64&#93;.line_ex & &#40;2*hi-lo&#41;;

a little correction: it must be bH8 instead of bR8 of course...

I want to add, that this code runs 4x slower when 32 bit comiled than 64 bit compiled!
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: DIRECT BITBOARD MOVEGENERATION

Post by Desperado »

Hi again.

Now some days have been passed by and i thought about implentation of
OD (ROD). While doing this just another idea came to my mind, which i like to present. (maybe discuss and work out).

1. So, i am not really able to decide which implementation i should use,
because of so many different architectures. Well, i am interested what
guides some of you are following, making such decisions ?

2. As it turned out, there wasnt a really fast way to get the MSBs aside
from using fast machine instructions. A de bruijn _magic_ calculation
for a _single_ bit wasnt the right way with respect to the Kindergarten
approach...

But we can do a lookup for _both_ MSBs in parallel with the magic
approach.(one magic calculation, result: 2msbs)

Memory Requ. (about):

Code: Select all


   a. Index&#58; 4K Content&#58; 4K &#40;1byte / entry&#41; &#40;maybe&#58; pre/post masked&#41;

       - shift sq to h8
       - magic lookup
       - shift result back to sq
 
       the byte is holding the relative shift numbers for corresponding
       directions.

       byte = &#40;shift_nb_of_files_ww<<3&#41; | &#40;shift_nb_of_ranks_ss&#41; &#40;example for rook&#41;

Code: Select all


      b. Index&#58; 16K Content&#58; 16K &#40;1byte / entry&#41;

      magic lookup for each square itself &#40;not shared&#41;

       byte = &#40;shift_nb_of_files_ww<<3&#41; | &#40;shift_nb_of_ranks_ss&#41; &#40;example for rook&#41;
Of course i have in mind some alternatives, but i hope these 2 are enough to show the basic idea.

Well, i am very interested in point _1._ and some remarks on it.

Thx
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: DIRECT BITBOARD MOVEGENERATION

Post by Gerd Isenberg »

Desperado wrote:Hi again.

Now some days have been passed by and i thought about implentation of
OD (ROD). While doing this just another idea came to my mind, which i like to present. (maybe discuss and work out).

1. So, i am not really able to decide which implementation i should use,
because of so many different architectures. Well, i am interested what
guides some of you are following, making such decisions ?

2. As it turned out, there wasnt a really fast way to get the MSBs aside
from using fast machine instructions. A de bruijn _magic_ calculation
for a _single_ bit wasnt the right way with respect to the Kindergarten
approach...

But we can do a lookup for _both_ MSBs in parallel with the magic
approach.(one magic calculation, result: 2msbs)

Memory Requ. (about):

Code: Select all


   a. Index&#58; 4K Content&#58; 4K &#40;1byte / entry&#41; &#40;maybe&#58; pre/post masked&#41;

       - shift sq to h8
       - magic lookup
       - shift result back to sq
 
       the byte is holding the relative shift numbers for corresponding
       directions.

       byte = &#40;shift_nb_of_files_ww<<3&#41; | &#40;shift_nb_of_ranks_ss&#41; &#40;example for rook&#41;

Code: Select all


      b. Index&#58; 16K Content&#58; 16K &#40;1byte / entry&#41;

      magic lookup for each square itself &#40;not shared&#41;

       byte = &#40;shift_nb_of_files_ww<<3&#41; | &#40;shift_nb_of_ranks_ss&#41; &#40;example for rook&#41;
Of course i have in mind some alternatives, but i hope these 2 are enough to show the basic idea.

Well, i am very interested in point _1._ and some remarks on it.

Thx
I suggest to hide the implementation, but to keep your original idea of OD using bsr, maybe with one alternative approach (f.i. 32-bit Kindergarten BB) which you may conditionally compile on K8 or 32-bit systems.

For your other proposals, well it is to hot today.
Takes some time ;-)

Cheers,
Gerd