Well,
reversed = (UI_64) 1 << (63-bsf64(bb))
is what i need. (uhps...63 not 64 btw )
Unfortunately, bsf is not that faster as we would like it. Same for lzcnt.
Furthermore i was creating the version only because to eliminate the bitscans.
And because the bit is already isolated, i thought of logical operations
(like rot,or,and...) to get the bb reversed.
i dont give up,and keep it always in mind.
DIRECT BITBOARD MOVEGENERATION
Moderators: hgm, Rebel, chrisw
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: DIRECT BITBOARD MOVEGENERATION
Here what comes to my mind is a De Bruijn Multiplication with a following lookup that retrieve the needed bitboard directly, without passing through the square.Desperado wrote:Well,
reversed = (UI_64) 1 << (63-bsf64(bb))
is what i need. (uhps...63 not 64 btw )
Unfortunately, bsf is not that faster as we would like it. Same for lzcnt.
Furthermore i was creating the version only because to eliminate the bitscans.
And because the bit is already isolated, i thought of logical operations
(like rot,or,and...) to get the bb reversed.
i dont give up,and keep it always in mind.
-
- Posts: 879
- Joined: Mon Dec 15, 2008 11:45 am
Re: DIRECT BITBOARD MOVEGENERATION
This idea is to find somewhere else in this thread already.
Doesn t profit, because in this case we may better use
the kindgarten-bb approach.(mul,sh,lookup attacks).
And something totally different now: how can i add
PersonX wrote:
text
to my post.(doesnt see a tag for that...)
Doesn t profit, because in this case we may better use
the kindgarten-bb approach.(mul,sh,lookup attacks).
And something totally different now: how can i add
PersonX wrote:
text
to my post.(doesnt see a tag for that...)
-
- Posts: 481
- Joined: Thu Apr 16, 2009 12:00 pm
- Location: Slovakia, EU
Re: DIRECT BITBOARD MOVEGENERATION
Its Quote="PersonX" tag, ends with /Quote. Place it in [].Desperado wrote: And something totally different now: how can i add
PersonX wrote:
text
to my post.(doesnt see a tag for that...)
-
- Posts: 879
- Joined: Mon Dec 15, 2008 11:45 am
Re: DIRECT BITBOARD MOVEGENERATION
Thx Richard.
But not to forget beside the bit-twiddles...
Reverse OD improvements(snap for single line):
I want to add, that this code runs 4x slower when 32 bit comiled than 64 bit compiled!
But not to forget beside the bit-twiddles...
Reverse OD improvements(snap for single line):
a little correction: it must be bH8 instead of bR8 of course...Desperado wrote:hi,
first some little changes (improvements).
Code: Select all
lo = msk[MSK_SWNE][sq64].flipped_hi & occ[1]; //occ[1] == bsw,doesnt need 2nd adress,saves xor lo|=bR8; //tried bts -> was worse lo&=-lo; //LSB lo = _byteswap_uint64(lo); //lsb becomes msb of lo hi = msk[MSK_SWNE][sq64].line_hi & occ[0]; //occ[0] == occ hi&=-hi; diag = msk[MSK_SWNE][sq64].line_ex & (2*hi-lo);
I want to add, that this code runs 4x slower when 32 bit comiled than 64 bit compiled!
-
- Posts: 879
- Joined: Mon Dec 15, 2008 11:45 am
Re: DIRECT BITBOARD MOVEGENERATION
Hi again.
Now some days have been passed by and i thought about implentation of
OD (ROD). While doing this just another idea came to my mind, which i like to present. (maybe discuss and work out).
1. So, i am not really able to decide which implementation i should use,
because of so many different architectures. Well, i am interested what
guides some of you are following, making such decisions ?
2. As it turned out, there wasnt a really fast way to get the MSBs aside
from using fast machine instructions. A de bruijn _magic_ calculation
for a _single_ bit wasnt the right way with respect to the Kindergarten
approach...
But we can do a lookup for _both_ MSBs in parallel with the magic
approach.(one magic calculation, result: 2msbs)
Memory Requ. (about):
Of course i have in mind some alternatives, but i hope these 2 are enough to show the basic idea.
Well, i am very interested in point _1._ and some remarks on it.
Thx
Now some days have been passed by and i thought about implentation of
OD (ROD). While doing this just another idea came to my mind, which i like to present. (maybe discuss and work out).
1. So, i am not really able to decide which implementation i should use,
because of so many different architectures. Well, i am interested what
guides some of you are following, making such decisions ?
2. As it turned out, there wasnt a really fast way to get the MSBs aside
from using fast machine instructions. A de bruijn _magic_ calculation
for a _single_ bit wasnt the right way with respect to the Kindergarten
approach...
But we can do a lookup for _both_ MSBs in parallel with the magic
approach.(one magic calculation, result: 2msbs)
Memory Requ. (about):
Code: Select all
a. Index: 4K Content: 4K (1byte / entry) (maybe: pre/post masked)
- shift sq to h8
- magic lookup
- shift result back to sq
the byte is holding the relative shift numbers for corresponding
directions.
byte = (shift_nb_of_files_ww<<3) | (shift_nb_of_ranks_ss) (example for rook)
Code: Select all
b. Index: 16K Content: 16K (1byte / entry)
magic lookup for each square itself (not shared)
byte = (shift_nb_of_files_ww<<3) | (shift_nb_of_ranks_ss) (example for rook)
Well, i am very interested in point _1._ and some remarks on it.
Thx
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: DIRECT BITBOARD MOVEGENERATION
I suggest to hide the implementation, but to keep your original idea of OD using bsr, maybe with one alternative approach (f.i. 32-bit Kindergarten BB) which you may conditionally compile on K8 or 32-bit systems.Desperado wrote:Hi again.
Now some days have been passed by and i thought about implentation of
OD (ROD). While doing this just another idea came to my mind, which i like to present. (maybe discuss and work out).
1. So, i am not really able to decide which implementation i should use,
because of so many different architectures. Well, i am interested what
guides some of you are following, making such decisions ?
2. As it turned out, there wasnt a really fast way to get the MSBs aside
from using fast machine instructions. A de bruijn _magic_ calculation
for a _single_ bit wasnt the right way with respect to the Kindergarten
approach...
But we can do a lookup for _both_ MSBs in parallel with the magic
approach.(one magic calculation, result: 2msbs)
Memory Requ. (about):
Code: Select all
a. Index: 4K Content: 4K (1byte / entry) (maybe: pre/post masked) - shift sq to h8 - magic lookup - shift result back to sq the byte is holding the relative shift numbers for corresponding directions. byte = (shift_nb_of_files_ww<<3) | (shift_nb_of_ranks_ss) (example for rook)
Of course i have in mind some alternatives, but i hope these 2 are enough to show the basic idea.Code: Select all
b. Index: 16K Content: 16K (1byte / entry) magic lookup for each square itself (not shared) byte = (shift_nb_of_files_ww<<3) | (shift_nb_of_ranks_ss) (example for rook)
Well, i am very interested in point _1._ and some remarks on it.
Thx
For your other proposals, well it is to hot today.
Takes some time
Cheers,
Gerd