32 bit versions for bitscan64

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: 32 bit versions for bitscan64

Post by Desperado »

Ok.

Well, as you pointed out it performs not that worse,
and we may not forget that it works also in the reversed direction.
bsr.

Thx a lot for your trials and time investement.

cheers
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: 32 bit versions for bitscan64

Post by Desperado »

only to have it complete. the bsr function would now
look like this.

Code: Select all

//****************************************************************************
//* PROCEDURE	: bsr64														 *
//* DESCRIPTION	: 32 bit version											 *
//****************************************************************************

ULONG bsr64(BTB_T bb)
	{
	 ULONG id;

	 _BitScanReverse(&id,(UI_32)(bb>>32)) ? 
	 id+=32 : 
	 _BitScanReverse(&id,(UI_32)bb);
	 
	 return(id);
	}
:)
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: 32 bit versions for bitscan64

Post by Michael Sherwin »

Desperado wrote:only to have it complete. the bsr function would now
look like this.

Code: Select all

//****************************************************************************
//* PROCEDURE	: bsr64														 *
//* DESCRIPTION	: 32 bit version											 *
//****************************************************************************

ULONG bsr64(BTB_T bb)
	{
	 ULONG id;

	 _BitScanReverse(&id,(UI_32)(bb>>32)) ? 
	 id+=32 : 
	 _BitScanReverse(&id,(UI_32)bb);
	 
	 return(id);
	}
:)
Just curious! Would it not be faster to make id global to avoid the return value?
(in MP (would work in SP version also) versions a pointer to a structure holding both bb and id could be passed to bsr64?)
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: 32 bit versions for bitscan64

Post by mcostalba »

I think the function is inlined anyway, at least for pgo compiles and accessing a local variable is way faster then a global one.

Besides I really don't like this kind of ugly, old fashioned, tricks ;-)

Anyhow I don't think it is faster, but of course you can always _test it_ (this is the evil word I know :D ) and come back with some numbers...

Thanks
Marco
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: 32 bit versions for bitscan64

Post by Michael Sherwin »

mcostalba wrote:I think the function is inlined anyway, at least for pgo compiles and accessing a local variable is way faster then a global one.

Besides I really don't like this kind of ugly, old fashioned, tricks ;-)

Anyhow I don't think it is faster, but of course you can always _test it_ (this is the evil word I know :D ) and come back with some numbers...

Thanks
Marco
I was just (slightly) curious. Not enough to do any testing of my own! :D

Here is what I do in RomiChess. And I am perfectly happy with it--given the limitations of the 64bit compiler.

asm.c:

Code: Select all

__inline s32 FirstBit(u64 bits)
{
    __asm 
    {
        mov edx, 0
        bsf eax, dword ptr bits
        jnz done
        mov edx, 32
        bsf eax, dword ptr bits+4
        jnz done
        mov eax, 32 
done:   add eax, edx
    }
}

__inline s32 LastBit(u64 bits)
{
    __asm 
    {
        mov edx, 32
        bsr eax, dword ptr bits+4
        jnz done
        mov edx, 0
        bsr eax, dword ptr bits
        jnz done
        mov eax, 64
done:   add eax, edx
    }
}
x64.c:

Code: Select all

__inline s32 FirstBit(u64 bits) {
	s32 index;
	if(_BitScanForward64(&index, bits))
		return index;
	return 64;
}

__inline s32 LastBit(u64 bits) {
	s32 index;
	if(_BitScanReverse64(&index, bits))
		return index;
	return 64;
}
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: 32 bit versions for bitscan64

Post by Desperado »

Michael Sherwin wrote:
Desperado wrote:only to have it complete. the bsr function would now
look like this.

Code: Select all

//****************************************************************************
//* PROCEDURE	: bsr64														 *
//* DESCRIPTION	: 32 bit version											 *
//****************************************************************************

ULONG bsr64(BTB_T bb)
	{
	 ULONG id;

	 _BitScanReverse(&id,(UI_32)(bb>>32)) ? 
	 id+=32 : 
	 _BitScanReverse(&id,(UI_32)bb);
	 
	 return(id);
	}
:)
Just curious! Would it not be faster to make id global to avoid the return value?
(in MP (would work in SP version also) versions a pointer to a structure holding both bb and id could be passed to bsr64?)
Hi Michael.

I think hiding the implentation is a good choice, even on the costs of some speed.

The following is a snapshot of my engine-implementation.

Code: Select all


//****************************************************************************
//* BITSCAN CHOICE															 *
//*																			 *
//* 1: SOFTSCANS,needed in debug mode and may work much faster than			 *
//*    the intrinsics, depending on machine.								 *
//*    a: further choice between MattsFoldingTrick and double conversion	 *
//*       where dc works significantly faster on my amd-k8					 *
//*    b: reverse scan only with double conversion							 *
//*																			 *
//* 2: HARDSCANS,using intrinsics. Special routines for 32 bit compile		 *
//*	   ONLY AVAILABLE IN RELEASE MODE!										 *
//*    The 32bit inrinsic versions may only perform equal or slightly better *
//*    than SOFTSCANS but working in both directions.On fast bitscan		 *
//*    machines i expect that the 32bit-intrinsic-scans work faster than	 *
//*    the double conversions.												 *
//****************************************************************************

#if _DEBUG | _SOFTSCAN

//****************************************************************************
//* FUNCTION	: bsf64														 *
//* DESCRIPTION	: 64 bit version (Matts Folding Trick)						 *
//****************************************************************************


#if _MATTSCAN

ULONG bsf64(BTB_T bb)
	{
	 UI_32 fld;

	 ASSERT(bb!=0,"bsf error");

	 bb ^= bb-1;
	 fld =(UI_32) bb ^ (bb>>32);

	 return(index[fld*0x78291ACF >> 26]);
	}

#else

//****************************************************************************
//* FUNCTION	: bsf64														 *
//* DESCRIPTION	: 64 bit version (double conversion)						 *
//*				  works best for amd-k8 model								 *
//****************************************************************************

ULONG bsf64(BTB_T bb)
	{
	 union {
			double d;
			struct 
				{
				 unsigned int mantissal : 32;
				 unsigned int mantissah : 20;
				 unsigned int exponent : 11;
				 unsigned int sign : 1;
				};
			} ud;

   
	 ud.d = (double)(bb & -bb); // isolated LS1B to double
   
	 return ud.exponent - 1023;
	}

#endif //MATTSCAN

//****************************************************************************
//* FUNCTION	: bsr64														 *
//* DESCRIPTION	: 64 bit version	(double conversion)						 *
//****************************************************************************

ULONG bsr64(UI_64 bb)
	{
	 union {
			double d;
			struct 
				{
				 unsigned int mantissal : 32;
				 unsigned int mantissah : 20;
				 unsigned int exponent : 11;
				 unsigned int sign : 1;
				};
			} ud;

	 ud.d = (double)(bb & ~(bb >> 32));  // avoid rounding error
	 
	 return (ud.exponent - 1023);
	}

#elif _COMPILE64

//****************************************************************************
//* FUNCTION	: bsf64														 *
//* DESCRIPTION	: 64 bit version intrinsic									 *
//****************************************************************************

ULONG bsf64(BTB_T bb)

	{ULONG id=64; _BitScanForward64(&id,bb);return(id);}

//****************************************************************************
//* FUNCTION	: bsr64														 *
//* DESCRIPTION	: 64 bit version intrinsic									 *
//****************************************************************************

ULONG bsr64(BTB_T bb)

	{ULONG id=64;_BitScanReverse64(&id,bb);return(id);}

#elif _COMPILE32

//****************************************************************************
//* FUNCTION	: bsf64														 *
//* DESCRIPTION	: 32 bit version inrinsic									 *
//****************************************************************************

ULONG bsf64(UI_64 bb)
	{
	 ULONG id;

	 _BitScanForward(&id,(UI_32)bb) ? 
	 id : 
	 id += 31 + _BitScanForward(&id,(UI_32)(bb>>32));

	 return(id);
	}

//****************************************************************************
//* FUNCTION	: bsr64														 *
//* DESCRIPTION	: 32 bit version intrinsic									 *
//****************************************************************************

ULONG bsr64(BTB_T bb)
	{
	 ULONG id;

	 _BitScanReverse(&id,(UI_32)(bb>>32)) ? 
	 id+=32 : 
	 _BitScanReverse(&id,(UI_32)bb);
	 
	 return(id);
	}

#endif
That s why i hide the code in a function :)

regards
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: 32 bit versions for bitscan64

Post by Michael Sherwin »

Desperado wrote:
Michael Sherwin wrote:
Desperado wrote:only to have it complete. the bsr function would now
look like this.

Code: Select all

//****************************************************************************
//* PROCEDURE	: bsr64														 *
//* DESCRIPTION	: 32 bit version											 *
//****************************************************************************

ULONG bsr64(BTB_T bb)
	{
	 ULONG id;

	 _BitScanReverse(&id,(UI_32)(bb>>32)) ? 
	 id+=32 : 
	 _BitScanReverse(&id,(UI_32)bb);
	 
	 return(id);
	}
:)
Just curious! Would it not be faster to make id global to avoid the return value?
(in MP (would work in SP version also) versions a pointer to a structure holding both bb and id could be passed to bsr64?)
Hi Michael.

I think hiding the implentation is a good choice, even on the costs of some speed.

The following is a snapshot of my engine-implementation.

Code: Select all


//****************************************************************************
//* BITSCAN CHOICE															 *
//*																			 *
//* 1: SOFTSCANS,needed in debug mode and may work much faster than			 *
//*    the intrinsics, depending on machine.								 *
//*    a: further choice between MattsFoldingTrick and double conversion	 *
//*       where dc works significantly faster on my amd-k8					 *
//*    b: reverse scan only with double conversion							 *
//*																			 *
//* 2: HARDSCANS,using intrinsics. Special routines for 32 bit compile		 *
//*	   ONLY AVAILABLE IN RELEASE MODE!										 *
//*    The 32bit inrinsic versions may only perform equal or slightly better *
//*    than SOFTSCANS but working in both directions.On fast bitscan		 *
//*    machines i expect that the 32bit-intrinsic-scans work faster than	 *
//*    the double conversions.												 *
//****************************************************************************

#if _DEBUG | _SOFTSCAN

//****************************************************************************
//* FUNCTION	: bsf64														 *
//* DESCRIPTION	: 64 bit version (Matts Folding Trick)						 *
//****************************************************************************


#if _MATTSCAN

ULONG bsf64(BTB_T bb)
	{
	 UI_32 fld;

	 ASSERT(bb!=0,"bsf error");

	 bb ^= bb-1;
	 fld =(UI_32) bb ^ (bb>>32);

	 return(index[fld*0x78291ACF >> 26]);
	}

#else

//****************************************************************************
//* FUNCTION	: bsf64														 *
//* DESCRIPTION	: 64 bit version (double conversion)						 *
//*				  works best for amd-k8 model								 *
//****************************************************************************

ULONG bsf64(BTB_T bb)
	{
	 union {
			double d;
			struct 
				{
				 unsigned int mantissal : 32;
				 unsigned int mantissah : 20;
				 unsigned int exponent : 11;
				 unsigned int sign : 1;
				};
			} ud;

   
	 ud.d = (double)(bb & -bb); // isolated LS1B to double
   
	 return ud.exponent - 1023;
	}

#endif //MATTSCAN

//****************************************************************************
//* FUNCTION	: bsr64														 *
//* DESCRIPTION	: 64 bit version	(double conversion)						 *
//****************************************************************************

ULONG bsr64(UI_64 bb)
	{
	 union {
			double d;
			struct 
				{
				 unsigned int mantissal : 32;
				 unsigned int mantissah : 20;
				 unsigned int exponent : 11;
				 unsigned int sign : 1;
				};
			} ud;

	 ud.d = (double)(bb & ~(bb >> 32));  // avoid rounding error
	 
	 return (ud.exponent - 1023);
	}

#elif _COMPILE64

//****************************************************************************
//* FUNCTION	: bsf64														 *
//* DESCRIPTION	: 64 bit version intrinsic									 *
//****************************************************************************

ULONG bsf64(BTB_T bb)

	{ULONG id=64; _BitScanForward64(&id,bb);return(id);}

//****************************************************************************
//* FUNCTION	: bsr64														 *
//* DESCRIPTION	: 64 bit version intrinsic									 *
//****************************************************************************

ULONG bsr64(BTB_T bb)

	{ULONG id=64;_BitScanReverse64(&id,bb);return(id);}

#elif _COMPILE32

//****************************************************************************
//* FUNCTION	: bsf64														 *
//* DESCRIPTION	: 32 bit version inrinsic									 *
//****************************************************************************

ULONG bsf64(UI_64 bb)
	{
	 ULONG id;

	 _BitScanForward(&id,(UI_32)bb) ? 
	 id : 
	 id += 31 + _BitScanForward(&id,(UI_32)(bb>>32));

	 return(id);
	}

//****************************************************************************
//* FUNCTION	: bsr64														 *
//* DESCRIPTION	: 32 bit version intrinsic									 *
//****************************************************************************

ULONG bsr64(BTB_T bb)
	{
	 ULONG id;

	 _BitScanReverse(&id,(UI_32)(bb>>32)) ? 
	 id+=32 : 
	 _BitScanReverse(&id,(UI_32)bb);
	 
	 return(id);
	}

#endif
That s why i hide the code in a function :)

regards
Hi Michael,

Yes, I need to start thinking in terms of portable easy to maintain and debug code, rather than the 'hard wired' stuff that I tend to write! :)
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: 32 bit versions for bitscan64

Post by Michael Sherwin »

Michael Sherwin wrote:
mcostalba wrote:I think the function is inlined anyway, at least for pgo compiles and accessing a local variable is way faster then a global one.

Besides I really don't like this kind of ugly, old fashioned, tricks ;-)

Anyhow I don't think it is faster, but of course you can always _test it_ (this is the evil word I know :D ) and come back with some numbers...

Thanks
Marco
I was just (slightly) curious. Not enough to do any testing of my own! :D

Here is what I do in RomiChess. And I am perfectly happy with it--given the limitations of the 64bit compiler.

asm.c:

Code: Select all

__inline s32 FirstBit(u64 bits)
{
    __asm 
    {
        mov edx, 0
        bsf eax, dword ptr bits
        jnz done
        mov edx, 32
        bsf eax, dword ptr bits+4
        jnz done
        mov eax, 32 
done:   add eax, edx
    }
}

__inline s32 LastBit(u64 bits)
{
    __asm 
    {
        mov edx, 32
        bsr eax, dword ptr bits+4
        jnz done
        mov edx, 0
        bsr eax, dword ptr bits
        jnz done
        mov eax, 64
done:   add eax, edx
    }
}
x64.c:

Code: Select all

__inline s32 FirstBit(u64 bits) {
	s32 index;
	if(_BitScanForward64(&index, bits))
		return index;
	return 64;
}

__inline s32 LastBit(u64 bits) {
	s32 index;
	if(_BitScanReverse64(&index, bits))
		return index;
	return 64;
}
However, if the latest processors are faster at executing 32bit bsf/bsr, then I might be interested in testing something like the following:

note: it's been a long time since i've written assembly so I am not sure if the following code is correct. Howeever, it (the correct code) did test slower when I tried it on older machines.

Code: Select all

__inline s32 FirstBit(u64 bits)
{
    __asm 
    {
        mov ebx, 0
        mov ecx, 0
        bsf ebx, dword ptr bits
        bsf ecx, dword ptr bits+4
        shl ebx, 6
        mov eax, [bsfTbl + ebx + ecx]
    }
}
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: 32 bit versions for bitscan64

Post by wgarvin »

Michael Sherwin wrote:However, if the latest processors are faster at executing 32bit bsf/bsr, then I might be interested in testing something like the following:

note: it's been a long time since i've written assembly so I am not sure if the following code is correct. Howeever, it (the correct code) did test slower when I tried it on older machines.

Code: Select all

__inline s32 FirstBit(u64 bits)
{
    __asm 
    {
        mov ebx, 0
        mov ecx, 0
        bsf ebx, dword ptr bits
        bsf ecx, dword ptr bits+4
        shl ebx, 6
        mov eax, [bsfTbl + ebx + ecx]
    }
}

If you're willing to rely on bsf to not change the register contents when it fails, you could try something like this:

Code: Select all

__inline s32 FirstBit(u64 bits) 
{ 
    __asm 
    { 
        mov ebx, 32 
        bsf ebx, dword ptr bits+4 
        add ebx, 32
        bsf ebx, dword ptr bits
    } 
}
If you don't trust bsf or don't like the length of that dependence chain, then you can try something like this instead:

Code: Select all

__inline s32 FirstBit(u64 bits) 
{ 
    __asm 
    { 
        bsf ecx, dword ptr bits+4 
        cmovnc ecx, 32
        bsf ebx, dword ptr bits
        lea ecx, [ecx+32]
        cmovnc ebx, ecx
    } 
}
I didn't compile those, YMMV... they both return 64 if the bitboard is empty, if you want -1 instead you could just replace the first 32 with 0xFFFFFFDF.
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: 32 bit versions for bitscan64

Post by Desperado »

Michael Sherwin wrote:

Code: Select all

__inline s32 FirstBit(u64 bits)
{
    __asm 
    {
        mov ebx, 0
        mov ecx, 0
        bsf ebx, dword ptr bits
        bsf ecx, dword ptr bits+4
        shl ebx, 6
        mov eax, [bsfTbl + ebx + ecx]
    }
}
my proposals:

Code: Select all


ULONG bsf64a(BTB_T bb)
	{
	 _asm bsf eax,DWORD PTR bb+4
	 _asm add eax,32
	 _asm bsf eax,DWORD PTR bb
	 //return eax
	}

or

Code: Select all


ULONG bsf64a(BTB_T bb)
	{
	 _asm bsf eax,DWORD PTR bb
	 _asm jnz DONE
	 _asm bsf eax,DWORD PTR bb+4
	 _asm add eax,32
DONE: ;
	 // return eax
	}
Hm, both are working (assert: bb!=0)

1. is branchless, but needs 2 scans (for fast bitscan machines)
2. is branching but may save a scan(for slow bitscan machines)

(long time ago i tried assembler :lol: )

8-)