Looking for intrinsic "least significant bit" on Visual Studio

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

chrisw
Posts: 4321
Joined: Tue Apr 03, 2012 4:28 pm

Re: Looking for intrinsic "least significant bit" on Visual Studio

Post by chrisw »

OliverBr wrote: Thu Sep 03, 2020 7:16 pm
mar wrote: Thu Sep 03, 2020 6:45 pm I did some googling and you you want is:

Code: Select all

#include <immintrin.h>
then use

Code: Select all

_tzcnt_u64(x)
_lzcnt_u64(x)
note that those are BMI1 (actually lzcnt was orginally ABM and tzcnt was added later as part of BMI1 as far as I understand)
Thank you. To already existing includes

Code: Select all

#include <windows.h>
#include <conio.h>
...
_lzcnt_u64(x)
does the job. I didn't need to include explicitly immintrin.h, but I think VS2019 is doing it implicitly.
It works perfectly.

_BitScanForward worked, too, but is slower. Both are clearly faster than own implemented methods.
Just swapped it into mine ....

was using code derived from this over-heavy Microsoft example:

Code: Select all

// BitScanForward.cpp
// compile with: /EHsc
#include <iostream>
#include <intrin.h>
using namespace std;

#pragma intrinsic(_BitScanForward)

int main()
{
   unsigned long mask = 0x1000;
   unsigned long index;
   unsigned char isNonzero;

   cout << "Enter a positive integer as the mask: " << flush;
   cin >> mask;
   isNonzero = _BitScanForward(&index, mask);
   if (isNonzero)
   {
      cout << "Mask: " << mask << " Index: " << index << endl;
   }
   else
   {
      cout << "No set bits found.  Mask is zero." << endl;
   }
}
and replaced with this:

Code: Select all

inline int scanforward64(u64 x)
{
	return ((int)_tzcnt_u64(x));
}
Mine is a little non-deterministic at present, but it seems to deliver about 1% speed up.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Looking for intrinsic "least significant bit" on Visual Studio

Post by Gerd Isenberg »

chrisw wrote: Fri Sep 04, 2020 5:49 pm snipped ...

Mine is a little non-deterministic at present, but it seems to deliver about 1% speed up.
Is this an AMD processor as well?
chrisw
Posts: 4321
Joined: Tue Apr 03, 2012 4:28 pm

Re: Looking for intrinsic "least significant bit" on Visual Studio

Post by chrisw »

Gerd Isenberg wrote: Fri Sep 04, 2020 11:14 pm
chrisw wrote: Fri Sep 04, 2020 5:49 pm snipped ...

Mine is a little non-deterministic at present, but it seems to deliver about 1% speed up.
Is this an AMD processor as well?
Intel i7 Laptop

in debug mode, this is the asm.
maybe something to do with casting to an int, while s2 is u64 (unnecessarily)

Code: Select all

u64 s2 = scanforward64(to_bb);
00007FF7C36485B1  tzcnt       rax,qword ptr [rbp+38h]  
00007FF7C36485B7  cdqe  
00007FF7C36485B9  mov         qword ptr [rbp+40h],rax
chrisw
Posts: 4321
Joined: Tue Apr 03, 2012 4:28 pm

Re: Looking for intrinsic "least significant bit" on Visual Studio

Post by chrisw »

chrisw wrote: Sat Sep 05, 2020 12:14 am
Gerd Isenberg wrote: Fri Sep 04, 2020 11:14 pm
chrisw wrote: Fri Sep 04, 2020 5:49 pm snipped ...

Mine is a little non-deterministic at present, but it seems to deliver about 1% speed up.
Is this an AMD processor as well?
Intel i7 Laptop

in debug mode, this is the asm.
maybe something to do with casting to an int, while s2 is u64 (unnecessarily)

Code: Select all

u64 s2 = scanforward64(to_bb);
00007FF7C36485B1  tzcnt       rax,qword ptr [rbp+38h]  
00007FF7C36485B7  cdqe  
00007FF7C36485B9  mov         qword ptr [rbp+40h],rax
again in debug mode, the old asm code for VS version BitScanForward:

Code: Select all

u64 s2 = scanforward64(to_bb);
00007FF6822A8961  mov         rax,qword ptr [rbp+38h]  
00007FF6822A8965  bsf         rax,rax  
00007FF6822A8969  mov         dword ptr [rbp+0E8h],eax  
00007FF6822A896F  movsxd      rax,dword ptr [rbp+0E8h]  
00007FF6822A8976  mov         qword ptr [rbp+40h],rax 
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Looking for intrinsic "least significant bit" on Visual Studio

Post by Gerd Isenberg »

chrisw wrote: Sat Sep 05, 2020 10:58 am
chrisw wrote: Sat Sep 05, 2020 12:14 am
Gerd Isenberg wrote: Fri Sep 04, 2020 11:14 pm
chrisw wrote: Fri Sep 04, 2020 5:49 pm snipped ...

Mine is a little non-deterministic at present, but it seems to deliver about 1% speed up.
Is this an AMD processor as well?
Intel i7 Laptop

in debug mode, this is the asm.
maybe something to do with casting to an int, while s2 is u64 (unnecessarily)

Code: Select all

u64 s2 = scanforward64(to_bb);
00007FF7C36485B1  tzcnt       rax,qword ptr [rbp+38h]  
00007FF7C36485B7  cdqe  
00007FF7C36485B9  mov         qword ptr [rbp+40h],rax
again in debug mode, the old asm code for VS version BitScanForward:

Code: Select all

u64 s2 = scanforward64(to_bb);
00007FF6822A8961  mov         rax,qword ptr [rbp+38h]  
00007FF6822A8965  bsf         rax,rax  
00007FF6822A8969  mov         dword ptr [rbp+0E8h],eax  
00007FF6822A896F  movsxd      rax,dword ptr [rbp+0E8h]  
00007FF6822A8976  mov         qword ptr [rbp+40h],rax 
Debug mode assembly sucks. Saw the 3 cycles latency 1 cycle reciprocal throughput for recent intel processors - tzcnt as well bsf - should result in same speed.
https://gmplib.org/~tege/x86-timing.pdf see bottom pg 3 (Oups, Agner Fog's site seems infected).
Anyway, general recommendation for x64 - use tzcnt (if available) rather than bsf.
chrisw
Posts: 4321
Joined: Tue Apr 03, 2012 4:28 pm

Re: Looking for intrinsic "least significant bit" on Visual Studio

Post by chrisw »

Gerd Isenberg wrote: Sat Sep 05, 2020 12:11 pm
chrisw wrote: Sat Sep 05, 2020 10:58 am
chrisw wrote: Sat Sep 05, 2020 12:14 am
Gerd Isenberg wrote: Fri Sep 04, 2020 11:14 pm
chrisw wrote: Fri Sep 04, 2020 5:49 pm snipped ...

Mine is a little non-deterministic at present, but it seems to deliver about 1% speed up.
Is this an AMD processor as well?
Intel i7 Laptop

in debug mode, this is the asm.
maybe something to do with casting to an int, while s2 is u64 (unnecessarily)

Code: Select all

u64 s2 = scanforward64(to_bb);
00007FF7C36485B1  tzcnt       rax,qword ptr [rbp+38h]  
00007FF7C36485B7  cdqe  
00007FF7C36485B9  mov         qword ptr [rbp+40h],rax
again in debug mode, the old asm code for VS version BitScanForward:

Code: Select all

u64 s2 = scanforward64(to_bb);
00007FF6822A8961  mov         rax,qword ptr [rbp+38h]  
00007FF6822A8965  bsf         rax,rax  
00007FF6822A8969  mov         dword ptr [rbp+0E8h],eax  
00007FF6822A896F  movsxd      rax,dword ptr [rbp+0E8h]  
00007FF6822A8976  mov         qword ptr [rbp+40h],rax 
Debug mode assembly sucks. Saw the 3 cycles latency 1 cycle reciprocal throughput for recent intel processors - tzcnt as well bsf - should result in same speed.
https://gmplib.org/~tege/x86-timing.pdf see bottom pg 3 (Oups, Agner Fog's site seems infected).
Anyway, general recommendation for x64 - use tzcnt (if available) rather than bsf.
In allegedly optimised compile mode
the intrisic:

Code: Select all

u64 s2 = scanforward64(to_bb);
00007FF603653159  tzcnt       rax,r9  
00007FF60365315E  movsxd      rdx,eax  
the old VS Bitscanforward

Code: Select all

u64 s2 = scanforward64(to_bb);
00007FF6A7C230C9  bsf         rax,r9  
00007FF6A7C230CD  movsxd      rdx,eax  
No difference apart from tzcnt and bsf. Looks like the compiler optimised away part of the old VS code. Maybe my 1% was indeed down to non-determinism.
OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Looking for intrinsic "least significant bit" on Visual Studio

Post by OliverBr »

MOBMAT wrote: Fri Sep 04, 2020 4:15 pm This shows all the options...

https://en.wikipedia.org/wiki/Find_first_set
This one is most interesting:

Code: Select all

ctz(x) = popcount((x & −x) − 1)
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink
chrisw
Posts: 4321
Joined: Tue Apr 03, 2012 4:28 pm

Re: Looking for intrinsic "least significant bit" on Visual Studio

Post by chrisw »

OliverBr wrote: Sun Sep 06, 2020 12:44 pm
MOBMAT wrote: Fri Sep 04, 2020 4:15 pm This shows all the options...

https://en.wikipedia.org/wiki/Find_first_set
This one is most interesting:

Code: Select all

ctz(x) = popcount((x & −x) − 1)
Most useful would be ctz_0, or bsf_0, which returns the bitno and clears it in one instruction, eg

bsf_0 r1, r2 // return bitno in r1, clears the corresponding bit in r2
OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Looking for intrinsic "least significant bit" on Visual Studio

Post by OliverBr »

chrisw wrote: Sun Sep 06, 2020 1:09 pm
OliverBr wrote: Sun Sep 06, 2020 12:44 pm
MOBMAT wrote: Fri Sep 04, 2020 4:15 pm This shows all the options...

https://en.wikipedia.org/wiki/Find_first_set
This one is most interesting:

Code: Select all

ctz(x) = popcount((x & −x) − 1)
Most useful would be ctz_0, or bsf_0, which returns the bitno and clears it in one instruction, eg

bsf_0 r1, r2 // return bitno in r1, clears the corresponding bit in r2
This eliminates all except the lowest bit:

Code: Select all

x &= -x
This clears the lowest bit in x:

Code: Select all

x &= x - 1
I am not sure if there are such nice tricks with the most significant bit.
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink
chrisw
Posts: 4321
Joined: Tue Apr 03, 2012 4:28 pm

Re: Looking for intrinsic "least significant bit" on Visual Studio

Post by chrisw »

OliverBr wrote: Sun Sep 06, 2020 2:28 pm
chrisw wrote: Sun Sep 06, 2020 1:09 pm
OliverBr wrote: Sun Sep 06, 2020 12:44 pm
MOBMAT wrote: Fri Sep 04, 2020 4:15 pm This shows all the options...

https://en.wikipedia.org/wiki/Find_first_set
This one is most interesting:

Code: Select all

ctz(x) = popcount((x & −x) − 1)
Most useful would be ctz_0, or bsf_0, which returns the bitno and clears it in one instruction, eg

bsf_0 r1, r2 // return bitno in r1, clears the corresponding bit in r2
This eliminates all except the lowest bit:

Code: Select all

x &= -x
This clears the lowest bit in x:

Code: Select all

x &= x - 1
Yup, I was just dreaming/fantasising of the bsf and the bit clear coming in one asm instruction. My code is full of:
While(bb)
{
Square = bitscanforward(bb);
// do some stuff with square bla bla
// .....
// strip out the bit
bb &= bb-1;
// and loop until bb all used up
}

Would be nice to do the housekeeping on bb and the bit number fetch all in one asm. Well, that was the dream.
I am not sure if there are such nice tricks with the most significant bit.
Msbit is way more problematical. Rule for chess engines is work with lsbit and move upwards. I didn’t find a reason yet to work the other way round. Guessing, but I think chess engine programmers requirements for bit twiddling are dwarfed by the crypto people, and am not sure what their requirements are. I think popcount is important to them, not sure about ls/msbits