emadsen wrote: ↑Wed Nov 17, 2021 5:45 pm
I just realized System.Numerics.BitOperations class "provides utility methods for intrinsic bit-twiddling operations. The methods use hardware intrinsics when available on the underlying platform; otherwise, they use optimized software fallbacks." See the doc. So no need for preprocessor directives, assuming Microsoft's software fallback code performs well.
I must have introduced the preprocessor directives when I was using System.Runtime.Intrinsics.X86 prior to the availability of System.Numerics.BitOperations.
I think a lot of those numeric bit operations were put in to satisfy the chess programming community once bit boards were introduced. I didn't realize the vast number of programmers hooked on this huge time sink. It really takes you to the edge on programming techniques though.
Nah we chessprogrammers get the scraps of cryptography and AI learning and compression instructions.
If we would get an instruction it would be one instruction to get all sliding piece attacks for the board in one cycle.
But compression gifted us with PEXT and thats the fastest slider lookup currently.
Anyways my very first movegen was in C# and the naive approach with bijective dictionaries. (Attacked squares and Attacked by list for each square)
and got 3MNps.
Okay, this works, but, it's slow as molasses, maybe slower. There's too may ifs.
public static int numTrailingBinaryZeros(UInt64 n)
{
UInt64 mask = 1;
for (int i = 0; i < 63; i++, mask <<= 1)
if ((n & mask) != 0)
return i;
emadsen wrote: ↑Wed Nov 17, 2021 5:45 pm
I just realized System.Numerics.BitOperations class "provides utility methods for intrinsic bit-twiddling operations. The methods use hardware intrinsics when available on the underlying platform; otherwise, they use optimized software fallbacks." See the doc. So no need for preprocessor directives, assuming Microsoft's software fallback code performs well.
I must have introduced the preprocessor directives when I was using System.Runtime.Intrinsics.X86 prior to the availability of System.Numerics.BitOperations.
I think a lot of those numeric bit operations were put in to satisfy the chess programming community once bit boards were introduced. I didn't realize the vast number of programmers hooked on this huge time sink. It really takes you to the edge on programming techniques though.
Nah we chessprogrammers get the scraps of cryptography and AI learning and compression instructions.
If we would get an instruction it would be one instruction to get all sliding piece attacks for the board in one cycle.
But compression gifted us with PEXT and thats the fastest slider lookup currently.
Anyways my very first movegen was in C# and the naive approach with bijective dictionaries. (Attacked squares and Attacked by list for each square)
and got 3MNps.
Okay, this works, but, it's slow as molasses, maybe slower. There's too may ifs.
public static int numTrailingBinaryZeros(UInt64 n)
{
UInt64 mask = 1;
for (int i = 0; i < 63; i++, mask <<= 1)
if ((n & mask) != 0)
return i;
emadsen wrote: ↑Wed Nov 17, 2021 5:45 pm
I just realized System.Numerics.BitOperations class "provides utility methods for intrinsic bit-twiddling operations. The methods use hardware intrinsics when available on the underlying platform; otherwise, they use optimized software fallbacks." See the doc. So no need for preprocessor directives, assuming Microsoft's software fallback code performs well.
I must have introduced the preprocessor directives when I was using System.Runtime.Intrinsics.X86 prior to the availability of System.Numerics.BitOperations.
I think a lot of those numeric bit operations were put in to satisfy the chess programming community once bit boards were introduced. I didn't realize the vast number of programmers hooked on this huge time sink. It really takes you to the edge on programming techniques though.
Nah we chessprogrammers get the scraps of cryptography and AI learning and compression instructions.
If we would get an instruction it would be one instruction to get all sliding piece attacks for the board in one cycle.
But compression gifted us with PEXT and thats the fastest slider lookup currently.
Anyways my very first movegen was in C# and the naive approach with bijective dictionaries. (Attacked squares and Attacked by list for each square)
and got 3MNps.
Okay, this works, but, it's slow as molasses, maybe slower. There's too may ifs.
public static int numTrailingBinaryZeros(UInt64 n)
{
UInt64 mask = 1;
for (int i = 0; i < 63; i++, mask <<= 1)
if ((n & mask) != 0)
return i;
Chessnut1071 wrote: ↑Wed Nov 17, 2021 9:35 pm
thanks, but, been there, done that, have the t-shirt. It doesn't work on my NET Framework 4.7.3. Do you know which version I need?
I don't think the bitscan or popcount CPU intrinsics are supported in .NET Framework. You have to upgrade to .NET Core 3, .NET 5, or .NET 6.
Chessnut1071 wrote: ↑Wed Nov 17, 2021 9:24 pm
Okay, this works, but, it's slow as molasses, maybe slower. There's too may ifs.
public static int numTrailingBinaryZeros(UInt64 n)
{
UInt64 mask = 1;
for (int i = 0; i < 63; i++, mask <<= 1)
if ((n & mask) != 0)
return i;
return 63;
}Anybody know anything faster?
The way people did it before CPUs had bit-scan-forward instructions was by table[(n & -n)*DEBRUIJN>>58], with a suitable initialized table, and a De Bruijn constant.
Chessnut1071 wrote: ↑Wed Nov 17, 2021 9:35 pm
thanks, but, been there, done that, have the t-shirt. It doesn't work on my NET Framework 4.7.3. Do you know which version I need?
YES IT WORKS FOR YOU TOO. You need the system.numerics nuget package. There you have RAW BMI bitoperations...
hgm wrote: ↑Wed Nov 17, 2021 10:32 pm
The way people did it before CPUs had bit-scan-forward instructions was by table[(n & -n)*DEBRUIJN>>58], with a suitable initialized table, and a De Bruijn constant.
Its literally perfect hashing with the hash function X * Y >> Z which has some nice properties. Mainly its fast to implement and if its not good enough you can add a plus there too. I dont know why this community calls or many C++ people call some things DEBRUIJN - and other "Magic" while its always perfect hashing into a lookup table to begin with.
That being said if bitscanforward is available its many times faster.
Well, 'many' is a bit of an exaggeration. It uses multiply + shift + lookup, instead of the single instruction bsf. But often the bit number is not what you finally want, and a lookup indexed by it would have to be used anyway. With the DeBruijn multiplier you could tabulate the desired result directly, rather than the bit number. Then it is just 3 instructions vs 2, both involving a lookup. Slower, for sure, but only marginally so.
The reason for calling the multiplier DEBRUIJN here is that there is a mathematical theory that proves there exists a constant that does achieve the perfect hashing, and provides a simple recipe for calculating one. No such theory exists for the multipliers used in magic bitboard; these have to be found by trial and error, and the hashing is usually only perfect by virtue of the fact that the hashed data is highly degenerate. In any case the multipliers there are not DeBruijn numbers.
hgm wrote: ↑Thu Nov 18, 2021 10:46 am
Well, 'many' is a bit of an exaggeration. It uses multiply + shift + lookup, instead of the single instruction bsf. But often the bit number is not what you finally want, and a lookup indexed by it would have to be used anyway. With the DeBruijn multiplier you could tabulate the desired result directly, rather than the bit number. Then it is just 3 instructions vs 2, both involving a lookup. Slower, for sure, but only marginally so.
The reason for calling the multiplier DEBRUIJN here is that there is a mathematical theory that proves there exists a constant that does achieve the perfect hashing, and provides a simple recipe for calculating one. No such theory exists for the multipliers used in magic bitboard; these have to be found by trial and error, and the hashing is usually only perfect by virtue of the fact that the hashed data is highly degenerate. In any case the multipliers there are not DeBruijn numbers.
Aaaah finally something new for me. I thought that DEBRUIJN is also found by brute force. (Which only takes 1-2s) to find a factor for all masks from
1ull << 0...63 that maps into 0..63 perfectly.
Thanks!