Comparison of all known Sliding lookup algorithms
Moderator: Ras
-
- Posts: 396
- Joined: Sat Feb 04, 2017 11:57 pm
- Location: USA
Re: Comparison of all known Sliding lookup algorithms
I saw a link (not in wiki) about move generation using a GPU.
https://github.com/ankan-ban/perft_gpu
Is this a "thing" and does it work as fast as claimed?
Granted, not everyone could take advantage of an engine that requires a GPU, but it could be offered as yet another flavor of an engine release.
Has anyone implemented an engine with code like this?
Vince
https://github.com/ankan-ban/perft_gpu
Is this a "thing" and does it work as fast as claimed?
Granted, not everyone could take advantage of an engine that requires a GPU, but it could be offered as yet another flavor of an engine release.
Has anyone implemented an engine with code like this?
Vince
i7-6700K @ 4.00Ghz 32Gb, Win 10 Home, EGTBs on PCI SSD
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
-
- Posts: 2098
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Comparison of all known sliding lookup algorithms.
Hello:
perft(15)
It also confirmed Perft(14) value:
Re: yet another attempt on Perft(14)
Waiting for Perft(16)...
Regards from Spain.
Ajedrecista.
It worked. That tool was used to compute Perft(15) of the starting position back in 2017:MOBMAT wrote: ↑Sun Jan 15, 2023 7:32 am I saw a link (not in wiki) about move generation using a GPU.
https://github.com/ankan-ban/perft_gpu
Is this a "thing" and does it work as fast as claimed?
Granted, not everyone could take advantage of an engine that requires a GPU, but it could be offered as yet another flavor of an engine release.
Has anyone implemented an engine with code like this?
Vince
perft(15)
It also confirmed Perft(14) value:
Re: yet another attempt on Perft(14)
Waiting for Perft(16)...
Regards from Spain.
Ajedrecista.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Comparison of all known Sliding lookup algorithms
New Algorithm Found: Kindergarten Super SISSY Bitboards
A very big congraulation to our friend - Mike Sherwin.
Core algorihtm:
https://github.com/Gigantua/Chess_Moveg ... yBoard.hpp
https://github.com/Gigantua/Chess_Moveg ... garten.hpp
https://github.com/Gigantua/Chess_Moveg ... /Sissy.hpp
How does it fare against similar algorithms:
Which is extremely well! Especially since this is a genuinely new algorithm and its lookup size is better or tied to both SISSY and Kindergarten boards. Essentially this is one of the very fastest algorithms (on my platform which is x86 - zen3)
This is a very productive year so far and in Jan 2023 we already have 2 new algorithms.
There were no rounds of optimisation on it yet so it looks very good to be optimized further on first glance.
As always: In real production engine code you have L2 and register pressure from other parts so to determine the best algorithm for your particular solution and platform its important to test.
Very cool stuff!
A very big congraulation to our friend - Mike Sherwin.
Core algorihtm:
Code: Select all
static uint64_t Bishop(int sq, uint64_t occ)
{
uint64_t dBlockers = occ & dMask[sq];
uint64_t aBlockers = occ & aMask[sq];
uint64_t dIndex = ((dBlockers * 0x0202020202020202ull) >> 58);
uint64_t aIndex = ((aBlockers * 0x0202020202020202ull) >> 58);
return (dSubset[sq][dIndex] | aSubset[sq][aIndex]) & ~(1ull << sq);
}
https://github.com/Gigantua/Chess_Moveg ... garten.hpp
https://github.com/Gigantua/Chess_Moveg ... /Sissy.hpp
How does it fare against similar algorithms:
Code: Select all
Slide Arithmetic 253.626650 256 [2kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inführ http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline 111.662891 0 [0kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inführ http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Kindergarten 452.184844 16640 [130kb] imul64 no Urban Koistinen https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards 196.301034 180416 [1409kb] none no Michael Sherwin http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Kindergarten Super SISSY Bitboards 420.917347 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewtopic.php?f=7&t=81234&start=30
Fancy Magic BB - Variable shift 391.531947 93376 [729kb] imul64 yes Pradu Kannan https://www.chessprogramming.org/Magic_Bitboards#Fancy
FoldingHash - 4x fancy magic 217.259060 6468 [50kb] none no Daniel Inführ tbd
Plain Magic BB 520.643515 295168 [2306kb] imul64 no Lasse Hansen https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift 537.245916 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr 770.780235 107904 [843kb] pext_u64 yes Zach Wegner https://www.chessprogramming.org/BMI2#PEXTBitboards
This is a very productive year so far and in Jan 2023 we already have 2 new algorithms.
There were no rounds of optimisation on it yet so it looks very good to be optimized further on first glance.
As always: In real production engine code you have L2 and register pressure from other parts so to determine the best algorithm for your particular solution and platform its important to test.
Very cool stuff!
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 396
- Joined: Sat Feb 04, 2017 11:57 pm
- Location: USA
Re: Comparison of all known Sliding lookup algorithms
in case you are still collecting data....
SHIFTED BITBOARDS....are they represented?
I looked through the move gen comparison code but didn't find this method of generating moves. Is it there?
A little background....
I implemented a chess engine a number of years ago and used the sliding move algorithm as described by an article describing how it was done in the NagaSkaki engine.
https://mayothi.com/nagaskakichess6.html
I understood the concept and used it. While debugging the code one day, I realized that the speed could be increased by applying a condition prior to do all the shifts as it is not necessary when the blocker (see code below) value is zero. This spead up the code by 10% or so. I then found the Wiki page on shifted bitboards.
https://www.chessprogramming.org/Shifted_Bitboards
which had incorporated the improvement. Yay, I was right. here is the code snippet for one ray.
This is the code from the Wiki page, I implemented it slightly different by having separate rayAttack arrays for each direction. Since each ray requires unique shift values, it was more readable to explicitly reference each ray attack arrays by name.
Recently while looking at this code again, I think I found another way to improve the speed. Instead of doing all the shifts to set bits, the same thing can be accomplished by finding the first occurrence of a set bit in the direction of the ray (Firstbit) and then OR'ing in the rayAttack[] for the square indicated by Firstbit. I haven't touched my engine code in some time and I can't spend the time right now to test it, but If someone already has Shifted Bitboard code in place, perhaps you could try it. I would think one table entry lookup would be faster than all the shifts.
Code: Select all
Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Megalooks Random Positions/s:
Exploading: 96.80MOps 6 kB Optimal perf: imul64
Reference: 68.02MOps 8 kB Optimal perf: none
KoggeStone: 72.40MOps 0 kB Optimal perf: none
RotatedBoard: 121.11MOps 14 kB Optimal perf: none
QBB Algo: 115.08MOps 0 kB Optimal perf: countr_zero, countl_zero
BobMike: 157.78MOps 8 kB Optimal perf: countr_zero, countl_zero
Obstr. Diff: 161.61MOps 6 kB Optimal perf: countl_zero
Obstr. Inline: 105.25MOps 0 kB Optimal perf: countl_zero
SlideArithm: 160.35MOps 2 kB Optimal perf: bzhi_u64, blsmsk_u64
SISSY BB: 111.78MOps 1409 kB Optimal perf: none
Hyperbola Qsc: 177.06MOps 2 kB Optimal perf: bswap
Hash Variable: 338.37MOps 729 kB Optimal perf: imul64
Hash Plain: 414.60MOps 2306 kB Optimal perf: imul64
Hash Fancy: 456.21MOps 694 kB Optimal perf: imul64
Pext : 629.59MOps 843 kB Optimal perf: pext_u64
HyperCube: 196.17MOps 841 kB Optimal perf: none
I looked through the move gen comparison code but didn't find this method of generating moves. Is it there?
A little background....
I implemented a chess engine a number of years ago and used the sliding move algorithm as described by an article describing how it was done in the NagaSkaki engine.
https://mayothi.com/nagaskakichess6.html
I understood the concept and used it. While debugging the code one day, I realized that the speed could be increased by applying a condition prior to do all the shifts as it is not necessary when the blocker (see code below) value is zero. This spead up the code by 10% or so. I then found the Wiki page on shifted bitboards.
https://www.chessprogramming.org/Shifted_Bitboards
which had incorporated the improvement. Yay, I was right. here is the code snippet for one ray.
Code: Select all
U64 rayAttacks[8][64]; // requires initialization
U64 getRightRayAttacks(U64 occupied, enumSquare square) {
U64 attacks = rayAttacks[right][square];
U64 blocker = attacks & occupied;
if ( blocker ) {
blocker = (blocker << 1) | (blocker << 2)
| (blocker << 3) | (blocker << 4)
| (blocker << 5) | (blocker << 6);
attacks ^= (blocker & attacks);
}
return attacks;
}
Recently while looking at this code again, I think I found another way to improve the speed. Instead of doing all the shifts to set bits, the same thing can be accomplished by finding the first occurrence of a set bit in the direction of the ray (Firstbit) and then OR'ing in the rayAttack[] for the square indicated by Firstbit. I haven't touched my engine code in some time and I can't spend the time right now to test it, but If someone already has Shifted Bitboard code in place, perhaps you could try it. I would think one table entry lookup would be faster than all the shifts.
i7-6700K @ 4.00Ghz 32Gb, Win 10 Home, EGTBs on PCI SSD
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
-
- Posts: 396
- Joined: Sat Feb 04, 2017 11:57 pm
- Location: USA
Re: Comparison of all known Sliding lookup algorithms
I tried the Cuda_compare app on my windows 10 PC and all it displayed was the model of my GPU which is a NVIDIA GeForce GTX 1080.
Is it to old to take advantage of the CUDA features?
Vince
Is it to old to take advantage of the CUDA features?
Vince
i7-6700K @ 4.00Ghz 32Gb, Win 10 Home, EGTBs on PCI SSD
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Comparison of all known Sliding lookup algorithms
Thanks Daniel for including Kindergarten Super Split Index Sub-Set Yielding Bitboards. On my Ryzen 9 3950x KSSISSY Bitboards actually came ahead of Kindergarten Bitboards 330.7 to 313.3. Wow! And that is without any optimization that is still to be done. My goal was to combine the best of Kindergarten with the best of SISSY to create a sliding piece generator that is faster than either.dangi12012 wrote: ↑Sat Jan 21, 2023 7:58 pm New Algorithm Found: Kindergarten Super SISSY Bitboards
A very big congraulation to our friend - Mike Sherwin.
Core algorihtm:https://github.com/Gigantua/Chess_Moveg ... yBoard.hppCode: Select all
static uint64_t Bishop(int sq, uint64_t occ) { uint64_t dBlockers = occ & dMask[sq]; uint64_t aBlockers = occ & aMask[sq]; uint64_t dIndex = ((dBlockers * 0x0202020202020202ull) >> 58); uint64_t aIndex = ((aBlockers * 0x0202020202020202ull) >> 58); return (dSubset[sq][dIndex] | aSubset[sq][aIndex]) & ~(1ull << sq); }
https://github.com/Gigantua/Chess_Moveg ... garten.hpp
https://github.com/Gigantua/Chess_Moveg ... /Sissy.hpp
How does it fare against similar algorithms:
Which is extremely well! Especially since this is a genuinely new algorithm and its lookup size is better or tied to both SISSY and Kindergarten boards. Essentially this is one of the very fastest algorithms (on my platform which is x86 - zen3)Code: Select all
Slide Arithmetic 253.626650 256 [2kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inführ http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767 Slide Arithmetic Inline 111.662891 0 [0kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inführ http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767 Kindergarten 452.184844 16640 [130kb] imul64 no Urban Koistinen https://www.chessprogramming.org/Kindergarten_Bitboards SISSY Bitboards 196.301034 180416 [1409kb] none no Michael Sherwin http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083 Kindergarten Super SISSY Bitboards 420.917347 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewtopic.php?f=7&t=81234&start=30 Fancy Magic BB - Variable shift 391.531947 93376 [729kb] imul64 yes Pradu Kannan https://www.chessprogramming.org/Magic_Bitboards#Fancy FoldingHash - 4x fancy magic 217.259060 6468 [50kb] none no Daniel Inführ tbd Plain Magic BB 520.643515 295168 [2306kb] imul64 no Lasse Hansen https://www.chessprogramming.org/Magic_Bitboards#Plain Black Magic BB - Fixed shift 537.245916 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy Pext constexpr 770.780235 107904 [843kb] pext_u64 yes Zach Wegner https://www.chessprogramming.org/BMI2#PEXTBitboards
This is a very productive year so far and in Jan 2023 we already have 2 new algorithms.
There were no rounds of optimisation on it yet so it looks very good to be optimized further on first glance.
As always: In real production engine code you have L2 and register pressure from other parts so to determine the best algorithm for your particular solution and platform its important to test.
Very cool stuff!
Code: Select all
Microsoft Windows [Version 10.0.19044.2486]
C:\Users\mjshe\source\repos\Chess_Movegen-main\x64\Release>Movegen_Compare
Verify Engines...OK!
AMD Ryzen 9 3950X 16-Core Processor
Million Lookups/s Random Squares, Random Occupation/s:
Name Performance [MQueens/s] Tablesize Dependencies Template Author Reference
SBAMG o^(o-3cbn) 185.814316 576 [4kb] countl_zero, bswap yes Syed Fahad http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline 119.276919 0 [0kb] countl_zero, bswap yes Syed Fahad and Daniel Inführ http://www.talkchess.com/forum3/viewtopic.php?t=59845
GaloisField - AVX512 4.978956 0 [0kb] AVX512F_GFNI no Daniel Inführ (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=81335
Hyperbola Quintessence o^(o-2r) 233.537094 256 [2kb] bswap no Ryan Mack https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline 75.355508 0 [0kb] bswap yes Ryan Mack https://www.chessprogramming.org/Hyperbola_Quintessence
Genetic 8 Ray 31.337153 0 [0kb] bswap no Daniel Inführ (dangi12012) Abstract C++ Syntax Tree Sifter (c) Daniel Infuehr
Bitrotation 28.589173 0 [0kb] ReverseBits no Daniel Inführ (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078&start=20
Binary Neural Network 13.306265 5852 [45kb] pdep_u64, AVX2 no Daniel Inführ (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79332
Exploding Bitboards 50.149227 768 [6kb] imul64 no Harald Lüßen http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=4523&start=80
Reference (Switch Lookup) 26.109347 0 [0kb] none yes Daniel Inführ (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78235&p=907362&hilit=espresso#p907362
AVX Branchless Shift 104.793444 0 [0kb] AVX2 no Daniel Inführ (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=60
Pext Emulated 47.447844 107904 [843kb] none no Zach Wegner https://randombit.net/bitbashing/posts/haswell_bit_permutations.html
Dumb7 Fill 49.996975 0 [0kb] none no Gunnar Andersson https://www.chessprogramming.org/Dumb7Fill
Kogge-Stone 91.967173 0 [0kb] none no Peter M. Kogge, Harold S. Stone https://www.chessprogramming.org/Kogge-Stone_Algorithm
Rotated Bitboards 35.294128 1848 [14kb] none no Robert Hyatt https://www.chessprogramming.org/Rotated_Bitboards
QBBEngine 151.972066 0 [0kb] countr_zero, countl_zero yes Fabio Gobbato https://www.chessprogramming.org/QBBEngine
QBBEngine - Shifted Mask 151.707237 0 [0kb] countr_zero, countl_zero no Fabio Gobbato http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=90#p924623
Classical Bob-Mike 162.468393 1024 [8kb] countr_zero, countl_zero yes Robert Hyatt and Michael Sherwin https://www.chessprogramming.org/Classical_Approach
Advanced Bob-Mike 177.823511 520 [4kb] countr_zero, countl_zero no Michael Sherwin and Daniel Inführ http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078&start=50#p924653
Leorik 187.654933 128 [1kb] countl_zero no Thomas Jahn (lithander) https://github.com/lithander/MinimalChessEngine
Leorik Inline 84.522435 0 [0kb] countl_zero no Thomas Jahn (lithander) https://github.com/lithander/MinimalChessEngine
Obstruction Difference 173.490832 768 [6kb] countl_zero no Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline 74.818061 0 [0kb] countl_zero yes Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?t=29087
Genetic Obstruction Difference 203.630736 384 [3kb] countl_zero no Daniel Inführ and Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79701
Genetic Obstruction Difference V2 231.707289 768 [6kb] countl_zero no Daniel Inführ http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79701
Slide Arithmetic 191.577916 256 [2kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inführ http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline 82.584805 0 [0kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inführ http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Kindergarten 313.312327 16640 [130kb] imul64 no Urban Koistinen https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards 160.812640 180416 [1409kb] none no Michael Sherwin http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Kindergarten Super SISSY Bitboards 330.705693 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewtopic.php?f=7&t=81234&start=30
Fancy Magic BB - Variable shift 270.239473 93376 [729kb] imul64 yes Pradu Kannan https://www.chessprogramming.org/Magic_Bitboards#Fancy
FoldingHash - 4x fancy magic 171.511485 6468 [50kb] none no Daniel Inführ tbd
Plain Magic BB 373.323776 295168 [2306kb] imul64 no Lasse Hansen https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift 447.642602 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr 38.538225 107904 [843kb] pext_u64 yes Zach Wegner https://www.chessprogramming.org/BMI2#PEXTBitboards
HyperCube 51.757490 107680 [841kb] none yes Daniel Inführ (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79004&p=916723&hilit=hypercube#p916723
PLEASE COPY PASTE ABOVE OUTPUT and share here:
http://www.talkchess.com/forum3/posting.php?mode=reply&f=7&t=79005
C:\Users\mjshe\source\repos\Chess_Movegen-main\x64\Release>
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Comparison of all known Sliding lookup algorithms
Result of two optimizations for KSSISSY Bitboards.
1) Created a struct for the main look-up tables.
Gain of about 3 mqps.
2)Removed the addition of & ~(1ull << sq). Not needed.
return (ss.dSubset[sq][dIndex] | ss.aSubset[sq][aIndex]) & ~(1ull << sq);
to
return (ss.dSubset[sq][dIndex] | ss.aSubset[sq][aIndex]);
Gain of about 38 mqps.
More optimizations coming soon.
Kindergarten 346.258390 16640 [130kb] imul64 no Urban Koistinen https://www.chessprogramming.org/Kindergarten_Bitboards
Kindergarten Super SISSY Bitboards 368.475904 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewto ... 4&start=30
1) Created a struct for the main look-up tables.
Code: Select all
struct SubSets {
uint64_t vSubset[64][64];
uint64_t hSubset[64][64];
uint64_t dSubset[64][64];
uint64_t aSubset[64][64];
} ss;
2)Removed the addition of & ~(1ull << sq). Not needed.
return (ss.dSubset[sq][dIndex] | ss.aSubset[sq][aIndex]) & ~(1ull << sq);
to
return (ss.dSubset[sq][dIndex] | ss.aSubset[sq][aIndex]);
Gain of about 38 mqps.
More optimizations coming soon.

Kindergarten 346.258390 16640 [130kb] imul64 no Urban Koistinen https://www.chessprogramming.org/Kindergarten_Bitboards
Kindergarten Super SISSY Bitboards 368.475904 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewto ... 4&start=30
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Comparison of all known Sliding lookup algorithms
Another successful optimization!
Got rid of the hMask[] look-up for the rook.
I'm out of ideas for further optimizations at the moment. Hopefully there are one or two more optimizations possible?
Very, very close to catching Black Magic BB!
Got rid of the hMask[] look-up for the rook.
Code: Select all
static uint64_t Rook(int sq, uint64_t occ)
{
uint64_t vBlockers = (occ >> (sq & 7)) & 0x0101010101010101ull;
uint64_t vIndex = ((vBlockers * 0x0080402010080400ull) >> 58);
uint64_t hIndex = (occ >> (sq & 0b11111000u) + 1) & 0x000000000000003f;
return (ss.vSubset[sq][vIndex] | ss.hSubset[sq][hIndex]);
}
Code: Select all
Kindergarten Super SISSY Bitboards 381.401345 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewtopic.php?f=7&t=81234&start=30
Plain Magic BB 432.728403 295168 [2306kb] imul64 no Lasse Hansen https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift 392.721303 88891 [694kb] imul64 no Onno Garms and Volker Annuss
Very, very close to catching Black Magic BB!

-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Comparison of all known Sliding lookup algorithms
Another idea:
(No computer to verify)
Cache locality should help.
Just look where sq is used. Ideally sq should be used only once to lookup a struct where all masks and pointers are stored that are needed. This enables avx2 and other tricks
I gotta do the comparison with clang15 the differences to msvc are often huge.
(No computer to verify)
Code: Select all
struct SubSets {
uint64_t vSubset[64];
uint64_t hSubset[64];
uint64_t dSubset[64];
uint64_t aSubset[64];
} ss[64]
Just look where sq is used. Ideally sq should be used only once to lookup a struct where all masks and pointers are stored that are needed. This enables avx2 and other tricks
I gotta do the comparison with clang15 the differences to msvc are often huge.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Comparison of all known Sliding lookup algorithms
I tried it but using MSVC 2022 it reported (minus)17 mqps.dangi12012 wrote: ↑Sun Jan 22, 2023 9:29 pm Another idea:
(No computer to verify)
Cache locality should help.Code: Select all
struct SubSets { uint64_t vSubset[64]; uint64_t hSubset[64]; uint64_t dSubset[64]; uint64_t aSubset[64]; } ss[64]
Just look where sq is used. Ideally sq should be used only once to lookup a struct where all masks and pointers are stored that are needed. This enables avx2 and other tricks
I gotta do the comparison with clang15 the differences to msvc are often huge.

On a brighter note rearranging the bishop code gained 2.2 mqps. That helped MSVC 2022 but it might not help clang. I also rearranged the rook code to be consistent with no change.
381.2 -> 383.4

Code: Select all
static uint64_t Bishop(int sq, uint64_t occ)
{
return
ss.dSubset[sq][(((occ & dMask[sq]) * 0x0202020202020202ull) >> 58)] +
ss.aSubset[sq][(((occ & aMask[sq]) * 0x0202020202020202ull) >> 58)];
}
static uint64_t Rook(int sq, uint64_t occ)
{
uint64_t hIndex = (occ >> (sq & 0b11111000u) + 1) & 0x000000000000003f;
uint64_t vIndex = ((((occ >> (sq & 7)) & 0x0101010101010101ull) * 0x0080402010080400ull) >> 58);
return ss.vSubset[sq][vIndex] | ss.hSubset[sq][hIndex];
}