Comparison of all known Sliding lookup algorithms

Discussion of chess software programming and technical issues.

Moderator: Ras

MOBMAT
Posts: 396
Joined: Sat Feb 04, 2017 11:57 pm
Location: USA

Re: Comparison of all known Sliding lookup algorithms

Post by MOBMAT »

I saw a link (not in wiki) about move generation using a GPU.
https://github.com/ankan-ban/perft_gpu

Is this a "thing" and does it work as fast as claimed?
Granted, not everyone could take advantage of an engine that requires a GPU, but it could be offered as yet another flavor of an engine release.

Has anyone implemented an engine with code like this?

Vince
i7-6700K @ 4.00Ghz 32Gb, Win 10 Home, EGTBs on PCI SSD
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
User avatar
Ajedrecista
Posts: 2097
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Comparison of all known sliding lookup algorithms.

Post by Ajedrecista »

Hello:
MOBMAT wrote: Sun Jan 15, 2023 7:32 am I saw a link (not in wiki) about move generation using a GPU.
https://github.com/ankan-ban/perft_gpu

Is this a "thing" and does it work as fast as claimed?
Granted, not everyone could take advantage of an engine that requires a GPU, but it could be offered as yet another flavor of an engine release.

Has anyone implemented an engine with code like this?

Vince
It worked. That tool was used to compute Perft(15) of the starting position back in 2017:

perft(15)

It also confirmed Perft(14) value:

Re: yet another attempt on Perft(14)

Waiting for Perft(16)...

Regards from Spain.

Ajedrecista.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

New Algorithm Found: Kindergarten Super SISSY Bitboards

A very big congraulation to our friend - Mike Sherwin.

Core algorihtm:

Code: Select all

	static uint64_t Bishop(int sq, uint64_t occ)
	{
		uint64_t dBlockers = occ & dMask[sq];
		uint64_t aBlockers = occ & aMask[sq];
		uint64_t dIndex = ((dBlockers * 0x0202020202020202ull) >> 58);
		uint64_t aIndex = ((aBlockers * 0x0202020202020202ull) >> 58);
		return (dSubset[sq][dIndex] | aSubset[sq][aIndex]) & ~(1ull << sq);
	}
https://github.com/Gigantua/Chess_Moveg ... yBoard.hpp
https://github.com/Gigantua/Chess_Moveg ... garten.hpp
https://github.com/Gigantua/Chess_Moveg ... /Sissy.hpp

How does it fare against similar algorithms:

Code: Select all

Slide Arithmetic                   253.626650                    256     [2kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline            111.662891                    0       [0kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Kindergarten                       452.184844                    16640   [130kb]     imul64                   no        Urban Koistinen                              https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards                    196.301034                    180416  [1409kb]    none                     no        Michael Sherwin                              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Kindergarten Super SISSY Bitboards 420.917347                    16640   [130kb]     imul64                   no        Michael Sherwin                              https://www.talkchess.com/forum3/viewtopic.php?f=7&t=81234&start=30
Fancy Magic BB - Variable shift    391.531947                    93376   [729kb]     imul64                   yes       Pradu Kannan                                 https://www.chessprogramming.org/Magic_Bitboards#Fancy
FoldingHash - 4x fancy magic       217.259060                    6468    [50kb]      none                     no        Daniel Inführ                                tbd
Plain Magic BB                     520.643515                    295168  [2306kb]    imul64                   no        Lasse Hansen                                 https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift       537.245916                    88891   [694kb]     imul64                   no        Onno Garms and Volker Annuss                 https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr                     770.780235                    107904  [843kb]     pext_u64                 yes       Zach Wegner                                  https://www.chessprogramming.org/BMI2#PEXTBitboards
Which is extremely well! Especially since this is a genuinely new algorithm and its lookup size is better or tied to both SISSY and Kindergarten boards. Essentially this is one of the very fastest algorithms (on my platform which is x86 - zen3)
This is a very productive year so far and in Jan 2023 we already have 2 new algorithms.
There were no rounds of optimisation on it yet so it looks very good to be optimized further on first glance.

As always: In real production engine code you have L2 and register pressure from other parts so to determine the best algorithm for your particular solution and platform its important to test.
Very cool stuff!
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
MOBMAT
Posts: 396
Joined: Sat Feb 04, 2017 11:57 pm
Location: USA

Re: Comparison of all known Sliding lookup algorithms

Post by MOBMAT »

in case you are still collecting data....

Code: Select all

Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Megalooks Random Positions/s:
Exploading: 	96.80MOps	6 kB	Optimal perf: imul64
Reference: 	68.02MOps	8 kB	Optimal perf: none
KoggeStone: 	72.40MOps	0 kB	Optimal perf: none
RotatedBoard: 	121.11MOps	14 kB	Optimal perf: none
QBB Algo: 	115.08MOps	0 kB	Optimal perf: countr_zero, countl_zero
BobMike: 	157.78MOps	8 kB	Optimal perf: countr_zero, countl_zero
Obstr. Diff: 	161.61MOps	6 kB	Optimal perf: countl_zero
Obstr. Inline: 	105.25MOps	0 kB	Optimal perf: countl_zero
SlideArithm: 	160.35MOps	2 kB	Optimal perf: bzhi_u64, blsmsk_u64
SISSY BB: 	111.78MOps	1409 kB	Optimal perf: none
Hyperbola Qsc: 	177.06MOps	2 kB	Optimal perf: bswap
Hash Variable: 	338.37MOps	729 kB	Optimal perf: imul64
Hash Plain: 	414.60MOps	2306 kB	Optimal perf: imul64
Hash Fancy: 	456.21MOps	694 kB	Optimal perf: imul64
Pext  : 	629.59MOps	843 kB	Optimal perf: pext_u64
HyperCube: 	196.17MOps	841 kB	Optimal perf: none
SHIFTED BITBOARDS....are they represented?
I looked through the move gen comparison code but didn't find this method of generating moves. Is it there?

A little background....

I implemented a chess engine a number of years ago and used the sliding move algorithm as described by an article describing how it was done in the NagaSkaki engine.

https://mayothi.com/nagaskakichess6.html

I understood the concept and used it. While debugging the code one day, I realized that the speed could be increased by applying a condition prior to do all the shifts as it is not necessary when the blocker (see code below) value is zero. This spead up the code by 10% or so. I then found the Wiki page on shifted bitboards.

https://www.chessprogramming.org/Shifted_Bitboards

which had incorporated the improvement. Yay, I was right. here is the code snippet for one ray.

Code: Select all

U64 rayAttacks[8][64]; // requires initialization

U64 getRightRayAttacks(U64 occupied, enumSquare square) {
   U64 attacks = rayAttacks[right][square];
   U64 blocker = attacks & occupied;
   if ( blocker ) {
      blocker = (blocker << 1) | (blocker << 2)
              | (blocker << 3) | (blocker << 4)
              | (blocker << 5) | (blocker << 6);
      attacks ^= (blocker & attacks);
   }
   return attacks;
}
This is the code from the Wiki page, I implemented it slightly different by having separate rayAttack arrays for each direction. Since each ray requires unique shift values, it was more readable to explicitly reference each ray attack arrays by name.

Recently while looking at this code again, I think I found another way to improve the speed. Instead of doing all the shifts to set bits, the same thing can be accomplished by finding the first occurrence of a set bit in the direction of the ray (Firstbit) and then OR'ing in the rayAttack[] for the square indicated by Firstbit. I haven't touched my engine code in some time and I can't spend the time right now to test it, but If someone already has Shifted Bitboard code in place, perhaps you could try it. I would think one table entry lookup would be faster than all the shifts.
i7-6700K @ 4.00Ghz 32Gb, Win 10 Home, EGTBs on PCI SSD
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
MOBMAT
Posts: 396
Joined: Sat Feb 04, 2017 11:57 pm
Location: USA

Re: Comparison of all known Sliding lookup algorithms

Post by MOBMAT »

I tried the Cuda_compare app on my windows 10 PC and all it displayed was the model of my GPU which is a NVIDIA GeForce GTX 1080.

Is it to old to take advantage of the CUDA features?

Vince
i7-6700K @ 4.00Ghz 32Gb, Win 10 Home, EGTBs on PCI SSD
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Comparison of all known Sliding lookup algorithms

Post by Mike Sherwin »

dangi12012 wrote: Sat Jan 21, 2023 7:58 pm New Algorithm Found: Kindergarten Super SISSY Bitboards

A very big congraulation to our friend - Mike Sherwin.

Core algorihtm:

Code: Select all

	static uint64_t Bishop(int sq, uint64_t occ)
	{
		uint64_t dBlockers = occ & dMask[sq];
		uint64_t aBlockers = occ & aMask[sq];
		uint64_t dIndex = ((dBlockers * 0x0202020202020202ull) >> 58);
		uint64_t aIndex = ((aBlockers * 0x0202020202020202ull) >> 58);
		return (dSubset[sq][dIndex] | aSubset[sq][aIndex]) & ~(1ull << sq);
	}
https://github.com/Gigantua/Chess_Moveg ... yBoard.hpp
https://github.com/Gigantua/Chess_Moveg ... garten.hpp
https://github.com/Gigantua/Chess_Moveg ... /Sissy.hpp

How does it fare against similar algorithms:

Code: Select all

Slide Arithmetic                   253.626650                    256     [2kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline            111.662891                    0       [0kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Kindergarten                       452.184844                    16640   [130kb]     imul64                   no        Urban Koistinen                              https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards                    196.301034                    180416  [1409kb]    none                     no        Michael Sherwin                              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Kindergarten Super SISSY Bitboards 420.917347                    16640   [130kb]     imul64                   no        Michael Sherwin                              https://www.talkchess.com/forum3/viewtopic.php?f=7&t=81234&start=30
Fancy Magic BB - Variable shift    391.531947                    93376   [729kb]     imul64                   yes       Pradu Kannan                                 https://www.chessprogramming.org/Magic_Bitboards#Fancy
FoldingHash - 4x fancy magic       217.259060                    6468    [50kb]      none                     no        Daniel Inführ                                tbd
Plain Magic BB                     520.643515                    295168  [2306kb]    imul64                   no        Lasse Hansen                                 https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift       537.245916                    88891   [694kb]     imul64                   no        Onno Garms and Volker Annuss                 https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr                     770.780235                    107904  [843kb]     pext_u64                 yes       Zach Wegner                                  https://www.chessprogramming.org/BMI2#PEXTBitboards
Which is extremely well! Especially since this is a genuinely new algorithm and its lookup size is better or tied to both SISSY and Kindergarten boards. Essentially this is one of the very fastest algorithms (on my platform which is x86 - zen3)
This is a very productive year so far and in Jan 2023 we already have 2 new algorithms.
There were no rounds of optimisation on it yet so it looks very good to be optimized further on first glance.

As always: In real production engine code you have L2 and register pressure from other parts so to determine the best algorithm for your particular solution and platform its important to test.
Very cool stuff!
Thanks Daniel for including Kindergarten Super Split Index Sub-Set Yielding Bitboards. On my Ryzen 9 3950x KSSISSY Bitboards actually came ahead of Kindergarten Bitboards 330.7 to 313.3. Wow! And that is without any optimization that is still to be done. My goal was to combine the best of Kindergarten with the best of SISSY to create a sliding piece generator that is faster than either.

Code: Select all

Microsoft Windows [Version 10.0.19044.2486]

C:\Users\mjshe\source\repos\Chess_Movegen-main\x64\Release>Movegen_Compare
Verify Engines...OK!

AMD Ryzen 9 3950X 16-Core Processor

Million Lookups/s Random Squares, Random Occupation/s:
Name                               Performance [MQueens/s]       Tablesize           Dependencies             Template  Author                                       Reference
SBAMG o^(o-3cbn)                   185.814316                    576     [4kb]       countl_zero, bswap       yes       Syed Fahad                                   http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline                       119.276919                    0       [0kb]       countl_zero, bswap       yes       Syed Fahad and Daniel Inführ                 http://www.talkchess.com/forum3/viewtopic.php?t=59845
GaloisField - AVX512               4.978956                      0       [0kb]       AVX512F_GFNI             no        Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=81335
Hyperbola Quintessence o^(o-2r)    233.537094                    256     [2kb]       bswap                    no        Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline      75.355508                     0       [0kb]       bswap                    yes       Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Genetic 8 Ray                      31.337153                     0       [0kb]       bswap                    no        Daniel Inführ (dangi12012)                   Abstract C++ Syntax Tree Sifter (c) Daniel Infuehr
Bitrotation                        28.589173                     0       [0kb]       ReverseBits              no        Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078&start=20
Binary Neural Network              13.306265                     5852    [45kb]      pdep_u64, AVX2           no        Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79332
Exploding Bitboards                50.149227                     768     [6kb]       imul64                   no        Harald Lüßen                                  http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=4523&start=80
Reference (Switch Lookup)          26.109347                     0       [0kb]       none                     yes       Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78235&p=907362&hilit=espresso#p907362
AVX Branchless Shift               104.793444                    0       [0kb]       AVX2                     no        Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=60
Pext Emulated                      47.447844                     107904  [843kb]     none                     no        Zach Wegner                                  https://randombit.net/bitbashing/posts/haswell_bit_permutations.html
Dumb7 Fill                         49.996975                     0       [0kb]       none                     no        Gunnar Andersson                             https://www.chessprogramming.org/Dumb7Fill
Kogge-Stone                        91.967173                     0       [0kb]       none                     no        Peter M. Kogge, Harold S. Stone              https://www.chessprogramming.org/Kogge-Stone_Algorithm
Rotated Bitboards                  35.294128                     1848    [14kb]      none                     no        Robert Hyatt                                 https://www.chessprogramming.org/Rotated_Bitboards
QBBEngine                          151.972066                    0       [0kb]       countr_zero, countl_zero yes       Fabio Gobbato                                https://www.chessprogramming.org/QBBEngine
QBBEngine - Shifted Mask           151.707237                    0       [0kb]       countr_zero, countl_zero no        Fabio Gobbato                                http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=90#p924623
Classical Bob-Mike                 162.468393                    1024    [8kb]       countr_zero, countl_zero yes       Robert Hyatt and Michael Sherwin             https://www.chessprogramming.org/Classical_Approach
Advanced Bob-Mike                  177.823511                    520     [4kb]       countr_zero, countl_zero no        Michael Sherwin and Daniel Inführ            http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078&start=50#p924653
Leorik                             187.654933                    128     [1kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Leorik Inline                      84.522435                     0       [0kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Obstruction Difference             173.490832                    768     [6kb]       countl_zero              no        Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline      74.818061                     0       [0kb]       countl_zero              yes       Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Genetic Obstruction Difference     203.630736                    384     [3kb]       countl_zero              no        Daniel Inführ and Michael Hoffmann           http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79701
Genetic Obstruction Difference V2  231.707289                    768     [6kb]       countl_zero              no        Daniel Inführ                                http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79701
Slide Arithmetic                   191.577916                    256     [2kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline            82.584805                     0       [0kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Kindergarten                       313.312327                    16640   [130kb]     imul64                   no        Urban Koistinen                              https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards                    160.812640                    180416  [1409kb]    none                     no        Michael Sherwin                              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Kindergarten Super SISSY Bitboards 330.705693                    16640   [130kb]     imul64                   no        Michael Sherwin                              https://www.talkchess.com/forum3/viewtopic.php?f=7&t=81234&start=30
Fancy Magic BB - Variable shift    270.239473                    93376   [729kb]     imul64                   yes       Pradu Kannan                                 https://www.chessprogramming.org/Magic_Bitboards#Fancy
FoldingHash - 4x fancy magic       171.511485                    6468    [50kb]      none                     no        Daniel Inführ                                tbd
Plain Magic BB                     373.323776                    295168  [2306kb]    imul64                   no        Lasse Hansen                                 https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift       447.642602                    88891   [694kb]     imul64                   no        Onno Garms and Volker Annuss                 https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr                     38.538225                     107904  [843kb]     pext_u64                 yes       Zach Wegner                                  https://www.chessprogramming.org/BMI2#PEXTBitboards
HyperCube                          51.757490                     107680  [841kb]     none                     yes       Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79004&p=916723&hilit=hypercube#p916723


PLEASE COPY PASTE ABOVE OUTPUT and share here:
 http://www.talkchess.com/forum3/posting.php?mode=reply&f=7&t=79005
C:\Users\mjshe\source\repos\Chess_Movegen-main\x64\Release>
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Comparison of all known Sliding lookup algorithms

Post by Mike Sherwin »

Result of two optimizations for KSSISSY Bitboards.

1) Created a struct for the main look-up tables.

Code: Select all

	struct SubSets {
	  uint64_t vSubset[64][64];
	  uint64_t hSubset[64][64];
	  uint64_t dSubset[64][64];
	  uint64_t aSubset[64][64];
	} ss;
Gain of about 3 mqps.

2)Removed the addition of & ~(1ull << sq). Not needed.
return (ss.dSubset[sq][dIndex] | ss.aSubset[sq][aIndex]) & ~(1ull << sq);
to
return (ss.dSubset[sq][dIndex] | ss.aSubset[sq][aIndex]);
Gain of about 38 mqps.

More optimizations coming soon. :D

Kindergarten 346.258390 16640 [130kb] imul64 no Urban Koistinen https://www.chessprogramming.org/Kindergarten_Bitboards
Kindergarten Super SISSY Bitboards 368.475904 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewto ... 4&start=30
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Comparison of all known Sliding lookup algorithms

Post by Mike Sherwin »

Another successful optimization!

Got rid of the hMask[] look-up for the rook.

Code: Select all

	static uint64_t Rook(int sq, uint64_t occ)
	{
		uint64_t vBlockers = (occ >> (sq & 7)) & 0x0101010101010101ull;
		uint64_t vIndex = ((vBlockers * 0x0080402010080400ull) >> 58);
		uint64_t hIndex = (occ >> (sq & 0b11111000u) + 1) & 0x000000000000003f;
		return (ss.vSubset[sq][vIndex] | ss.hSubset[sq][hIndex]);
	}

Code: Select all

Kindergarten Super SISSY Bitboards 381.401345                    16640   [130kb]     imul64                   no        Michael Sherwin                              https://www.talkchess.com/forum3/viewtopic.php?f=7&t=81234&start=30
Plain Magic BB                     432.728403                    295168  [2306kb]    imul64                   no        Lasse Hansen                                 https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift       392.721303                    88891   [694kb]     imul64                   no        Onno Garms and Volker Annuss
I'm out of ideas for further optimizations at the moment. Hopefully there are one or two more optimizations possible?
Very, very close to catching Black Magic BB! :D
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

Another idea:
(No computer to verify)

Code: Select all

struct SubSets {
	  uint64_t vSubset[64];
	  uint64_t hSubset[64];
	  uint64_t dSubset[64];
	  uint64_t aSubset[64];
	} ss[64]
Cache locality should help.

Just look where sq is used. Ideally sq should be used only once to lookup a struct where all masks and pointers are stored that are needed. This enables avx2 and other tricks

I gotta do the comparison with clang15 the differences to msvc are often huge.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Comparison of all known Sliding lookup algorithms

Post by Mike Sherwin »

dangi12012 wrote: Sun Jan 22, 2023 9:29 pm Another idea:
(No computer to verify)

Code: Select all

struct SubSets {
	  uint64_t vSubset[64];
	  uint64_t hSubset[64];
	  uint64_t dSubset[64];
	  uint64_t aSubset[64];
	} ss[64]
Cache locality should help.

Just look where sq is used. Ideally sq should be used only once to lookup a struct where all masks and pointers are stored that are needed. This enables avx2 and other tricks

I gotta do the comparison with clang15 the differences to msvc are often huge.
I tried it but using MSVC 2022 it reported (minus)17 mqps. :(

On a brighter note rearranging the bishop code gained 2.2 mqps. That helped MSVC 2022 but it might not help clang. I also rearranged the rook code to be consistent with no change.

381.2 -> 383.4 :D

Code: Select all

	static uint64_t Bishop(int sq, uint64_t occ)
	{
		return 
		  ss.dSubset[sq][(((occ & dMask[sq]) * 0x0202020202020202ull) >> 58)] + 
		  ss.aSubset[sq][(((occ & aMask[sq]) * 0x0202020202020202ull) >> 58)];
	}

	static uint64_t Rook(int sq, uint64_t occ)
	{
		uint64_t hIndex = (occ >> (sq & 0b11111000u) + 1) & 0x000000000000003f;
		uint64_t vIndex = ((((occ >> (sq & 7)) & 0x0101010101010101ull) * 0x0080402010080400ull) >> 58);
		return ss.vSubset[sq][vIndex] | ss.hSubset[sq][hIndex];
	}