Comparison of all known Sliding lookup algorithms

Discussion of chess software programming and technical issues.

Moderator: Ras

dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

lithander wrote: Wed Feb 09, 2022 1:48 pm Neural networks are exciting, I get it. But why are you not trying to apply machine learning approaches to topics where there's potential benefit to real chess engines? Like the evaluation or move ordering or maybe even reductions and extensions? In all these situations all bits of gamestate are potentially relevant and exactly in what way the input correlates with the output is hard to define so it seems like a more promising area to try and apply machine learning approaches.

...but maybe I just don't get what the real world usecase for Binary Neural Networks Sliding Inference is. Is there one?
There are multiple reasons: First of all the biggest reason is the above post - get access to something that is literally 3 orders of magnitude faster than my 16 Core Ryzen CPU. That is exciting in itself and could potentially be the fastest movegen yet (for nvidia).
Second: You can have the same backend for movegeneration and moveevaluation that improves locality. Why even write any code at all for ches?
Third: There is no need to maintain a mailbox for Neural networks!!!

Fourth: This is infinitely malluable and binary networks are underused or not used at all in chess which is an oversight imo - the dotproduct for any type is defined as this: x0*y0+xn*yn. With binary networks its this: popcnt(x^y). 4 popcnts can be done in ONE clockcycle. (very cheap). 64 additions are more expensive (take 2 cycle of avx2 at least and still need 4 more shuffle operations to reduce back to a single number)

Best reason:
The native language of the first layer of any network in chess machine learning code can go from char[64] to uint64_t which is an 8x improvement in memory and the operations to infer the output are also cheaper! (See above!)

Why movegen:
Its an obvious target to see how fast it can be made and if learning can achieve 100.00% accuracy. I have definitely proven that 100% accuracy can be achieved.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Comparison of all known Sliding lookup algorithms

Post by lithander »

Okay, that's unfamiliar territory for me but I begin to understand the big picture idea.

However, regarding the details...
dangi12012 wrote: Wed Feb 09, 2022 6:48 pm the dotproduct for any type is defined as this: x0*y0+xn*yn. With binary networks its this: popcnt(x^y). 4 popcnts can be done in ONE clockcycle. (very cheap).
Why isn't it popcnt(x&y)? You only want to count a 1 where both xn and yn happen to be 1? Thats AND not XOR?
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

Update!

New Algo!
AVX Branchless Shift
I was getting tired of the lack of AVX2 so I sat down and mixed together Kogge Stone - Dumb7 Fill and NO HEADACHES algo to create this branchless version of what normally is a conditional loop:

You can see its dumb7 fill like but does not infer 4 sliders at once. It infers 1 slider at a time - but in 4 directions at once.
Image
Performance is great because it beats QBBEngine by a small amount! :)

Update2:
Neural Network speed improved by 3x.
This was made possible by training against the 16 bits of pext and not against the occupancy itself.
Training code is here If you run it - it should produce most C++ code needed: https://github.com/Gigantua/Chess_BinaryNeuralNetwork


This is the performance for truly sampling random positions:

Code: Select all

Million Lookups/s Random Squares, Random Occupation/s:
Name                               Performance [MQueens/s]       Tablesize           Dependencies             Template  Author                                       Reference
Binary Neural Network              52.163979                     5852    [45kb]      pdep_u64, AVX2           no        Daniel Inführ (dangi12012)                   Not released yet
Exploding Bitboards                64.496087                     768     [6kb]       imul64                   no        Harald Lüßen                                  http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=4523&start=80
Reference (Switch Lookup)          35.547266                     0       [0kb]       none                     yes       Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78235&p=907362&hilit=espresso#p907362
AVX Branchless Shift               185.208336                    0       [0kb]       none                     no        Daniel Inführ (dangi12012)
Pext Emulated                      66.814698                     107904  [843kb]     none                     no        Zach Wegner                                  https://randombit.net/bitbashing/posts/haswell_bit_permutations.html
Dumb7 Fill                         68.671650                     0       [0kb]       none                     no        Gunnar Andersson                             https://www.chessprogramming.org/Dumb7Fill
Kogge-Stone                        111.041916                    0       [0kb]       none                     no        Peter M. Kogge, Harold S. Stone              https://www.chessprogramming.org/Kogge-Stone_Algorithm
Rotated Bitboards                  46.262581                     1848    [14kb]      none                     no        Robert Hyatt                                 https://www.chessprogramming.org/Rotated_Bitboards
QBBEngine                          176.931912                    0       [0kb]       countr_zero, countl_zero yes       Fabio Gobbato                                https://www.chessprogramming.org/QBBEngine
Classical Bob-Mike                 196.873001                    1024    [8kb]       countr_zero, countl_zero yes       Robert Hyatt and Michael Sherwin             https://www.chessprogramming.org/Classical_Approach
Leorik                             216.790810                    128     [1kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Leorik Inline                      103.871642                    0       [0kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Obstruction Difference             241.152710                    768     [6kb]       countl_zero              no        Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline      90.061827                     0       [0kb]       countl_zero              yes       Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Slide Arithmetic                   242.023315                    256     [2kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline            105.799674                    0       [0kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
SBAMG o^(o-3cbn)                   235.724430                    576     [4kb]       countl_zero, bswap       yes       Syed Fahad                                   http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline                       93.100479                     0       [0kb]       countl_zero, bswap       yes       Syed Fahad and Daniel Inführ                 http://www.talkchess.com/forum3/viewtopic.php?t=59845
Hyperbola Quintessence o^(o-2r)    293.503304                    256     [2kb]       bswap                    no        Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline      93.087840                     0       [0kb]       bswap                    yes       Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Kindergarten                       446.025357                    16640   [130kb]     imul64                   no        Urban Koistinen                              https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards                    201.621711                    180416  [1409kb]    none                     no        Michael Sherwin                              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Fancy Magic BB - Variable shift    381.589575                    93376   [729kb]     imul64                   yes       Pradu Kannan                                 https://www.chessprogramming.org/Magic_Bitboards#Fancy
Plain Magic BB                     478.667390                    295168  [2306kb]    imul64                   no        Lasse Hansen                                 https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift       522.343232                    88891   [694kb]     imul64                   no        Onno Garms and Volker Annuss                 https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr                     742.179286                    107904  [843kb]     pext_u64                 yes       Zach Wegner                                  https://www.chessprogramming.org/BMI2#PEXTBitboards
HyperCube                          69.705720                     107680  [841kb]     none                     yes       Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79004&p=916723&hilit=hypercube#p916723
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Comparison of all known Sliding lookup algorithms

Post by Desperado »

dangi12012 wrote: Thu Feb 10, 2022 10:15 pm Update!

New Algo!
AVX Branchless Shift
I was getting tired of the lack of AVX2 so I sat down and mixed together Kogge Stone - Dumb7 Fill and NO HEADACHES algo to create this branchless version of what normally is a conditional loop:

You can see its dumb7 fill like but does not infer 4 sliders at once. It infers 1 slider at a time - but in 4 directions at once.
Image
Performance is great because it beats QBBEngine by a small amount! :)

Update2:
Neural Network speed improved by 3x.
This was made possible by training against the 16 bits of pext and not against the occupancy itself.
Training code is here If you run it - it should produce most C++ code needed: https://github.com/Gigantua/Chess_BinaryNeuralNetwork


This is the performance for truly sampling random positions:

Code: Select all

Million Lookups/s Random Squares, Random Occupation/s:
Name                               Performance [MQueens/s]       Tablesize           Dependencies             Template  Author                                       Reference
Binary Neural Network              52.163979                     5852    [45kb]      pdep_u64, AVX2           no        Daniel Inführ (dangi12012)                   Not released yet
Exploding Bitboards                64.496087                     768     [6kb]       imul64                   no        Harald Lüßen                                  http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=4523&start=80
Reference (Switch Lookup)          35.547266                     0       [0kb]       none                     yes       Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78235&p=907362&hilit=espresso#p907362
AVX Branchless Shift               185.208336                    0       [0kb]       none                     no        Daniel Inführ (dangi12012)
Pext Emulated                      66.814698                     107904  [843kb]     none                     no        Zach Wegner                                  https://randombit.net/bitbashing/posts/haswell_bit_permutations.html
Dumb7 Fill                         68.671650                     0       [0kb]       none                     no        Gunnar Andersson                             https://www.chessprogramming.org/Dumb7Fill
Kogge-Stone                        111.041916                    0       [0kb]       none                     no        Peter M. Kogge, Harold S. Stone              https://www.chessprogramming.org/Kogge-Stone_Algorithm
Rotated Bitboards                  46.262581                     1848    [14kb]      none                     no        Robert Hyatt                                 https://www.chessprogramming.org/Rotated_Bitboards
QBBEngine                          176.931912                    0       [0kb]       countr_zero, countl_zero yes       Fabio Gobbato                                https://www.chessprogramming.org/QBBEngine
Classical Bob-Mike                 196.873001                    1024    [8kb]       countr_zero, countl_zero yes       Robert Hyatt and Michael Sherwin             https://www.chessprogramming.org/Classical_Approach
Leorik                             216.790810                    128     [1kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Leorik Inline                      103.871642                    0       [0kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Obstruction Difference             241.152710                    768     [6kb]       countl_zero              no        Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline      90.061827                     0       [0kb]       countl_zero              yes       Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Slide Arithmetic                   242.023315                    256     [2kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline            105.799674                    0       [0kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
SBAMG o^(o-3cbn)                   235.724430                    576     [4kb]       countl_zero, bswap       yes       Syed Fahad                                   http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline                       93.100479                     0       [0kb]       countl_zero, bswap       yes       Syed Fahad and Daniel Inführ                 http://www.talkchess.com/forum3/viewtopic.php?t=59845
Hyperbola Quintessence o^(o-2r)    293.503304                    256     [2kb]       bswap                    no        Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline      93.087840                     0       [0kb]       bswap                    yes       Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Kindergarten                       446.025357                    16640   [130kb]     imul64                   no        Urban Koistinen                              https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards                    201.621711                    180416  [1409kb]    none                     no        Michael Sherwin                              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Fancy Magic BB - Variable shift    381.589575                    93376   [729kb]     imul64                   yes       Pradu Kannan                                 https://www.chessprogramming.org/Magic_Bitboards#Fancy
Plain Magic BB                     478.667390                    295168  [2306kb]    imul64                   no        Lasse Hansen                                 https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift       522.343232                    88891   [694kb]     imul64                   no        Onno Garms and Volker Annuss                 https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr                     742.179286                    107904  [843kb]     pext_u64                 yes       Zach Wegner                                  https://www.chessprogramming.org/BMI2#PEXTBitboards
HyperCube                          69.705720                     107680  [841kb]     none                     yes       Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79004&p=916723&hilit=hypercube#p916723
Well, in my opinion, the results over the chat look very inconsistent. On the one hand in the ranking of the algorithms, on the other hand the concrete numbers that are presented. Even on my computer, the ranking per measurement is not stable.

Sorry, if I ask a stupid question. But what is random sampling supposed to do here? From my point of view essential components for a comparison of the algorithms are missing. (assuming that I have interpreted the term correctly at this point).

Basically, you have a board state and an action that leads to a subsequent state. One does not jump through randomly possible states. In the performance context this can even mean that you never get to the next relevant state.

So, it is absolutely necessary to include an action like execute move in the performance comparison. With a uniform implementation, the algorithms remain comparable and the effort for the action is constant and does not affect the ranking.

A significant difference is that the performance of some algorithms changes relatively when the board states are connected by an action.

For example, if the target squares for a sliding piece do not change,
algorithms that do the computation on the fly, can access a cache and use the last computation. This benefits these algorithms more than algorithms such as Magics that only compute an index and perform a memory lookup.This means in practice and in ranking, the results shift in favor of direct computations. It also means that the optimization of the algorithms is far from being as complete as you presented it.

I actually find it interesting to see the comparison of the algorithms.
But so in the big picture of this thread, at least for me personally, the fundamentals for direct comparison are pretty skewed at this point.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

Desperado wrote: Fri Feb 11, 2022 8:50 pm Sorry, if I ask a stupid question. But what is random sampling supposed to do here? From my point of view essential components for a comparison of the algorithms are missing. (assuming that I have interpreted the term correctly at this point).

Basically, you have a board state and an action that leads to a subsequent state. One does not jump through randomly possible states. In the performance context this can even mean that you never get to the next relevant state.

So, it is absolutely necessary to include an action like execute move in the performance comparison. With a uniform implementation, the algorithms remain comparable and the effort for the action is constant and does not affect the ranking.

A significant difference is that the performance of some algorithms changes relatively when the board states are connected by an action.

For example, if the target squares for a sliding piece do not change,
algorithms that do the computation on the fly, can access a cache and use the last computation. This benefits these algorithms more than algorithms such as Magics that only compute an index and perform a memory lookup.This means in practice and in ranking, the results shift in favor of direct computations.

I actually find it interesting to see the comparison of the algorithms.
But so in the big picture of this thread, at least for me personally, the fundamentals for direct comparison are pretty skewed at this point.
Hello Desperado, Thanks for the question!
The big point I want to make is that no objective slider algorithm ranking can exist because the performance of these algorithms depend very much on the hardware and context (like you noted above). For example in a real engine there will be existing L2 cache pressure so that fancy magic may not perform as expected.

Biggest differences:
Hardware differences (intel, amd) - Ram speed and timings, DDR5 vs DDR4 vs DDR4
Compiler and flags (huge difference) (clang, gcc msvc)
Core clocks in the context AVX2 vs no AVX2
Mailslot vs Bitboard and general register pressure of other engine code (huge difference)

What I have created here is a dependency free and self encapsulated C++20 (constexpr compiletime initialized) code that you can test with a uniform interface in the context of your application easily by dropping in a single header. So the real ranking can only exist if you test perf where you would normally call these algos inside your own application. Please keep the header intact since this is mandated by the licence.

With these differences in mind I implemented these 3 Test modes:
Image

- Completely random sampling (bitboards)
- Known square sampling (mailslots with a 64xloop)
- Threaded environment random sampling

What I will implement is sorting this output by perf then the output will look cleaner.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Comparison of all known Sliding lookup algorithms

Post by Desperado »

dangi12012 wrote: Fri Feb 11, 2022 9:07 pm
Desperado wrote: Fri Feb 11, 2022 8:50 pm Sorry, if I ask a stupid question. But what is random sampling supposed to do here? From my point of view essential components for a comparison of the algorithms are missing. (assuming that I have interpreted the term correctly at this point).

Basically, you have a board state and an action that leads to a subsequent state. One does not jump through randomly possible states. In the performance context this can even mean that you never get to the next relevant state.

So, it is absolutely necessary to include an action like execute move in the performance comparison. With a uniform implementation, the algorithms remain comparable and the effort for the action is constant and does not affect the ranking.

A significant difference is that the performance of some algorithms changes relatively when the board states are connected by an action.

For example, if the target squares for a sliding piece do not change,
algorithms that do the computation on the fly, can access a cache and use the last computation. This benefits these algorithms more than algorithms such as Magics that only compute an index and perform a memory lookup.This means in practice and in ranking, the results shift in favor of direct computations.

I actually find it interesting to see the comparison of the algorithms.
But so in the big picture of this thread, at least for me personally, the fundamentals for direct comparison are pretty skewed at this point.
Hello Desperado, Thanks for the question!
The big point I want to make is that no objective slider algorithm ranking can exist because the performance of these algorithms depend very much on the hardware and context (like you noted above). For example in a real engine there will be existing L2 cache pressure so that fancy magic may not perform as expected.

Biggest differences:
Hardware differences (intel, amd) - Ram speed and timings, DDR5 vs DDR4 vs DDR4
Compiler and flags (huge difference) (clang, gcc msvc)
Core clocks in the context AVX2 vs no AVX2
Mailslot vs Bitboard and general register pressure of other engine code (huge difference)

What I have created here is a dependency free and self encapsulated C++20 (constexpr compiletime initialized) code that you can test with a uniform interface in the context of your application easily by dropping in a single header. So the real ranking can only exist if you test perf where you would normally call these algos inside your own application. Please keep the header intact since this is mandated by the licence.

With these differences in mind I implemented these 3 Test modes:
Image

- Completely random sampling (bitboards)
- Known square sampling (mailslots with a 64xloop)
- Threaded environment random sampling

What I will implement is sorting this output by perf then the output will look cleaner.
Well, first make them consistent in ranking and in the performance output. Your reports vary a lot. The reasons are not only performance updates. There is somthing fishy. Just have a look in your reports, the relative rankings, the corresponding numbers or details (like thread counts).

I still say that random sampling does not provide any useful information, until you explain to me, what it is good for. (in this kind of perf. test)
What extra information do random bitboards provide? All extra pieces are useless and unconnected board states handicap some algorithms.
You might say that the handicap of one algorithm is the strength of another. No, because this kind of handicap you introduce does not exist in a real world application. I am not aware of a single application where this kind perfomance test is related to the real performance of the algo.

Well, to plug the algorithms into an existing framework is possible.
What most people are interested in, is the result of your passion, to compare the algos. So, present your efforts in a meaninful way.

All developers know how to classify the results and what to consider in hardware, software and specifics of an algorithm in the result. If not, one or two meaningful comments usually help. Nevertheless. The algorithms should be presented by a state-action pair. Perft with bulk counting then puts the priority on generation instead of move execution. So you get a very good and general impression about the relative ranking. Everybody is aware that this will be an upperbound performance compared to a real chess engine.

Regards.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

Desperado wrote: Fri Feb 11, 2022 10:12 pm Well, first make them consistent in ranking and in the performance output. Your reports vary a lot. The reasons are not only performance updates. There is somthing fishy. Just have a look in your reports, the relative rankings, the corresponding numbers or details (like thread counts).

I still say that random sampling does not provide any useful information, until you explain to me, what it is good for. (in this kind of perf. test)
What extra information do random bitboards provide? All extra pieces are useless and unconnected board states handicap some algorithms.
You might say that the handicap of one algorithm is the strength of another. No, because this kind of handicap you introduce does not exist in a real world application. I am not aware of a single application where this kind perfomance test is related to the real performance of the algo.
Regards.
Yes you are correct. I mixed up the CLANG and MSVC outputs there. And this is my point - just by switching compilers the performance ranking is completely changed. Same thing will happen when going AMD to Intel. So take it with a grain of salt and here are the outputs that are consistent with previous postings.

If you have ideas how to improve the testing framework - pull request are welcome.
Also if you share your outputs that would be great too!

Perft is not an engine - and gives the sense that this ranking is meaningful in itself - which it is -
But only in the context of the code you intend to run it in!

Also the summary is really simple: If you can use PEXT - do it. If not - profile you app.
If you intend to write GPGPU - wait for my upcoming work!

Code: Select all

AMD Ryzen 9 5950X 16-Core Processor

Million Lookups/s Known Squares, Random Occupation/s:
Name                               Performance [MQueens/s]       Tablesize           Dependencies             Template  Author                                       Reference
Binary Neural Network              48.023759                     5852    [45kb]      pdep_u64, AVX2           no        Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79332
Exploding Bitboards                128.830259                    768     [6kb]       imul64                   no        Harald Lüßen                                  http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=4523&start=80
Reference (Switch Lookup)          170.292341                    0       [0kb]       none                     yes       Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78235&p=907362&hilit=espresso#p907362
AVX Branchless Shift               166.325569                    0       [0kb]       AVX2                     no        Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=60
Pext Emulated                      83.768208                     107904  [843kb]     none                     no        Zach Wegner                                  https://randombit.net/bitbashing/posts/haswell_bit_permutations.html
Dumb7 Fill                         94.121495                     0       [0kb]       none                     no        Gunnar Andersson                             https://www.chessprogramming.org/Dumb7Fill
Kogge-Stone                        152.343302                    0       [0kb]       none                     no        Peter M. Kogge, Harold S. Stone              https://www.chessprogramming.org/Kogge-Stone_Algorithm
Rotated Bitboards                  49.943653                     1848    [14kb]      none                     no        Robert Hyatt                                 https://www.chessprogramming.org/Rotated_Bitboards
QBBEngine                          195.516860                    0       [0kb]       countr_zero, countl_zero yes       Fabio Gobbato                                https://www.chessprogramming.org/QBBEngine
Classical Bob-Mike                 221.942174                    1024    [8kb]       countr_zero, countl_zero yes       Robert Hyatt and Michael Sherwin             https://www.chessprogramming.org/Classical_Approach
Leorik                             195.065454                    128     [1kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Leorik Inline                      265.642063                    0       [0kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Obstruction Difference             219.632164                    768     [6kb]       countl_zero              no        Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline      483.565205                    0       [0kb]       countl_zero              yes       Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Slide Arithmetic                   223.454481                    256     [2kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline            256.748370                    0       [0kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
SBAMG o^(o-3cbn)                   264.806950                    576     [4kb]       countl_zero, bswap       yes       Syed Fahad                                   http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline                       324.539167                    0       [0kb]       countl_zero, bswap       yes       Syed Fahad and Daniel Inführ                 http://www.talkchess.com/forum3/viewtopic.php?t=59845
Hyperbola Quintessence o^(o-2r)    263.473369                    256     [2kb]       bswap                    no        Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline      322.005612                    0       [0kb]       bswap                    yes       Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Kindergarten                       345.842567                    16640   [130kb]     imul64                   no        Urban Koistinen                              https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards                    201.853456                    180416  [1409kb]    none                     no        Michael Sherwin                              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Fancy Magic BB - Variable shift    232.096813                    93376   [729kb]     imul64                   yes       Pradu Kannan                                 https://www.chessprogramming.org/Magic_Bitboards#Fancy
Plain Magic BB                     850.501730                    295168  [2306kb]    imul64                   no        Lasse Hansen                                 https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift       981.836033                    88891   [694kb]     imul64                   no        Onno Garms and Volker Annuss                 https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr                     1412.897101                   107904  [843kb]     pext_u64                 yes       Zach Wegner                                  https://www.chessprogramming.org/BMI2#PEXTBitboards
HyperCube                          282.754577                    107680  [841kb]     none                     yes       Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79004&p=916723&hilit=hypercube#p916723
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Comparison of all known Sliding lookup algorithms

Post by Sopel »

Your benchmarks seem to have variance of about 50%.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

Question for the community:
Could somebody run Movegen_Compare on a Raspberry PI?

Code: Select all

git clone https://github.com/Gigantua/Chess_Movegen.git
cd Chess_Movegen
make
./movegen_compare
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
abulmo2
Posts: 462
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: Comparison of all known Sliding lookup algorithms

Post by abulmo2 »

dangi12012 wrote: Wed Feb 16, 2022 11:33 pm Question for the community:
Could somebody run Movegen_Compare on a Raspberry PI?

Code: Select all

git clone https://github.com/Gigantua/Chess_Movegen.git
cd Chess_Movegen
make
./movegen_compare
Compilation failed on my Raspberry PI 4. It complains about the builtin_ia32_* functions . They are for Intel-like cpu, not ARM. It
also complains about the cpuid asm code.
Richard Delorme