dangi12012 wrote: ↑Thu Feb 10, 2022 10:15 pm
Update!
New Algo!
AVX Branchless Shift
I was getting tired of the lack of AVX2 so I sat down and mixed together Kogge Stone - Dumb7 Fill and NO HEADACHES algo to create this branchless version of what normally is a conditional loop:
You can see its dumb7 fill like but does not infer 4 sliders at once. It infers 1 slider at a time - but in 4 directions at once.

Performance is great because it beats QBBEngine by a small amount!
Update2:
Neural Network speed improved by 3x.
This was made possible by training against the 16 bits of pext and not against the occupancy itself.
Training code is here If you run it - it should produce most C++ code needed:
https://github.com/Gigantua/Chess_BinaryNeuralNetwork
This is the performance for truly sampling random positions:
Code: Select all
Million Lookups/s Random Squares, Random Occupation/s:
Name Performance [MQueens/s] Tablesize Dependencies Template Author Reference
Binary Neural Network 52.163979 5852 [45kb] pdep_u64, AVX2 no Daniel Inführ (dangi12012) Not released yet
Exploding Bitboards 64.496087 768 [6kb] imul64 no Harald Lüßen http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=4523&start=80
Reference (Switch Lookup) 35.547266 0 [0kb] none yes Daniel Inführ (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78235&p=907362&hilit=espresso#p907362
AVX Branchless Shift 185.208336 0 [0kb] none no Daniel Inführ (dangi12012)
Pext Emulated 66.814698 107904 [843kb] none no Zach Wegner https://randombit.net/bitbashing/posts/haswell_bit_permutations.html
Dumb7 Fill 68.671650 0 [0kb] none no Gunnar Andersson https://www.chessprogramming.org/Dumb7Fill
Kogge-Stone 111.041916 0 [0kb] none no Peter M. Kogge, Harold S. Stone https://www.chessprogramming.org/Kogge-Stone_Algorithm
Rotated Bitboards 46.262581 1848 [14kb] none no Robert Hyatt https://www.chessprogramming.org/Rotated_Bitboards
QBBEngine 176.931912 0 [0kb] countr_zero, countl_zero yes Fabio Gobbato https://www.chessprogramming.org/QBBEngine
Classical Bob-Mike 196.873001 1024 [8kb] countr_zero, countl_zero yes Robert Hyatt and Michael Sherwin https://www.chessprogramming.org/Classical_Approach
Leorik 216.790810 128 [1kb] countl_zero no Thomas Jahn (lithander) https://github.com/lithander/MinimalChessEngine
Leorik Inline 103.871642 0 [0kb] countl_zero no Thomas Jahn (lithander) https://github.com/lithander/MinimalChessEngine
Obstruction Difference 241.152710 768 [6kb] countl_zero no Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline 90.061827 0 [0kb] countl_zero yes Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?t=29087
Slide Arithmetic 242.023315 256 [2kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inführ http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline 105.799674 0 [0kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inführ http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
SBAMG o^(o-3cbn) 235.724430 576 [4kb] countl_zero, bswap yes Syed Fahad http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline 93.100479 0 [0kb] countl_zero, bswap yes Syed Fahad and Daniel Inführ http://www.talkchess.com/forum3/viewtopic.php?t=59845
Hyperbola Quintessence o^(o-2r) 293.503304 256 [2kb] bswap no Ryan Mack https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline 93.087840 0 [0kb] bswap yes Ryan Mack https://www.chessprogramming.org/Hyperbola_Quintessence
Kindergarten 446.025357 16640 [130kb] imul64 no Urban Koistinen https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards 201.621711 180416 [1409kb] none no Michael Sherwin http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Fancy Magic BB - Variable shift 381.589575 93376 [729kb] imul64 yes Pradu Kannan https://www.chessprogramming.org/Magic_Bitboards#Fancy
Plain Magic BB 478.667390 295168 [2306kb] imul64 no Lasse Hansen https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift 522.343232 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr 742.179286 107904 [843kb] pext_u64 yes Zach Wegner https://www.chessprogramming.org/BMI2#PEXTBitboards
HyperCube 69.705720 107680 [841kb] none yes Daniel Inführ (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79004&p=916723&hilit=hypercube#p916723
Well, in my opinion, the results over the chat look very inconsistent. On the one hand in the ranking of the algorithms, on the other hand the concrete numbers that are presented. Even on my computer, the ranking per measurement is not stable.
Sorry, if I ask a stupid question. But what is random sampling supposed to do here? From my point of view essential components for a comparison of the algorithms are missing. (assuming that I have interpreted the term correctly at this point).
Basically, you have a board state and an action that leads to a subsequent state. One does not jump through randomly possible states. In the performance context this can even mean that you never get to the next relevant state.
So, it is absolutely necessary to include an action like execute move in the performance comparison. With a uniform implementation, the algorithms remain comparable and the effort for the action is constant and does not affect the ranking.
A significant difference is that the performance of some algorithms changes relatively when the board states are connected by an action.
For example, if the target squares for a sliding piece do not change,
algorithms that do the computation on the fly, can access a cache and use the last computation. This benefits these algorithms more than algorithms such as Magics that only compute an index and perform a memory lookup.This means in practice and in ranking, the results shift in favor of direct computations. It also means that the optimization of the algorithms is far from being as complete as you presented it.
I actually find it interesting to see the comparison of the algorithms.
But so in the big picture of this thread, at least for me personally, the fundamentals for direct comparison are pretty skewed at this point.