I would like to release the sourcecode to the fastest Chess movegenerator I was able to write during the last 2½ years.
It is able to generate 2 Billion Moves per Thread per Second. No Hashing is used - and it still is single threaded.
What makes it so fast is summarized below - but in short: Make/Unmake and a Movelist is not needed and 2x slower than a visitor pattern - which is even more powerful and faster and implemented in this movegen. Also expanding the enumeration template for color to include enpassant and all castling squares helps a lot!
Summary:
Explanation Article: https://www.codeproject.com/Articles/53 ... egenerator
Sourcecode: https://github.com/Gigantua/Gigantua
Linux/Windows Binary: https://www.codeproject.com/KB/Articles ... nLinux.zip
It is a Bitboard movegenerator. As it turns out incremental bitboards are slower for this type of generator. Remembering previous from | to square is more expensive than recalculating many of the moves. It is possible to add that to the sourcecode easily and maybe I will merge the exploration branches in the future.
It uses heavy template instantiation with a template inlined "callback" function. What makes this so fast is that you dont need a movelist anymore for this approach. When a legal move is encountered it instantly is decided what to do with it. Expand more - Store - Or sort into a list of good moves. All possible with this approach.
The pin and checkmask is what makes bitboards so good. With a handful of & operations all moves are pruned to only cointain legal moves. An I mean all moves. Even the most ugliest of pinned - prevention of promotion - or check double check pin combos are almost free and certainly branchless!
Also what I think no one implemented before me: Enpassant and Castling are moved to a Boardstate Template. You can input any fen - the appropriate template is called with a single switch and off you go enumerating. The cost of a single switch at the very start of enumerating saves billions of noop instructions for castling / enpassant. The Boardstate contains: Moving Color, HasEnpassant, 4 Castling Squares. Which makes the C++ compiler emit 64 functions for all 6 independent flags.
The beauty of this is that the current color implemented as a template removes all lookups like pawn move direction etc. Code checks for Castling are only in the assembly if it is possible to castle. Once a Rook / King moves - Not a single instruction is used for castling. Not even a single if. It gets if constexpr pruned away.
The reciever for all possible moves are 8 Methods that are named after each piece and movetype - and the from - to squares are there for you in bitboard form. So its almost free to count all castling moves etc. It wont even add a single if instruction. This approach needs no movelist at all - but you can still store the move inside the callback.
What I would like for you the reader to do: Try to find any mistakes or slow stuff in my code. Tear it apart - think about it - ideally comment below with ideas how to make it faster - or better send a pull request into the github repo!
https://github.com/Gigantua/Gigantua
Compile with Visual studio or Linux - Clang -march=native -std=c++20 -lstdc++ -O3 Gigantua.cpp -flto -o giga_clang
Greetings! - Daniel Inführ
References:
http://www.talkchess.com/forum3/viewtop ... =7&t=78230
Perft Start 7: 3195901860 2191ms 1458.31 MNodes/s
Perft Kiwi 6: 8031647685 4004ms 2005.51 MNodes/s
Perft aggregate: 18999768562 10149ms 1871.94 MNodes/s
Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release
Moderator: Ras
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 179
- Joined: Tue Jun 15, 2021 8:11 pm
- Full name: Emanuel Torres
Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release
That's nice, thanks for releasing the source.
So... are you going to credit me or what? Lol I literally see my code from this post in your repo (this line and onwards), after which you suddenly claimed a 30% performance boost overnight in spite of having worked on this for 2.5 years =)
So... are you going to credit me or what? Lol I literally see my code from this post in your repo (this line and onwards), after which you suddenly claimed a 30% performance boost overnight in spite of having worked on this for 2.5 years =)
[Moderation warning] This signature violated the rule against commercial exhortations.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release
I shall credit you for a faster bitloop. - Unrolled switch is 80% slower = 1/5th of the speed because branching on popcnt is not predictable. changing from while to for was a good idea.
It was 5% faster and as you can read in the source - there are still many while loops - because the for loop is slower when you actually need the source square as a single bit for the bitboard. 1ull<<sq is slower there.
Now go forth and find more optimisations i missed - it was a good tip!
It was 5% faster and as you can read in the source - there are still many while loops - because the for loop is slower when you actually need the source square as a single bit for the bitboard. 1ull<<sq is slower there.
Now go forth and find more optimisations i missed - it was a good tip!
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 469
- Joined: Fri Dec 16, 2016 11:04 am
- Location: France
- Full name: Richard Delorme
Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release
I know it is supposed to run at high speed only on recent AMD or Intel CPU, but the number I get on my (not so) old Ryzen 1700x are very far from yours:dangi12012 wrote: ↑Wed Oct 06, 2021 6:39 pm Perft Start 7: 3195901860 2191ms 1458.31 MNodes/s
Perft Kiwi 6: 8031647685 4004ms 2005.51 MNodes/s
Perft aggregate: 18999768562 10149ms 1871.94 MNodes/s
(Run under Fedora, after compiling it with clang 12)
Code: Select all
Perft Start 7: 3195901860 21051ms 151.813 MNodes/s
Perft Kiwi 6: 8031647685 39598ms 202.825 MNodes/s
Perft Midgame 6: 6923051137 35298ms 196.126 MNodes/s
Perft Endgame 6: 849167880 2911ms 291.63 MNodes/s
Perft aggregate: 18999768562 101618ms 186.971 MNodes/s
Code: Select all
perft 7 : 3195901860 leaves in 5.754 s 555431950 leaves/s
perft 6 : 8031647685 leaves in 12.853 s 624892362 leaves/s
perft 6 : 6923051137 leaves in 8.334 s 830732235 leaves/s
perft 6 : 849167880 leaves in 1.454 s 584123663 leaves/s
So an aggregate time of 28.395 s.
Richard Delorme
-
- Posts: 179
- Joined: Tue Jun 15, 2021 8:11 pm
- Full name: Emanuel Torres
Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release
That's expected. You basically can't use PEXT on Zen or Zen 2. It could clog up your whole system, as in your example.
It's already configurable.
Code: Select all
//This is the most important definition for performance!
namespace Lookup = Chess_Lookup::Lookup_Pext;
[Moderation warning] This signature violated the rule against commercial exhortations.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release
What you can do on Zen1/2 is have your compiler emit the PEXT emulation. That will be faster than the horrible horrbile Zen1/2 microcode:abulmo2 wrote: ↑Thu Oct 07, 2021 1:25 am The compilation time of your perft is also impressive. About 10 minutes on my computer. I guess you are using PEXT instruction, which are fast on ryzen 5000 but slow on previous version. You should maybe provide an alternative code without PEXT for older CPU.
Code: Select all
Movemap.hpp - struct SliderPext_t
_ForceInline constexpr uint64_t operator[](const uint64_t blocker) const
{
if (std::is_constant_evaluated()) {
return AttackPtr[_pext_u64_emulated(blocker, Mask)];
}
else {
return AttackPtr[_pext_u64(blocker, Mask)];
}
}
Should be much faster on Zen2.
As for compiletime you can define _ForceInline as whitespace. - much faster compile - much slower executable.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release
We did it guys! Top post on Reddit C++.
Nice
Nice
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 749
- Joined: Tue May 22, 2007 11:13 am
Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release
Your use of macros starting with an underscore and a capital letter (such as _Compiletime) is undefined behavior in C++. These are all reserved for the C++ Standard Library. Use them at your own risk.
-
- Posts: 1957
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
-
- Posts: 1276
- Joined: Wed Mar 08, 2006 8:28 pm
- Location: Florida, USA
Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release
This is awesome - congrats! I've always maintained that Perft is something worth optimizing. If your engine has a good perft speed you can probably add a decent evaluation too.
– Steve
– Steve
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine