I think the secret lies in many optimizations that make the code look simple yet effective. I don't know how else to explain a boost of 25-30% on Gigantua, which was already a much faster perft engine than the rest as far as I know.Aleks Peshkov wrote: ↑Wed Sep 17, 2025 3:00 amDid not found any secret that any other bitboard legal generator on the planet does not do. BTW Stockfish goes fine with pseudolegal generator. But the code looks nice and pleasant to see. Well done!chessbit wrote: ↑Tue Sep 16, 2025 10:05 pm For those interested, I have released the code on my dusty github: https://github.com/thuijbregts/chessbit
There are some notes about the implementation, and also a binary if you want to try it yourself. I would be interested to know how it performs on your machine!
Writing the fastest move generator. Up to 4BNodes/s
Moderator: Ras
- 
				chessbit
- Posts: 36
- Joined: Fri Dec 29, 2023 4:47 pm
- Location: Belgium
- Full name: thomas albert
Re: Writing the fastest move generator. Up to 4BNodes/s
- 
				chessbit
- Posts: 36
- Joined: Fri Dec 29, 2023 4:47 pm
- Location: Belgium
- Full name: thomas albert
Re: Writing the fastest move generator. Up to 4BNodes/s
Glad that it works with clang++. I would really recommend using the binary available on github, which was compiled with the Intel compiler. The code is much faster than with clang, normally.
Not much faster than what, btw?
- 
				Steve Maughan  
- Posts: 1298
- Joined: Wed Mar 08, 2006 8:28 pm
- Location: Florida, USA
Re: Writing the fastest move generator. Up to 4BNodes/s
Fascinating. You may we'll be right. My latest engine (Juggernaut) is written in Delphi (Pascal), which isn't the most performant language. I've tried to squeeze every last drop of speed out of it but it does perft 7 from the starting position in ~20 seconds (~160 Mnps), with bulk counting, on a fast laptop. In a normal middlegame position it searches ~7 mnps. The ratio is approximately 20:1, hence the 100 Mnps. Juggernaut also updates the evaluation and position hash, even in the perft — maybe this is more of a factor than I realized.
Thanks,
Steve
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine
			
						- 
				Aleks Peshkov
- Posts: 951
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
- Full name: Aleks Peshkov
Re: Writing the fastest move generator. Up to 4BNodes/s
It is not. If a perft engine reports real nodes per second (not perft number per second), difference in NPS between perft(minimax) and alpha-beta should be less then 2 times.
- 
				syzygy
- Posts: 5780
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Writing the fastest move generator. Up to 4BNodes/s
Not much faster than what I got with gcc, which seemed to refuse to inline everything that was supposed to be inlined.
- 
				Dmi3
- Posts: 5
- Joined: Thu Sep 11, 2025 11:15 am
- Location: Russia
- Full name: Dmitry Molodov
Re: Writing the fastest move generator. Up to 4BNodes/s
Why? It all depends on the cost of QS, the presence or absence of Null-Move pruning (or Internal Iterative Deepening), and the cost of evaluation (incremental PESTO vs NEURO).Aleks Peshkov wrote: ↑Wed Sep 17, 2025 5:06 pmIf a perft engine reports real nodes per second (not perft number per second), difference in NPS between perft(minimax) and alpha-beta should be less then 2 times.
- 
				Aleks Peshkov
- Posts: 951
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
- Full name: Aleks Peshkov
Re: Writing the fastest move generator. Up to 4BNodes/s
Bulk node counting is close amount of work to what chess program with legal move generator and simple PST eval do at leafs. In my perft I do TT lookup and update all incremental stuff (evaluation, repetition stack, attack tables in my case).Dmi3 wrote: ↑Wed Sep 17, 2025 5:42 pmWhy? It all depends on the cost of QS, the presence or absence of Null-Move pruning (or Internal Iterative Deepening), and the cost of evaluation (incremental PESTO vs NEURO).Aleks Peshkov wrote: ↑Wed Sep 17, 2025 5:06 pmIf a perft engine reports real nodes per second (not perft number per second), difference in NPS between perft(minimax) and alpha-beta should be less then 2 times.
- 
				Rein Halbersma
- Posts: 751
- Joined: Tue May 22, 2007 11:13 am
Re: Writing the fastest move generator. Up to 4BNodes/s
3 years ago I realized that with C++20, you can simply put constexpr in front of everything and compute perft inside a static_assert. Did this for my draughts engine to depth 7 and 1M nodes. https://damforum.nl/viewtopic.php?t=8530abulmo2 wrote: ↑Wed Sep 17, 2025 11:55 am The trick that makes chessbit faster is CTFE (compile time function evaluation) abusing metaprogramming with constexpr & templates. The recursive call to perft and the move generation are intricate and the compiler eliminates all the recursive calls up to depth 18 (the upper limit in chessbit) by slowly building a huge executable. On my computer chessbit executable size is 12.8 Mb versus 83kb for mperft.
- 
				benvining
- Posts: 44
- Joined: Fri May 30, 2025 10:18 pm
- Location: Chicago
- Full name: Ben Vining
Re: Writing the fastest move generator. Up to 4BNodes/s
Awesome! I was only able to get up to depth 4 within constexpr before my compiler complained that I'd hit the constexpr step limit. I haven't fussed with any flags for this, did you need to do that?
			
			
									
						
							BenBot on GitHub: https://github.com/benthevining/BenBot
			
						- 
				abulmo2
- Posts: 479
- Joined: Fri Dec 16, 2016 11:04 am
- Location: France
- Full name: Richard Delorme
Re: Writing the fastest move generator. Up to 4BNodes/s
I am basically a Linux user, not a Windows one. But I did try you executable on my son's computer:chessbit wrote: ↑Wed Sep 17, 2025 12:50 pm
Could you try the executable available on github? The code was optimized with the Intel compiler from the start, clang and other compilers yield (much) worse results. I think it would be fairer to compare different engines with the best possible compilation results. For instance, I don't know if your engine would be better optimized with the Intel compiler.
I got 1.55BNps +/- 0.01Bnps. On the same computer, compiling your code with clang (the vanilla version, not the one included with MSVC) , I got slightly faster result: 1.58bnps +/- 0.01Bnps.
Well the title of the thread is about a move generator, not perftThe thread is deviating into chess engine discussions, but this was not the intended goal of this project. I only wrote this code with a perft engine in mind. That being said, I think it would not be too difficult to change the existing code for the purpose of a chess engine. I might implement one just to prove it but I need a break.

Even for perft, I do not think it is the fastest one, as it is missing a hashtable and parallel computation. For example, my program mperft, on my son's computer, using an hashtable is faster:
Code: Select all
.\mperft-1.0.exe -d 8 -H 24 -b
[...]
perft  8 :     84998978956 leaves in     16.563 s   5131858900 leaves/s
The fastest perft can do parallel computation on GPUs: https://github.com/ankan-ban/perft_gpu. This one was fast enough to compute perft 15 of the starting position in a few days on distributed computers.
Richard Delorme