100M nodes (not perft numbers) per second is not possible on consumer one thread CPU. 10M x2 sure doable. x3? I doubt.Steve Maughan wrote: ↑Tue Sep 16, 2025 6:12 pmWhich number are you skeptical of, the 2300 or the 2600? You could easily split the move generation into captures then quiets. Add a basic piece square table on top of an alpha beta search and I don't think the nps world drop to less than 100 Mnps (I could be wrong). That should be 2300 ELO. Juggernaut 1.01 is basically this with a TT. It does 10 Mnps and has an ELO of 2150.syzygy wrote: ↑Tue Sep 16, 2025 5:11 pmNot if you are stuck with a fixed move order because you don't generate lists of moves.Steve Maughan wrote: ↑Mon Sep 15, 2025 5:16 amA chess engine with a basic search at this speed (I'm assuming it would slow down to <100 mnps) with Pesto piece-square evaluation would be easily north of 2300 ELO, and more likely closer to 2600 ELO.
Steve
Writing the fastest move generator. Up to 4BNodes/s
Moderator: Ras
-
- Posts: 917
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
- Full name: Aleks Peshkov
Re: Writing the fastest move generator. Up to 4BNodes/s
-
- Posts: 1295
- Joined: Wed Mar 08, 2006 8:28 pm
- Location: Florida, USA
Re: Writing the fastest move generator. Up to 4BNodes/s
It would be easy to split a move generator into captures and quiet moves. Strength is generally correlated with speed, so I respectfully disagree that there's no relationship. Maybe one day I'll have the time to dig deeper.syzygy wrote: ↑Tue Sep 16, 2025 10:18 pmI am skeptical of the suitability of the move generator for use in a chess engine.Steve Maughan wrote: ↑Tue Sep 16, 2025 6:12 pmWhich number are you skeptical of, the 2300 or the 2600? You could easily split the move generation into captures then quiets. Add a basic piece square table on top of an alpha beta search and I don't think the nps world drop to less than 100 Mnps (I could be wrong). That should be 2300 ELO. Juggernaut 1.01 is basically this with a TT. It does 10 Mnps and has an ELO of 2150.syzygy wrote: ↑Tue Sep 16, 2025 5:11 pmNot if you are stuck with a fixed move order because you don't generate lists of moves.Steve Maughan wrote: ↑Mon Sep 15, 2025 5:16 amA chess engine with a basic search at this speed (I'm assuming it would slow down to <100 mnps) with Pesto piece-square evaluation would be easily north of 2300 ELO, and more likely closer to 2600 ELO.
A program optimised for perft will not generate captures first, for example. And once you have a decent search, most moves should not ever be looked at. There is no relationship between 4BNodes/sec and Elo.
— Steve
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine
-
- Posts: 5714
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Writing the fastest move generator. Up to 4BNodes/s
I needed to addchessbit wrote: ↑Tue Sep 16, 2025 10:05 pm For those interested, I have released the code on my dusty github: https://github.com/thuijbregts/chessbit
There are some notes about the implementation, and also a binary if you want to try it yourself. I would be interested to know how it performs on your machine!
Code: Select all
#define __popcnt64(x) _mm_popcnt_u64(X)
Code: Select all
#define __forceinline __attribute__((always_inline))
With a typo in "always_inline" (which made gcc simply ignore the attribute), I got:
Code: Select all
perft 6
Depth: 6
Nodes: 119060324
Time: 78 ms
Average: 1512.49 Mn/s
perft 7
Depth: 7
Nodes: 3195901860
Time: 1898 ms
Average: 1683.04 Mn/s
perft 8
Depth: 8
Nodes: 84998978956
Time: 50675 ms
Average: 1677.33 Mn/s
I will try with clang.
-
- Posts: 5714
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Writing the fastest move generator. Up to 4BNodes/s
With clang++ compilation works, but it is not that much faster:
Code: Select all
perft 6
Depth: 6
Nodes: 119060324
Time: 81 ms
Average: 1465.79 Mn/s
perft 7
Depth: 7
Nodes: 3195901860
Time: 1511 ms
Average: 2114.58 Mn/s
perft 8
Depth: 8
Nodes: 84998978956
Time: 49892 ms
Average: 1703.65 Mn/s
-
- Posts: 5714
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Writing the fastest move generator. Up to 4BNodes/s
Why do you think that would be easy?Steve Maughan wrote: ↑Tue Sep 16, 2025 10:29 pmIt would be easy to split a move generator into captures and quiet moves. Strength is generally correlated with speed, so I respectfully disagree that there's no relationship. Maybe one day I'll have the time to dig deeper.
-
- Posts: 5714
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Writing the fastest move generator. Up to 4BNodes/s
To maximize nps, I recommend pure minimaxAleks Peshkov wrote: ↑Tue Sep 16, 2025 10:19 pm 100M nodes (not perft numbers) per second is not possible on consumer one thread CPU. 10M x2 sure doable. x3? I doubt.

-
- Posts: 917
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
- Full name: Aleks Peshkov
Re: Writing the fastest move generator. Up to 4BNodes/s
Did not found any secret that any other bitboard legal generator on the planet does not do. BTW Stockfish goes fine with pseudolegal generator. But the code looks nice and pleasant to see. Well done!chessbit wrote: ↑Tue Sep 16, 2025 10:05 pm For those interested, I have released the code on my dusty github: https://github.com/thuijbregts/chessbit
There are some notes about the implementation, and also a binary if you want to try it yourself. I would be interested to know how it performs on your machine!
-
- Posts: 917
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
- Full name: Aleks Peshkov
-
- Posts: 475
- Joined: Fri Dec 16, 2016 11:04 am
- Location: France
- Full name: Richard Delorme
Re: Writing the fastest move generator. Up to 4BNodes/s
Looking at the code, I do not think the move generator is that much faster than the one usually used in modern engines.Steve Maughan wrote: ↑Mon Sep 15, 2025 5:16 amA chess engine with a basic search at this speed (I'm assuming it would slow down to <100 mnps) with Pesto piece-square evaluation would be easily north of 2300 ELO, and more likely closer to 2600 ELO.
— Steve
Five years ago, I made a simple perft based on magic move generation: https://github.com/abulmo/MPerft, using the pext instruction if available. On my computer, using bulk counting (as chessbit does), I have got about 800 Mnps from the starting position :
Code: Select all
perft 7 : 3195901860 leaves in 3.979 s 803109301 leaves/s
Code: Select all
perft 7 : 3195901860 leaves in 47.741 s 66942888 leaves/s
Code: Select all
perft 7
Depth: 7
Nodes: 3195901860
Time: 1975 ms
Average: 1617.49 Mn/s
The trick that makes chessbit faster is CTFE (compile time function evaluation) abusing metaprogramming with constexpr & templates. The recursive call to perft and the move generation are intricate and the compiler eliminates all the recursive calls up to depth 18 (the upper limit in chessbit) by slowly building a huge executable. On my computer chessbit executable size is 12.8 Mb versus 83kb for mperft.
Mperft is basically what I am using in the chess engine Dumb https://github.com/abulmo/Dumb, changing the language from C to D, and optimising it for a chess engine by using a pseudolegal move generator, no bulk counting, etc. The speed of perft drops to 49 Mnps. Yes, the optimisations that make a chess engine faster slow down perft.
Dumb is (and will stay) a simple engine with a pesto-like evaluation function, and some simple search pruning and move sorting. Its speed is only 7 Mnps but its strength on CCRL is around 2850 Elo. By removing SEE (static exchange evaluation), the speed should increase over 10 Mnps, but the Elo should go down. And, as suggested by Ronald, using only minimax should increase the speed up to 20 Mnps.
So I do not think chessbit is of interest in building a chess engine. And a speed of 100 Mnps on a single core is surely an illusion on current hardware, even for a simple engine.
Richard Delorme
-
- Posts: 27
- Joined: Fri Dec 29, 2023 4:47 pm
- Location: Belgium
- Full name: thomas albert
Re: Writing the fastest move generator. Up to 4BNodes/s
Could you try the executable available on github? The code was optimized with the Intel compiler from the start, clang and other compilers yield (much) worse results. I think it would be fairer to compare different engines with the best possible compilation results. For instance, I don't know if your engine would be better optimized with the Intel compiler.abulmo2 wrote: ↑Wed Sep 17, 2025 11:55 amLooking at the code, I do not think the move generator is that much faster than the one usually used in modern engines.Steve Maughan wrote: ↑Mon Sep 15, 2025 5:16 amA chess engine with a basic search at this speed (I'm assuming it would slow down to <100 mnps) with Pesto piece-square evaluation would be easily north of 2300 ELO, and more likely closer to 2600 ELO.
— Steve
Five years ago, I made a simple perft based on magic move generation: https://github.com/abulmo/MPerft, using the pext instruction if available. On my computer, using bulk counting (as chessbit does), I have got about 800 Mnps from the starting position :Without bulk counting, which is closer to what we can get with a chess engine, I have got only about 70 Mnps:Code: Select all
perft 7 : 3195901860 leaves in 3.979 s 803109301 leaves/s
On the same computer & OS, chessbit, compiled with clang++ with the modifications proposed by Ronald de Man (Syzygy), reaches about 1600 Mnps ;Code: Select all
perft 7 : 3195901860 leaves in 47.741 s 66942888 leaves/s
So, on my computer chessbit is only 2x faster than mperft.Code: Select all
perft 7 Depth: 7 Nodes: 3195901860 Time: 1975 ms Average: 1617.49 Mn/s
The trick that makes chessbit faster is CTFE (compile time function evaluation) abusing metaprogramming with constexpr & templates. The recursive call to perft and the move generation are intricate and the compiler eliminates all the recursive calls up to depth 18 (the upper limit in chessbit) by slowly building a huge executable. On my computer chessbit executable size is 12.8 Mb versus 83kb for mperft.
Mperft is basically what I am using in the chess engine Dumb https://github.com/abulmo/Dumb, changing the language from C to D, and optimising it for a chess engine by using a pseudolegal move generator, no bulk counting, etc. The speed of perft drops to 49 Mnps. Yes, the optimisations that make a chess engine faster slow down perft.
Dumb is (and will stay) a simple engine with a pesto-like evaluation function, and some simple search pruning and move sorting. Its speed is only 7 Mnps but its strength on CCRL is around 2850 Elo. By removing SEE (static exchange evaluation), the speed should increase over 10 Mnps, but the Elo should go down. And, as suggested by Ronald, using only minimax should increase the speed up to 20 Mnps.
So I do not think chessbit is of interest in building a chess engine. And a speed of 100 Mnps on a single core is surely an illusion on current hardware, even for a simple engine.
The thread is deviating into chess engine discussions, but this was not the intended goal of this project. I only wrote this code with a perft engine in mind. That being said, I think it would not be too difficult to change the existing code for the purpose of a chess engine. I might implement one just to prove it but I need a break
