Yes, with gcc there is a flag -fconstexpr-ops-limit which defaults to 2^25. I put it at 2^35.
Writing the fastest move generator. Up to 4BNodes/s
Moderator: Ras
-
- Posts: 751
- Joined: Tue May 22, 2007 11:13 am
Re: Writing the fastest move generator. Up to 4BNodes/s
-
- Posts: 23
- Joined: Fri Dec 29, 2023 4:47 pm
- Location: Belgium
- Full name: thomas albert
Re: Writing the fastest move generator. Up to 4BNodes/s
The point of this engine was "no tricks". With multi threading, chessbit can calculate perft 8 in 4s on my machine, so about 4 times faster than mperft. If you're interested, I have posted a branch on github with multi threading.abulmo2 wrote: ↑Fri Sep 19, 2025 2:20 pmI am basically a Linux user, not a Windows one. But I did try you executable on my son's computer:chessbit wrote: ↑Wed Sep 17, 2025 12:50 pm
Could you try the executable available on github? The code was optimized with the Intel compiler from the start, clang and other compilers yield (much) worse results. I think it would be fairer to compare different engines with the best possible compilation results. For instance, I don't know if your engine would be better optimized with the Intel compiler.
I got 1.55BNps +/- 0.01Bnps. On the same computer, compiling your code with clang (the vanilla version, not the one included with MSVC) , I got slightly faster result: 1.58bnps +/- 0.01Bnps.
Well the title of the thread is about a move generator, not perftThe thread is deviating into chess engine discussions, but this was not the intended goal of this project. I only wrote this code with a perft engine in mind. That being said, I think it would not be too difficult to change the existing code for the purpose of a chess engine. I might implement one just to prove it but I need a break.
Even for perft, I do not think it is the fastest one, as it is missing a hashtable and parallel computation. For example, my program mperft, on my son's computer, using an hashtable is faster:
that's 5.13 Bnps.Code: Select all
.\mperft-1.0.exe -d 8 -H 24 -b [...] perft 8 : 84998978956 leaves in 16.563 s 5131858900 leaves/s
The fastest perft can do parallel computation on GPUs: https://github.com/ankan-ban/perft_gpu. This one was fast enough to compute perft 15 of the starting position in a few days on distributed computers.
I'm interested in a hashtable though. I assume it's not the same as a transposition table with a zobrist key? Could you point me to a source that explains the concept?
Also I'm surprised clang made a better binary for you. I think the rootcause might be your pc's CPU, perhaps an Intel, for which the binary is not optimized. Indeed, the binary I posted is optimized for AVX2 but I think it also runs better for AMD CPUs. However, if you can provide the flags you used, I will try also on my machine to compare. Thanks.
-
- Posts: 914
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
- Full name: Aleks Peshkov
Re: Writing the fastest move generator. Up to 4BNodes/s
It is usual transposition table with zobrist hashing. I use "prefer largest perft number" replacement scheme, but any would work.
The more you improve perft the more you get for working chess engine, where you will not be interested in perft anymore.

-
- Posts: 23
- Joined: Fri Dec 29, 2023 4:47 pm
- Location: Belgium
- Full name: thomas albert
Re: Writing the fastest move generator. Up to 4BNodes/s
I have implemented the zobrist hash, but the TT is never useful in my engine because I visit every branch directly upon making a move. I do not add them to a list, and then iterate over them. Is there a way to make the TT useful otherwise? I guess I'll have to think about it.Aleks Peshkov wrote: ↑Fri Sep 19, 2025 8:06 pmIt is usual transposition table with zobrist hashing. I use "prefer largest perft number" replacement scheme, but any would work.
The more you improve perft the more you get for working chess engine, where you will not be interested in perft anymore.![]()
-
- Posts: 23
- Joined: Fri Dec 29, 2023 4:47 pm
- Location: Belgium
- Full name: thomas albert
Re: Writing the fastest move generator. Up to 4BNodes/s
I have done with the list. Although it makes the perft about 50% slower (maybe this can be optimized though), with the TT it's like two times faster. I'll implement it with multi threading then.chessbit wrote: ↑Fri Sep 19, 2025 8:38 pmI have implemented the zobrist hash, but the TT is never useful in my engine because I visit every branch directly upon making a move. I do not add them to a list, and then iterate over them. Is there a way to make the TT useful otherwise? I guess I'll have to think about it.Aleks Peshkov wrote: ↑Fri Sep 19, 2025 8:06 pmIt is usual transposition table with zobrist hashing. I use "prefer largest perft number" replacement scheme, but any would work.
The more you improve perft the more you get for working chess engine, where you will not be interested in perft anymore.![]()
-
- Posts: 914
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
- Full name: Aleks Peshkov
Re: Writing the fastest move generator. Up to 4BNodes/s
It should not be 50% slower. You should not store leaf nodes in TT and need not to calculate zobrist for them.
-
- Posts: 23
- Joined: Fri Dec 29, 2023 4:47 pm
- Location: Belgium
- Full name: thomas albert
Re: Writing the fastest move generator. Up to 4BNodes/s
No, I meant that iterating through a list instead of executing moves upon finding them in the recursive function took a big performance hit.Aleks Peshkov wrote: ↑Fri Sep 19, 2025 10:49 pmIt should not be 50% slower. You should not store leaf nodes in TT and need not to calculate zobrist for them.
That being said, I have implemented multi threading with a TT, and getting the following results:
Code: Select all
perft -t 8
TT Hits: 181722817
Depth: 8
Nodes: 84998978956
Time: 2184 ms
Average: 38915.6 Mn/s
perft -t 9
TT Hits: 3798708241
Depth: 9
Nodes: 2439530234167
Time: 44567 ms
Average: 54738.2 Mn/s
-
- Posts: 914
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
- Full name: Aleks Peshkov
Re: Writing the fastest move generator. Up to 4BNodes/s
Calculate nodes (positions visited) per second, not nonsensical perft numbers per second.
What list are you talk? If you get a TT hit nothing is generated. If you get a TT miss, you count as normal without any lists.
What is real NPS I meant (single-thread, 128M hash)
Code: Select all
info nodes 279896535 time 28528 nps 9811231
info depth 8 perft 84998978956
-
- Posts: 23
- Joined: Fri Dec 29, 2023 4:47 pm
- Location: Belgium
- Full name: thomas albert
Re: Writing the fastest move generator. Up to 4BNodes/s
I assume that when you generate the moves, you add them to a list. Then when all moves are generated for a depth, you iterate over them and check the TT if it already exists?Aleks Peshkov wrote: ↑Sat Sep 20, 2025 1:39 amCalculate nodes (positions visited) per second, not nonsensical perft numbers per second.
What list are you talk? If you get a TT hit nothing is generated. If you get a TT miss, you count as normal without any lists.
What is real NPS I meant (single-thread, 128M hash)Number of real visited nodes 300 times less then perft number.Code: Select all
info nodes 279896535 time 28528 nps 9811231 info depth 8 perft 84998978956
If not, I'm interested to know how it's implemented?
In the perft engine, moves are "made" as soon as they are found, so the TT can never be useful, because every position is a unique one. That's what I mean by iterating over a list, and the loss of performance.
-
- Posts: 28382
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Writing the fastest move generator. Up to 4BNodes/s
You check whether the position reached by a move has already been searched. But you don't do that only in the leaf nodes. (Where in perft it would indeed be pretty useless, as you just want to count those, not process them in any other way that you could save time on.) You do it in every node in the tree. And that is also very useful in perft. The position after, say, 1. e3 d5 2. d4 is the same as that after 1. d4 e5 2. e3, and when calculating perft(7) you don't want to do the perft(4) from that position twice.
Engines store moves in a list, because they typically want to search them in a different order than they were generated. Unlike perft, the size of an alpha-beta tree is highly dependent on the move ordering. A factor 2 in speed that you might gain by skipping the sorting can easily mean you have to sort a thousand times larger tree.
Engines store moves in a list, because they typically want to search them in a different order than they were generated. Unlike perft, the size of an alpha-beta tree is highly dependent on the move ordering. A factor 2 in speed that you might gain by skipping the sorting can easily mean you have to sort a thousand times larger tree.