Writing the fastest move generator. Up to 4BNodes/s

Discussion of chess software programming and technical issues.

Moderator: Ras

Rein Halbersma
Posts: 751
Joined: Tue May 22, 2007 11:13 am

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by Rein Halbersma »

benvining wrote: Fri Sep 19, 2025 6:11 am Awesome! I was only able to get up to depth 4 within constexpr before my compiler complained that I'd hit the constexpr step limit. I haven't fussed with any flags for this, did you need to do that?
Yes, with gcc there is a flag -fconstexpr-ops-limit which defaults to 2^25. I put it at 2^35.
chessbit
Posts: 23
Joined: Fri Dec 29, 2023 4:47 pm
Location: Belgium
Full name: thomas albert

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by chessbit »

abulmo2 wrote: Fri Sep 19, 2025 2:20 pm
chessbit wrote: Wed Sep 17, 2025 12:50 pm
Could you try the executable available on github? The code was optimized with the Intel compiler from the start, clang and other compilers yield (much) worse results. I think it would be fairer to compare different engines with the best possible compilation results. For instance, I don't know if your engine would be better optimized with the Intel compiler.
I am basically a Linux user, not a Windows one. But I did try you executable on my son's computer:
I got 1.55BNps +/- 0.01Bnps. On the same computer, compiling your code with clang (the vanilla version, not the one included with MSVC) , I got slightly faster result: 1.58bnps +/- 0.01Bnps.
The thread is deviating into chess engine discussions, but this was not the intended goal of this project. I only wrote this code with a perft engine in mind. That being said, I think it would not be too difficult to change the existing code for the purpose of a chess engine. I might implement one just to prove it but I need a break :).
Well the title of the thread is about a move generator, not perft :-)
Even for perft, I do not think it is the fastest one, as it is missing a hashtable and parallel computation. For example, my program mperft, on my son's computer, using an hashtable is faster:

Code: Select all

.\mperft-1.0.exe -d 8 -H 24 -b
[...]
perft  8 :     84998978956 leaves in     16.563 s   5131858900 leaves/s
that's 5.13 Bnps.

The fastest perft can do parallel computation on GPUs: https://github.com/ankan-ban/perft_gpu. This one was fast enough to compute perft 15 of the starting position in a few days on distributed computers.
The point of this engine was "no tricks". With multi threading, chessbit can calculate perft 8 in 4s on my machine, so about 4 times faster than mperft. If you're interested, I have posted a branch on github with multi threading.
I'm interested in a hashtable though. I assume it's not the same as a transposition table with a zobrist key? Could you point me to a source that explains the concept?

Also I'm surprised clang made a better binary for you. I think the rootcause might be your pc's CPU, perhaps an Intel, for which the binary is not optimized. Indeed, the binary I posted is optimized for AVX2 but I think it also runs better for AMD CPUs. However, if you can provide the flags you used, I will try also on my machine to compare. Thanks.
Aleks Peshkov
Posts: 914
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia
Full name: Aleks Peshkov

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by Aleks Peshkov »

chessbit wrote: Fri Sep 19, 2025 6:26 pm I'm interested in a hashtable though. I assume it's not the same as a transposition table with a zobrist key? Could you point me to a source that explains the concept?
It is usual transposition table with zobrist hashing. I use "prefer largest perft number" replacement scheme, but any would work.

The more you improve perft the more you get for working chess engine, where you will not be interested in perft anymore. :)
chessbit
Posts: 23
Joined: Fri Dec 29, 2023 4:47 pm
Location: Belgium
Full name: thomas albert

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by chessbit »

Aleks Peshkov wrote: Fri Sep 19, 2025 8:06 pm
chessbit wrote: Fri Sep 19, 2025 6:26 pm I'm interested in a hashtable though. I assume it's not the same as a transposition table with a zobrist key? Could you point me to a source that explains the concept?
It is usual transposition table with zobrist hashing. I use "prefer largest perft number" replacement scheme, but any would work.

The more you improve perft the more you get for working chess engine, where you will not be interested in perft anymore. :)
I have implemented the zobrist hash, but the TT is never useful in my engine because I visit every branch directly upon making a move. I do not add them to a list, and then iterate over them. Is there a way to make the TT useful otherwise? I guess I'll have to think about it.
chessbit
Posts: 23
Joined: Fri Dec 29, 2023 4:47 pm
Location: Belgium
Full name: thomas albert

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by chessbit »

chessbit wrote: Fri Sep 19, 2025 8:38 pm
Aleks Peshkov wrote: Fri Sep 19, 2025 8:06 pm
chessbit wrote: Fri Sep 19, 2025 6:26 pm I'm interested in a hashtable though. I assume it's not the same as a transposition table with a zobrist key? Could you point me to a source that explains the concept?
It is usual transposition table with zobrist hashing. I use "prefer largest perft number" replacement scheme, but any would work.

The more you improve perft the more you get for working chess engine, where you will not be interested in perft anymore. :)
I have implemented the zobrist hash, but the TT is never useful in my engine because I visit every branch directly upon making a move. I do not add them to a list, and then iterate over them. Is there a way to make the TT useful otherwise? I guess I'll have to think about it.
I have done with the list. Although it makes the perft about 50% slower (maybe this can be optimized though), with the TT it's like two times faster. I'll implement it with multi threading then.
Aleks Peshkov
Posts: 914
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia
Full name: Aleks Peshkov

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by Aleks Peshkov »

chessbit wrote: Fri Sep 19, 2025 10:08 pm I have done with the list. Although it makes the perft about 50% slower (maybe this can be optimized though), with the TT it's like two times faster. I'll implement it with multi threading then.
It should not be 50% slower. You should not store leaf nodes in TT and need not to calculate zobrist for them.
chessbit
Posts: 23
Joined: Fri Dec 29, 2023 4:47 pm
Location: Belgium
Full name: thomas albert

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by chessbit »

Aleks Peshkov wrote: Fri Sep 19, 2025 10:49 pm
chessbit wrote: Fri Sep 19, 2025 10:08 pm I have done with the list. Although it makes the perft about 50% slower (maybe this can be optimized though), with the TT it's like two times faster. I'll implement it with multi threading then.
It should not be 50% slower. You should not store leaf nodes in TT and need not to calculate zobrist for them.
No, I meant that iterating through a list instead of executing moves upon finding them in the recursive function took a big performance hit.

That being said, I have implemented multi threading with a TT, and getting the following results:

Code: Select all

perft -t 8
TT Hits:                181722817
Depth:          8
Nodes:          84998978956
Time:           2184 ms
Average:        38915.6 Mn/s
perft -t 9
TT Hits:                3798708241
Depth:          9
Nodes:          2439530234167
Time:           44567 ms
Average:        54738.2 Mn/s
I think there is room for optimization but I wonder how these numbers compare with others? The way it is implemented is the same as a chess engine, so you could plug this directly in a search.
Aleks Peshkov
Posts: 914
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia
Full name: Aleks Peshkov

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by Aleks Peshkov »

chessbit wrote: Sat Sep 20, 2025 12:05 am No, I meant that iterating through a list instead of executing moves upon finding them in the recursive function took a big performance hit.
Calculate nodes (positions visited) per second, not nonsensical perft numbers per second.

What list are you talk? If you get a TT hit nothing is generated. If you get a TT miss, you count as normal without any lists.

What is real NPS I meant (single-thread, 128M hash)

Code: Select all

info nodes 279896535 time 28528 nps 9811231
info depth 8 perft 84998978956
Number of real visited nodes 300 times less then perft number.
chessbit
Posts: 23
Joined: Fri Dec 29, 2023 4:47 pm
Location: Belgium
Full name: thomas albert

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by chessbit »

Aleks Peshkov wrote: Sat Sep 20, 2025 1:39 am
chessbit wrote: Sat Sep 20, 2025 12:05 am No, I meant that iterating through a list instead of executing moves upon finding them in the recursive function took a big performance hit.
Calculate nodes (positions visited) per second, not nonsensical perft numbers per second.

What list are you talk? If you get a TT hit nothing is generated. If you get a TT miss, you count as normal without any lists.

What is real NPS I meant (single-thread, 128M hash)

Code: Select all

info nodes 279896535 time 28528 nps 9811231
info depth 8 perft 84998978956
Number of real visited nodes 300 times less then perft number.
I assume that when you generate the moves, you add them to a list. Then when all moves are generated for a depth, you iterate over them and check the TT if it already exists?
If not, I'm interested to know how it's implemented?
In the perft engine, moves are "made" as soon as they are found, so the TT can never be useful, because every position is a unique one. That's what I mean by iterating over a list, and the loss of performance.
User avatar
hgm
Posts: 28382
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by hgm »

You check whether the position reached by a move has already been searched. But you don't do that only in the leaf nodes. (Where in perft it would indeed be pretty useless, as you just want to count those, not process them in any other way that you could save time on.) You do it in every node in the tree. And that is also very useful in perft. The position after, say, 1. e3 d5 2. d4 is the same as that after 1. d4 e5 2. e3, and when calculating perft(7) you don't want to do the perft(4) from that position twice.

Engines store moves in a list, because they typically want to search them in a different order than they were generated. Unlike perft, the size of an alpha-beta tree is highly dependent on the move ordering. A factor 2 in speed that you might gain by skipping the sorting can easily mean you have to sort a thousand times larger tree.