A comparison of some Perft programs 

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
hgm
Posts: 28480
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: A comparison of some Perft programs 

Post by hgm »

The latest (mailbox) engine I built used incrementally updated mobility. I wonder if this system could also be used in perft. After all, the final ply of a perft basically just returns the mobility. With the catch that it must be the legal mobility, rather than the pseudo-legal one. So you would have to deal with discovery and blocking of pins as well as of slider moves.

The most expensive part of incremental (pseudo-legal) mobility is adjusting the mobility for the blocking of opponent slider moves by non-captures in the forelast ply. The discovery is less of a problem, as it is caused by evacuating the from-square. And the piece on that square usually has many moves, that can share the result of that calculation. The to-squares typically are scattered much more, so they have to be treated individually. Pre-calculation of, say, how many opponent slider moves go over each square is not competitive if the result of the caculation is not used multiple times, especially if for many squares it would not be used at all.

As an individual test for slider blocking one can test for alignment with each of the opponent sliders, which is a comparatively cheap table access based on the difference of square numbers, and which rules out most sliders. That an 'expensive' ray scan is then necessary for sliders that pass the test does not hurt very much. (And even then such a ray scan is of course extremely cheap compared to full move generation of all opponent's pieces.) I still don't like it that the alignment test has to be done for all opponent sliders individually, though.

Yesterday the following idea occurred to me. You could make a board-size table that holds a 'signature' for every square. This signature is then a set of ranks, files, diagonals and anti-diagonals (8 + 8 + 13 + 13 = 42 in total), where the bits are set for those rays the square is on. Alignment of two squares can then be tested by ANDing their signatures. If a set bit remains, there would be alignment, and the bit would tell you on which ray.

A pre-calculation could then OR the signature for all enemy sliders, after ANDing those with the directions they move in. (E.g. for a Rook we would clear the diagonal bits.) We could then AND the signature at the to-square of each move with this 'slider summary' as a primary filter. Moves for which this result is zero will immediately be eliminated as blocking any slider moves. If any bits are set these identify the ray on which blocking might occur, and as a consequence to some extent the slider thay could be blocked (orthogonal or diagonal). So you would typically only have to test 2 or 3 individually. It would also be possible to prepare a small table that for each ray that contains sliders specifies the sliders (moving along that ray) they contain. So you can simply look up which sliders the to-square is aligned with.
Sopel
Posts: 398
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: A comparison of some Perft programs 

Post by Sopel »

phhnguyen wrote: Mon Jan 12, 2026 11:30 am
Sopel wrote: Sun Jan 11, 2026 9:42 pm FWIW on 7800x3d on windows I get ~300Mnps for SF 17.1 and 1650Mnps for gigantua on perft 7
Your information is ambiguous to me. SF doesn't print out both elapsed and speed when computing Perft. How can you get the speed of 300 Mbps? If it is the speed of normal search, we cannot use it to compare with Gigantua since (AFAK) it uses leave nodes from bulk counting, but not nodes by making/undoing to calculate Perft speed.

The best if you could do, please measure the time for calculating Perft 8 for both programs. SF doesn't print out elapsed, but you can easily add a few commands to SF code (e.g., such as: TimePoint startPoint = now();... elapsed = now() - startPoint;), recompile and run. You may find a 3rd party program to run it for timing, too. For Gigantua, you can download its 64-bit exe file from its GitHub and run it as below. It will calculate elapsed time itself:

Code: Select all

Gigantua.exe "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1" 8
FYI: I have just re-run some Perft programs from my PC after resetting. There is no chance for Gigantua on my old PC: it is still lagging behind SF. Hope it performs better on your PC.
time can be measured by outside forces, and I can do division
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
abulmo2
Posts: 491
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: A comparison of some Perft programs 

Post by abulmo2 »

phhnguyen wrote: Mon Jan 12, 2026 11:03 am
petero2 wrote: Sun Jan 11, 2026 8:51 pm
phhnguyen wrote: Sat Jan 10, 2026 11:25 pm 7) MPerft by Richard Delorme
https://github.com/abulmo/MPerft

The project was updated 6 years ago. I downloaded and compiled it via the make command.

It is the fastest and significantly faster than other programs in this comparison, 17 times faster than Stockfish. On the screen, it prints clearly that it uses hashing and multithreading.
Are you sure it is using multithreading? On my computer it says it is using hashing and bulk counting, but it doesn't say anything about multiple threads, and only uses one thread. Also looking at the source code I see nothing about threads in it.
You are right! It is my mistake! MPerft uses bulk counting and hashing only!
I fixed this in version 3.0. Now mperft uses also multithreading. On my old ryzen 9 5950:

Code: Select all

$ mperft-3.0-x86-64-v3 -d 8 -b -h 256 -t 32 -q
perft  8 :     84998978956 leaves in      0.553 s 153729146951 leaves/s
Richard Delorme
abulmo2
Posts: 491
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: A comparison of some Perft programs 

Post by abulmo2 »

phhnguyen wrote: Mon Jan 12, 2026 10:59 am
hgm wrote: Sun Jan 11, 2026 11:46 am The key to doing fast perfts seems to be hashing.
Agreed. Hashing is one of the largest gain methods for Perft.

However, from my experience as well as reading somewhere, hashing may give a factor of about 2 times speed. My computer has 8 cores, and multithreading could bring me a factor of 8 times faster :D
With the latest MPerft that implement multithreading, I got this on an AMD Ryzen 9 5950x.
  • bulk counting: x10
  • multithreading with 16 threads: x14
  • transposition table of 256Mbytes (16M entries): x17
Richard Delorme