A comparison of some Perft programs 

Discussion of chess software programming and technical issues.

Moderator: Ras

abulmo2
Posts: 495
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: A comparison of some Perft programs.

Post by abulmo2 »

Ajedrecista wrote: Tue Jun 23, 2026 7:56 pm Hello Richard:

[You wrote a faster perft counter than gperft 1.1 on my system! Under the same or very similar conditions, Perft(9) was computed in circa 99 seconds with gperft 1.1 (hash = 1 GB and 4 threads) and circa 90 seconds (-10%) with MPerft 5.2 v2 (AVX compatible) (using --fast (hash = 1 GB and 4 threads) and --nullmove 9)! The difference was that gperft showed a divided perft and Mperft not (I realised later that I ran MPerft without --div argument), which might close the gap a little.

It is true that gperft 1.2d exists and is faster than 1.1, but I disgracefully missed that download and is not longer available. OTOH, I tested MPerft 5.2 v2 (AVX compatible), so there must be additional gains with v3 (AVX2 compatible) and v3-128 (AVX2 compatible and 128-bit counter for crunching big numbers).

Other appeal to me is the already compiled source, which is a bottleneck for a dummy like me that do not know how to compile. By the way: 5.3 is mentioned in the Readme of your GitHub, but it is not in the Release section... is it unfinished?

Big, big thank you to Richard!
Thank you for your kind words. I just release version 5.3; although it is not finished... :D . In version 5.3, I mostly optimized the hashtable.
The performance of the different versions on my system (Ryzen 9 5950x at 4.2 Ghz) are (using the arguments -n 9 -h 1024 -t 16 -q):
- mperft-5.3-x86-64: 7.382s
- mperft-5.3-x86-64-v2: 6.357s (-13.9%)
- mperft-5.3-x86-64-v3: 6.003s ( -5.6%)
The x86-64-v2 version brings popcount and the x86-64-v3 version pext (+some compiler optimisations). So you do not lose much speed. The 128-bit version is slower, mainly because less transposition table entries are available using the same amount of memory.

gperft was incredibly fast. It is unfortunate the program is no more available.
Richard Delorme
User avatar
Ajedrecista
Posts: 2253
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: A comparison of some Perft programs.

Post by Ajedrecista »

Hello Richard:
abulmo2 wrote: Fri Jun 26, 2026 8:14 pmThank you for your kind words. I just release version 5.3; although it is not finished... :D . In version 5.3, I mostly optimized the hashtable.
The performance of the different versions on my system (Ryzen 9 5950x at 4.2 Ghz) are (using the arguments -n 9 -h 1024 -t 16 -q):
- mperft-5.3-x86-64: 7.382s
- mperft-5.3-x86-64-v2: 6.357s (-13.9%)
- mperft-5.3-x86-64-v3: 6.003s ( -5.6%)
The x86-64-v2 version brings popcount and the x86-64-v3 version pext (+some compiler optimisations). So you do not lose much speed. The 128-bit version is slower, mainly because less transposition table entries are available using the same amount of memory.

[...]
Thank you for your explanations: so, v3 does not bring a large gain against v2, but everything counts. It is good to know that the 128-bit version is slower, which was unexpecterd to me. Summarizing: go for v3 when possible for usual runs and reserve the 128-bit version for something really large, to overcome overflows.

There is a note on the 5.2 release that says 'I hope the 128 bit version to now count right above 64 bit numbers'. That does not bring confidence! Let us do a thing: in the past, I overcame overflows with a clever method combining different tools:

Re: KBNk ---> perft(20) result.

I got a result that was later confirmed by Paul (gperft's author). Since I can not run the 128-bit version right now, you or anyone else could give a try, just to confirm:

Code: Select all

k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1

perft(20) = 75,072,759,878,600,741,186
I tried v2 (without 128-bit counters) for perft(19) of the same position (total has one overflow, but divide counts not) and got:

Code: Select all

.\mperft-5.2-windows>mperft-5.2-x86-64-v2.exe --nullmove 19 --div --fast --fen "k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1"
Magic Perft version 5.2 (c) Richard Delorme 2020 - 2026
Bitboard move generation based on magic bitboards.
Perft setting: hashtable size: 1024 Mbytes (67108868 entries); with 4 threads; with nullmove counting.
  a b c d e f g h
8 k . . . . . . . 8
7 . . . . . . . . 7
6 . . . . . . . . 6
5 . . . . . . . . 5
4 . . . . . . . . 4
3 . . . . . . . . 3
2 . . . . . . . . 2
1 . . B . K . N . 1
  a b c d e f g h
w,
depth: 19
 c1a3    659,457,009,336,590,439 positions in           5.668 116.343 Ppos/s
 c1b2  1,264,611,181,709,709,388 positions in           1.069   1.182 Epos/s
 c1d2    740,040,237,489,377,013 positions in           0.957 773.245 Ppos/s
 c1e3    515,757,908,580,425,750 positions in           0.629 819.638 Ppos/s
 c1f4    539,487,494,231,792,182 positions in           0.638 844.362 Ppos/s
 c1g5    891,663,162,564,721,957 positions in           0.407   2.190 Epos/s
 c1h6    780,567,379,191,018,161 positions in           0.171   4.550 Epos/s
 e1d1    692,541,386,956,704,134 positions in           0.147   4.680 Epos/s
 e1d2    703,028,745,285,875,868 positions in           0.143   4.905 Epos/s
 e1e2  1,050,134,052,410,078,727 positions in           0.051  20.526 Epos/s
 e1f1    735,804,983,672,832,601 positions in           0.026  27.523 Epos/s
 e1f2  1,157,859,680,063,453,488 positions in           0.012  95.214 Epos/s
 g1e2    945,371,558,234,344,460 positions in           0.004 197.125 Epos/s
 g1f3  1,385,438,910,975,932,839 positions in           0.001   1.027 Zpos/s
 g1h3    895,280,075,533,759,867 positions in           0.000   6.463 Zpos/s
total   : 12,957,043,766,236,616,874 positions in           9.929   1.305 Epos/s
With JetChess trick and knowing one overflow only, from what I wrote back then:

Code: Select all

JetChess 1.0.0.0:

k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1

perft(19)

  1  Bc1-b2  1264611181709709388
  2  Bc1-a3  659457009336590439
  3  Bc1-d2  740040237489377013
  4  Bc1-e3  515757908580425750
  5  Bc1-f4  539487494231792182
  6  Bc1-g5  891663162564721957
  7  Bc1-h6  780567379191018161
  8  Ng1-e2  945371558234344460
  9  Ng1-f3  1385438910975932839
 10  Ng1-h3  895280075533759867
 11  Ke1-f1  735804983672832601
 12  Ke1-d1  692541386956704134
 13  Ke1-d2  703028745285875868
 14  Ke1-e2  1050134052410078727
 15  Ke1-f2  1157859680063453488

Total:   -5489700307472934742
Then: -5,489,700,307,472,934,742 + 1 × 2^64 = 12,957,043,766,236,616,874... exactly the same than MPerft. Furthermore, every divided count is matched between Jetchess and MPerft for this perft(19) count.

I want to raise a warning: I did exactly the same with perft(20) and divided counts match except in Nf3:

Code: Select all

JetChess 1.0.0.0:

k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1

perft(20)

  1  Bc1-b2  7381849532908251042
  2  Bc1-a3  3789314683705030887
  3  Bc1-d2  4308176281282487428
  4  Bc1-e3  2998731023870432062
  5  Bc1-f4  3123318871803074928
  6  Bc1-g5  5178532290991801515
  7  Bc1-h6  4542810120280543237
  8  Ng1-e2  5413637977425633758
  9  Ng1-f3  7943137449825604502
 10  Ng1-h3  5179627520227528635
 11  Ke1-f1  4299702185286081370
 12  Ke1-d1  4009569653362539651
 13  Ke1-d2  4047788944120303364
 14  Ke1-e2  6100661826137759280
 15  Ke1-f2  6755901517373669527

Total:   1285783583762534722

Code: Select all

.\mperft-5.2-windows>mperft-5.2-x86-64-v2.exe --nullmove 20 --div --fast --fen "k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1"
Magic Perft version 5.2 (c) Richard Delorme 2020 - 2026
Bitboard move generation based on magic bitboards.
Perft setting: hashtable size: 1024 Mbytes (67108868 entries); with 4 threads; with nullmove counting.
  a b c d e f g h
8 k . . . . . . . 8
7 . . . . . . . . 7
6 . . . . . . . . 6
5 . . . . . . . . 5
4 . . . . . . . . 4
3 . . . . . . . . 3
2 . . . . . . . . 2
1 . . B . K . N . 1
  a b c d e f g h
w,
depth: 20
 c1a3  3,789,314,683,705,030,887 positions in          11.943 317.283 Ppos/s
 c1b2  7,381,849,532,908,251,042 positions in           1.105   6.679 Epos/s
 c1d2  4,308,176,281,282,487,428 positions in           1.162   3.706 Epos/s
 c1e3  2,998,731,023,870,432,062 positions in           0.750   3.993 Epos/s
 c1f4  3,123,318,871,803,074,928 positions in           0.788   3.959 Epos/s
 c1g5  5,178,532,290,991,801,515 positions in           0.488  10.603 Epos/s
 c1h6  4,542,810,120,280,543,237 positions in           0.195  23.230 Epos/s
 e1d1  4,009,569,653,362,539,651 positions in           0.174  23.034 Epos/s
 e1d2  4,047,788,944,120,303,364 positions in           0.170  23.701 Epos/s
 e1e2  6,100,661,826,137,759,280 positions in           0.068  88.899 Epos/s
 e1f1  4,299,702,185,286,081,370 positions in           0.027 158.014 Epos/s
 e1f2  6,755,901,517,373,669,527 positions in           0.012 539.924 Epos/s
 g1e2  5,413,637,977,425,633,758 positions in           0.004   1.236 Zpos/s
 g1f3  7,078,446,321,370,469,270 positions in           0.001   5.818 Zpos/s
 g1h3  5,179,627,520,227,528,635 positions in           0.000  82.920 Zpos/s
total   :    421,092,455,307,399,490 positions in          16.893  24.926 Ppos/s
full time:          17.259 s
For Nf3, I get 7,943,137,449,825,604,502 from JetChess and 7,078,446,321,370,469,270 from MPerft in a count that should not face 64-bit overflows because 2^63 ~ 9.22e+18 and those counts are less than 8e+18 each. MPerft gets around 8.6469e+17 leaf nodes less than JetChess; and JetChess' result was confirmed by Paul/gperft... The exact difference of 864,691,128,455,135,232 is a curious number itself when taking logarithms: log2(864,691,128,455,135,232) = 58 + ln(3)/ln(2) → 864,691,128,455,135,232 = 3 × 2⁵⁸ and that 58 reminded me what I read before in the Readme: 'On 64 bits versions, the leaf counter is limited to 2⁵⁸'.

When I run perft(19) of the resulting position after Nf3 (k7/8/8/8/8/5N2/8/2B1K3 b - - 0 1), I get the same result for both perft counters. I do not know what is going on, with overflows of 2⁵⁸ being everywhere, but only 'crashing' a single divided count. The explanation might be simple after all: why am I using a not 128-bit version for results with 64-bit overflows? Let each work to be done by the appropiate version.

Going back to perft(20), if I sum all the divided counts of MPerft and compare with the true result, it falls short again by the same distance of 3 × 2⁵⁸, which means that all the reported divided counts by MPerft are correct (without needing corrections for overflows), except the already mentioned Nf3.

------------
abulmo2 wrote: Fri Jun 26, 2026 8:14 pm[...]

gperft was incredibly fast. It is unfortunate the program is no more available.
Yes... :-( I still have got 1.0 (Windows only), 1.0.1, 1.0.2, 1.0.3 and 1.1; while I skipped 1.2d, as said before.

Regards from Spain.

Ajedrecista.