A comparison of some Perft programs 

Discussion of chess software programming and technical issues.

Moderator: Ras

abulmo2
Posts: 496
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: A comparison of some Perft programs.

Post by abulmo2 »

Ajedrecista wrote: Tue Jun 23, 2026 7:56 pm Hello Richard:

[You wrote a faster perft counter than gperft 1.1 on my system! Under the same or very similar conditions, Perft(9) was computed in circa 99 seconds with gperft 1.1 (hash = 1 GB and 4 threads) and circa 90 seconds (-10%) with MPerft 5.2 v2 (AVX compatible) (using --fast (hash = 1 GB and 4 threads) and --nullmove 9)! The difference was that gperft showed a divided perft and Mperft not (I realised later that I ran MPerft without --div argument), which might close the gap a little.

It is true that gperft 1.2d exists and is faster than 1.1, but I disgracefully missed that download and is not longer available. OTOH, I tested MPerft 5.2 v2 (AVX compatible), so there must be additional gains with v3 (AVX2 compatible) and v3-128 (AVX2 compatible and 128-bit counter for crunching big numbers).

Other appeal to me is the already compiled source, which is a bottleneck for a dummy like me that do not know how to compile. By the way: 5.3 is mentioned in the Readme of your GitHub, but it is not in the Release section... is it unfinished?

Big, big thank you to Richard!
Thank you for your kind words. I just release version 5.3; although it is not finished... :D . In version 5.3, I mostly optimized the hashtable.
The performance of the different versions on my system (Ryzen 9 5950x at 4.2 Ghz) are (using the arguments -n 9 -h 1024 -t 16 -q):
- mperft-5.3-x86-64: 7.382s
- mperft-5.3-x86-64-v2: 6.357s (-13.9%)
- mperft-5.3-x86-64-v3: 6.003s ( -5.6%)
The x86-64-v2 version brings popcount and the x86-64-v3 version pext (+some compiler optimisations). So you do not lose much speed. The 128-bit version is slower, mainly because less transposition table entries are available using the same amount of memory.

gperft was incredibly fast. It is unfortunate the program is no more available.
Richard Delorme
User avatar
Ajedrecista
Posts: 2253
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: A comparison of some Perft programs.

Post by Ajedrecista »

Hello Richard:
abulmo2 wrote: Fri Jun 26, 2026 8:14 pmThank you for your kind words. I just release version 5.3; although it is not finished... :D . In version 5.3, I mostly optimized the hashtable.
The performance of the different versions on my system (Ryzen 9 5950x at 4.2 Ghz) are (using the arguments -n 9 -h 1024 -t 16 -q):
- mperft-5.3-x86-64: 7.382s
- mperft-5.3-x86-64-v2: 6.357s (-13.9%)
- mperft-5.3-x86-64-v3: 6.003s ( -5.6%)
The x86-64-v2 version brings popcount and the x86-64-v3 version pext (+some compiler optimisations). So you do not lose much speed. The 128-bit version is slower, mainly because less transposition table entries are available using the same amount of memory.

[...]
Thank you for your explanations: so, v3 does not bring a large gain against v2, but everything counts. It is good to know that the 128-bit version is slower, which was unexpecterd to me. Summarizing: go for v3 when possible for usual runs and reserve the 128-bit version for something really large, to overcome overflows.

There is a note on the 5.2 release that says 'I hope the 128 bit version to now count right above 64 bit numbers'. That does not bring confidence! Let us do a thing: in the past, I overcame overflows with a clever method combining different tools:

Re: KBNk ---> perft(20) result.

I got a result that was later confirmed by Paul (gperft's author). Since I can not run the 128-bit version right now, you or anyone else could give a try, just to confirm:

Code: Select all

k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1

perft(20) = 75,072,759,878,600,741,186
I tried v2 (without 128-bit counters) for perft(19) of the same position (total has one overflow, but divide counts not) and got:

Code: Select all

.\mperft-5.2-windows>mperft-5.2-x86-64-v2.exe --nullmove 19 --div --fast --fen "k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1"
Magic Perft version 5.2 (c) Richard Delorme 2020 - 2026
Bitboard move generation based on magic bitboards.
Perft setting: hashtable size: 1024 Mbytes (67108868 entries); with 4 threads; with nullmove counting.
  a b c d e f g h
8 k . . . . . . . 8
7 . . . . . . . . 7
6 . . . . . . . . 6
5 . . . . . . . . 5
4 . . . . . . . . 4
3 . . . . . . . . 3
2 . . . . . . . . 2
1 . . B . K . N . 1
  a b c d e f g h
w,
depth: 19
 c1a3    659,457,009,336,590,439 positions in           5.668 116.343 Ppos/s
 c1b2  1,264,611,181,709,709,388 positions in           1.069   1.182 Epos/s
 c1d2    740,040,237,489,377,013 positions in           0.957 773.245 Ppos/s
 c1e3    515,757,908,580,425,750 positions in           0.629 819.638 Ppos/s
 c1f4    539,487,494,231,792,182 positions in           0.638 844.362 Ppos/s
 c1g5    891,663,162,564,721,957 positions in           0.407   2.190 Epos/s
 c1h6    780,567,379,191,018,161 positions in           0.171   4.550 Epos/s
 e1d1    692,541,386,956,704,134 positions in           0.147   4.680 Epos/s
 e1d2    703,028,745,285,875,868 positions in           0.143   4.905 Epos/s
 e1e2  1,050,134,052,410,078,727 positions in           0.051  20.526 Epos/s
 e1f1    735,804,983,672,832,601 positions in           0.026  27.523 Epos/s
 e1f2  1,157,859,680,063,453,488 positions in           0.012  95.214 Epos/s
 g1e2    945,371,558,234,344,460 positions in           0.004 197.125 Epos/s
 g1f3  1,385,438,910,975,932,839 positions in           0.001   1.027 Zpos/s
 g1h3    895,280,075,533,759,867 positions in           0.000   6.463 Zpos/s
total   : 12,957,043,766,236,616,874 positions in           9.929   1.305 Epos/s
With JetChess trick and knowing one overflow only, from what I wrote back then:

Code: Select all

JetChess 1.0.0.0:

k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1

perft(19)

  1  Bc1-b2  1264611181709709388
  2  Bc1-a3  659457009336590439
  3  Bc1-d2  740040237489377013
  4  Bc1-e3  515757908580425750
  5  Bc1-f4  539487494231792182
  6  Bc1-g5  891663162564721957
  7  Bc1-h6  780567379191018161
  8  Ng1-e2  945371558234344460
  9  Ng1-f3  1385438910975932839
 10  Ng1-h3  895280075533759867
 11  Ke1-f1  735804983672832601
 12  Ke1-d1  692541386956704134
 13  Ke1-d2  703028745285875868
 14  Ke1-e2  1050134052410078727
 15  Ke1-f2  1157859680063453488

Total:   -5489700307472934742
Then: -5,489,700,307,472,934,742 + 1 × 2^64 = 12,957,043,766,236,616,874... exactly the same than MPerft. Furthermore, every divided count is matched between Jetchess and MPerft for this perft(19) count.

I want to raise a warning: I did exactly the same with perft(20) and divided counts match except in Nf3:

Code: Select all

JetChess 1.0.0.0:

k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1

perft(20)

  1  Bc1-b2  7381849532908251042
  2  Bc1-a3  3789314683705030887
  3  Bc1-d2  4308176281282487428
  4  Bc1-e3  2998731023870432062
  5  Bc1-f4  3123318871803074928
  6  Bc1-g5  5178532290991801515
  7  Bc1-h6  4542810120280543237
  8  Ng1-e2  5413637977425633758
  9  Ng1-f3  7943137449825604502
 10  Ng1-h3  5179627520227528635
 11  Ke1-f1  4299702185286081370
 12  Ke1-d1  4009569653362539651
 13  Ke1-d2  4047788944120303364
 14  Ke1-e2  6100661826137759280
 15  Ke1-f2  6755901517373669527

Total:   1285783583762534722

Code: Select all

.\mperft-5.2-windows>mperft-5.2-x86-64-v2.exe --nullmove 20 --div --fast --fen "k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1"
Magic Perft version 5.2 (c) Richard Delorme 2020 - 2026
Bitboard move generation based on magic bitboards.
Perft setting: hashtable size: 1024 Mbytes (67108868 entries); with 4 threads; with nullmove counting.
  a b c d e f g h
8 k . . . . . . . 8
7 . . . . . . . . 7
6 . . . . . . . . 6
5 . . . . . . . . 5
4 . . . . . . . . 4
3 . . . . . . . . 3
2 . . . . . . . . 2
1 . . B . K . N . 1
  a b c d e f g h
w,
depth: 20
 c1a3  3,789,314,683,705,030,887 positions in          11.943 317.283 Ppos/s
 c1b2  7,381,849,532,908,251,042 positions in           1.105   6.679 Epos/s
 c1d2  4,308,176,281,282,487,428 positions in           1.162   3.706 Epos/s
 c1e3  2,998,731,023,870,432,062 positions in           0.750   3.993 Epos/s
 c1f4  3,123,318,871,803,074,928 positions in           0.788   3.959 Epos/s
 c1g5  5,178,532,290,991,801,515 positions in           0.488  10.603 Epos/s
 c1h6  4,542,810,120,280,543,237 positions in           0.195  23.230 Epos/s
 e1d1  4,009,569,653,362,539,651 positions in           0.174  23.034 Epos/s
 e1d2  4,047,788,944,120,303,364 positions in           0.170  23.701 Epos/s
 e1e2  6,100,661,826,137,759,280 positions in           0.068  88.899 Epos/s
 e1f1  4,299,702,185,286,081,370 positions in           0.027 158.014 Epos/s
 e1f2  6,755,901,517,373,669,527 positions in           0.012 539.924 Epos/s
 g1e2  5,413,637,977,425,633,758 positions in           0.004   1.236 Zpos/s
 g1f3  7,078,446,321,370,469,270 positions in           0.001   5.818 Zpos/s
 g1h3  5,179,627,520,227,528,635 positions in           0.000  82.920 Zpos/s
total   :    421,092,455,307,399,490 positions in          16.893  24.926 Ppos/s
full time:          17.259 s
For Nf3, I get 7,943,137,449,825,604,502 from JetChess and 7,078,446,321,370,469,270 from MPerft in a count that should not face 64-bit overflows because 2^63 ~ 9.22e+18 and those counts are less than 8e+18 each. MPerft gets around 8.6469e+17 leaf nodes less than JetChess; and JetChess' result was confirmed by Paul/gperft... The exact difference of 864,691,128,455,135,232 is a curious number itself when taking logarithms: log2(864,691,128,455,135,232) = 58 + ln(3)/ln(2) → 864,691,128,455,135,232 = 3 × 2⁵⁸ and that 58 reminded me what I read before in the Readme: 'On 64 bits versions, the leaf counter is limited to 2⁵⁸'.

When I run perft(19) of the resulting position after Nf3 (k7/8/8/8/8/5N2/8/2B1K3 b - - 0 1), I get the same result for both perft counters. I do not know what is going on, with overflows of 2⁵⁸ being everywhere, but only 'crashing' a single divided count. The explanation might be simple after all: why am I using a not 128-bit version for results with 64-bit overflows? Let each work to be done by the appropiate version.

Going back to perft(20), if I sum all the divided counts of MPerft and compare with the true result, it falls short again by the same distance of 3 × 2⁵⁸, which means that all the reported divided counts by MPerft are correct (without needing corrections for overflows), except the already mentioned Nf3.

------------
abulmo2 wrote: Fri Jun 26, 2026 8:14 pm[...]

gperft was incredibly fast. It is unfortunate the program is no more available.
Yes... :-( I still have got 1.0 (Windows only), 1.0.1, 1.0.2, 1.0.3 and 1.1; while I skipped 1.2d, as said before.

Regards from Spain.

Ajedrecista.
abulmo2
Posts: 496
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: A comparison of some Perft programs.

Post by abulmo2 »

Ajedrecista wrote: Sat Jun 27, 2026 12:57 pm There is a note on the 5.2 release that says 'I hope the 128 bit version to now count right above 64 bit numbers'. That does not bring confidence! Let us do a thing: in the past, I overcame overflows with a clever method combining different tools:
Version 5.2 fixes a few bugs in the 128 bit version. Some intermediate variable were 64 bits wide in stead of 128-bit, which could lead to spurious overflow. Note that 128-bit number is poorly supported by the main C/C++ compilers. For example, under Linux you cannot write a 128 bit literal nor print out a 128-bit number (MPerft uses workarounds for both of them). There is also a lack of 128 bit results to compare with. What I know is that version 5.2/5.3 give consistent results, probably exact.
Re: KBNk ---> perft(20) result.

I got a result that was later confirmed by Paul (gperft's author). Since I can not run the 128-bit version right now, you or anyone else could give a try, just to confirm:

Code: Select all

k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1

perft(20) = 75,072,759,878,600,741,186
The 128 bit version give the following result:

Code: Select all

mperft-5.3-x86-64-v3-128bits -f "k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1" 20 --fast --div
Magic Perft version 5.3 (c) Richard Delorme 2020 - 2026
Bitboard move generation based on magic (pext) bitboards.
Using 128 bits counter & Zobrist's key.
Perft setting: hashtable size: 32768 Mbytes (1073741828 entries); with 32 threads; with nullmove counting.
  a b c d e f g h
8 k . . . . . . . 8
7 . . . . . . . . 7
6 . . . . . . . . 6
5 . . . . . . . . 5
4 . . . . . . . . 4
3 . . . . . . . . 3
2 . . . . . . . . 2
1 . . B . K . N . 1
  a b c d e f g h
w, 
depth: 20
 c1a3  3,789,314,683,705,030,887 positions in        0:00.771   4.913 Epos/s
 c1b2  7,381,849,532,908,251,042 positions in        0:00.128  57.618 Epos/s
 c1d2  4,308,176,281,282,487,428 positions in        0:00.123  34.772 Epos/s
 c1e3  2,998,731,023,870,432,062 positions in        0:00.078  38.056 Epos/s
 c1f4  3,123,318,871,803,074,928 positions in        0:00.091  34.065 Epos/s
 c1g5  5,178,532,290,991,801,515 positions in        0:00.054  95.537 Epos/s
 c1h6  4,542,810,120,280,543,237 positions in        0:00.022 203.186 Epos/s
 e1d1  4,009,569,653,362,539,651 positions in        0:00.021 184.209 Epos/s
 e1d2  4,047,788,944,120,303,364 positions in        0:00.024 162.760 Epos/s
 e1e2  6,100,661,826,137,759,280 positions in        0:00.007 849.395 Epos/s
 e1f1  4,299,702,185,286,081,370 positions in        0:00.004   1.026 Zpos/s
 e1f2  6,755,901,517,373,669,527 positions in        0:00.002   2.945 Zpos/s
 g1e2  5,413,637,977,425,633,758 positions in        0:00.005   1.070 Zpos/s
 g1f3  7,943,137,449,825,604,502 positions in        0:00.000  16.228 Zpos/s
 g1h3  5,179,627,520,227,528,635 positions in        0:00.000 120.027 Zpos/s
total   : 75,072,759,878,600,741,186 positions in        0:01.336  56.185 Epos/s
full time:        0:15.425 s
So it confirms your result.
Note that using simple endgame positions it is quite easy to overflow 128-bit numbers.

Code: Select all

$ mperft-5.3-x86-64-v3-128bits -f "3qk3/8/8/8/8/8/8/4K3 w - - 0 1" 40 --fast -l
Magic Perft version 5.3 (c) Richard Delorme 2020 - 2026
Bitboard move generation based on magic (pext) bitboards.
Using 128 bits counter & Zobrist's key.
Perft setting: hashtable size: 32768 Mbytes (1073741828 entries); with 32 threads; with nullmove counting.
  a b c d e f g h
8 . . . q k . . . 8
7 . . . . . . . . 7
6 . . . . . . . . 6
5 . . . . . . . . 5
4 . . . . . . . . 4
3 . . . . . . . . 3
2 . . . . . . . . 2
1 . . . . K . . . 1
  a b c d e f g h
w, 
perft  1:                                                   3 positions in        0:00.000   1.144 Mpos/s
perft  2:                                                  63 positions in        0:00.000   4.332 Mpos/s
perft  3:                                                 329 positions in        0:00.000  33.657 Mpos/s
perft  4:                                               8,548 positions in        0:00.000 195.918 Mpos/s
perft  5:                                              40,679 positions in        0:00.000 157.544 Mpos/s
perft  6:                                           1,089,960 positions in        0:00.000   3.104 Gpos/s
perft  7:                                           5,114,254 positions in        0:00.005 879.489 Mpos/s
perft  8:                                         138,446,205 positions in        0:00.009  14.705 Gpos/s
perft  9:                                         643,479,181 positions in        0:00.002 216.262 Gpos/s
perft 10:                                      17,516,410,462 positions in        0:00.007   2.292 Tpos/s
perft 11:                                      81,232,421,029 positions in        0:00.010   8.065 Tpos/s
perft 12:                                   2,218,131,608,219 positions in        0:00.024  88.763 Tpos/s
perft 13:                                  10,262,215,985,523 positions in        0:00.029 345.340 Tpos/s
perft 14:                                 280,829,349,563,385 positions in        0:00.057   4.911 Ppos/s
perft 15:                               1,297,998,818,932,585 positions in        0:00.063  20.395 Ppos/s
perft 16:                              35,576,609,082,824,949 positions in        0:00.104 339.010 Ppos/s
perft 17:                             164,247,550,974,734,442 positions in        0:00.117   1.399 Epos/s
perft 18:                           4,507,326,418,311,453,626 positions in        0:00.175  25.738 Epos/s
perft 19:                          20,793,374,300,613,418,561 positions in        0:00.234  88.737 Epos/s
perft 20:                         571,166,912,404,442,796,749 positions in        0:00.283   2.014 Zpos/s
perft 21:                       2,632,953,960,379,504,344,150 positions in        0:00.338   7.777 Zpos/s
perft 22:                      72,379,192,429,587,502,526,718 positions in        0:00.387 186.795 Zpos/s
perft 23:                     333,452,076,802,942,292,304,794 positions in        0:00.538 619.437 Zpos/s
perft 24:                   9,172,137,475,372,452,097,338,908 positions in        0:00.578  15.846 Ypos/s
perft 25:                  42,233,162,321,751,010,925,985,535 positions in        0:00.615  68.591 Ypos/s
perft 26:               1,162,263,592,652,410,085,819,279,036 positions in        0:00.720   1.612 Rpos/s
perft 27:               5,349,185,013,275,714,786,934,684,836 positions in        0:00.814   6.565 Rpos/s
perft 28:             147,268,705,138,733,701,686,330,652,207 positions in        0:00.866 169.975 Rpos/s
perft 29:             677,513,878,027,131,424,006,971,650,216 positions in        0:01.014 668.106 Rpos/s
perft 30:          18,658,587,063,416,864,255,387,511,935,553 positions in        0:01.143  16.321 Qpos/s
perft 31:          85,809,762,525,500,954,233,243,488,589,996 positions in        0:01.323  64.825 Qpos/s
perft 32:       2,363,782,025,397,195,327,917,872,446,390,159 positions in        0:01.431 1650.808 Qpos/s
perft 33:      10,867,665,630,870,920,541,588,165,318,233,518 positions in        0:01.421 7645.622 Qpos/s
perft 34:     299,429,665,589,366,176,247,021,570,069,696,829 positions in        0:01.645 181938.125 Qpos/s
perft 35:   1,376,300,979,740,609,421,424,212,750,564,342,557 positions in        0:01.602 858976.412 Qpos/s
perft 36:  37,926,410,097,876,340,281,000,619,340,372,744,001 positions in        0:01.837 20638904.569 Qpos/s
perft 37: 174,287,433,517,717,864,381,080,632,595,041,305,625 positions in        0:01.948 89441256.434 Qpos/s
perft 38: 337,202,520,174,289,936,700,713,709,013,056,098,182 positions in        0:01.941 173638694.175 Qpos/s
perft 39: 339,337,238,885,727,500,452,289,895,318,291,273,129 positions in        0:02.089 162369860.575 Qpos/s
perft 40: 144,584,449,479,129,745,543,978,057,051,695,167,225 positions in        0:02.280 63388176.495 Qpos/s
total   :  14,180,018,784,607,675,732,780,915,642,408,237,558 positions in        0:25.670 552395.974 Qpos/s
Using 128 bits, I am limited to 2^122 in the hashtable (the other 6 bits are used to store the depth), which 5,316,911,983,139,663,491,615,228,241,121,378,304 so any number bigger than that are suspicious in MPerft
If I remember well, the number of positions is around 10^800 - 10^1000 in chess, taking into account 3-fold repetition & 50-move rules (which perft does not), so 128 bit is obviously a low limit.