Hello Richard:
abulmo2 wrote: ↑Fri Jun 26, 2026 8:14 pmThank you for your kind words. I just release version 5.3; although it is not finished...

. In version 5.3, I mostly optimized the hashtable.
The performance of the different versions on my system (Ryzen 9 5950x at 4.2 Ghz) are (using the arguments -n 9 -h 1024 -t 16 -q):
- mperft-5.3-x86-64: 7.382s
- mperft-5.3-x86-64-v2: 6.357s (-13.9%)
- mperft-5.3-x86-64-v3: 6.003s ( -5.6%)
The x86-64-v2 version brings popcount and the x86-64-v3 version pext (+some compiler optimisations). So you do not lose much speed. The 128-bit version is slower, mainly because less transposition table entries are available using the same amount of memory.
[...]
Thank you for your explanations: so, v3 does not bring a large gain against v2, but everything counts. It is good to know that the 128-bit version is slower, which was unexpecterd to me.
Summarizing: go for v3 when possible for usual runs and reserve the 128-bit version for something really large, to overcome overflows.
There is a note on the
5.2 release that says
'I hope the 128 bit version to now count right above 64 bit numbers'. That does not bring confidence! Let us do a thing: in the past, I overcame overflows with a clever method combining different tools:
Re: KBNk ---> perft(20) result.
I got a result that was later confirmed by Paul (gperft's author). Since I can not run the 128-bit version right now, you or anyone else could give a try, just to confirm:
Code: Select all
k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1
perft(20) = 75,072,759,878,600,741,186
I tried v2 (without 128-bit counters) for perft(19) of the same position (total has one overflow, but divide counts not) and got:
Code: Select all
.\mperft-5.2-windows>mperft-5.2-x86-64-v2.exe --nullmove 19 --div --fast --fen "k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1"
Magic Perft version 5.2 (c) Richard Delorme 2020 - 2026
Bitboard move generation based on magic bitboards.
Perft setting: hashtable size: 1024 Mbytes (67108868 entries); with 4 threads; with nullmove counting.
a b c d e f g h
8 k . . . . . . . 8
7 . . . . . . . . 7
6 . . . . . . . . 6
5 . . . . . . . . 5
4 . . . . . . . . 4
3 . . . . . . . . 3
2 . . . . . . . . 2
1 . . B . K . N . 1
a b c d e f g h
w,
depth: 19
c1a3 659,457,009,336,590,439 positions in 5.668 116.343 Ppos/s
c1b2 1,264,611,181,709,709,388 positions in 1.069 1.182 Epos/s
c1d2 740,040,237,489,377,013 positions in 0.957 773.245 Ppos/s
c1e3 515,757,908,580,425,750 positions in 0.629 819.638 Ppos/s
c1f4 539,487,494,231,792,182 positions in 0.638 844.362 Ppos/s
c1g5 891,663,162,564,721,957 positions in 0.407 2.190 Epos/s
c1h6 780,567,379,191,018,161 positions in 0.171 4.550 Epos/s
e1d1 692,541,386,956,704,134 positions in 0.147 4.680 Epos/s
e1d2 703,028,745,285,875,868 positions in 0.143 4.905 Epos/s
e1e2 1,050,134,052,410,078,727 positions in 0.051 20.526 Epos/s
e1f1 735,804,983,672,832,601 positions in 0.026 27.523 Epos/s
e1f2 1,157,859,680,063,453,488 positions in 0.012 95.214 Epos/s
g1e2 945,371,558,234,344,460 positions in 0.004 197.125 Epos/s
g1f3 1,385,438,910,975,932,839 positions in 0.001 1.027 Zpos/s
g1h3 895,280,075,533,759,867 positions in 0.000 6.463 Zpos/s
total : 12,957,043,766,236,616,874 positions in 9.929 1.305 Epos/s
With JetChess trick and knowing one overflow only, from what I wrote back then:
Code: Select all
JetChess 1.0.0.0:
k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1
perft(19)
1 Bc1-b2 1264611181709709388
2 Bc1-a3 659457009336590439
3 Bc1-d2 740040237489377013
4 Bc1-e3 515757908580425750
5 Bc1-f4 539487494231792182
6 Bc1-g5 891663162564721957
7 Bc1-h6 780567379191018161
8 Ng1-e2 945371558234344460
9 Ng1-f3 1385438910975932839
10 Ng1-h3 895280075533759867
11 Ke1-f1 735804983672832601
12 Ke1-d1 692541386956704134
13 Ke1-d2 703028745285875868
14 Ke1-e2 1050134052410078727
15 Ke1-f2 1157859680063453488
Total: -5489700307472934742
Then: -5,489,700,307,472,934,742 + 1 × 2^64 = 12,957,043,766,236,616,874... exactly the same than MPerft. Furthermore, every divided count is matched between Jetchess and MPerft for this perft(19) count.
I want to raise a warning: I did exactly the same with perft(20) and divided counts match
except in Nf3:
Code: Select all
JetChess 1.0.0.0:
k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1
perft(20)
1 Bc1-b2 7381849532908251042
2 Bc1-a3 3789314683705030887
3 Bc1-d2 4308176281282487428
4 Bc1-e3 2998731023870432062
5 Bc1-f4 3123318871803074928
6 Bc1-g5 5178532290991801515
7 Bc1-h6 4542810120280543237
8 Ng1-e2 5413637977425633758
9 Ng1-f3 7943137449825604502
10 Ng1-h3 5179627520227528635
11 Ke1-f1 4299702185286081370
12 Ke1-d1 4009569653362539651
13 Ke1-d2 4047788944120303364
14 Ke1-e2 6100661826137759280
15 Ke1-f2 6755901517373669527
Total: 1285783583762534722
Code: Select all
.\mperft-5.2-windows>mperft-5.2-x86-64-v2.exe --nullmove 20 --div --fast --fen "k7/8/8/8/8/8/8/2B1K1N1 w - - 0 1"
Magic Perft version 5.2 (c) Richard Delorme 2020 - 2026
Bitboard move generation based on magic bitboards.
Perft setting: hashtable size: 1024 Mbytes (67108868 entries); with 4 threads; with nullmove counting.
a b c d e f g h
8 k . . . . . . . 8
7 . . . . . . . . 7
6 . . . . . . . . 6
5 . . . . . . . . 5
4 . . . . . . . . 4
3 . . . . . . . . 3
2 . . . . . . . . 2
1 . . B . K . N . 1
a b c d e f g h
w,
depth: 20
c1a3 3,789,314,683,705,030,887 positions in 11.943 317.283 Ppos/s
c1b2 7,381,849,532,908,251,042 positions in 1.105 6.679 Epos/s
c1d2 4,308,176,281,282,487,428 positions in 1.162 3.706 Epos/s
c1e3 2,998,731,023,870,432,062 positions in 0.750 3.993 Epos/s
c1f4 3,123,318,871,803,074,928 positions in 0.788 3.959 Epos/s
c1g5 5,178,532,290,991,801,515 positions in 0.488 10.603 Epos/s
c1h6 4,542,810,120,280,543,237 positions in 0.195 23.230 Epos/s
e1d1 4,009,569,653,362,539,651 positions in 0.174 23.034 Epos/s
e1d2 4,047,788,944,120,303,364 positions in 0.170 23.701 Epos/s
e1e2 6,100,661,826,137,759,280 positions in 0.068 88.899 Epos/s
e1f1 4,299,702,185,286,081,370 positions in 0.027 158.014 Epos/s
e1f2 6,755,901,517,373,669,527 positions in 0.012 539.924 Epos/s
g1e2 5,413,637,977,425,633,758 positions in 0.004 1.236 Zpos/s
g1f3 7,078,446,321,370,469,270 positions in 0.001 5.818 Zpos/s
g1h3 5,179,627,520,227,528,635 positions in 0.000 82.920 Zpos/s
total : 421,092,455,307,399,490 positions in 16.893 24.926 Ppos/s
full time: 17.259 s
For Nf3, I get 7,943,137,449,825,604,502 from JetChess and 7,078,446,321,370,469,270 from MPerft in a count that should not face 64-bit overflows because 2^63 ~ 9.22e+18 and those counts are less than 8e+18 each. MPerft gets around 8.6469e+17 leaf nodes less than JetChess; and JetChess' result was confirmed by Paul/gperft... The exact difference of 864,691,128,455,135,232 is a curious number itself when taking logarithms: log2(864,691,128,455,135,232) = 58 + ln(3)/ln(2) → 864,691,128,455,135,232 = 3 × 2⁵⁸ and that 58 reminded me what I read before in the Readme:
'On 64 bits versions, the leaf counter is limited to 2⁵⁸'.
When I run perft(19) of the resulting position after Nf3 (k7/8/8/8/8/5N2/8/2B1K3 b - - 0 1), I get the same result for both perft counters. I do not know what is going on, with overflows of 2⁵⁸ being everywhere, but only 'crashing' a single divided count. The explanation might be simple after all: why am I using a not 128-bit version for results with 64-bit overflows? Let each work to be done by the appropiate version.
Going back to perft(20), if I sum all the divided counts of MPerft and compare with the true result, it falls short again by the same distance of 3 × 2⁵⁸, which means that all the reported divided counts by MPerft are correct (without needing corrections for overflows), except the already mentioned Nf3.
------------
abulmo2 wrote: ↑Fri Jun 26, 2026 8:14 pm[...]
gperft was incredibly fast. It is unfortunate the program is no more available.
Yes...

I still have got 1.0 (Windows only), 1.0.1, 1.0.2, 1.0.3 and 1.1; while I skipped 1.2d, as said before.
Regards from Spain.
Ajedrecista.