Fastest perft

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Fastest perft

Post by sje »

Code: Select all

Depth: 7   Count: 3,195,901,860   Elapsed: 16.5221  (1.93432e+08 Hz / 5.16979e-09 s)
Depth: 8   Count: 84,998,978,956   Elapsed: 53.3478  (1.5933e+09 Hz / 6.27628e-10 s)
Depth: 9   Count: 2,439,530,234,167   Elapsed: 429.863  (5.67514e+09 Hz / 1.76207e-10 s)
Depth: 10   Count: 69,352,859,712,417   Elapsed: 5208.12  (1.33163e+10 Hz / 7.50959e-11 s)
User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Fastest perft

Post by hgm »

Yes, you still beat me by a factor 3; my perft(10) took 15,556 sec (from the position after 1. h3 h6, but that should be nearly equal to that of the opening position).

Still not bad for only a single thread on a 2.4GHz C2D, with 512MB hash. (It never used more than 380MB according to the task manager, though.)
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Fastest perft

Post by sje »

Knock off 40% of the above times if run on the 3.4 GHz i7-2600 instead on the 2.66 GHz Xeon 5150 pair.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Fastest perft

Post by stegemma »

While verifying my new engine, i've found a little bug. Using Quick Perft with this FEN:

perft 5 "4k3/8/8/8/8/2r3r1/8/R3K2R w KQ -"

you get different results, without warning, from this bad FEN:

perft 5 "4k3/8/8/8/8/2r3r1/8/R3K2R w KQkq -"
User avatar
Ajedrecista
Posts: 2097
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Fastest perft

Post by Ajedrecista »

Hello:

First of all: sorry for bumping this topic. I downloaded perft.c source of qperft (by Muller) and I compiled it by myself, using:

Code: Select all

gcc -o perft perft.c
The size of perft.c is around 52.7 KB (in a Notepad), while my own compiled perft.exe (using MinGW32) is around 108.7 KB (very far from 21.5 KB of the executable that can be downloaded at Muller's web). Also, my executable is around 35% slower for Perft(7) than the official one (in my computer), but it is not the problem for me. The real problem is that the official perft.exe works perfectly, while my own compiled executable not: using the starting position, it runs perfectly until Perft(6)... but problems start at seventh ply: if I split the values, all of them are correct, but the sum is wrong (is negative!): -1099065436, which is exactly Perft(7) - 2^(32). My system is 32-bits and I suspected that it could be one reason, but the official .exe is flawless; I saw in the code the next comment:

Code: Select all

if(depth > 7) { // the count will not fit in 32 bits
Although in my case it was depth > 6 for the total perft (not the splitted values, that were fine). Perft(8) is even worse than Perft(7) as expected: I include log.txt file here:

Code: Select all

perft 8 -2
2. h2h3 moves =        380 ( 0.000 sec)
2. h2h4 moves =        420 ( 0.000 sec)
2. g2g3 moves =        420 ( 0.000 sec)
2. g2g4 moves =        421 ( 0.000 sec)
2. f2f3 moves =        380 ( 0.000 sec)
2. f2f4 moves =        401 ( 0.000 sec)
2. e2e3 moves =        599 ( 0.000 sec)
2. e2e4 moves =        600 ( 0.000 sec)
2. d2d3 moves =        539 ( 0.000 sec)
2. d2d4 moves =        560 ( 0.000 sec)
2. c2c3 moves =        420 ( 0.000 sec)
2. c2c4 moves =        441 ( 0.000 sec)
2. b2b3 moves =        420 ( 0.000 sec)
2. b2b4 moves =        421 ( 0.000 sec)
2. a2a3 moves =        380 ( 0.015 sec)
2. a2a4 moves =        420 ( 0.000 sec)
2. g1f3 moves =        440 ( 0.000 sec)
2. g1h3 moves =        400 ( 0.000 sec)
2. b1a3 moves =        400 ( 0.000 sec)
2. b1c3 moves =        440 ( 0.000 sec)
2. h2h3 moves =       8457 ( 0.000 sec)
2. h2h4 moves =       9329 ( 0.000 sec)
2. g2g3 moves =       9345 ( 0.000 sec)
2. g2g4 moves =       9328 ( 0.000 sec)
2. f2f3 moves =       8457 ( 0.000 sec)
2. f2f4 moves =       8929 ( 0.000 sec)
2. e2e3 moves =      13134 ( 0.000 sec)
2. e2e4 moves =      13160 ( 0.000 sec)
2. d2d3 moves =      11959 ( 0.000 sec)
2. d2d4 moves =      12435 ( 0.000 sec)
2. c2c3 moves =       9272 ( 0.000 sec)
2. c2c4 moves =       9744 ( 0.000 sec)
2. b2b3 moves =       9345 ( 0.000 sec)
2. b2b4 moves =       9332 ( 0.000 sec)
2. a2a3 moves =       8457 ( 0.000 sec)
2. a2a4 moves =       9329 ( 0.000 sec)
2. g1f3 moves =       9748 ( 0.000 sec)
2. g1h3 moves =       8881 ( 0.000 sec)
2. b1a3 moves =       8885 ( 0.000 sec)
2. b1c3 moves =       9755 ( 0.000 sec)
2. h2h3 moves =     181044 ( 0.016 sec)
2. h2h4 moves =     218829 ( 0.000 sec)
2. g2g3 moves =     217210 ( 0.015 sec)
2. g2g4 moves =     214048 ( 0.000 sec)
2. f2f3 moves =     178889 ( 0.000 sec)
2. f2f4 moves =     198473 ( 0.016 sec)
2. e2e3 moves =     402988 ( 0.000 sec)
2. e2e4 moves =     405385 ( 0.016 sec)
2. d2d3 moves =     328511 ( 0.000 sec)
2. d2d4 moves =     361790 ( 0.015 sec)
2. c2c3 moves =     222861 ( 0.000 sec)
2. c2c4 moves =     240082 ( 0.000 sec)
2. b2b3 moves =     215255 ( 0.016 sec)
2. b2b4 moves =     216145 ( 0.000 sec)
2. a2a3 moves =     181046 ( 0.000 sec)
2. a2a4 moves =     217832 ( 0.000 sec)
2. g1f3 moves =     233491 ( 0.000 sec)
2. g1h3 moves =     198502 ( 0.000 sec)
2. b1a3 moves =     198572 ( 0.000 sec)
2. b1c3 moves =     234656 ( 0.016 sec)
2. h2h3 moves =    4463070 ( 0.078 sec)
2. h2h4 moves =    5385554 ( 0.093 sec)
2. g2g3 moves =    5346260 ( 0.079 sec)
2. g2g4 moves =    5239875 ( 0.078 sec)
2. f2f3 moves =    4404141 ( 0.078 sec)
2. f2f4 moves =    4890429 ( 0.078 sec)
2. e2e3 moves =    9726018 ( 0.141 sec)
2. e2e4 moves =    9771632 ( 0.140 sec)
2. d2d3 moves =    8073082 ( 0.110 sec)
2. d2d4 moves =    8879566 ( 0.109 sec)
2. c2c3 moves =    5417640 ( 0.047 sec)
2. c2c4 moves =    5866666 ( 0.062 sec)
2. b2b3 moves =    5310358 ( 0.047 sec)
2. b2b4 moves =    5293555 ( 0.031 sec)
2. a2a3 moves =    4463267 ( 0.032 sec)
2. a2a4 moves =    5363555 ( 0.031 sec)
2. g1f3 moves =    5723523 ( 0.047 sec)
2. g1h3 moves =    4877234 ( 0.015 sec)
2. b1a3 moves =    4856835 ( 0.016 sec)
2. b1c3 moves =    5708064 ( 0.016 sec)
2. h2h3 moves =  106678423 ( 1.218 sec)
2. h2h4 moves =  138495290 ( 1.391 sec)
2. g2g3 moves =  135987651 ( 1.250 sec)
2. g2g4 moves =  130293018 ( 1.188 sec)
2. f2f3 moves =  102021008 ( 0.968 sec)
2. f2f4 moves =  119614841 ( 1.063 sec)
2. e2e3 moves =  306138410 ( 2.078 sec)
2. e2e4 moves =  309478263 ( 2.078 sec)
2. d2d3 moves =  227598692 ( 1.250 sec)
2. d2d4 moves =  269605599 ( 1.453 sec)
2. c2c3 moves =  144074944 ( 0.453 sec)
2. c2c4 moves =  157756443 ( 0.500 sec)
2. b2b3 moves =  133233975 ( 0.282 sec)
2. b2b4 moves =  134087476 ( 0.250 sec)
2. a2a3 moves =  106743106 ( 0.125 sec)
2. a2a4 moves =  137077337 ( 0.203 sec)
2. g1f3 moves =  147678554 ( 0.203 sec)
2. g1h3 moves =  120669525 ( 0.078 sec)
2. b1a3 moves =  120142144 ( 0.047 sec)
2. b1c3 moves =  148527161 ( 0.078 sec)
2. h2h3 moves = -1434558616 (15.125 sec)
2. h2h4 moves = -583844181 (18.547 sec)
2. g2g3 moves = -653534373 (16.375 sec)
2. g2g4 moves = -828762594 (15.328 sec)
2. f2f3 moves = -1566351428 (10.281 sec)
2. f2f4 moves = -1095927890 (12.000 sec)
2. e2e3 moves = -550543673 (29.985 sec)
2. e2e4 moves = -487826371 (30.265 sec)
2. d2d3 moves = 1798281323 (17.281 sec)
2. d2d4 moves = -1405352642 (20.672 sec)
2. c2c3 moves = -488738172 ( 7.063 sec)
2. c2c4 moves =  -95299680 ( 7.765 sec)
2. b2b3 moves = -715667679 ( 4.235 sec)
2. b2b4 moves = -725899667 ( 4.015 sec)
2. a2a3 moves = -1431555643 ( 1.719 sec)
2. a2a4 moves = -618657677 ( 3.000 sec)
2. g1f3 moves = -357613200 ( 3.156 sec)
2. g1h3 moves = -1073689014 ( 1.125 sec)
2. b1a3 moves = -1101444719 ( 0.860 sec)
2. b1c3 moves = -368282956 ( 1.234 sec)
Perft(8) wrong value that I obtained was -900366964, which is exactly Perft(8) - 10·2^(33) = Perft(8) - 5·2^(34)... why? Also all splitted values are wrong: (wrong splitted value) = (true splitted value) - 2^(32) (or -2^(33) for d4, e3 and e4); please note that d3 splitted (wrong) value is the only positive number and it is also: (wrong value for d3) = (true value for d3) - 2^(32). What is wrong with the code that is available online at this link? I surely will not understand nothing, but it is baffling for me that the results are wrong without changing this code, and the official qperft executable works fine. One could think that the official .exe has not _exactly_ the same code (i.e. few minor changes), but I am not convinced of this and I think it is more about 32-bit vs. 64-bit compilers (although I am clueless).

qperft is the very first source code (and the only one for the momment) that I managed to compile, so everybody can guess that I am not an expert compiler. It would be nice than someone could confirm if it is due to the 32-bit issue, but OTOH the official perft.exe works perfectly (maybe compiled with MinGW64 or another 64-bit compiler?).

The improve of qperft is more than noticeable and it is almost as fast as JetChess (not so fast yet) without splitting (split slows down the count) in my computer (thank you very much!). Just an example: now Perft(7) without splitting takes around 11 seconds with official qperft, while JetChess takes around 8.5 seconds. I guess that multi-core support would boost speed and surpass JetChess in 32-bit systems (I do not know how fast are both programs on 64-bit systems), but I also suppose that it is very hard and maybe not worthy, but very appreciated. Thanks in advance for reading such long post.

Regards from Spain.

Ajedrecista.
Pablo Vazquez
Posts: 154
Joined: Thu May 31, 2007 9:05 pm
Location: Madrid, Spain

Re: Fastest perft

Post by Pablo Vazquez »

Hi Jesús,

IIRC, Mingw has problems with the lld format: it thinks the corresponding argument is going to be 32 bit long. So you have to use I64d instead, just replace the 2 occurrences in the code (lines 1393 and 1481).

Regarding the size, you have to strip the executable with the -s option. Regarding speed, try compiling with different levels of optimizations: -O1 or -O2 or -O3, and see which one is the fastest.

Regards
User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Fastest perft

Post by hgm »

Indeed, the problem is purely in the printing. When linking with the MSVC library (as MinGW does), the print routine does not understand %lld, and would just print an int (which happens to be the low 32 bits of the long long int on a PC), and mess up everything after that (the high 32 bits are printed like they were the time). As said by Pablo, you would have to replace it by %I64d or %I64u. I have not made the code portable enough that it automatically builds correctly under both Linux and Windows.

To compile for Windows I used gcc+Cygwin (which should be equivalent to MinGW), as:

gcc -O2 -s -mno-cygwin -fomit-frame-pointer qperft.c -o qperft.exe

The -fomit-frame-pointer probably gives you the extra speedup.
User avatar
Ajedrecista
Posts: 2097
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Fastest perft

Post by Ajedrecista »

Hello again:
Pablo Vazquez wrote:Hi Jesús,

IIRC, Mingw has problems with the lld format: it thinks the corresponding argument is going to be 32 bit long. So you have to use I64d instead, just replace the 2 occurrences in the code (lines 1393 and 1481).

Regarding the size, you have to strip the executable with the -s option. Regarding speed, try compiling with different levels of optimizations: -O1 or -O2 or -O3, and see which one is the fastest.

Regards
It works! Thanks for the quick answer. I only change these two lines of code and it runs perfect. I suspected that could be the length of the argument, but since I have no clue about programming, I could not replace it properly. Muchas gracias Pablo. Finally I compiled in the following way:

Code: Select all

gcc -o qperft -O3 -s qperft.c
The size optimization (-s) is more than noticeable: using -O1 and -O2, qperft.exe size was 27 or 28 KB, while using -O3, qperft.exe size is 34 KB (-O3 is slightly faster in my computer). It is now faster than the official version in my computer (official version = perft.exe; own compiled version = qperft.exe):

Code: Select all

Perft(8) of initial position without splitting, using 256 MB of hash.
---------------------------------------------------------------------
perft.exe

Perft(1) = 20 (0.000 sec)
Perft(2) = 400 (0.000 sec)
Perft(3) = 8902 (0.000 sec)
Perft(4) = 197281 (0.000 sec)
Perft(5) = 4865609 (0.063 sec)
Perft(6) = 119060324 (0.875 sec)
Perft(7) = 3195901860 (10.734 sec)
Perft(8) = 84998978956 (147.000 sec)
----------------------------------------------------------------------
qperft.exe

Perft(1) = 20 (0.000 sec)
Perft(2) = 400 (0.000 sec)
Perft(3) = 8902 (0.000 sec)
Perft(4) = 197281 (0.015 sec)
Perft(5) = 4865609 (0.047 sec)
Perft(6) = 119060324 (0.734 sec)
Perft(7) = 3195901860 (9.157 sec)
Perft(8) = 84998978956 (130.609 sec)
For comparison (in my computer), JetChess takes (more less): 48 ms for Perft(5), 710 ms for Perft(6), 8.5 seconds for Perft(7) and 2 minutes for Perft(8) using the same hash. qperft (also perft) can compete now with JetChess (in my computer)! Great news. Maybe 9.157 seconds was a bit lucky (it usually takes 9.9 seconds), but anyway, when running Perft(7), qperft is consistently faster than perft in around 1 or 1.5 seconds.

Also thanks for Muller for explaning it in easy words (I finally took I64d).

Regards from Spain.

Ajedrecista.