Fastest perft

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: Fastest perft

Post by abulmo »

hgm wrote: I have posted my latest version, which does not have the flag=0;, and uses separate routines (leaf_perft and move_count in stead of perft and move_gen) for the final ply (making it 25% faster). It might not optimally use the hash table for low-depth perfts, as it uses a sectioning of it optimized to calculate perft(13).

http://hgm.nubati.net/q2perft.c

I am currently on Linux; when I get back to Windows I will make a (32-bit) Windows binary. I always use optimizationlevel -O2, btw (gcc -O2 -s q2perft.c -o qperft.exe)
On my 64-bit linux, this version crashes immediately. I guess line 1435 should be :
Hash = (union _bucket *) (((unsigned long)Hash) + 63 & ~63);
to work correctly on OSes with 64 bit long pointers.

Once corrected, this version is indead faster than JetChess, and does perft(8) in 38s (compiled with icc -O2 -xHost q2perft.c -o q2perft) on my machine.
Richard
tvrzsky
Posts: 128
Joined: Sat Sep 23, 2006 7:10 pm
Location: Prague

Re: Fastest perft

Post by tvrzsky »

Hola Pablo!

I am sorry to say that frc-perft is MUCH slower than JetChess in my computer: while Perft(7) of starting position takes less than 9 seconds with JetChess, it took 80.89 seconds (!) with frc-perft. I do not know if I did something wrong. All the counters I have seen are good, but I prefer JetChess right now.

Thanks for explaining how to compile. Surely, I will be unable of doing it correctly (I will try it...), so probably I will have to wait Muller compiles (thank you Muller!). Pablo, you were very accurate discovering the issue of flag=0, congrats!

Regards from Spain.

Ajedrecista.
I think that this ratio is characteristic for running perft with/without hash. At least my engine does perft(7) in 82 sec without and in 15 sec with hash (Windows 32bit, Core2Duo@1.6GHz).
Filip
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Fastest perft

Post by hgm »

I did some comparison of the hash-less perfts onmy 1.3GHz Celeron-M, by having them calculate perft(7), with the following results:

Code: Select all

 75.45 sec   qperft (Linux)
 77.95 sec   i-perft
 82.27 sec   qperft (*)
 94.56 sec   qperft (Windows)
110.81 sec   frcperft-win32 (downloaded binary)
180.79 sec   frcperft-win32 (compiled from source using makefile)
The qperft version marked by (*) was compiled with the same setting as was used to compile frcperft in the accompanying makefile:

gcc -O3 -fomit-frame-pointer -fstrict-aliasing -fno-exceptions -s q2perft.c

Still not as good as the Linux version, though.
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: Fastest perft

Post by abulmo »

Pablo Vazquez wrote:
If you want the fastest perft, frcperft is the one you are looking for: http://members.ziggo.nl/allard.siemelink/spark/. But it doesn't use a hash table.
Without hash, oliperft looks faster than frcperft, for perft 8 :
frcperft (no-hash): 3m36s
oliperft (no-hash): 2m45s

oliperft could be even faster by using special asm (or intrinsic) functions for its getLsb and bitcount functions like frcperft does. By doing so, I get :
oliperft(no-hash, bsf64 + popcount) : 2m07s.
Richard
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Fastest perft

Post by hgm »

How does that compare to qperft? Is frcperft-win64 really orders of magnitude faster than frcperft-win32? When I translate your time of 216 sec for perft(8) to perft(7), which should be some 30 times faster, it would be something like 7.5 sec. While I have 110 sec for frcperft-win32. Of course my machine is pretty slow, but not 15 times slower than yours...

What happens when you run frcperft-win32 or qperft without hash?

I really should start working on getting that extra factor 10 on qperft...
tvrzsky
Posts: 128
Joined: Sat Sep 23, 2006 7:10 pm
Location: Prague

Re: Fastest perft

Post by tvrzsky »

hgm wrote: I have posted my latest version, which does not have the flag=0;, and uses separate routines (leaf_perft and move_count in stead of perft and move_gen) for the final ply (making it 25% faster). It might not optimally use the hash table for low-depth perfts, as it uses a sectioning of it optimized to calculate perft(13).

http://hgm.nubati.net/q2perft.c

I am currently on Linux; when I get back to Windows I will make a (32-bit) Windows binary. I always use optimizationlevel -O2, btw (gcc -O2 -s q2perft.c -o qperft.exe)
Unfortunately, my binaries crash on memory acces violation on line 1289 when running with hash at the stage perft(3). Both Visual Studio 2010 and gcc4.5.0 compiles (32bit exes in Windows 7 64bit).
tvrzsky
Posts: 128
Joined: Sat Sep 23, 2006 7:10 pm
Location: Prague

Re: Fastest perft

Post by tvrzsky »

hgm wrote:How does that compare to qperft? Is frcperft-win64 really orders of magnitude faster than frcperft-win32? When I translate your time of 216 sec for perft(8) to perft(7), which should be some 30 times faster, it would be something like 7.5 sec. While I have 110 sec for frcperft-win32. Of course my machine is pretty slow, but not 15 times slower than yours...

What happens when you run frcperft-win32 or qperft without hash?

I really should start working on getting that extra factor 10 on qperft...

Code: Select all

E:\work\chessprog\Engines\frcperft-1.0\bin>frcperft-win32.exe

frcperft 1.0, (C) 2008-2011 AJ Siemelink
single threaded, no hashing, mode=FAST, extract=BSF32, count=LOOP

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
interactive mode, type 'help'+Enter for help
% perft 7
perft 1             20       0.00s      1.$ mnps 1712.0 ticks/move
perft 2            400       0.00s      1.$ mnps  299.5 ticks/move
perft 3           8902       0.00s      1.$ mnps   49.4 ticks/move
perft 4         197281       0.01s     39.5 mnps   41.7 ticks/move
perft 5        4865609       0.11s     43.4 mnps   38.6 ticks/move
perft 6      119060324       2.74s     43.4 mnps   38.3 ticks/move
perft 7     3195901860      70.46s     45.4 mnps   36.7 ticks/move

Code: Select all

E:\work\chessprog\Engines\frcperft-1.0\bin>frcperft-win64.exe

frcperft 1.0, (C) 2008-2011 AJ Siemelink
single threaded, no hashing, mode=FAST, extract=BUILTIN, count=LOOP

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
interactive mode, type 'help'+Enter for help
% perft 7
perft 1             20       0.00s      1.$ mnps 3330.5 ticks/move
perft 2            400       0.00s      1.$ mnps  247.8 ticks/move
perft 3           8902       0.00s      1.$ mnps   22.2 ticks/move
perft 4         197281       0.00s      1.$ mnps   16.7 ticks/move
perft 5        4865609       0.04s    108.1 mnps   15.2 ticks/move
perft 6      119060324       1.09s    109.5 mnps   15.2 ticks/move
perft 7     3195901860      27.91s    114.5 mnps   14.5 ticks/move
kongsian
Posts: 46
Joined: Thu Jun 15, 2006 11:21 am

Re: Fastest perft

Post by kongsian »

hgm wrote: Well, it doesn't work for me. (Under Ubuntu 10.04.)

I used make, and I get a binary 'perft' in melee/src/perft. But when I run it, it just starts eating CPU, without printing anything. It does not even print the Usage when I run it without arguments. It just hangs, eating CPU...
Thanks for finding this out. The Debruijn numbers were wrong and have been replaced with working ones. These are used for bit scanning; they are used if the gcc intrinsic function cannot be used. Are you on 32-bit? Or maybe an old gcc version?

Anyway you can try this again. MeleeChess uses bitboard, so it sucks under 32-bit.

Kong Sian
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: Fastest perft

Post by abulmo »

hgm wrote:How does that compare to qperft?
Quick Perft by H.G. Muller
Perft mode: No hashing, bulk counting in horizon nodes

perft( 1)= 20 ( 0.000 sec)
perft( 2)= 400 ( 0.000 sec)
perft( 3)= 8902 ( 0.000 sec)
perft( 4)= 197281 ( 0.000 sec)
perft( 5)= 4865609 ( 0.020 sec)
perft( 6)= 119060324 ( 0.580 sec)
perft( 7)= 3195901860 (15.420 sec)
perft( 8)= 84998978956 (415.940 sec)

real 7m13.652s
user 7m11.965s
Is frcperft-win64 really orders of magnitude faster than frcperft-win32?
Almost 3 times (x2.75), even 4 times (x3.77) if you add sse4.2 (ie popcount) on the 64 bit versions :

here are some frcperft results, for perft 7 :
single threaded, no hashing, mode=FAST, extract=BSF32, count=LOOP
perft 7 3195901860 29.75s 107.4 mnps 31.7 ticks/move (ie lux32 binary)

single threaded, no hashing, mode=FAST, extract=BSF64, count=LOOP (ie lux64 binary):
perft 7 3195901860 10.81s 295.6 mnps 11.5 ticks/move

single threaded, no hashing, mode=FAST, extract=BSF64, count=POPCNT (recompiled)
perft 7 3195901860 7.88s 405.6 mnps 8.4 ticks/move

oliperft is even more impressive here :
oliperft 7
7 0 668 3195901860

and with bsf64 and popcount in assembly language :
7 0 563 3195901860

ie only 5.63 sec for perft(7) without hash.
When I translate your time of 216 sec for perft(8) to perft(7), which should be some 30 times faster, it would be something like 7.5 sec. While I have 110 sec for frcperft-win32. Of course my machine is pretty slow, but not 15 times slower than yours...
If your CPU does not support 64 bit nor the popcount instruction, it is possible.
I really should start working on getting that extra factor 10 on qperft...
I wonder what performance one could achieve if adding an hash table to oliperft or recompiling JetChess to use 64bit functionality and popcount ?
Richard
Pablo Vazquez
Posts: 154
Joined: Thu May 31, 2007 9:05 pm
Location: Madrid, Spain

Re: Fastest perft

Post by Pablo Vazquez »

abulmo wrote:
Pablo Vazquez wrote:
If you want the fastest perft, frcperft is the one you are looking for: http://members.ziggo.nl/allard.siemelink/spark/. But it doesn't use a hash table.
Without hash, oliperft looks faster than frcperft, for perft 8 :
frcperft (no-hash): 3m36s
oliperft (no-hash): 2m45s

oliperft could be even faster by using special asm (or intrinsic) functions for its getLsb and bitcount functions like frcperft does. By doing so, I get :
oliperft(no-hash, bsf64 + popcount) : 2m07s.
Here are my numbers:

PIV 3.6GHz Ubuntu 64-bit, no hashing:

Qperft

Code: Select all

Quick Perft by H.G. Muller
Perft mode: No hashing, bulk counting in horizon nodes

perft( 1)=           20 ( 0.000 sec)
perft( 2)=          400 ( 0.000 sec)
perft( 3)=         8902 ( 0.000 sec)
perft( 4)=       197281 ( 0.000 sec)
perft( 5)=      4865609 ( 0.070 sec)
perft( 6)=    119060324 ( 1.600 sec)
perft( 7)=   3195901860 (43.220 sec)
frcperft64

Code: Select all

% perft 7
perft 1             20       0.00s      inf mnps 1280.5 ticks/move
perft 2            400       0.00s      inf mnps   86.4 ticks/move
perft 3           8902       0.00s      inf mnps   37.1 ticks/move
perft 4         197281       0.01s     19.7 mnps   34.1 ticks/move
perft 5        4865609       0.04s    121.6 mnps   31.7 ticks/move
perft 6      119060324       1.13s    105.4 mnps   32.4 ticks/move
perft 7     3195901860      28.42s    112.5 mnps   30.2 ticks/move
frcperft32

Code: Select all

% perft 7
perft 1             20       0.00s      inf mnps 1626.5 ticks/move
perft 2            400       0.00s      inf mnps  187.3 ticks/move
perft 3           8902       0.00s      inf mnps   92.9 ticks/move
perft 4         197281       0.01s     19.7 mnps   87.8 ticks/move
perft 5        4865609       0.12s     40.5 mnps   90.0 ticks/move
perft 6      119060324       2.89s     41.2 mnps   82.7 ticks/move
perft 7     3195901860      77.51s     41.2 mnps   82.3 ticks/move
Oliperft is too slow in my computer: with hashing it takes 25 seconds to depth 7, only a bit faster than frcperft without hashing.[/code]