gperft

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Macintosh
Posts: 13
Joined: Wed Jun 26, 2013 8:23 pm
Location: Jena, Germany

Re: gperft

Post by Macintosh »

Hi Paul,

in case you need some additional timings for an i7-3770S @ 3.1GHz (nominally, cause intel's turbo-boost messes with the clock rate, depending on power consumption, temperature and number of active CPU cores):

Code: Select all

All for the  starting position and using Windows.
gperft 1 thread, no hash:
depth 7: 9.077 seconds
depth 8: 251.750 seconds

gperft 4 threads (hyper-threading enabled; 4 real cores), 8GB hash:
depth 7: 0.592 seconds
depth 8: 6.793 seconds
depth 9: 74.979 seconds

gperft 8 threads (hyper-threading enabled, 4 real cores, 4 hyper-threads), 8GB hash:
depth 7: 0.407 seconds
depth 8: 4.467 seconds
depth 9: 47.527 seconds
I think there are already some threads regarding hyper-threading on this forum, so I spare you the details of running 1 through 8 threads. However, if you would like those numbers, I would gladly run them for you! :wink:

Greeting from Jena


Marcus
Macintosh
Posts: 13
Joined: Wed Jun 26, 2013 8:23 pm
Location: Jena, Germany

Re: gperft

Post by Macintosh »

Hi all,

I implemented a dynamic split depth, in order to cope with those "unusual" positions like "kings only", ...
Doing so resulted in a significant speed improvement for those positions when XORing the depth into the hash key. I think I did not overlook something, as the "normal" (starting-/middlegame-) positions do NOT suffer a penalty by doing so. (With respect to the usual uncertainties appearing in such kind of speed measurements).

Thanks again, Paul, for sharing this trick!

Greetings from Jena


Marcus
ibid
Posts: 89
Joined: Mon Jun 13, 2011 12:09 pm

Re: gperft

Post by ibid »

Macintosh wrote:Hi Paul,

in case you need some additional timings for an i7-3770S @ 3.1GHz (nominally, cause intel's turbo-boost messes with the clock rate, depending on power consumption, temperature and number of active CPU cores):

Code: Select all

All for the  starting position and using Windows.
gperft 1 thread, no hash:
depth 7: 9.077 seconds
depth 8: 251.750 seconds

gperft 4 threads (hyper-threading enabled; 4 real cores), 8GB hash:
depth 7: 0.592 seconds
depth 8: 6.793 seconds
depth 9: 74.979 seconds

gperft 8 threads (hyper-threading enabled, 4 real cores, 4 hyper-threads), 8GB hash:
depth 7: 0.407 seconds
depth 8: 4.467 seconds
depth 9: 47.527 seconds
I think there are already some threads regarding hyper-threading on this forum, so I spare you the details of running 1 through 8 threads. However, if you would like those numbers, I would gladly run them for you! :wink:
Thank you! You got the interesting ones -- 1, 4 and 8 threads. I am rather amazed how much hyper-threading speeds gperft up.

Your times are also very close to what I get under windows on my machine (Phenom 1090T @ 3.6 GHz). For the single threaded run I get 9.799 and 260.329 seconds -- a bit slower but I imagine your turbo is kicking in with only a single thread going. With 4 threads I get 0.552, 6.709, 75.300 which are all very close (I did use only 6GB, but that would make very little difference on such short runs). The result using all 6 cores is pretty close to your hyper-threading result too: 0.391, 4.524, 50.885. Gives me a pretty good idea where my cpu ranks among the intel cpus (at least as far as gperft is concerned, anyhow).

-paul
ibid
Posts: 89
Joined: Mon Jun 13, 2011 12:09 pm

gperft 1.0.1

Post by ibid »

A little bit faster (4% for windows, 6-9% for linux). Plus, removed a bug which effectively limited hash tables to only 128 GB. :)

https://www.dropbox.com/s/bxoogxwkbncxd68/gperft.zip

-paul
ibid
Posts: 89
Joined: Mon Jun 13, 2011 12:09 pm

gperft 1.0.2

Post by ibid »

New version. A few small changes, mostly trying to get things more cache-friendly. Good for perhaps 1%. And a bigger speedup from a change in compiler. The linux build is about 7-8% faster with gcc 4.8.1 compared to my old 4.4.5 (mostly because I added -flto and PGO). The windows build gets a heftier 10-14% with MinGW (4.8.1 again) compared to Visual Studio 2010 (Release build + PGO).

Despite the fact they are both now built with basically the same compiler, the linux build is still 7-10% faster. Is this typical for a MinGW build? It seems odd to me, since the basic perft is, except for a handful of printf's, pure computation. And yet is 7% slower -- seems it should be very similar for linux/windows. The multithreaded search adds spinlocks and such, so I can see why there might be some difference in that case.

Anyhow, I think I have removed any dependencies on oddball dll's from the windows build and the linux one works with my somewhat elderly debian setup -- but please let me know if there are issues...

https://www.dropbox.com/s/bxoogxwkbncxd68/gperft.zip

-paul
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: gperft 1.0.2

Post by abulmo »

ibid wrote: Despite the fact they are both now built with basically the same compiler, the linux build is still 7-10% faster. Is this typical for a MinGW build? It seems odd to me, since the basic perft is, except for a handful of printf's, pure computation. And yet is 7% slower -- seems it should be very similar for linux/windows. The multithreaded search adds spinlocks and such, so I can see why there might be some difference in that case.
Windows & Linux use a different ABI, peculiarly with function parameter passed on register or on the stack. I wonder if this the cause of the slowdown observed under Windows.
Richard
Macintosh
Posts: 13
Joined: Wed Jun 26, 2013 8:23 pm
Location: Jena, Germany

Re: gperft 1.0.2

Post by Macintosh »

Hi Paul,

I wanted to perform some speed tests using turbo-boost on/off and hyperthreading in use/or not. I took some measurements on my program, also with mutliple (hyper-)threads but no hash.

Unfortunately, I did not get gperft to run several threads without using hash tables. I tried "-memory 0" and "-hash 0 0 3" (for example), but gperft still come up with 1MB upper hash.

Code: Select all

gperft.exe -hash 0 0 1 -threads 2 "" 8
gperft 1.0.2 (windows)
Low hash table not in use.
High hash table ready (1 MB, 2-5 ply).
Using 2 threads (split after 3 ply).
Depth is 8.
...
Am I missing something, or does gpeft use at least some hash as soon as the multiple threads version is used?

Regards


Marcus
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: gperft 1.0.2

Post by Sven »

Macintosh wrote:Am I missing something, or does gpeft use at least some hash as soon as the multiple threads version is used?
According to the README.txt gperft has two modes: single-threaded with no hash table and multi-threaded with hash tables. I tested the Windows version and found the same behaviour that you describe: with the -threads option there is at least a minimum of 1 MB "upper hash" in use even with -hash 0 0 1.

Sven
ibid
Posts: 89
Joined: Mon Jun 13, 2011 12:09 pm

Re: gperft 1.0.2

Post by ibid »

Macintosh wrote:Am I missing something, or does gpeft use at least some hash as soon as the multiple threads version is used?
Hello Marcus,

No, you aren't missing anything. gperft was originally intended for reasonably fast deep perfts (so multi-threaded and hash tables). The no-hash-table search was added as something of an afterthought so people could do comparisons and such. Didn't see a reason for threads so I never added it.

Apparently people do have a use for it, and it is trivial to add as an option to the existing threaded code, so I shall do so. I'll have time later tonight and will make new binaries -- I'll post something here when it is ready.

You'll just use the -threads option (without any -memory or -hash) to access it. You can also use -split to vary the depth at which it assigns work to threads.

-paul
ibid
Posts: 89
Joined: Mon Jun 13, 2011 12:09 pm

gperft 1.0.3

Post by ibid »

Updated version: you can now do multi-threaded perfts without hash tables.
For example:

Code: Select all

$ gperft -threads 4 "8/2p5/3p4/KP5r/1R3p1k/8/4P1P1/8 w - -" 9
gperft 1.0.3 (windows)
Using simple perft function.
Using 4 threads (split after 3 ply).
Depth is 9.

--------
--p-----
---p----
KP-----r  w - -
-R---p-k
--------
----P-P-
--------

Rxf4+      885,894,814
Rb1      5,688,812,201
Rb2      3,576,462,883
Rb3      4,258,569,600
Ra4      3,399,463,489
Rc4      4,966,212,754
Rd4      4,521,736,449
Re4      3,920,504,030
e3       3,087,382,842
g3+      1,220,138,764
e4       2,321,736,882
g4       3,628,220,761
Ka4      4,031,452,737
Ka6      4,580,161,609
TOTAL   50,086,749,815
48.906 seconds
The link is the same:
https://www.dropbox.com/s/bxoogxwkbncxd68/gperft.zip
Please let me know if there are any issues...

-paul