Stockfish-1.7 use of multiple CPUs

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

royb
Posts: 536
Joined: Thu Mar 09, 2006 12:53 am

Stockfish-1.7 use of multiple CPUs

Post by royb »

First off, a huge thank you to the Stockfish development team!! The progress made by the team in terms of strength improvement is fantastic!

I'm not saying the comments below suggest a bug; rather this is something that just makes me wonder what is really going on ...

When I use 1.7 or 1.7.1 and configure the engine to use 2 cores (max that I have on my laptop), I notice that CPU utilization is not consistent as the engine analyzes away in the beginning. At first, only 1 CPU is utilized 100% and slowly the second CPU is utilized for a bit and then again goes idle. Overall, I see 200% CPU utilization for perhaps half of the time the engine analyzes if I let it run for 60 seconds or so. After 60 seconds, the utilization stays pretty constant near 200%.

Might this be due to the way splitting is done? I'm using the defaults for all parameters except explicitly setting the number of threads to 2, and Hash set to 1024 MB (I have 8 GB of RAM). I'm running Ubuntu 9.10 64-bit and the engine runs under SCID 4.2.2.

Stockfish-1.6 seemed to do this also, but CPU#2 was busier in the early stages of analysis (just from memory -- an overall "feel").

Comments from the developers as to what causes this variation in CPU utilization are greatly appreciated.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Stockfish-1.7 use of multiple CPUs

Post by mcostalba »

royb wrote: Comments from the developers as to what causes this variation in CPU utilization are greatly appreciated.
Hi Roy,

thanks for reporting, unfortunatly I am not able to reproduce this behaviour, I have a DUAL also: Intel Core 2 Duo T520 and I see 100% (or near 100%) both in Windows Vista and in Linux Mandriva.

So I am sorry but I don't think I can be helpful to you on this point :-(
royb
Posts: 536
Joined: Thu Mar 09, 2006 12:53 am

Re: Stockfish-1.7 use of multiple CPUs

Post by royb »

Another odd thing is that with 2 cores used, kn/sec is lower (roughly 1300 knps for one core, about 850 knps for 2 cores). At least that's true for the 45 seconds of calculation on this position:

2r3k1/5ppp/p1B1b1n1/Np4q1/1P1Q2P1/P2R3P/5P2/6K1 b - - 0 1

(taken from another post here at CCC).

Even after 90 seconds, knps for the 2 core analysis is only 1600 knps while the single core analysis shows 1400 knps (all with 1 GB of hash on my 8 GB system).
LarsA

Re: Stockfish-1.7 use of multiple CPUs

Post by LarsA »

royb wrote:First off, a huge thank you to the Stockfish development team!! The progress made by the team in terms of strength improvement is fantastic!

I'm not saying the comments below suggest a bug; rather this is something that just makes me wonder what is really going on ...

When I use 1.7 or 1.7.1 and configure the engine to use 2 cores (max that I have on my laptop), I notice that CPU utilization is not consistent as the engine analyzes away in the beginning. At first, only 1 CPU is utilized 100% and slowly the second CPU is utilized for a bit and then again goes idle. Overall, I see 200% CPU utilization for perhaps half of the time the engine analyzes if I let it run for 60 seconds or so. After 60 seconds, the utilization stays pretty constant near 200%.

Might this be due to the way splitting is done? I'm using the defaults for all parameters except explicitly setting the number of threads to 2, and Hash set to 1024 MB (I have 8 GB of RAM). I'm running Ubuntu 9.10 64-bit and the engine runs under SCID 4.2.2.

Stockfish-1.6 seemed to do this also, but CPU#2 was busier in the early stages of analysis (just from memory -- an overall "feel").

Comments from the developers as to what causes this variation in CPU utilization are greatly appreciated.
Just downloaded Stockfish 1.7.1 and I'm extremely impressed with the engine!
I'm running the Linux 64-bit build on an Ubuntu 9.10 64-bit distro (core2duo cpu).
Default engine settings with 2 Threads and 512 MB Hash.

Upon starting the engine, I immediately noticed the symptoms described by Roy Brunjes. SF somehow starts off utilizing less than 50% of the cores, but gets going after a while (~1 minute).

I tried compiling it myself with the default Makefile with similar results.

Could this be a problem related to the Ubuntu distro? Sounds a bit strange to me, but who knows...


Don't know if it helps, but heres some analysis from Anand-Topalov game 4:

[d] 3r1rk1/q4pp1/n1bNp2p/p7/pn2P1N1/6P1/1P1Q1PBP/2RR2K1 w - - 1 2


2 Threads:
info depth 1 score cp 92 time 491 nodes 2115 nps 4307 pv g4e5
info depth 2 score cp 56 time 492 nodes 2778 nps 5646 pv d2c3
info depth 3 score cp 40 time 493 nodes 3334 nps 6762 pv d2c3
info depth 4 score cp 101 time 498 nodes 9423 nps 18921 pv g4e5
info depth 5 score cp 92 time 500 nodes 11343 nps 22686 pv g4e5
info depth 6 score cp 88 time 505 nodes 18832 nps 37291 pv g4e5
info depth 7 score cp 88 time 628 nodes 30771 nps 48998 pv g4e5
info depth 8 score cp 88 time 632 nodes 35895 nps 56795 pv g4e5
info depth 9 score cp 32 time 706 nodes 146668 nps 207745 pv g4e5
info depth 10 score cp 56 time 1296 nodes 281838 nps 217467 pv g4e5
info depth 11 score cp 40 time 1479 nodes 339201 nps 229344 pv g4e5
info depth 12 score cp 40 time 2050 nodes 502029 nps 244892 pv g4e5
info depth 13 score cp 56 time 2152 nodes 648008 nps 301118 pv g4e5
info depth 14 score cp 80 time 3687 nodes 946030 nps 256585 pv g4e5
info depth 15 score cp 40 time 4523 nodes 2056524 nps 454681 pv g4e5
info depth 16 score cp 44 time 5567 nodes 2785200 nps 500305 pv g4e5
info depth 17 score cp 56 time 11816 nodes 5783377 nps 489453 pv g4e5
info depth 18 score cp 36 time 28074 nodes 24147823 nps 860148 pv g4e5
info depth 19 score cp 153 time 58330 nodes 76233532 nps 1306935 pv g4h6
info depth 20 score cp 165 time 66213 nodes 88369861 nps 1334630 pv g4h6
info depth 21 score cp 165 time 78182 nodes 105314931 nps 1347048 pv g4h6

1 Thread:
info depth 1 score cp 92 time 497 nodes 2115 nps 4255 pv g4e5
info depth 2 score cp 56 time 497 nodes 2778 nps 5589 pv d2c3
info depth 3 score cp 40 time 498 nodes 3334 nps 6694 pv d2c3
info depth 4 score cp 101 time 504 nodes 9423 nps 18696 pv g4e5
info depth 5 score cp 92 time 506 nodes 11343 nps 22416 pv g4e5
info depth 6 score cp 88 time 513 nodes 17795 nps 34688 pv g4e5
info depth 7 score cp 88 time 519 nodes 23911 nps 46071 pv g4e5
info depth 8 score cp 56 time 572 nodes 74308 nps 129909 pv g4e5
info depth 9 score cp 76 time 650 nodes 148630 nps 228661 pv g4e5
info depth 10 score cp 72 time 743 nodes 237669 nps 319877 pv g4e5
info depth 11 score cp 64 time 1000 nodes 490874 nps 490874 pv g4e5
info depth 12 score cp 72 time 1151 nodes 645875 nps 561142 pv g4e5
info depth 13 score cp 76 time 1390 nodes 880218 nps 633250 pv g4e5
info depth 14 score cp 52 time 1996 nodes 1493028 nps 748010 pv g4e5
info depth 15 score cp 44 time 5761 nodes 5255262 nps 912213 pv g4e5
info depth 16 score cp 24 time 19559 nodes 19062351 nps 974607 pv g4e5
info depth 17 score cp 129 time 42931 nodes 42520796 nps 990445 pv g4h6
info depth 18 score cp 165 time 115198 nodes 116311420 nps 1009665 pv g4h6
info depth 19 score cp 157 time 166092 nodes 167390309 nps 1007816 pv g4h6
info depth 20 score cp 169 time 226736 nodes 227279096 nps 1002395 pv g4h6
info depth 21 score cp 177 time 258223 nodes 259183323 nps 1003718 pv g4h6
UncombedCoconut
Posts: 319
Joined: Fri Dec 18, 2009 11:40 am
Location: Naperville, IL

Re: Stockfish-1.7 use of multiple CPUs

Post by UncombedCoconut »

LarsA wrote:Upon starting the engine, I immediately noticed the symptoms described by Roy Brunjes. SF somehow starts off utilizing less than 50% of the cores, but gets going after a while (~1 minute).

I tried compiling it myself with the default Makefile with similar results.

Could this be a problem related to the Ubuntu distro? Sounds a bit strange to me, but who knows...
I'm not seeing this effect under Arch Linux. Both cores are always busy, and 2T/1T scaling never goes so far under 100%.
I wonder whether CPU frequency scaling could be involved. I bring it up because it just broke on my laptop and our relative NPS makes it look like you're using the "ondemand" governor.
My results, formatted like yours:

2 Threads:
info depth 1 score cp 92 time 116 nodes 2115 nps 18232 pv g4e5
info depth 2 score cp 56 time 119 nodes 2778 nps 23344 pv d2c3
info depth 3 score cp 40 time 121 nodes 3334 nps 27553 pv d2c3
info depth 4 score cp 101 time 142 nodes 9423 nps 66359 pv g4e5
info depth 5 score cp 92 time 148 nodes 11343 nps 76641 pv g4e5
info depth 6 score cp 88 time 165 nodes 18826 nps 114096 pv g4e5
info depth 7 score cp 88 time 194 nodes 30699 nps 158242 pv g4e5
info depth 8 score cp 88 time 207 nodes 35838 nps 173130 pv g4e5
info depth 9 score cp 88 time 289 nodes 76385 nps 264307 pv g4e5
info depth 10 score cp 60 time 1132 nodes 275069 nps 242993 pv g4e5
info depth 11 score cp 44 time 1865 nodes 661106 nps 354480 pv g4e5
info depth 12 score cp 48 time 1958 nodes 691265 nps 353046 pv g4e5
info depth 13 score cp 48 time 2527 nodes 969986 nps 383848 pv g4e5
info depth 14 score cp 48 time 3043 nodes 1238046 nps 406850 pv g4e5
info depth 15 score cp 56 time 5504 nodes 2600461 nps 472467 pv g4e5
info depth 16 score cp 56 time 12031 nodes 5521645 nps 458951 pv g4e5
info depth 17 score cp 48 time 17589 nodes 8308540 nps 472371 pv g4e5
info depth 18 score cp 52 time 20089 nodes 9252022 nps 460551 pv g4e5
info depth 19 score cp 157 time 289487 nodes 140514689 nps 485392 pv g4h6
info depth 20 score cp 173 time 361620 nodes 178148238 nps 492639 pv g4h6
info depth 21 score cp 173 time 440999 nodes 220985947 nps 501103 pv g4h6

1 Thread:
info depth 1 score cp 92 time 115 nodes 2115 nps 18391 pv g4e5
info depth 2 score cp 56 time 118 nodes 2778 nps 23542 pv d2c3
info depth 3 score cp 40 time 120 nodes 3334 nps 27783 pv d2c3
info depth 4 score cp 101 time 140 nodes 9423 nps 67307 pv g4e5
info depth 5 score cp 92 time 147 nodes 11343 nps 77163 pv g4e5
info depth 6 score cp 88 time 170 nodes 17795 nps 104676 pv g4e5
info depth 7 score cp 88 time 194 nodes 23911 nps 123252 pv g4e5
info depth 8 score cp 56 time 373 nodes 74308 nps 199217 pv g4e5
info depth 9 score cp 76 time 645 nodes 148630 nps 230434 pv g4e5
info depth 10 score cp 72 time 967 nodes 237669 nps 245779 pv g4e5
info depth 11 score cp 64 time 1873 nodes 490874 nps 262079 pv g4e5
info depth 12 score cp 72 time 2408 nodes 645883 nps 268223 pv g4e5
info depth 13 score cp 76 time 3242 nodes 880313 nps 271533 pv g4e5
info depth 14 score cp 52 time 5392 nodes 1493222 nps 276932 pv g4e5
info depth 15 score cp 48 time 9600 nodes 2653534 nps 276409 pv g4e5
info depth 16 score cp 48 time 21987 nodes 6086610 nps 276827 pv g4e5
info depth 17 score cp 165 time 130787 nodes 37189139 nps 284348 pv g4h6
info depth 18 score cp 165 time 161618 nodes 45914755 nps 284094 pv g4h6
info depth 19 score cp 185 time 235685 nodes 67109564 nps 284742 pv g4h6
info depth 20 score cp 185 time 317653 nodes 91195409 nps 287091 pv g4h6
info depth 21 score cp 202 time 705294 nodes 201857341 nps 286203 pv g4h6
royb
Posts: 536
Joined: Thu Mar 09, 2006 12:53 am

Re: Stockfish-1.7 use of multiple CPUs

Post by royb »

UncombedCoconut wrote:
LarsA wrote:Upon starting the engine, I immediately noticed the symptoms described by Roy Brunjes. SF somehow starts off utilizing less than 50% of the cores, but gets going after a while (~1 minute).

I tried compiling it myself with the default Makefile with similar results.

Could this be a problem related to the Ubuntu distro? Sounds a bit strange to me, but who knows...
I'm not seeing this effect under Arch Linux. Both cores are always busy, and 2T/1T scaling never goes so far under 100%.
I wonder whether CPU frequency scaling could be involved. I bring it up because it just broke on my laptop and our relative NPS makes it look like you're using the "ondemand" governor.
My results, formatted like yours:

2 Threads:
info depth 1 score cp 92 time 116 nodes 2115 nps 18232 pv g4e5
info depth 2 score cp 56 time 119 nodes 2778 nps 23344 pv d2c3
info depth 3 score cp 40 time 121 nodes 3334 nps 27553 pv d2c3
info depth 4 score cp 101 time 142 nodes 9423 nps 66359 pv g4e5
info depth 5 score cp 92 time 148 nodes 11343 nps 76641 pv g4e5
info depth 6 score cp 88 time 165 nodes 18826 nps 114096 pv g4e5
info depth 7 score cp 88 time 194 nodes 30699 nps 158242 pv g4e5
info depth 8 score cp 88 time 207 nodes 35838 nps 173130 pv g4e5
info depth 9 score cp 88 time 289 nodes 76385 nps 264307 pv g4e5
info depth 10 score cp 60 time 1132 nodes 275069 nps 242993 pv g4e5
info depth 11 score cp 44 time 1865 nodes 661106 nps 354480 pv g4e5
info depth 12 score cp 48 time 1958 nodes 691265 nps 353046 pv g4e5
info depth 13 score cp 48 time 2527 nodes 969986 nps 383848 pv g4e5
info depth 14 score cp 48 time 3043 nodes 1238046 nps 406850 pv g4e5
info depth 15 score cp 56 time 5504 nodes 2600461 nps 472467 pv g4e5
info depth 16 score cp 56 time 12031 nodes 5521645 nps 458951 pv g4e5
info depth 17 score cp 48 time 17589 nodes 8308540 nps 472371 pv g4e5
info depth 18 score cp 52 time 20089 nodes 9252022 nps 460551 pv g4e5
info depth 19 score cp 157 time 289487 nodes 140514689 nps 485392 pv g4h6
info depth 20 score cp 173 time 361620 nodes 178148238 nps 492639 pv g4h6
info depth 21 score cp 173 time 440999 nodes 220985947 nps 501103 pv g4h6

1 Thread:
info depth 1 score cp 92 time 115 nodes 2115 nps 18391 pv g4e5
info depth 2 score cp 56 time 118 nodes 2778 nps 23542 pv d2c3
info depth 3 score cp 40 time 120 nodes 3334 nps 27783 pv d2c3
info depth 4 score cp 101 time 140 nodes 9423 nps 67307 pv g4e5
info depth 5 score cp 92 time 147 nodes 11343 nps 77163 pv g4e5
info depth 6 score cp 88 time 170 nodes 17795 nps 104676 pv g4e5
info depth 7 score cp 88 time 194 nodes 23911 nps 123252 pv g4e5
info depth 8 score cp 56 time 373 nodes 74308 nps 199217 pv g4e5
info depth 9 score cp 76 time 645 nodes 148630 nps 230434 pv g4e5
info depth 10 score cp 72 time 967 nodes 237669 nps 245779 pv g4e5
info depth 11 score cp 64 time 1873 nodes 490874 nps 262079 pv g4e5
info depth 12 score cp 72 time 2408 nodes 645883 nps 268223 pv g4e5
info depth 13 score cp 76 time 3242 nodes 880313 nps 271533 pv g4e5
info depth 14 score cp 52 time 5392 nodes 1493222 nps 276932 pv g4e5
info depth 15 score cp 48 time 9600 nodes 2653534 nps 276409 pv g4e5
info depth 16 score cp 48 time 21987 nodes 6086610 nps 276827 pv g4e5
info depth 17 score cp 165 time 130787 nodes 37189139 nps 284348 pv g4h6
info depth 18 score cp 165 time 161618 nodes 45914755 nps 284094 pv g4h6
info depth 19 score cp 185 time 235685 nodes 67109564 nps 284742 pv g4h6
info depth 20 score cp 185 time 317653 nodes 91195409 nps 287091 pv g4h6
info depth 21 score cp 202 time 705294 nodes 201857341 nps 286203 pv g4h6
Actually, I run with the "ondemand" governor active always. However, eve when I explicitly set the cpufrequency to "Performance", the performance numbers as seen via Stockfish are unchanged.

Roy
LarsA

Re: Stockfish-1.7 use of multiple CPUs

Post by LarsA »

UncombedCoconut wrote:
LarsA wrote:Upon starting the engine, I immediately noticed the symptoms described by Roy Brunjes. SF somehow starts off utilizing less than 50% of the cores, but gets going after a while (~1 minute).

I tried compiling it myself with the default Makefile with similar results.

Could this be a problem related to the Ubuntu distro? Sounds a bit strange to me, but who knows...
I'm not seeing this effect under Arch Linux. Both cores are always busy, and 2T/1T scaling never goes so far under 100%.
I wonder whether CPU frequency scaling could be involved. I bring it up because it just broke on my laptop and our relative NPS makes it look like you're using the "ondemand" governor.
My results, formatted like yours:

2 Threads:
info depth 1 score cp 92 time 116 nodes 2115 nps 18232 pv g4e5
info depth 2 score cp 56 time 119 nodes 2778 nps 23344 pv d2c3
info depth 3 score cp 40 time 121 nodes 3334 nps 27553 pv d2c3
info depth 4 score cp 101 time 142 nodes 9423 nps 66359 pv g4e5
info depth 5 score cp 92 time 148 nodes 11343 nps 76641 pv g4e5
info depth 6 score cp 88 time 165 nodes 18826 nps 114096 pv g4e5
info depth 7 score cp 88 time 194 nodes 30699 nps 158242 pv g4e5
info depth 8 score cp 88 time 207 nodes 35838 nps 173130 pv g4e5
info depth 9 score cp 88 time 289 nodes 76385 nps 264307 pv g4e5
info depth 10 score cp 60 time 1132 nodes 275069 nps 242993 pv g4e5
info depth 11 score cp 44 time 1865 nodes 661106 nps 354480 pv g4e5
info depth 12 score cp 48 time 1958 nodes 691265 nps 353046 pv g4e5
info depth 13 score cp 48 time 2527 nodes 969986 nps 383848 pv g4e5
info depth 14 score cp 48 time 3043 nodes 1238046 nps 406850 pv g4e5
info depth 15 score cp 56 time 5504 nodes 2600461 nps 472467 pv g4e5
info depth 16 score cp 56 time 12031 nodes 5521645 nps 458951 pv g4e5
info depth 17 score cp 48 time 17589 nodes 8308540 nps 472371 pv g4e5
info depth 18 score cp 52 time 20089 nodes 9252022 nps 460551 pv g4e5
info depth 19 score cp 157 time 289487 nodes 140514689 nps 485392 pv g4h6
info depth 20 score cp 173 time 361620 nodes 178148238 nps 492639 pv g4h6
info depth 21 score cp 173 time 440999 nodes 220985947 nps 501103 pv g4h6

1 Thread:
info depth 1 score cp 92 time 115 nodes 2115 nps 18391 pv g4e5
info depth 2 score cp 56 time 118 nodes 2778 nps 23542 pv d2c3
info depth 3 score cp 40 time 120 nodes 3334 nps 27783 pv d2c3
info depth 4 score cp 101 time 140 nodes 9423 nps 67307 pv g4e5
info depth 5 score cp 92 time 147 nodes 11343 nps 77163 pv g4e5
info depth 6 score cp 88 time 170 nodes 17795 nps 104676 pv g4e5
info depth 7 score cp 88 time 194 nodes 23911 nps 123252 pv g4e5
info depth 8 score cp 56 time 373 nodes 74308 nps 199217 pv g4e5
info depth 9 score cp 76 time 645 nodes 148630 nps 230434 pv g4e5
info depth 10 score cp 72 time 967 nodes 237669 nps 245779 pv g4e5
info depth 11 score cp 64 time 1873 nodes 490874 nps 262079 pv g4e5
info depth 12 score cp 72 time 2408 nodes 645883 nps 268223 pv g4e5
info depth 13 score cp 76 time 3242 nodes 880313 nps 271533 pv g4e5
info depth 14 score cp 52 time 5392 nodes 1493222 nps 276932 pv g4e5
info depth 15 score cp 48 time 9600 nodes 2653534 nps 276409 pv g4e5
info depth 16 score cp 48 time 21987 nodes 6086610 nps 276827 pv g4e5
info depth 17 score cp 165 time 130787 nodes 37189139 nps 284348 pv g4h6
info depth 18 score cp 165 time 161618 nodes 45914755 nps 284094 pv g4h6
info depth 19 score cp 185 time 235685 nodes 67109564 nps 284742 pv g4h6
info depth 20 score cp 185 time 317653 nodes 91195409 nps 287091 pv g4h6
info depth 21 score cp 202 time 705294 nodes 201857341 nps 286203 pv g4h6
Thank you for the suggestions!
I just tried changing the frequency scaling to both performance and "always max", but sadly it didn't affect sf's performance.

It's been a few years since I worked with gprof (or C++ for that matter), but I think I'll have to do some profiling to see what's really going on here. Something pthread-related is my main theory for now. Off course, profiling could give me nothing :-)
LarsA

Re: Stockfish-1.7 use of multiple CPUs

Post by LarsA »

Hi!

After upgrading to Ubuntu 10.4 the problems disappeared.
royb
Posts: 536
Joined: Thu Mar 09, 2006 12:53 am

Re: Stockfish-1.7 use of multiple CPUs

Post by royb »

LarsA wrote:Hi!

After upgrading to Ubuntu 10.4 the problems disappeared.
Lars,

Same for me. I upgraded one of my systems to 10.04 and the problem disappeared. Apparently there was something broken in 9.10 and I'm guessing it had to do with cpufreq ...

Roy