Crafty-23.3 very slow

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty-23.3 very slow

Post by bob »

zullil wrote:
bob wrote: Strange indeed. Is it possible to set up a remote login at some point? I could at least look at things as it runs and perhaps get an idea. Last time I ran on a Nehalem box it ran like the blazes. It might just need some tuning. One thing is for sure, LMR is way more aggressive (just compare the depths on your two 8-thread searches, the one running 1/2 normal speed went 3 plies deeper). It might be that smpsn needs tuning. If you want to run a few tests, try "smpsn=2000", smpsn=4000, and even smpsn=8000. Run a test for 60 secs, 8 threads, and run the same position 4 times. Then change the smpsn value and repeat. You can fine-tune it even better but usually changing by the above will at least point you in the right area to try...

I'll run a couple of these tests on my 8-core box to see if it is also sensitive to ths more than I realize...
Varying the smpsn parameter had essentially no effect.

I'm coming to the conclusion that this is a gcc-related issue. Have you tried compiling with gcc? I know you use icc.

Here are summary results for 23.2 and 23.3 respectively, each compiled with three versions of gcc.

Code: Select all

gcc version 4.2.1 (Apple Inc. build 5664)
Crafty v23.2 (8 cpus)
              time=1:02  mat=-1  n=820623467  fh=94%  nps=13.2M       

gcc version 4.4.4 (GCC) 
Crafty v23.2 (8 cpus)
              time=1:00  mat=-1  n=1244905755  fh=94%  nps=20.7M      
 
gcc version 4.5.0 (GCC) 
Crafty v23.2 (8 cpus)
              time=1:00  mat=-1  n=611061880  fh=94%  nps=10.1M       



gcc version 4.2.1 (Apple Inc. build 5664)
Crafty v23.3 (8 cpus)
              time=1:01  mat=-1  n=823531770  fh=87%  nps=13.5M       

gcc version 4.4.4 (GCC) 
Crafty v23.3 (8 cpus)
              time=1:00  mat=-1  n=484844694  fh=86%  nps=8.1M       

gcc version 4.5.0 (GCC) 
Crafty v23.3 (8 cpus)
              time=1:02  mat=-1  n=531897007  fh=88%  nps=8.6M       
 
Here's the relevant portion of the Makefile. I changed the CC and CXX flags as needed.

Code: Select all

darwin:
        $(MAKE) target=FreeBSD \
                CC=gcc-mp-4.5 CXX=g++-mp-4.5 \
                CFLAGS='-Wall -Wextra -pipe -O3' \
                CXFLAGS='-Wall -pipe -O3' \
                LDFLAGS=$(LDFLAGS) \
                LIBS='-lstdc++' \
                opt='$(opt) -DCPUS=8' \
                crafty-make
Some recent change in the Crafty source really affected gcc-4.4.
My first thought would be to remove the -O3 and see what happens with 1 cpu vs 8. Then use -O2 if the first test scales reasonably (you should expect nps to be at least 7x faster and pretty close to 8x usually.) If O2 breaks it, it, we have something to look at.

Never mind. Just tried gcc here. Something is beyond wrong as my numbers get very close to yours suddenly. I'll try to experiment around a bit to see if the problem is an obvious one... ugh...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty-23.3 very slow

Post by bob »

BTW for me, gcc sucks for either 23.2 or 23.3... not sure why, yet...
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Crafty-23.3 very slow

Post by zullil »

bob wrote:BTW for me, gcc sucks for either 23.2 or 23.3... not sure why, yet...
Sorry---and relieved---to hear that. At least it's not just me. Thanks for the help.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty-23.3 very slow

Post by bob »

zullil wrote:
bob wrote:BTW for me, gcc sucks for either 23.2 or 23.3... not sure why, yet...
Sorry---and relieved---to hear that. At least it's not just me. Thanks for the help.
Have you tried to download Intel's free compiler for Linux. I suspect it will work on your mac unix system.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Crafty-23.3 very slow

Post by zullil »

bob wrote:
zullil wrote:
bob wrote:BTW for me, gcc sucks for either 23.2 or 23.3... not sure why, yet...
Sorry---and relieved---to hear that. At least it's not just me. Thanks for the help.
Have you tried to download Intel's free compiler for Linux. I suspect it will work on your mac unix system.
The free compiler for Linux won't install (well, maybe it would with some hacking) on OS X. I can download the compiler for Mac OS X, but it's for evaluation and stops working in 30 days.

After discovering that binaries compiled with recent versions of gcc were almost as fast as those produced with icc, I decided to give up on icc. Hope you can discover why gcc is struggling with 23.2 and 23.3.
jefk
Posts: 1085
Joined: Sun Jul 25, 2010 10:07 pm
Location: the Netherlands
Full name: Jef Kaan

Re: Crafty-23.3 very slow

Post by jefk »

23.3 crafty slow(er) ?
well i found some other compilations of the 23.3 version, namely
on the site of Jim Ablett's winboard projects, havent looked at
the nps though; while it indeed seems to run a bit slower
than the 23.2 version, i find the evaluation function much
better and as result have included it my latest Bookbuilder
package (4.09), which you can download at superchess.com.

and on the site of Peter Skinner http://www.webkikr.com/
there now also the 23.3 compilations seem to be available;
still have to compare which one is the best for me..
best regards,
jef
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty-23.3 very slow

Post by bob »

jefk wrote:23.3 crafty slow(er) ?
well i found some other compilations of the 23.3 version, namely
on the site of Jim Ablett's winboard projects, havent looked at
the nps though; while it indeed seems to run a bit slower
than the 23.2 version, i find the evaluation function much
better and as result have included it my latest Bookbuilder
package (4.09), which you can download at superchess.com.

and on the site of Peter Skinner http://www.webkikr.com/
there now also the 23.3 compilations seem to be available;
still have to compare which one is the best for me..
best regards,
jef
23.3 is a little slower. But just a few percent. Whenever you get more aggressive in pruning and reduce the effective branching factor, NPS suffers, because you do a lot of work at the front of a node to generate moves and such, and then throw some of that away without using it, which will slow things down a bit. But here, we are talking about well over 50% with gcc. On my cluster box I use on ICC (8 cores) I see speeds of 20M and up. Using gcc, this drops to 8M or so. For reasons (so far) that are unknown...

Have tried to profile, but the damned profiler doesn't work with parallel search, it produces corrupted data...
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Crafty-23.3 very slow

Post by zullil »

bob wrote:
jefk wrote:23.3 crafty slow(er) ?
well i found some other compilations of the 23.3 version, namely
on the site of Jim Ablett's winboard projects, havent looked at
the nps though; while it indeed seems to run a bit slower
than the 23.2 version, i find the evaluation function much
better and as result have included it my latest Bookbuilder
package (4.09), which you can download at superchess.com.

and on the site of Peter Skinner http://www.webkikr.com/
there now also the 23.3 compilations seem to be available;
still have to compare which one is the best for me..
best regards,
jef
23.3 is a little slower. But just a few percent. Whenever you get more aggressive in pruning and reduce the effective branching factor, NPS suffers, because you do a lot of work at the front of a node to generate moves and such, and then throw some of that away without using it, which will slow things down a bit. But here, we are talking about well over 50% with gcc. On my cluster box I use on ICC (8 cores) I see speeds of 20M and up. Using gcc, this drops to 8M or so. For reasons (so far) that are unknown...

Have tried to profile, but the damned profiler doesn't work with parallel search, it produces corrupted data...
Newer versions of gcc have -fprofile-correction, which handles parallel threads, I think.

By the way, I reinstalled icc and voila:

Code: Select all

Crafty v23.3 (8 cpus)

White(1): setboard r2qr1k1/pppb2p1/2np1p1p/6B1/1b5Q/3B1N2/PPP2PPP/2KR3R w - - 0 1 
White(1): st 60
search time set to 60.00.
White(1): display stats
display statistics at end of each search.
White(1): go
              time limit 1:00 (+0.00) (1:00)
              depth   time  score   variation (1)
starting thread 1
starting thread 2
starting thread 3
starting thread 4
starting thread 5
starting thread 6
starting thread 7
                8     0.03  -0.61   1. Be3 Ba5 2. Bg6 Re7 3. Qh5 Ne5 4.
                                    Nxe5 Rxe5 (s=4)
                8->   0.04  -0.61   1. Be3 Ba5 2. Bg6 Re7 3. Qh5 Ne5 4.
                                    Nxe5 Rxe5
                9     0.05  -0.63   1. Be3 Be6 2. Bg6 Re7 3. Be4 Ba5 4.
                                    Qh5 f5 5. Bd5 Bxd5 6. Rxd5
                9->   0.07  -0.63   1. Be3 Be6 2. Bg6 Re7 3. Be4 Ba5 4.
                                    Qh5 f5 5. Bd5 Bxd5 6. Rxd5 (s=3)
               10     0.08  -0.63   1. Be3 Be6 2. Bg6 Re7 3. Be4 Ba5 4.
                                    Qh5 f5 5. Bd5 Bxd5 6. Rxd5 (s=2)
               10->   0.09  -0.63   1. Be3 Be6 2. Bg6 Re7 3. Be4 Ba5 4.
                                    Qh5 f5 5. Bd5 Bxd5 6. Rxd5 (s=6)
               11     0.10  -0.63   1. Be3 Be6 2. Bg6 Re7 3. Be4 Ba5 4.
                                    Qh5 f5 5. Bd5 Bxd5 6. Rxd5 (s=5)
               11->   0.13  -0.63   1. Be3 Be6 2. Bg6 Re7 3. Be4 Ba5 4.
                                    Qh5 f5 5. Bd5 Bxd5 6. Rxd5 (s=4)
               12     0.17  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Kb1 Qc8 5. Qh5 Bg4 6. Qd5+ Qe6 7. Rhe1
                                    <HT> (s=3)
               12->   0.25  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Kb1 Qc8 5. Qh5 Bg4 6. Qd5+ Qe6 7. Rhe1
                                    <HT> (s=4)
               13     0.30  -0.67   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    h4 (s=3)
               13->   0.44  -0.67   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    h4 (s=5)
               14     0.51  -0.69   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 a5 (s=4)
               14->   0.81  -0.69   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 a5 (s=8)
               15     0.94  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    h4 Ne5 8. Qxd7 <HT> (s=7)
               15->   1.69  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    h4 Ne5 8. Qxd7 <HT> (s=8)
               16     1.88  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 f5 8. Bxc6 bxc6 (s=7)
               16->   3.46  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 f5 8. Bxc6 bxc6 (s=10)
               17     3.92  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 f5 8. Bxc6 Qxc6 9. Qxc6 bxc6 (s=9)
               17->   6.77  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 f5 8. Bxc6 Qxc6 9. Qxc6 bxc6 (s=8)
               18     7.38  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 f5 8. Bxc6 Qxc6 9. Qxc6 bxc6 (s=7)
               18->  12.73  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 f5 8. Bxc6 Qxc6 9. Qxc6 bxc6 (s=10)
               19    24.62  -0.56   1. Be3 Be6 2. Bg6 Re7 3. a3 Ba5 4.
                                    Qa4 a6 5. Qf4 Ne5 6. Nxe5 fxe5 7. Qh4
                                    c6 8. Qh5 Bd5 9. f3 b5 10. h4 (s=9)
               19->  30.11  -0.56   1. Be3 Be6 2. Bg6 Re7 3. a3 Ba5 4.
                                    Qa4 a6 5. Qf4 Ne5 6. Nxe5 fxe5 7. Qh4
                                    c6 8. Qh5 Bd5 9. f3 b5 10. h4
               20    32.11  -0.56   1. Be3 Be6 2. Bg6 Re7 3. a3 Ba5 4.
                                    Qa4 a6 5. Qf4 Ne5 6. Nxe5 fxe5 7. Qh4
                                    c6 8. Qh5 Bd5 9. f3 b5 10. h4 Qc7
               20->  40.64  -0.56   1. Be3 Be6 2. Bg6 Re7 3. a3 Ba5 4.
                                    Qa4 a6 5. Qf4 Ne5 6. Nxe5 fxe5 7. Qh4
                                    c6 8. Qh5 Bd5 9. f3 b5 10. h4 Qc7 (s=5)
               21    47.79  -0.58   1. Be3 Be6 2. Bg6 Re7 3. a3 Ba5 4.
                                    Qa4 a6 5. Qf4 Ne5 6. Nxe5 fxe5 7. Qh4
                                    c6 8. Qh5 Bd5 9. f3 b5 10. h4 Qc7 11.
                                    Bf5 (s=4)
              time=1:00  mat=-1  n=1215686131  fh=90%  nps=20.0M      
              extensions=20.3M qchecks=61.5M reduced=137.5M pruned=445.0M
              predicted=0  evals=303.0M  50move=0  EGTBprobes=0  hits=0
              SMP->  splits=193740  aborts=22589  data=59/512  elap=1:00
terminating SMP processes.
White(1): Be3
              time used:   1:00

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty-23.3 very slow

Post by bob »

zullil wrote:
bob wrote:
jefk wrote:23.3 crafty slow(er) ?
well i found some other compilations of the 23.3 version, namely
on the site of Jim Ablett's winboard projects, havent looked at
the nps though; while it indeed seems to run a bit slower
than the 23.2 version, i find the evaluation function much
better and as result have included it my latest Bookbuilder
package (4.09), which you can download at superchess.com.

and on the site of Peter Skinner http://www.webkikr.com/
there now also the 23.3 compilations seem to be available;
still have to compare which one is the best for me..
best regards,
jef
23.3 is a little slower. But just a few percent. Whenever you get more aggressive in pruning and reduce the effective branching factor, NPS suffers, because you do a lot of work at the front of a node to generate moves and such, and then throw some of that away without using it, which will slow things down a bit. But here, we are talking about well over 50% with gcc. On my cluster box I use on ICC (8 cores) I see speeds of 20M and up. Using gcc, this drops to 8M or so. For reasons (so far) that are unknown...

Have tried to profile, but the damned profiler doesn't work with parallel search, it produces corrupted data...
Newer versions of gcc have -fprofile-correction, which handles parallel threads, I think.

By the way, I reinstalled icc and voila:

Code: Select all

Crafty v23.3 (8 cpus)

White(1): setboard r2qr1k1/pppb2p1/2np1p1p/6B1/1b5Q/3B1N2/PPP2PPP/2KR3R w - - 0 1 
White(1): st 60
search time set to 60.00.
White(1): display stats
display statistics at end of each search.
White(1): go
              time limit 1:00 (+0.00) (1:00)
              depth   time  score   variation (1)
starting thread 1
starting thread 2
starting thread 3
starting thread 4
starting thread 5
starting thread 6
starting thread 7
                8     0.03  -0.61   1. Be3 Ba5 2. Bg6 Re7 3. Qh5 Ne5 4.
                                    Nxe5 Rxe5 (s=4)
                8->   0.04  -0.61   1. Be3 Ba5 2. Bg6 Re7 3. Qh5 Ne5 4.
                                    Nxe5 Rxe5
                9     0.05  -0.63   1. Be3 Be6 2. Bg6 Re7 3. Be4 Ba5 4.
                                    Qh5 f5 5. Bd5 Bxd5 6. Rxd5
                9->   0.07  -0.63   1. Be3 Be6 2. Bg6 Re7 3. Be4 Ba5 4.
                                    Qh5 f5 5. Bd5 Bxd5 6. Rxd5 (s=3)
               10     0.08  -0.63   1. Be3 Be6 2. Bg6 Re7 3. Be4 Ba5 4.
                                    Qh5 f5 5. Bd5 Bxd5 6. Rxd5 (s=2)
               10->   0.09  -0.63   1. Be3 Be6 2. Bg6 Re7 3. Be4 Ba5 4.
                                    Qh5 f5 5. Bd5 Bxd5 6. Rxd5 (s=6)
               11     0.10  -0.63   1. Be3 Be6 2. Bg6 Re7 3. Be4 Ba5 4.
                                    Qh5 f5 5. Bd5 Bxd5 6. Rxd5 (s=5)
               11->   0.13  -0.63   1. Be3 Be6 2. Bg6 Re7 3. Be4 Ba5 4.
                                    Qh5 f5 5. Bd5 Bxd5 6. Rxd5 (s=4)
               12     0.17  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Kb1 Qc8 5. Qh5 Bg4 6. Qd5+ Qe6 7. Rhe1
                                    <HT> (s=3)
               12->   0.25  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Kb1 Qc8 5. Qh5 Bg4 6. Qd5+ Qe6 7. Rhe1
                                    <HT> (s=4)
               13     0.30  -0.67   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    h4 (s=3)
               13->   0.44  -0.67   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    h4 (s=5)
               14     0.51  -0.69   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 a5 (s=4)
               14->   0.81  -0.69   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 a5 (s=8)
               15     0.94  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    h4 Ne5 8. Qxd7 <HT> (s=7)
               15->   1.69  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    h4 Ne5 8. Qxd7 <HT> (s=8)
               16     1.88  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 f5 8. Bxc6 bxc6 (s=7)
               16->   3.46  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 f5 8. Bxc6 bxc6 (s=10)
               17     3.92  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 f5 8. Bxc6 Qxc6 9. Qxc6 bxc6 (s=9)
               17->   6.77  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 f5 8. Bxc6 Qxc6 9. Qxc6 bxc6 (s=8)
               18     7.38  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 f5 8. Bxc6 Qxc6 9. Qxc6 bxc6 (s=7)
               18->  12.73  -0.72   1. Be3 Be6 2. Bg6 Re7 3. c3 Ba5 4.
                                    Qa4 Bb6 5. Rhe1 Bxe3+ 6. Rxe3 Qd7 7.
                                    Be4 f5 8. Bxc6 Qxc6 9. Qxc6 bxc6 (s=10)
               19    24.62  -0.56   1. Be3 Be6 2. Bg6 Re7 3. a3 Ba5 4.
                                    Qa4 a6 5. Qf4 Ne5 6. Nxe5 fxe5 7. Qh4
                                    c6 8. Qh5 Bd5 9. f3 b5 10. h4 (s=9)
               19->  30.11  -0.56   1. Be3 Be6 2. Bg6 Re7 3. a3 Ba5 4.
                                    Qa4 a6 5. Qf4 Ne5 6. Nxe5 fxe5 7. Qh4
                                    c6 8. Qh5 Bd5 9. f3 b5 10. h4
               20    32.11  -0.56   1. Be3 Be6 2. Bg6 Re7 3. a3 Ba5 4.
                                    Qa4 a6 5. Qf4 Ne5 6. Nxe5 fxe5 7. Qh4
                                    c6 8. Qh5 Bd5 9. f3 b5 10. h4 Qc7
               20->  40.64  -0.56   1. Be3 Be6 2. Bg6 Re7 3. a3 Ba5 4.
                                    Qa4 a6 5. Qf4 Ne5 6. Nxe5 fxe5 7. Qh4
                                    c6 8. Qh5 Bd5 9. f3 b5 10. h4 Qc7 (s=5)
               21    47.79  -0.58   1. Be3 Be6 2. Bg6 Re7 3. a3 Ba5 4.
                                    Qa4 a6 5. Qf4 Ne5 6. Nxe5 fxe5 7. Qh4
                                    c6 8. Qh5 Bd5 9. f3 b5 10. h4 Qc7 11.
                                    Bf5 (s=4)
              time=1:00  mat=-1  n=1215686131  fh=90%  nps=20.0M      
              extensions=20.3M qchecks=61.5M reduced=137.5M pruned=445.0M
              predicted=0  evals=303.0M  50move=0  EGTBprobes=0  hits=0
              SMP->  splits=193740  aborts=22589  data=59/512  elap=1:00
terminating SMP processes.
White(1): Be3
              time used:   1:00

I know. And that's crazy-looking. I can't imagine what a compiler could do to cause that kind of SMP slow-down unless there are some hidden library calls inserted that require synchronization primitives to avoid some sort of internal data corruption. I'm looking as I always use gcc on AMD boxes, as the intel compiler seems to produce some sort of bad code when run on AMD, something that slows things down about like what gcc is doing.