It occurs to me that Snow Leopard on my box can be booted with either a 32-bit kernel (the default, and what I have been using) or a 64-bit kernel.
I wonder if the choice I make will have any effect on scaling.
Are there particular lines in the output from
sysctl -a
that would be of interest here?
Another Crafty-23.1 Nehalem scaling problem
Moderator: Ras
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Another Crafty-23.1 Nehalem scaling problem
I've come to several conclusions:
1) Crafty is not the problem.
2) icc is not working correctly for me.
3) The slightly poor scaling with mt=8 shown below is likely due to the fact that hyper-threading currently cannot be disabled in OS X 10.6.2.
There is a checkbox in a preference pane that is supposed to do this, but testing (with Crafty and gnubg) has confirmed that toggling the status of that checkbox does not disable hyper-threading. Am about to file a bug report with Apple.
Thanks, Bob, for your assistance.
1) Crafty is not the problem.

2) icc is not working correctly for me.
3) The slightly poor scaling with mt=8 shown below is likely due to the fact that hyper-threading currently cannot be disabled in OS X 10.6.2.

Thanks, Bob, for your assistance.
zullil wrote:Something seems amiss with my icc, so I switched back to gcc. This is on the new Snow Leopard system. Scaling seems pretty good, I guess. Will repeat this exact experiment with my old Leopard system on the same hardware.bob wrote:Here's what I would expect:
log.001: time=30.20 mat=0 n=97218373 fh=95% nps=3.2M
log.002: time=30.35 mat=0 n=198382851 fh=95% nps=6.5M
log.003: time=31.01 mat=0 n=399493603 fh=94% nps=12.9M
log.004: time=30.89 mat=0 n=690102470 fh=94% nps=22.3M
I ran the same position for 30 seconds, using 1, 2, 4 and 8 cpus. Scaling is about 7 on this box with the current version (almost identical to 23.1). I had slightly better scaling numbers on a nehalem, but we don't currently have one up and running... But Nehalem ought to be somewhat better than this core-2 xeon box since the Nehalem has a better memory system.
Code: Select all
darwin: $(MAKE) target=FreeBSD \ CC=gcc CXX=g++ \ CFLAGS='$(CFLAGS) -O3 -msse4.2' \ CXFLAGS='$(CFLAGS) -O3 -msse4.2' \ LDFLAGS=$(LDFLAGS) \ LIBS='-lpthread -lstdc++' \ opt='-DCPUS=8 -DINLINE64' \ crafty-make
Code: Select all
max threads set to 1. Crafty v23.1 (1 cpus) White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19 time=30.59 mat=0 n=97395988 fh=91% nps=3.2M extensions=3.3M qchecks=2.9M reduced=7.8M pruned=38.2M predicted=0 evals=44.0M 50move=0 EGTBprobes=0 hits=0 SMP-> splits=0 aborts=0 data=0/512 elap=30.59 max threads set to 2. Crafty v23.1 (2 cpus) White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19 time=31.11 mat=0 n=175166321 fh=91% nps=5.6M extensions=5.9M qchecks=5.1M reduced=13.6M pruned=66.9M predicted=0 evals=81.0M 50move=0 EGTBprobes=0 hits=0 SMP-> splits=490 aborts=69 data=5/512 elap=31.11 max threads set to 4. Crafty v23.1 (4 cpus) White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19 time=30.93 mat=0 n=345958868 fh=91% nps=11.2M extensions=12.7M qchecks=11.2M reduced=27.4M pruned=137.6M predicted=0 evals=153.1M 50move=0 EGTBprobes=0 hits=0 SMP-> splits=4036 aborts=648 data=14/512 elap=30.93 max threads set to 8. Crafty v23.1 (8 cpus) White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19 time=30.78 mat=0 n=614520779 fh=90% nps=20.0M extensions=24.6M qchecks=22.7M reduced=48.3M pruned=250.0M predicted=0 evals=262.9M 50move=0 EGTBprobes=0 hits=0 SMP-> splits=72982 aborts=13450 data=41/512 elap=30.78
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Another Crafty-23.1 Nehalem scaling problem
zullil wrote:I've come to several conclusions:
1) Crafty is not the problem.
2) icc is not working correctly for me.
3) The slightly poor scaling with mt=8 shown below is likely due to the fact that hyper-threading currently cannot be disabled in OS X 10.6.2.There is a checkbox in a preference pane that is supposed to do this, but testing (with Crafty and gnubg) has confirmed that toggling the status of that checkbox does not disable hyper-threading. Am about to file a bug report with Apple.
Thanks, Bob, for your assistance.
Is there no BIOS setting that turns this off? Every machine I have worked on allowed this. More importantly, any recent Linux kernel handles hyperthreading just fine, understanding that you first schedule one process per physical core, before you start scheduling processes on logical cores which share resources between the two logical cores on a single physical core. If the MACOS scheduler doesn't understand this, then HT on is a real problem.
In general, when these machines boot up, you see something like "Press <F2> to enter setup" or some such. There you will find somewhere a setting like "logical processor" which is on or off. You want it off. Then HT is disabled. The wording is not always clear, but look for something like that. it does vary but has always been there on every box I have used if they had PIV or Nehalem processors.
zullil wrote:Something seems amiss with my icc, so I switched back to gcc. This is on the new Snow Leopard system. Scaling seems pretty good, I guess. Will repeat this exact experiment with my old Leopard system on the same hardware.bob wrote:Here's what I would expect:
log.001: time=30.20 mat=0 n=97218373 fh=95% nps=3.2M
log.002: time=30.35 mat=0 n=198382851 fh=95% nps=6.5M
log.003: time=31.01 mat=0 n=399493603 fh=94% nps=12.9M
log.004: time=30.89 mat=0 n=690102470 fh=94% nps=22.3M
I ran the same position for 30 seconds, using 1, 2, 4 and 8 cpus. Scaling is about 7 on this box with the current version (almost identical to 23.1). I had slightly better scaling numbers on a nehalem, but we don't currently have one up and running... But Nehalem ought to be somewhat better than this core-2 xeon box since the Nehalem has a better memory system.
Code: Select all
darwin: $(MAKE) target=FreeBSD \ CC=gcc CXX=g++ \ CFLAGS='$(CFLAGS) -O3 -msse4.2' \ CXFLAGS='$(CFLAGS) -O3 -msse4.2' \ LDFLAGS=$(LDFLAGS) \ LIBS='-lpthread -lstdc++' \ opt='-DCPUS=8 -DINLINE64' \ crafty-make
Code: Select all
max threads set to 1. Crafty v23.1 (1 cpus) White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19 time=30.59 mat=0 n=97395988 fh=91% nps=3.2M extensions=3.3M qchecks=2.9M reduced=7.8M pruned=38.2M predicted=0 evals=44.0M 50move=0 EGTBprobes=0 hits=0 SMP-> splits=0 aborts=0 data=0/512 elap=30.59 max threads set to 2. Crafty v23.1 (2 cpus) White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19 time=31.11 mat=0 n=175166321 fh=91% nps=5.6M extensions=5.9M qchecks=5.1M reduced=13.6M pruned=66.9M predicted=0 evals=81.0M 50move=0 EGTBprobes=0 hits=0 SMP-> splits=490 aborts=69 data=5/512 elap=31.11 max threads set to 4. Crafty v23.1 (4 cpus) White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19 time=30.93 mat=0 n=345958868 fh=91% nps=11.2M extensions=12.7M qchecks=11.2M reduced=27.4M pruned=137.6M predicted=0 evals=153.1M 50move=0 EGTBprobes=0 hits=0 SMP-> splits=4036 aborts=648 data=14/512 elap=30.93 max threads set to 8. Crafty v23.1 (8 cpus) White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19 time=30.78 mat=0 n=614520779 fh=90% nps=20.0M extensions=24.6M qchecks=22.7M reduced=48.3M pruned=250.0M predicted=0 evals=262.9M 50move=0 EGTBprobes=0 hits=0 SMP-> splits=72982 aborts=13450 data=41/512 elap=30.78
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Another Crafty-23.1 Nehalem scaling problem
Something is definitely up. When you got the Nehalem box, does it have a new OSX kernel to go along with it, one that actually understands hyper-threading??? I used to see just your kind of results when HT first came out. The older linux kernels would see 4 logical CPUS on my dual PIV box, and running 2 threads, it would just as likely run the two threads on one physical processor using both logical processors, which is far worse than running one thread per physical CPU. Newer kernels handle this perfectly now.zullil wrote:Here are results from my 10.5.8 Leopard system on the same box. I'm coming to the conclusion that OS X is still learning to deal with Nehalem. I'm giving up on this now--too frustrating. Thanks for the help.
Code: Select all
max threads set to 1. Crafty v23.1 (1 cpus) White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19 time=30.79 mat=0 n=97395988 fh=91% nps=3.2M extensions=3.3M qchecks=2.9M reduced=7.8M pruned=38.2M predicted=0 evals=44.0M 50move=0 EGTBprobes=0 hits=0 SMP-> splits=0 aborts=0 data=0/512 elap=30.79 max threads set to 2. Crafty v23.1 (2 cpus) White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19 time=30.44 mat=0 n=163726848 fh=91% nps=5.4M extensions=5.8M qchecks=5.1M reduced=12.9M pruned=64.3M predicted=0 evals=73.3M 50move=0 EGTBprobes=0 hits=0 SMP-> splits=523 aborts=84 data=7/512 elap=30.44 max threads set to 4. Crafty v23.1 (4 cpus) White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19 time=30.32 mat=0 n=301277071 fh=91% nps=9.9M extensions=11.1M qchecks=9.8M reduced=24.3M pruned=120.4M predicted=0 evals=132.3M 50move=0 EGTBprobes=0 hits=0 SMP-> splits=3574 aborts=643 data=15/512 elap=30.32 max threads set to 8. Crafty v23.1 (8 cpus) White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19 time=30.05 mat=0 n=468325941 fh=91% nps=15.6M extensions=17.6M qchecks=15.8M reduced=37.9M pruned=192.6M predicted=0 evals=200.4M 50move=0 EGTBprobes=0 hits=0 SMP-> splits=53739 aborts=9453 data=38/512 elap=30.05
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Another Crafty-23.1 Nehalem scaling problem
I've just discovered a bit more.
I can boot OS X 10.6.2 with either a 32-bit kernel or a 64-bit kernel. If I boot using the 64-bit kernel, then the Preference pane checkbox to enable/disable H-T doesn't work. But with the 32-bit kernel it functions correctly. With H-T off and mt=8, all eight physical cores are active.
Hoping to have this all sorted out soon!
I can boot OS X 10.6.2 with either a 32-bit kernel or a 64-bit kernel. If I boot using the 64-bit kernel, then the Preference pane checkbox to enable/disable H-T doesn't work. But with the 32-bit kernel it functions correctly. With H-T off and mt=8, all eight physical cores are active.
Hoping to have this all sorted out soon!
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Another Crafty-23.1 Nehalem scaling problem
(And with H-T on and mt=8 it uses 8 distinct physical cores. So I guess it really shouldn't matter how H-T is set, as long as I stick to mt <= 8. )zullil wrote:I've just discovered a bit more.
I can boot OS X 10.6.2 with either a 32-bit kernel or a 64-bit kernel. If I boot using the 64-bit kernel, then the Preference pane checkbox to enable/disable H-T doesn't work. But with the 32-bit kernel it functions correctly. With H-T off and mt=8, all eight physical cores are active.
Hoping to have this all sorted out soon!
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Another Crafty-23.1 Nehalem scaling problem
SO long as you can verify that 8 physical cores are being used. The cores are numbered quite non-intuitively to me, in that numbering for logical/physical occurs in an order that it not what I would expect. One of the guys at Intel that works on the linux process scheduler explained the scheme to me, and I am not sure I would be able to remember it now since 6 months have elapsed.zullil wrote:(And with H-T on and mt=8 it uses 8 distinct physical cores. So I guess it really shouldn't matter how H-T is set, as long as I stick to mt <= 8. )zullil wrote:I've just discovered a bit more.
I can boot OS X 10.6.2 with either a 32-bit kernel or a 64-bit kernel. If I boot using the 64-bit kernel, then the Preference pane checkbox to enable/disable H-T doesn't work. But with the 32-bit kernel it functions correctly. With H-T off and mt=8, all eight physical cores are active.
Hoping to have this all sorted out soon!