Howard E wrote:The last two tests do this:
8Threads for ht on only 4.6m nps for Arasan 11.3
8Threads for ht on only 7.7m nps for bright 0.4a
There are only 4 physical processors on my machine,
so the other 4 are virtual.
But it looks like nps count is not accurate for comparing MP capable engine performance. I coming out of the single processor dark ages and just found this out.
I have been testing on a dual-socket quad-core nehalem, turning on SMT actually slows the NPS for Crafty. And using mt=16 to use the logical processors further hurts because of the SMP search overhead. I'm not so sure SMT is that great on dual-socket machines when the program is pretty well tuned with respect to cache usage, etc...
Howard E wrote:Computer:
corei7-920 (modest overclock from 133 to 150)
so 2660 mhz to 3000 mhz (20 * clock)
for single core apps something in mother board called
turbo enabled yields 21 * speed so 3150 mhz
8gb ram 512hash allotted for chess programs
Test:
nps count from new game starting position
ht is hyper threading
T is threads
nps is million except rybka's count
1. Rybka 2.2mpox64
ht=on 862.784
ht=off 744.450
2. Arasan 11.3
ht=off ht=on
1T 1.7 1.8
2T 3.2 2.9
4T 4.3 3.7
8Tfor ht on only 4.6
3. Bright0.4a
ht=off ht=on
1T 1.6 1.6
2T 3.1 3.0
4T 6.1 5.1
8T for ht on only 7.7
Thanks Howard, this is useful information.
bright's scaling without hyperthreading seems pretty good: 1.6, 3.1, 6.1.
(as I only have a dual core computer, I was not sure of the 4cpu nps)
But if hyperthreading is enabled, the 4cpu nps only reaches 5.1.
It seems that bright (and arasan too?!) needs to set the processor affinity to cope with hyperthreading (to make sure each thread gets its own cpu)
I haven't read the other discussion yet, but it seems to me that larger nps is better, so yes, bright would perform best (although just marginally better then 4 threads and no hyperthreading) with 8 threads and hyperthreading enabled.
Since the 8 thread (HT) nps is only 25% better than the 4 threads (no HT)nps, you'd need to play a large number (1000's) of games to actually prove it.
NPS is irrelevant. All that counts is time to depth. And I know of no program that given the choice of 4 cpus at 2M nodes per second per CPU, or 8 cpus at 1M nodes per second, would produce faster time-to-depth on the 8 cpus. Both would search the same NPS. But the search overhead would make the 8 cpu version slower, and weaker.
it takes at _least_ a 30% improvement in NPS to make hyper-threading worthwhile for chess. And I have not seen that kind of improvement, making it a losing proposition.
Easy enough to test. Just run using no SMT and 4 physical processors and search a group of positions to a fixed depth. Then turn SMT on and run the same positions with 8 processors. The latter will take longer to complete, making the point quite clear.
I do not think 'all that counts is not nps' and not nps to fixed depth.
All that counts is time to find the best move.
For the following position I ran Crafty 2 times.
The results differ pretty much.
[D] r1b1N2k/1pBn2p1/p3Q2p/5n2/8/2q5/2PRB1PP/7K w
You are right that time to solution is what matters. But time to depth is usually a good predictor of that (especially averaged over a lot of positions). NPS is not.
BBauer wrote:I do not think 'all that counts is not nps' and not nps to fixed depth.
All that counts is time to find the best move.
For the following position I ran Crafty 2 times.
The results differ pretty much.
[D] r1b1N2k/1pBn2p1/p3Q2p/5n2/8/2q5/2PRB1PP/7K w
TIme to depth is all that matters. I'd hope everyone knows that for smp testing, you need to run a position several times and average things rather than just making one run...