I found something in your parallel search code...Daniel Shawul wrote:Recently I had a chance to run scorpio on 8 processor machine.
Unfortunately it did not seem to do better than the 4 processor version at all! I see that all the cpus are busy (100%) but obviously sitting idle. The work allocation scheme is pretty much straight forward but I don't know if that is good enough for keeping the processors alive. The 4 processor version scales almost 4x nps wise as expected but the 8 cpu run does not increase the nps at all!! Bewildered I took crafty and run it there and it sweapt to 10 Mnps with no problem! I do not have a lot of chance to run on that machine but I can test on 4 cpus as much as I want to, which I am hoping will help me figure out the problem. I heard problems of this kind happening to other guys here, so if you could point me to things to watch out for, it is much appreciated.
Daniel
Every time you need to find out if you can do a split, you do a global lock and then loop through every processor state to see if there are idle processors.
This looks really expensive. With more processors, the worse it gets. Most of the time, there are no idle processors, and yet you have to do this loop for every move. At the same time, you lock everything up.
I suggest that you maintain a counter for idle CPUs. Simply use this counter to see if you can split.
Here is your code:
Code: Select all
if(n_processors > 1
&& pstack->depth > 2 * UNITDEPTH
) {
register int i;
/*
attach idle processors to current processor
*/
l_lock(lock_smp);
for(i = 0;i < n_processors;i++) {
if(processors[i].state == WAIT) {
attach_processor(i);
}
}
if(n_workers) {
attach_processor(processor_id);
for(i = 0; i < n_processors; i++) {
if(workers[i])
processors[i].state = GO;
}
}
l_unlock(lock_smp);
...
}
Code: Select all
if(g_n_idle_processors > 0 // there has to be an avaiable CPU
&& pstack->depth > 2 * UNITDEPTH
) {
register int i;
/*
attach idle processors to current processor
*/
l_lock(lock_smp);
if (g_n_idle_processors) { // check again, in case another thread acquired the CPUs while we were blocked on the lock
for(i = 0;i < n_processors;i++) {
if(processors[i].state == WAIT) {
attach_processor(i);
}
}
if(n_workers) {
attach_processor(processor_id);
for(i = 0; i < n_processors; i++) {
if(workers[i])
processors[i].state = GO;
}
}
}
l_unlock(lock_smp);
...
}