diep wrote:hgm wrote:I am still puzzled very much as to why this 10% nps drop would occur. If there is normally nothing that wats to run, why should it suddenly try to eat CPU because some below-normal processes start running? It is ertainly not the GUI that starts eating CPU.
SMP search usually.
Remove 1 or 2% system time from a system and the SMP search has big problems. Not seldom at 5% already you could easily lose 2 ply search depth or so.
Remove 1 full core and some engines are factor 1000 slower. Glaurung is a good example of that (and of course Stockfish).
Realize the type of search Glaurung is doing gives a better speedup than what Rybka is doing. Rybka can easier lose a core though...
One has to choose himself what you like most. In Diep i'm doing an ATTEMPT for all 3 things; a good speedup AND not sensitive too much to losing a few percent system time AND scalable to hundreds of cores.
But that makes the search a LOT more complicated (understatement).
I will give 1 concrete example.
Though via a different method in other engines it has the same effect.
In 2002 'supercomputer design' of Diep what i effectively was doing when i received at a position P at depthleft D a fail high for processor p
and other processors still were searching moves, i used 2 techniques up until december 2009.
One technique is that i aborted the entire tree of cpu's there at P(D-i) with i >= 0 and simply waited with that CPU until all other cpu's were gone in that position, then continued search.
In Diep 2010 and also current Diep what i do is, i give all cpu's P(D-i) an abort and then p directly continues to P. The idea is that especially if you have a lot of cores, say hundreds in the future (such algorithms take years to fix so it's a matter of designing for the future), we don't have to suck slow wait for some far remote cpu to be aborted. For the same reason if you have more searching processes active than cores available at this moment, it means one proces is simply not in the runqueue. So earliest time it can get fired is say a millisecond or 30 from now (in theory 10, but forget that in linux and windows). Such delays you simply cannot afford. You slow down factor 1000 then or so if you have more processes than processors available. So diep2002-2009 also suffered from the same phenomena there like Gaurung, be it to a tad smaller effect, the problem is similar.
So just aborting the cpu's and run further is a good idea nah?
At small machines, even with just 16 cores, it appears to have 2 major disadvantages.
a) it's more administration to do so, which slows you down. Though that's possible to fix, but that requires real careful and sharp mathematical model what is allowed at what spot at which time.
b) a simple splitproblem i hadn't foreseen. If p is so fast to give other cpu's an abort and directly move on, it nearly always is faster in position P(D+1), which is the father position of P(D). As we know we already gave a fail high in P(D), we know we have a fail low in P(D). With Principal Variation Search in Diep where i start the root window at (-inf,inf) we can directly prove that if i get back a fail low, that there is 2 choices. Either it was a nullmove that failed high in which case the above story is useless, or it was a normal move.
If it was a normal move, that means that we can directly SPLIT in position P(D+1). Now the 2002-2009 search of Diep would directly pick up those idle cpu's, as it always WAITED for them to be removed from position P(D).
That means obviously we directly have CPU's to split in, causing a MAGNIFICENT scaling for the 2002-2009 search provided you got all system time. In the 2010+ new SMP model i yet have to see the same scaling occur, as p returns so quickly from P(D) to P(D+1) that the cpu's it aborted simply had little chance yet in the average case to return from their job.
Fixing this problem isn't gonna be easy. But sure it suffers less if you remove a core suddenly as for example some service gets awakened (or VNC is active). But still you don't want to run a chessprogram at less cores than you have active processes.
It'll lose a factor 2 in speed or so, yet a ply or 2-3.