Best engine for greater than 8-core SMP system

bob · Post by **bob** » Mon Dec 13, 2010 11:10 pm

Milos wrote:
bob wrote:With respect to depth, if you crank up Crafty on (say) a 16 cpu system, in a middlegame position, and tell it to "go" it will start to search and display the NPS frequently. It will start off at about 1/2 the speed it will be searching after 30 seconds or so. I've not noticed any "stabilization" of parallel performance after a certain depth. The deeper the search, the less overhead we seem to see, in general, because the further from the tips you split, the better the move ordering is and the less likely you are to split at a CUT node which kills performance. Is 30 minutes per search better than 1? Yes. A lot? probably not. But there is a steady gain, at least for as far as I have measured. I have not compared 1 day searches to 1 hour searches for obvious reasons, however.
So then logically comes the question. If you leave it running on 16 cpu system for days would 16 be asymptotic speedup for most of the positions, or there still would be a majority of positions that can never reach linear speed-up, no matter how much time they are left to run?
And what would be the asymptotic average value (13 or 14 or maybe even 15 for 16 cpu system)?

I don't think 16 is the asymptote for current approaches. It is obviously the theoretical asymptote, but we won't get there on average. There is one, somewhere. I have not attempted to measure it because once we get past 5-6 minutes per move nobody cares since most tournament games don't typically see that kind of move time limit very often.

I have noticed this "deeper=better" more commonly on positions that are problematic for a parallel search. The speedup might be quite bad for 30 seconds or 2 minutes, but it begins to level out as the move goes deeper.

The bad positions are the ones where we change our mind a lot, because move ordering is bad. But it is hard to find a position that will behave like that, ply after ply, for 30+ plies. Almost all finally settle down and then move ordering ramps up along with the parallel speedup. The two are almost perfectly correlated, for obvious reasons.

Best engine for greater than 8-core SMP system

Re: Best engine for greater than 8-core SMP system