Latest SMP update

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Latest SMP update

Post by bob »

I've done the verification process to the best of my ability, finally. I don't find anything odd anywhere after a couple of weeks of testing, tuning, instrumenting, etc.

First. New version using 8 cores vs itself using 1 core, to fixed depth of 12 plies (Kai's test):

1 Crafty-25.0 2602 2 2 12451 51% 2598 51%
2 Crafty-25.0x 2598 2 2 12451 49% 2602 51%

25.0 is latest using 8 cores, 25.0x is latest using only one core. So no unexpected gain from SMP search, which is expected (to me).

So far I only have 1 cpu and 4 20 cpu runs to work with, I'll give those results and then fill in with the 2, 4, 8 and 16 cpu results as they become available.

I did the usual 4 runs over the same set of positions. I will give 4 average speedups (sum of individual position search times, divided into the sum of the individual 1 cpu search times. Then the geometric mean of each:

13.10/13.54 (mean/geometric mean)
11.84/12.41
10.53/13.54
16..61/16.61

combined mean: 13.04
geometric mean: 13.95

Next NPS scaling: average = 16.4 (20.0 would be max)

finally overhead, or % of extra nodes over serial search: 26.8%

On initial investigation, the overhead looks more like what I have seen over the years. 30% is a number I have mentioned too many times to count.

I am probably going to re-run these again, because there were a couple of students running 1-2-4 cpu tests for a couple of minutes here and there during the middle 2 tests, which would likely inflate those times a bit here and there. However, a speedup of 14x out of 20 (or really 14x out of 16x since the NPS provides a hard upper bound of 16x on the possible speedup) is doing pretty good. Will be interesting to see the 16 cpu numbers. Something tells me the new algorithmic approach is going to be right up there with DTS, since it does everything does except for the iterative vs recursive search issue. But this version does "pre-splits" which gives about the same benefit as the DTS algorithm, with maybe a bit more accuracy.

More later. If there are questions, fire away...