Crafty SMP measurement

bob · Post by **bob** » Mon Apr 04, 2016 6:04 am

I have a question, brought on by my imminent retirement from UAB. I have finally decided that 46 years is enough, so as of the middle of May, I will be happily unemployed. I have one final paper I want to write, dealing with Crafty's SMP search, and here's the question:

(1) I have decided that one time control is not very useful, so I am thinking about running tests for 10 seconds per move, then 60 seconds per move, and finally 3-5 minutes per move. I plan on reporting the speedup for each time control, since it is pretty clear that the speedup is worse at fast time controls. I might, after analyzing the data, decide to drop the 10 to 5, and maybe the 60 to 30 seconds per move...

(2) the next issue is more significant. I have always computed the speedup for max cores, then used the SAME search depth for all the other runs (2, 4, 8, 16, 20 and beyond) but those 2, 4 etc runs take correspondingly longer. So I intend to do this:

(a) I am going to run the 20-core run for 10 minutes per position, or 4 hours.

(b) I am going to run the 1-core run to the max depth that was completed for each 20 core run. This run will take days.

(c) and for the rest, I plan on running EVERY test for 10 minutes per position, as I did for (a). And then when I report speedups, they will actually use the same time per move rather than the same depth. It doesn't seen very interesting to see a 2.0 speedup where 1 cpu takes 120 minutes and the 2 cpu run takes 60 minutes. What I will do is for each different number of cores, I will take the max depth reached for each time limit I want to report, and then use that same depth time from the 1 core run. This way each test will be to the same time limit, although obviously each test will go deeper since additional cores will be used.

Any thoughts on this sort of measurement? I REALLY want to see the speedup for fast, not-so-fast and 3+ minutes per move to see how they compare. This lets me run the 1 core test which will take, as I said, days, and then each other test will only need 10 minutes per position or 4 hours. Total time will be 1 core + 5 * 4 hours since I will run 2, 4, 8, 16 and 20. And I will repeat each test (excluding 1 core) four times to deal with variability issues. So whatever 1 cpu takes + just under 4 days for the multi-core runs, which is not so bad.

Any thoughts or suggestions?

zullil · Post by **zullil** » Mon Apr 04, 2016 6:28 pm

bob wrote: Any thoughts or suggestions?

Make sure you retain access to the UAB computing resources. Get that in writing.

Hope you enjoy your retirement.

mvk · Post by **mvk** » Mon Apr 04, 2016 6:33 pm

In chess game playing or analysis, we primarily add cores to search a larger tree at the same time control, not to search the same tree faster: the game conditions or the available time for analysis doesn't change: it is dictated by the environment. We can only add cores and see how the system changes.

The larger tree gives added strength through Thompson's mechanism. Therefore I suggest to quantify the gain in strength directly (through elo or wilo), rather than through an indirect measure as you do when comparing to a single-core tree at 1000x more time: we don't have 1000x more time when we have a single core, we have the same amount of time. So that should be compared from my POV. Same time X different configs -> resulting performance.

I think like this: if our computers were 1000x faster today, our single-core search would be quite different that we do now, because we would use the additional processing in a different manner: software improvements or tuning that still need to be discovered, or those we do know but can't use at the current time controls will play a role: Tree shaping, raw speed and time control are interlinked. We know that because not every improvement works out the same at every time control even when compensating for draw rate differences.

Comparison with Stockfish scaling and Komodo scaling using the same metrics to put the Crafty program in perspective.

bob · Post by **bob** » Mon Apr 04, 2016 8:22 pm

mvk wrote:In chess game playing or analysis, we primarily add cores to search a larger tree at the same time control, not to search the same tree faster: the game conditions or the available time for analysis doesn't change: it is dictated by the environment. We can only add cores and see how the system changes.

The larger tree gives added strength through Thompson's mechanism. Therefore I suggest to quantify the gain in strength directly (through elo or wilo), rather than through an indirect measure as you do when comparing to a single-core tree at 1000x more time: we don't have 1000x more time when we have a single core, we have the same amount of time. So that should be compared from my POV. Same time X different configs -> resulting performance.

I think like this: if our computers were 1000x faster today, our single-core search would be quite different that we do now, because we would use the additional processing in a different manner: software improvements or tuning that still need to be discovered, or those we do know but can't use at the current time controls will play a role: Tree shaping, raw speed and time control are interlinked. We know that because not every improvement works out the same at every time control even when compensating for draw rate differences.

Comparison with Stockfish scaling and Komodo scaling using the same metrics to put the Crafty program in perspective.

This was my thinking also. Hence my idea of taking a 3 minute search at 2 cores and measure the equivalent time for the same search at 1 core. Then a 3 minute search at 4 cores compared to the same search at 1, etc. A LOT less time required, and the numbers actually mean something. The 60 minute search at 1 core taking 30 minutes at 2 cores is not very relevant IMO.

I am going to run this experiment over the next week or two and see how the numbers look. The way I am running it I can use most any time per move since shorter times are always included in the longer time runs...

Mincho Georgiev · Post by **Mincho Georgiev** » Tue Apr 05, 2016 9:24 am

I really hope we have an access to it. Is it gonna be proprietary after you submit it to UAB?
I'd be very interested in fixed time per move results (serial vs parallel) compared to fixed time per 40 moves (game) test. (ELO)

bob · Post by **bob** » Tue Apr 05, 2016 5:42 pm

Mincho Georgiev wrote:I really hope we have an access to it. Is it gonna be proprietary after you submit it to UAB?
I'd be very interested in fixed time per move results (serial vs parallel) compared to fixed time per 40 moves (game) test. (ELO)

No, it won't be proprietary.. I will submit to the JICGA and also post an electronic copy on my web page...

Mincho Georgiev · Post by **Mincho Georgiev** » Tue Apr 05, 2016 6:33 pm

bob wrote:
Mincho Georgiev wrote:I really hope we have an access to it. Is it gonna be proprietary after you submit it to UAB?
I'd be very interested in fixed time per move results (serial vs parallel) compared to fixed time per 40 moves (game) test. (ELO)
No, it won't be proprietary.. I will submit to the JICGA and also post an electronic copy on my web page...

Awesome, thank you!

Crafty SMP measurement

Crafty SMP measurement

Re: Crafty SMP measurement

Re: Crafty SMP measurement

Re: Crafty SMP measurement

Re: Crafty SMP measurement

Re: Crafty SMP measurement

Re: Crafty SMP measurement