Crafty SMP measurement
Posted: Mon Apr 04, 2016 6:04 am
I have a question, brought on by my imminent retirement from UAB. I have finally decided that 46 years is enough, so as of the middle of May, I will be happily unemployed. I have one final paper I want to write, dealing with Crafty's SMP search, and here's the question:
(1) I have decided that one time control is not very useful, so I am thinking about running tests for 10 seconds per move, then 60 seconds per move, and finally 3-5 minutes per move. I plan on reporting the speedup for each time control, since it is pretty clear that the speedup is worse at fast time controls. I might, after analyzing the data, decide to drop the 10 to 5, and maybe the 60 to 30 seconds per move...
(2) the next issue is more significant. I have always computed the speedup for max cores, then used the SAME search depth for all the other runs (2, 4, 8, 16, 20 and beyond) but those 2, 4 etc runs take correspondingly longer. So I intend to do this:
(a) I am going to run the 20-core run for 10 minutes per position, or 4 hours.
(b) I am going to run the 1-core run to the max depth that was completed for each 20 core run. This run will take days.
(c) and for the rest, I plan on running EVERY test for 10 minutes per position, as I did for (a). And then when I report speedups, they will actually use the same time per move rather than the same depth. It doesn't seen very interesting to see a 2.0 speedup where 1 cpu takes 120 minutes and the 2 cpu run takes 60 minutes. What I will do is for each different number of cores, I will take the max depth reached for each time limit I want to report, and then use that same depth time from the 1 core run. This way each test will be to the same time limit, although obviously each test will go deeper since additional cores will be used.
Any thoughts on this sort of measurement? I REALLY want to see the speedup for fast, not-so-fast and 3+ minutes per move to see how they compare. This lets me run the 1 core test which will take, as I said, days, and then each other test will only need 10 minutes per position or 4 hours. Total time will be 1 core + 5 * 4 hours since I will run 2, 4, 8, 16 and 20. And I will repeat each test (excluding 1 core) four times to deal with variability issues. So whatever 1 cpu takes + just under 4 days for the multi-core runs, which is not so bad.
Any thoughts or suggestions?
(1) I have decided that one time control is not very useful, so I am thinking about running tests for 10 seconds per move, then 60 seconds per move, and finally 3-5 minutes per move. I plan on reporting the speedup for each time control, since it is pretty clear that the speedup is worse at fast time controls. I might, after analyzing the data, decide to drop the 10 to 5, and maybe the 60 to 30 seconds per move...
(2) the next issue is more significant. I have always computed the speedup for max cores, then used the SAME search depth for all the other runs (2, 4, 8, 16, 20 and beyond) but those 2, 4 etc runs take correspondingly longer. So I intend to do this:
(a) I am going to run the 20-core run for 10 minutes per position, or 4 hours.
(b) I am going to run the 1-core run to the max depth that was completed for each 20 core run. This run will take days.
(c) and for the rest, I plan on running EVERY test for 10 minutes per position, as I did for (a). And then when I report speedups, they will actually use the same time per move rather than the same depth. It doesn't seen very interesting to see a 2.0 speedup where 1 cpu takes 120 minutes and the 2 cpu run takes 60 minutes. What I will do is for each different number of cores, I will take the max depth reached for each time limit I want to report, and then use that same depth time from the 1 core run. This way each test will be to the same time limit, although obviously each test will go deeper since additional cores will be used.
Any thoughts on this sort of measurement? I REALLY want to see the speedup for fast, not-so-fast and 3+ minutes per move to see how they compare. This lets me run the 1 core test which will take, as I said, days, and then each other test will only need 10 minutes per position or 4 hours. Total time will be 1 core + 5 * 4 hours since I will run 2, 4, 8, 16 and 20. And I will repeat each test (excluding 1 core) four times to deal with variability issues. So whatever 1 cpu takes + just under 4 days for the multi-core runs, which is not so bad.
Any thoughts or suggestions?