Bob I think if one tries try to fully "emulate" a sequential algorithm, like alpha beta is, on a parallel system, you need a lot of communication between the processes, simply because of this sequential nature? Rybka Cluster runs on "ordinary" PCs, but fast ones, probably some octals but also maybe a few slower or smaller Quads, all owned by Lukas Cimiotti. So you start with uneven capabilities per clusternode. Maybe that is not true and they are five full octals, but that does not change the argument substantially. I think the computers communicate by some network setup with limited bandwith. Don't know much about it, but I think that with your Infiniband cluster Crafty could do a lot more, communication-wise? So I think it is a trade-off and Cluster Rybka just has to do the best she can with very limited message passing. The CPUs just need to work more independently than in a full sequential algorithm. In your example you would like the CPUs to work for 90% of their time on the first move, dividing the work very closely but this creates too much overhead in the PC-cluster.bob wrote:this is rare, but does happen. But it does not "mean nothing". There is no other viable way to measure parallel speedup. we don't need guesses, approximations, and such, when precise numbers are easy to obtain...Uri Blass wrote:time to ply measurement means nothing if the program does not play the same move at the same depth.bob wrote:the problem is that from a parallel processing research point of view, "speedup" is _the_ number we want to see. That is a linear function, whereas Elo is not necessarily linear. Everyone seems to believe in diminishing returns, which means Elo doesn't linearly increase with speedup/search depth. Comparing SMP searches can only be done with time-to-ply measurements... We don't really care what the strength improvement is, just what is the parallel speedup...Uri Blass wrote:The problem is that fixed depth search times may be misleading because the same depth with 4 cpu does not mean the same as the same depth with 1 cpubob wrote:I was more interested in the speedup scaling, since Vas has claimed publicly that his speedup is better than anybody else's... I just don't have any good boxes with windows or I would try that myself...Uri Blass wrote:I think that fixed depth may be misleading because rybka may play better at fixed depth with more cores thanks to doing less pruning at the same depth.bob wrote:Those numbers sound pretty reasonable. I'm not so happy with "those" that report numbers that are simply fictional, and which anybody that has done any reading or research into parallel search could immediately recognize as bogus.
I still hope that one day someone will post some real numbers on Rybka's parallel speedup on an 8-way box, by running some "normal" positions to a fixed depth using 1, 2, 4 and 8 processors. He claims to scale better than any other program. Somehow I doubt it. Maybe "as good as" if he is lucky. But so far we just have urban legend to go on. Speedups for my program are quite easy to produce and anybody can do it.
It is possible to test it simply by playing fixed depth match between
rybka single core and rybka 4 cores.
Uri
and the only good test to find effective speed up is by games between
rybka 4 cpu and rybka 1 cpu with unequal time control.
For example
If rybka 4 cpu can win by result like 5300:4700 after 10,000 ponder off games with 3:1 time handicap then you can say that the effective speed up is more than 3:1 and it may be a good idea to try 3.5:1 time handicap.
I have no time for this type of test and hopefully other can do it.
Uri
It is possible that smp rybka play better moves at depth 10 relative to single processor rybka because smp rybka does less pruning.
Uri
If it plays better moves by pruning less, then the sequential algorithm ought to prune less and play better as well. This argument is circular and leading nowhere...
Eelco