Yeah, this.hgm wrote:Sorry, that is nonsense. If A and B would always play the same game against each other, and A happened to win it, it would not prove that A is stronger at all. It could very well be that starting from every position that is not in the game B would win. (In practice this could even occur, e.g. because the opening book of the far stronger engine B contains an error that allows a book win.)
Ricardo is right, with the caveat that one should not go to extremes: if the randomness would be so high that it starts to randomly decide the result (e.g. by randomly starving processes for CPU so they would always lose on time before they could complete 20 moves), that would qualify as "too much randomness". But in typical testing conditions we are very far from this limit.
I guess I should have been even clearer and say all randomness is good, as long as you have fixed the conditions you want to fix - the programs, the time control and the available hardware power (plus perhaps a few other implicit things I'm forgetting now). Most other things should be left to randomness, which will help your statistical measurements. Non-determinism coming from the parallel search definitely helps testing.