Yes, it would be more random. The point is so that the games are random in the first place. Randomness is very important in running these tests. The whole point that your mathematician friend made in an email is that you can't draw statistical conclusions from results that aren't random. You don't know what sort of distribution that testing by time gives at all. In fact, for one given search and search time, I'd expect a typical engine that checks the timer every X nodes to have very few possible node counts, likely only two or three in a controlled environment like yours, depending on X. The picture gets much more complicated because there is an unpredictable (not random) element added at every move, which of course leads to a large tree of possible games, but not one that has much statistical meaning. Thinking about this point, I would say that a random number of nodes for each move instead of per game would be better, of course giving each engine the same amount of nodes to make it fair.bob wrote:The only difference I can see between the two approaches is the distribution of node counts. If you use 3M +/- 500K, you would probably get a uniform distribution evenly scattered over the interval unless you modified the PRNG to produce a normal rather than uniform distribution (easy enough to do to be sure). If we set the time so that the average search covers about 3M nodes, and assuming some sort of upper/lower bound of say 500K nodes, we get a normal distribution centered on 3M. Now are you going to tell me that the uniform distribution somehow is better? Or that it somehow more accurately simulates the real world?
So, again, I don't see any possible advantage other than repeatability, which is completely worthless in this context since we already know how to get perfect repeatability, but it gives too many duplicate games to provide useful information...
If you run the same test twice, and random numbers to produce a uniform distribution from 2.5M to 3.5M, that would seem to be _more_ random than the current normal distribution centered on 3M. Why would we want to go _more_ random when we know the results change with each different node count.
When you talk about a normal distribution as coming from the RNG, then there is probably not much difference. I think, and this is merely philosophical, not a point I'm trying to make, that testing with uniform distribution would give better results, as if an engine can produce a good move after a wide range of possible search times, it uses a better, more robust and "universal" algorithm, not one that merely plays well given "around 3M nodes". But, search times do not give a normal distribution. I would guess that your setup reduces variance quite a lot, so that the distribution is much smaller and much less random.
An additional point is that most engines only check for timeout every X nodes, so having a window of +/-500K only guarantees 1M/x different possible samples. Mine for instance, checks every 10000 nodes, so that's 100 different possible games. Ideally they should be modified to check every node. Since the games are based on node count rather than time, this wouldn't hurt, but only give more robustness.
A good analogy to this is Monte Carlo analysis, which this could be considered to be in a way. If you don't have a good random number source, than the MC samples have bias, and you can't draw reasonable conclusions from them, as your first post in the other thread shows quite clearly.