Yes, it is impressive that a change of a mere two nodes made the game different. It is impressive that the games were changed by node counts well below what were discussing.MartinBryant wrote:OK Karl,
here's another experiment for you, not quite the one we were discussing but perhaps shedding some more light...
I thought it would be interesting to play a game between two engines at a fixed node count and then gradually increase that count to see how long it took until the game changed. I thought I would increase the node count by 1, 2, 4, 8, 16, etc nodes, doubling each time in case it took quite a while for a change to occur.
[snip]
So in 12 games we have 7 unique games where the maximum percentage change was 0.1024%
Also most changes (all much smaller than our previously discussed 1000) produce a game change.
Now I guess this data doesn't help much with your desired correlation test but to a layman it does seem to add even more fuel to the 'boy isn't it erratic!' side of the argument.
To me it is even more telling that not all twelve games were drawn. There was one win for black in there along with eleven draws. But as long as we are just eyeballing results, eleven draws out of twelve doesn't seem like a highly erratic result.
To relate this to the Crafty vs. Fruit discussion, suppose that our bet about correlated game results had arisen in a slightly different way than it actually did. Suppose that we had been discussing the probability of draws as opposed to decisive results. Suppose that a great mass of data from all sorts of positions shows that Spike playing itself draws 40% of the time. I might have claimed that, due to correlated game results, if I can peek at the result of the first playout from a position, and if that first playout is a draw, I'll bet that more that half the remaining playouts from this position are draws. You say that my bet is a loser because there are only 40% draws on average, and there is a ton of variation in playouts due to changing node counts. I would have won this bet about correlation in convincing fashion.
Yes indeed. Thanks for offering! Let's try Spike against another bot where node counts can be fixed. Let's pick 201 random starting positions that are roughly balanced. For each position, pick which color Spike plays at random.MartinBryant wrote:So is there another experiment you could propose using Spike v Spike or Spike v A.N.Other (I presume I can find another UCI engine that supports fixed node counts) which would help you?
Now play out each position twice with Spike playing the same color both times. In the first playout, use 1,000,000 nodes for both engines, and in the second use 1% more, i.e. 1,010,000 nodes for both engines. If the NPS of each bot are vastly different, choose a constant ratio of node counts to make the games roughly even. For example if the opponent is slower/smarter by a factor of two, in the first game use 1,000,000 nodes for Spike and 500,000 for opponent, and in the second game use 1,010,000 nodes for Spike and 505,000 for the opponent. Whatever makes the engines roughly the same strength, since we get the most information from nearly-equal games.
For each pair of games record the ordered pair of Spike's two results, e.g. (0.5, 1) for a draw and win, or (1, 0) for a win and a loss. That will give us 201 measurements of random variable X (=first playout) and random variable Y (=second playout). If there is no correlation, then Spike's wins, draws, and losses in the first playout won't especially line up with its wins, draws, and losses in the second playout. I'm betting they will line up, though, perhaps even enough for you to eyeball without calculating the coefficient of correlation.
As Kenny Dail was saying, I expect there to be more correlation in game results if the 201 positions are chosen from a bit later in the opening. Although I'm asking you to tilt the result in a way that I expect to be in my favor by picking such positions instead of running from the initial position of chess, using later positions is a good idea anyway since it is more like what Bob was doing, right?
It is way cool of you to offer to run an experiment, since I'm not able to do anything but spout off theories myself.