Yes, I kept the hash table between the 1000 walk runs. The weights seemed to have stabilized after about 70000 1000-walk runs.Michel wrote:Yes thanks for the experiment!This was just an experiment to see how far you could get with non-uniform sampling. For the record, I ran a longer test over night and with a bigger hash table. The standard deviation for the average-of-1000 walks seems to have stabilized at about 3.75e16, compared to 1.05e17 for uniform sampling.
I agree that the point was to see how much
better non uniform sampling can be, all performance issues aside.
Your SD reduction seems to be 2.8.
I am curious to see if the node expansion approach can do better.
Theoretically I think it cannot. But who knows.
EDIT:
Just to be absolutely sure. I assume you keep the hash table between the 1000 walk runs? So that the later runs use a hash table with "ideal"
weights (I know the question will sound stupid).
I have now tried to use node expansion to combine random walks with subtree splitting where the variance is high. The method doesn't seem to work, but here is what I did:
Each tree node contains an estimate of the subtree size below the node and an estimate of the standard deviation for the subtree size estimate.
1. Start with a single node representing the root position.
2. Find a leaf node by starting at the root and go to the child node with the highest estimated variance. Repeat until a leaf node is reached.
3. Split the selected leaf node by adding child nodes for each possible move.
4. Use N random walks for each new child node to get estimates for the tree size and standard deviation. (I have tried N=100 and N=1000.)
5. Propagate the new information back up to the root node, by summing the data for the child nodes.
6. Repeat from step 2.
In step 5, I have also tried to combine the child node data with the data from random walks made before the child nodes were added. I did this as a weighted average of the children information and the nodes' own information from random walks.
The problem with this algorithm is that I consistently get too small estimates, when I try to estimate perfT(12). When using N=100 in step 4, I quickly get estimates that are more than 5 standard deviations too low. When I use N=1000, it takes longer for the problem to show up, but eventually it happens also in this case.
If I modify step 2 so that it picks a random child node repeatedly until a leaf node is found, the bias seems to go away. (But this makes the method quite pointless, in that case you might as well just do repeated random walks instead.)
My guess is that step 2 introduces bias because subtrees that happen to have their size underestimated may also tend to have their variance underestimated, thereby causing them to stay around longer as leaf nodes before they are further subdivided. The effect is that at any given point in time, the tree will on average have an over-representation of nodes with underestimated subtree sizes.
Unless someone can come up with an idea how to get an unbiased estimator using subtree splitting, I am going to change my mind and instead believe that an optimized version of my dynamic weight adjustment method will be more efficient.
For the record, here is the code I used: http://web.comhem.se/petero2home/PerfTE ... 2.java.txt