Well you said the sqrt is irrelevant but clearly the example says otherwise (unless you find something wrong with it).
The example is wrong since it does only one step. My theorem is for what happens in the limit.
The limit does not imply variance=0.
I could write out the proof (which is trivial) but mathematics on this forum is unreadable since it does not implement mathml.
Please note that the example uses sqrt(mean) as weight not sqrt(mean^2 + variance). I already mentioned that lim(sqrt(mean^2+variance)) N->inf = mean because varaince->0. But here you are saying variance does not go to zero in the limit so I am a little confused.Also the example is not only one step. Assume a perft() estimator which gives you the exact leaf nodes estimates. The problem is that 4 * sqrt(4) != 2 * sqrt(8) when we sum the leaf nodes of the left and right move so equality of weights is not maintained ,as suggested by their perft counts of 16 , even in the limit. If the weighting was just a multiple of the mean, then it would work any time.
This also reminds me a question to ask you. Your weight sqrt(mean^2 + variance) was derived for MC with weights and multipliers. There we use multipliers but inside the tree we don't. We just sum the perfts of each child nodes to get the parents. So it may not be immediately appropriate as a UCT selection weight. Note that it becomes similar to weighing by mean nodes count since variance is relatively smaller than mean. That was probably why the results I got with your formula was closer to the mean proportioning.
It got about 4.14e12. This is also what I predicted it would reach by comparing the results at 100 mil iterations. About 5.6X times more effort is usually required for the one starting at 0 ply to get to the same variance.
Please note that the example uses sqrt(mean) as weight not sqrt(mean^2 + variance).
Then the example certainly does not invalidate my theorem since the theorem is for the weight sqrt(mean^2 + variance).
Well you said sqrt was not relevant but only that we don't have a tree. So I gave you an example for it. You also seem to be confused when you say variance does not got to zero in the limit Anyway lets stop it at that and discuss my second question.
Do you think it is appropriate to use your formula as it is for UCT selection because it was originally derived for weighing in MC which has multipliers ? It seems to me it is almost same as proportioning by mean value when used in UCT ...
Well you said sqrt was not relevant but only that we don't have a tree.
Ok I see the confusion. I though you were worried about the fact that a square root is not
an additive function. I said this is not relevant. But my theorem works _only_ for the weights sqrt(mean^2+variance). For another weight assignment it would be wrong.
is almost same as proportioning by mean value when used in UCT ...
Well if you go back to the beginning of this thread this is what I was saying.
It is also what people use in Monte Carlo integration (in higher dimension they use
an even cruder approximation for efficiency).
Michel wrote:
Well if you go back to the beginning of this thread this is what I was saying.
It is also what people use in Monte Carlo integration (in higher dimension they use
an even cruder approximation for efficiency).
Yes I stumbled on to that page while reading about quasi Monte carlo methods for variance reduction. The VEGAS kind of algorithms which try to match probablities with a certain function have a name when used for an MAB problem ,"probability matching methods". And most of the methods I tested were of that kind, sub-tree size proportioning being one of them.
X = certain_ratio - (child_visits / total_visits)
My selection method which estimates the variance reductions for each move after just one game, and picks the best by (sigma^2/N - sigma^2/N + 1) may be hard to beat. Estimate of population standard deviation (sigma) will change after one game since it is itself approximated by the sample standard deviation. So may be that can be improved but other than that I am pessimistic about other formulas giving improvements, but who knows?
As to reduction of variance in the MC part, I am going to experiment with a quasi monte-carlo method which may work for an ordered tree. The idea is if I did a perft with random numbers (U1, U2, U3... UN) , then do also a perft (1 - U1, 1 - U2 .... 1 - UN) and average the results. Try both ends of the move list to get an average in the middle. This has reduced variance for integration of some monotone functions by as much as 70%. It may work for an LMR tree for example.
I am bit lost. Is it now settled which method is better?
The UCT type methods give a StdDev of about 4e12 for Perft(12) for 100e6 random walks it seems. But a random walk is 12 nodes. So a cost of
4e12 * sqrt(12 * 100e6)=1.4e17
The data posted by HGM for 5 ply fullwith 500e6 nodes search seems to give a StdDev
of 0.012% which comes down to 7.5e12. So a total cost of
7.5e12*sqrt(500e6)=1.7e17
So in this case UCT seems somewhat better. At the cost of a much more complex implementation however.
EDIT:
I am assuming that in HGM's date "nodes" means all nodes visited. Not just leaf
nodes.
If we do the count with leaf nodes then UCT is much farther ahead but this is
probably unfair as one should really count the number of move generation
calls and not the number of leaf nodes.
EDIT2:
The latter remark immediately suggests an optimization in Peter's method. Rather
than do a single random walk at a node, do several. This saves calls to GenMoves.
The latter remark immediately suggests an optimization in Peter's method. Rather
than do a single random walk at a node, do several. This saves calls to GenMoves.
Of course this is not entirely clear as doing multiple random walks nodes higher in the tree leaves less resources for doing random walks at the root. This is again an optimization problem.
The latter remark immediately suggests an optimization in Peter's method. Rather
than do a single random walk at a node, do several. This saves calls to GenMoves.
Of course this is not entirely clear as doing multiple random walks nodes higher in the tree leaves less resources for doing random walks at the root. This is again an optimization problem.
I already tried this approach. With my implementation of Peter's method I selected exactly two instead of exactly one random move. The results were horrible, the "Error" value was huge, so I immediately discarded it.
A possible explanation immediately comes into mind from your last remark. For depth=12 and fullDepth=3, for instance, selecting two moves instead of one increases the number of random walks per full-width leaf by a factor of 2^(12-3-1)=256. Keeping the total effort constant (measured in total number of nodes visited - btw I fully agree that measuring this way makes a lot of sense) means starting only 1/256 of the previous number of walks from the root. The overall number of paths from root to leaf is of course the same but the nature of these paths is different. Many of these paths have a common beginning (starting from first selective ply!) and differ only towards the leaves, since at each selective node you "fork" the path. So I could imagine that this is counter-productive for getting enough randomness.
Maybe someone else is more successful with that idea.