I am not sure if I follow precisely what you say here but I hadAnother remark: One has to be pretty careful here, because it is very tempting to come up with examples where you choose some of the probabililities zero, to reduce variance. But then you fall into the trap of the limit of an expectation value (or variance, which is also an expectation value) in general not being equal to the expectation value of the limit distribution. If you let probabilities approach zero, the weight you need to give the result when it happens approaches infinity, and this hugely drives up variance. Only when you take that limit first, setting the probability to exactly zero, you get rid of the variance. But that is meaningless, it is not the variance of any unbiased estimate, because it becomes independent of the size of that sub-tree, which the true average is not.
made similar considerations.
I understand it like this. If you assign very small probabilities
then the distribution becomes very skew and you need very large
samples before the strong limit theorem kicks in (which allows
you to use the variance to compute error bars).
So saying that the weights should be proportional to tree size
is an exageration. One should not assign probabilities which
are small compared to the sample size one wants to use.
But as the example in my post shows: even a very crude biasing
of the weights gives a substantial reduction of the variance.
EDIT: I am not sure you noticed my "optimal" value for the weights.
I reproduce it here
if one wants to estimate x_1+...+x_n and
one has unbiased estimators X_i for x_i with variance sigma_i^2 then
if I calculated correctly one should choose X_i with probability
p_i=mu sqrt(x_i^2+sigma_i^2)
where mu should be adjusted such that the sum of the p_i is 1.