Perft(13) betting pool

Daniel Shawul · Post by **Daniel Shawul** » Thu Aug 11, 2011 1:20 pm

Michel wrote:
Well you said the sqrt is irrelevant but clearly the example says otherwise (unless you find something wrong with it).
The example is wrong since it does only one step. My theorem is for what happens in the limit.

The limit does not imply variance=0.

I could write out the proof (which is trivial) but mathematics on this forum is unreadable since it does not implement mathml.

Please note that the example uses sqrt(mean) as weight not sqrt(mean^2 + variance). I already mentioned that lim(sqrt(mean^2+variance)) N->inf = mean because varaince->0. But here you are saying variance does not go to zero in the limit so I am a little confused.Also the example is not only one step. Assume a perft() estimator which gives you the exact leaf nodes estimates. The problem is that 4 * sqrt(4) != 2 * sqrt(8) when we sum the leaf nodes of the left and right move so equality of weights is not maintained ,as suggested by their perft counts of 16 , even in the limit. If the weighting was just a multiple of the mean, then it would work any time.
This also reminds me a question to ask you. Your weight sqrt(mean^2 + variance) was derived for MC with weights and multipliers. There we use multipliers but inside the tree we don't. We just sum the perfts of each child nodes to get the parents. So it may not be immediately appropriate as a UCT selection weight. Note that it becomes similar to weighing by mean nodes count since variance is relatively smaller than mean. That was probably why the results I got with your formula was closer to the mean proportioning.

Daniel Shawul · Post by **Daniel Shawul** » Thu Aug 11, 2011 1:28 pm

It finished.

Code: Select all

498M 6.285261e+016 +- 4.144135e+012
499M 6.285261e+016 +- 4.139973e+012

It got about 4.14e12. This is also what I predicted it would reach by comparing the results at 100 mil iterations. About 5.6X times more effort is usually required for the one starting at 0 ply to get to the same variance.

Daniel Shawul · Post by **Daniel Shawul** » Thu Aug 11, 2011 1:37 pm

And I got a lower variance by using a slightly higher N = 75

Mean 6.285331e+016
StdDev1 3.200617e+012
StdDev2 3.767452e+012

Detail :

Code: Select all

------------ 100 --------------
0 h2h3   1.776249e+015 +- 5.497688e+011 1.813514e+015 +- 6.489119e+011   2950474    198071
0 h2h4   2.564620e+015 +- 6.502346e+011 2.620180e+015 +- 7.741230e+011   4127415    261421
0 g2g3   2.448605e+015 +- 6.339089e+011 2.498763e+015 +- 7.511630e+011   3922843    247548
0 g2g4   2.175157e+015 +- 5.892079e+011 2.215929e+015 +- 6.983936e+011   3388975    225357
0 f2f3   1.526651e+015 +- 5.072713e+011 1.552888e+015 +- 5.938812e+011   2511954    181842
0 f2f4   2.010535e+015 +- 5.894239e+011 2.051050e+015 +- 6.945458e+011   3391461    216525
0 e2e3   7.052357e+015 +- 1.096859e+012 7.162071e+015 +- 1.269445e+012  11744462    667344
0 e2e4   7.157400e+015 +- 1.101106e+012 7.263929e+015 +- 1.275693e+012  11835582    643147
0 d2d3   4.502045e+015 +- 8.295392e+011 4.589112e+015 +- 9.919863e+011   6717458    380673
0 d2d4   6.211736e+015 +- 1.007660e+012 6.325279e+015 +- 1.190064e+012   9912012    558315
0 c2c3   2.702244e+015 +- 6.761533e+011 2.752242e+015 +- 7.934935e+011   4462944    260133
0 c2c4   3.059587e+015 +- 7.154998e+011 3.119601e+015 +- 8.416018e+011   4997471    283656
0 b2b3   2.356984e+015 +- 6.206550e+011 2.407453e+015 +- 7.398755e+011   3760380    246667
0 b2b4   2.363646e+015 +- 6.217054e+011 2.411594e+015 +- 7.339142e+011   3773121    242236
0 a2a3   1.787840e+015 +- 5.519437e+011 1.825739e+015 +- 6.529436e+011   2973881    201397
0 a2a4   2.518741e+015 +- 6.422428e+011 2.573284e+015 +- 7.636726e+011   4026520    256299
0 g1h3   2.091718e+015 +- 5.894865e+011 2.132618e+015 +- 6.952024e+011   3392180    220819
0 g1f3   2.647664e+015 +- 6.479021e+011 2.705429e+015 +- 7.716747e+011   4097835    256364
0 b1c3   2.679813e+015 +- 6.801433e+011 2.730340e+015 +- 7.999429e+011   4515773    280548
0 b1a3   2.061969e+015 +- 5.985469e+011 2.102294e+015 +- 7.034788e+011   3497259    225086
Perft 6.169556e+016 +- 3.200617e+012  6.285331e+016 +- 3.767452e+012

Michel · Post by **Michel** » Thu Aug 11, 2011 1:39 pm

Please note that the example uses sqrt(mean) as weight not sqrt(mean^2 + variance).

Then the example certainly does not invalidate my theorem since the theorem is for the weight sqrt(mean^2 + variance).

Daniel Shawul · Post by **Daniel Shawul** » Thu Aug 11, 2011 1:49 pm

Michel wrote:
Please note that the example uses sqrt(mean) as weight not sqrt(mean^2 + variance).
Then the example certainly does not invalidate my theorem since the theorem is for the weight sqrt(mean^2 + variance).

Well you said sqrt was not relevant but only that we don't have a tree. So I gave you an example for it. You also seem to be confused when you say variance does not got to zero in the limit

Anyway lets stop it at that and discuss my second question.
Do you think it is appropriate to use your formula as it is for UCT selection because it was originally derived for weighing in MC which has multipliers ? It seems to me it is almost same as proportioning by mean value when used in UCT ...

Michel · Post by **Michel** » Thu Aug 11, 2011 3:28 pm

Well you said sqrt was not relevant but only that we don't have a tree.

Ok I see the confusion. I though you were worried about the fact that a square root is not
an additive function. I said this is not relevant. But my theorem works _only_ for the weights sqrt(mean^2+variance). For another weight assignment it would be wrong.

is almost same as proportioning by mean value when used in UCT ...

Well if you go back to the beginning of this thread this is what I was saying.
It is also what people use in Monte Carlo integration (in higher dimension they use
an even cruder approximation for efficiency).

http://en.wikipedia.org/wiki/VEGAS_algorithm

Daniel Shawul · Post by **Daniel Shawul** » Thu Aug 11, 2011 5:19 pm

Michel wrote:
Well if you go back to the beginning of this thread this is what I was saying.
It is also what people use in Monte Carlo integration (in higher dimension they use
an even cruder approximation for efficiency).

http://en.wikipedia.org/wiki/VEGAS_algorithm

Yes I stumbled on to that page while reading about quasi Monte carlo methods for variance reduction. The VEGAS kind of algorithms which try to match probablities with a certain function have a name when used for an MAB problem ,"probability matching methods". And most of the methods I tested were of that kind, sub-tree size proportioning being one of them.

Code: Select all

X = certain_ratio - &#40;child_visits / total_visits&#41;

My selection method which estimates the variance reductions for each move after just one game, and picks the best by (sigma^2/N - sigma^2/N + 1) may be hard to beat. Estimate of population standard deviation (sigma) will change after one game since it is itself approximated by the sample standard deviation. So may be that can be improved but other than that I am pessimistic about other formulas giving improvements, but who knows?

As to reduction of variance in the MC part, I am going to experiment with a quasi monte-carlo method which may work for an ordered tree. The idea is if I did a perft with random numbers (U1, U2, U3... UN) , then do also a perft (1 - U1, 1 - U2 .... 1 - UN) and average the results. Try both ends of the move list to get an average in the middle. This has reduced variance for integration of some monotone functions by as much as 70%. It may work for an LMR tree for example.

Michel · Post by **Michel** » Fri Aug 12, 2011 12:42 pm

I am bit lost. Is it now settled which method is better?

The UCT type methods give a StdDev of about 4e12 for Perft(12) for 100e6 random walks it seems. But a random walk is 12 nodes. So a cost of

4e12 * sqrt(12 * 100e6)=1.4e17

The data posted by HGM for 5 ply fullwith 500e6 nodes search seems to give a StdDev
of 0.012% which comes down to 7.5e12. So a total cost of

7.5e12*sqrt(500e6)=1.7e17

So in this case UCT seems somewhat better. At the cost of a much more complex implementation however.

EDIT:

I am assuming that in HGM's date "nodes" means all nodes visited. Not just leaf
nodes.

If we do the count with leaf nodes then UCT is much farther ahead but this is
probably unfair as one should really count the number of move generation
calls and not the number of leaf nodes.

EDIT2:

The latter remark immediately suggests an optimization in Peter's method. Rather
than do a single random walk at a node, do several. This saves calls to GenMoves.

Michel · Post by **Michel** » Fri Aug 12, 2011 1:24 pm

The latter remark immediately suggests an optimization in Peter's method. Rather
than do a single random walk at a node, do several. This saves calls to GenMoves.

Of course this is not entirely clear as doing multiple random walks nodes higher in the tree leaves less resources for doing random walks at the root. This is again an optimization problem.

Sven · Post by **Sven** » Fri Aug 12, 2011 1:58 pm

Michel wrote:
The latter remark immediately suggests an optimization in Peter's method. Rather
than do a single random walk at a node, do several. This saves calls to GenMoves.
Of course this is not entirely clear as doing multiple random walks nodes higher in the tree leaves less resources for doing random walks at the root. This is again an optimization problem.

I already tried this approach. With my implementation of Peter's method I selected exactly two instead of exactly one random move. The results were horrible, the "Error" value was huge, so I immediately discarded it.

A possible explanation immediately comes into mind from your last remark. For depth=12 and fullDepth=3, for instance, selecting two moves instead of one increases the number of random walks per full-width leaf by a factor of 2^(12-3-1)=256. Keeping the total effort constant (measured in total number of nodes visited - btw I fully agree that measuring this way makes a lot of sense) means starting only 1/256 of the previous number of walks from the root. The overall number of paths from root to leaf is of course the same but the nature of these paths is different. Many of these paths have a common beginning (starting from first selective ply!) and differ only towards the leaves, since at each selective node you "fork" the path. So I could imagine that this is counter-productive for getting enough randomness.

Maybe someone else is more successful with that idea.

Sven

Perft(13) betting pool

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?