Perft(13) betting pool

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Defect of a node?

Post by Daniel Shawul »

petero2 wrote:
Michel wrote:
"cost" is defined as sqrt(mean^2 + variance)
My conjecture is that the correct measure to decide splitting is what
I call the "defect".

sqrt(mean^2 + variance)-mean
I tested this method and a method that has the much simpler node expansion criteria "ply<5". This means that the in-memory tree will very quickly grow to a tree with exactly perft(5) leaf nodes. Here are the standard deviations I measured after 100e6 random walks:

Code: Select all

split condition   stddev
sqrt&#40;m^2+var&#41;     3.9826e+12
sqrt&#40;m^2+var&#41;-m   3.9805e+12
ply<5             3.9642e+12

I don't know if the differences are significant, but it does seem to support your idea that tree growing is overkill for this problem, at least when you start from the initial chess position.
Maybe starting from a more tactical position would be different.
Sorry but too much sour grapes! I have already heard from both you and Michel how tree growth is "not workable", "not worth it" etc... So let me get things straight once and for all.

Even _dynamic_ weighting is not your or Michel's idea let alone the tree growing algorithm. You ,Michel and every one else was engrossed in static way of determining weights for your MC part of the tree while I consistently were saying a magician is required. Well we tried with static capture heuristic and didn't get much,did we? Then I explicitly replied to you mentioning _twice_ how I can get better results with dynamic proportioning and UCT. How else can you get dynamic weights if you don't store the result of the tree itself somehow? Hashtable,tree etc is all semantics! So indeed dynamic weights are also an idea derived from UCT.
The reason why I wanted you to start from the root is because within the tree we are using _uniform_ sampling. One way to decrease the variance could be using upper confidence bound by saving the first 1-3 plies in memory. I mentioned the following formula a while back that is used for UCT, I am sure it can be modified to give much better results than any heuristic since this one learns it dynamically. But it requires complicated programming..

Code: Select all

X = Xmean + Constant * sqrt&#40; log&#40;parent_nodes&#41; / child_nodes&#41;)
Select the node with maximum X value. The second term which is the variance is meant for a mean score between 0 < X < 1 but I am sure it can be modified for our case. Anyway even using simple proportioning based on mean nodes count at the tree, this _dynamic_ adjustement should be better than any heuristic.
This post is in reply to you regarding your bad results with static capture heuristc.
So please tell me if you come up with the dynamic weighting by yourself??
Then I followed it up with the whole description in my next post. Again you were still in static tests.
Here the idea for this kickass algorithm. You can say it is UCT for perft but I haven't actually implemented it so it can be all for nothing if I made some mistake in planning somewhere

Code: Select all

a&#41; Start from the root and do 10 Monte carlo simulations for each move
b&#41; Pick the best node to expand using UCB formula I provided above
c&#41; Expand the selected node which will be one of the biggest nodes with the right formula.
d&#41; Do this recursively for the children of that big node
So this algorithm explores the bigger nodes more often, while the smaller nodes will have their estimates from small number of simulations. No need for heuristic, no need for fixed depth expansion unlike the one we are doing i.e ply 3-6 fixed with uniform sampling.

If I have to write out my imagination, this algorithm will be so good as to be the best. But I could be wrong somewhere. Opinions ?
We gave you credit with modifying HG's method to a pure MC , even though MC is a well known algorithm. This is an achievement as it made things simpler at least for me...
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Defect of a node?

Post by Daniel Shawul »

And here is a new best result when I let it expand more. It is clear I can do that as much as I want to get variance down.
Mean 6.284728e+016
StdDev1 3.273917e+012
StdDev2 3.837780e+012
The first stddev is the one I have been reporting upto now so that is much lower than the 4.0e12 estimates so far. But I prefer the 3.83778e12 estimate ,which is still the best so far, since that one is associated with the unbiased mean. The more I decide to expand the more the bias.
Detail result:

Code: Select all

wasted 0.00
Time &#58; 6589032ms Tree &#58; nodes 5448629 depth 5/0 pps 15176 visits 100000000
0 h2h3   1.782537e+015 +- 5.677375e+011 1.814645e+015 +- 6.586233e+011   3007181    179137
0 h2h4   2.568554e+015 +- 6.674070e+011 2.618871e+015 +- 7.881380e+011   4155713    232790
0 g2g3   2.452181e+015 +- 6.511507e+011 2.497848e+015 +- 7.659198e+011   3955737    223801
0 g2g4   2.177061e+015 +- 6.072715e+011 2.218104e+015 +- 7.139257e+011   3440677    212783
0 f2f3   1.529340e+015 +- 5.245851e+011 1.552443e+015 +- 6.031390e+011   2567459    168521
0 f2f4   2.014427e+015 +- 6.074470e+011 2.050898e+015 +- 7.047999e+011   3442554    200779
0 e2e3   7.049659e+015 +- 1.111169e+012 7.161492e+015 +- 1.294366e+012  11519246    506986
0 e2e4   7.155433e+015 +- 1.116985e+012 7.263627e+015 +- 1.294026e+012  11640160    494503
0 d2d3   4.502213e+015 +- 8.468002e+011 4.589482e+015 +- 1.016031e+012   6689999    343223
0 d2d4   6.210609e+015 +- 1.022905e+012 6.327090e+015 +- 1.212181e+012   9761911    436299
0 c2c3   2.703132e+015 +- 6.924851e+011 2.750726e+015 +- 8.095179e+011   4473887    233905
0 c2c4   3.060255e+015 +- 7.317014e+011 3.118954e+015 +- 8.590323e+011   4994958    252487
0 b2b3   2.361856e+015 +- 6.388795e+011 2.407226e+015 +- 7.526590e+011   3808046    222604
0 b2b4   2.366240e+015 +- 6.389858e+011 2.412551e+015 +- 7.494126e+011   3809312    220644
0 a2a3   1.792392e+015 +- 5.699437e+011 1.825004e+015 +- 6.610101e+011   3030595    180717
0 a2a4   2.519513e+015 +- 6.589673e+011 2.572188e+015 +- 7.807706e+011   4051276    229216
0 g1h3   2.095797e+015 +- 6.072664e+011 2.132012e+015 +- 7.064724e+011   3440508    201852
0 g1f3   2.651979e+015 +- 6.654146e+011 2.702961e+015 +- 7.880087e+011   4130939    238939
0 b1c3   2.681367e+015 +- 6.973724e+011 2.730308e+015 +- 8.158274e+011   4537259    251004
0 b1a3   2.064341e+015 +- 6.162088e+011 2.100854e+015 +- 7.152882e+011   3542583    203033
Perft 6.173889e+016 +- 3.273917e+012  6.284728e+016 +- 3.837780e+012
move e2e4
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Defect of a node?

Post by Michel »

Sorry but too much sour grapes! I have already heard from both you and Michel how tree growth is "not workable", "not worth it" etc... So let me get things straight once and for all.
Calm down. I think you misunderstood what I was saying.

Whatever tree splitting algorithm used, all that counts is the shape of the final subtree. So if you can predict the shape of the final subtree you can generate it in advance and save the work of discovering it dynamically.

I now think that perft trees (in chess) are so regular that for any sensible tree splitting criterion you basically get an almost uniform subtree in the end. By uniform _in this context_ I mean that all leaves are at the same distance from the root. In practice the leaves are at distance d and d-1
from the root for a certain d.

The fact that you use a uniform subtree does not mean that you use uniform weights. The weights are still adjusted dynamically. You seem to be confusing the two.

For artificially generated trees, tree splitting likely does work
and I think that in that case my splitting criterion based on sqrt(mean^2+variance)-mean will show its benefits (remember
that this is not an arbitrary assessment, there is a bit of theory
behind it).
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Defect of a node?

Post by Daniel Shawul »

Michel wrote:
Sorry but too much sour grapes! I have already heard from both you and Michel how tree growth is "not workable", "not worth it" etc... So let me get things straight once and for all.
Calm down. I think you misunderstood what I was saying.

Whatever tree splitting algorithm used, all that counts is the shape of the final subtree. So if you can predict the shape of the final subtree you can generate it in advance and save the work of discovering it dynamically.

I now think that perft trees (in chess) are so regular that for any sensible tree splitting criterion you basically get an almost uniform subtree in the end. By uniform _in this context_ I mean that all leaves are at the same distance from the root. In practice the leaves are at distance d and d-1
from the root for a certain d.

The fact that you use a uniform subtree does not mean that you use uniform weights. The weights are still adjusted dynamically. You seem to be confusing the two.

For artificially generated trees, tree splitting likely does work
and I think that in that case my splitting criterion based on sqrt(mean^2+variance)-mean will show its benefits (remember
that this is not an arbitrary assessment, there is a bit of theory
behind it).
I am calm and I perfectly understand what you are saying. And I did more tests than what you give me credit for. For example I tested with a pre - generated tree even for uniform tree upto depth=4. The only difference was that I saved some "wasted" nodes when I had that as a way of avoiding bias. Then I made the test with perft(3) leaf nodes grown dynamically. All methods would grow the tree to depth=3 in a couple of iterations so it is like it is pre-generated anyway. Uniform sampling inside the tree performed worst by far as the graph clearly shows! The _limited_ depth version test I carried out is all about that. So obviously you need non-uniform sampling inside the 3-ply tree. So again what do you mean by using weights alone since you have to pass through the internal nodes anyway and you can't use weights in there. Even if you have weights at the leaf nodes, you would have to re-adjust them dynamically every simulation taking you back to UCT... So pre-grown or not doesn't make a bit of a difference AFAIAC.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Defect of a node?

Post by Daniel Shawul »

The first stddev is the one I have been reporting upto now so that is much lower than the 4.0e12 estimates so far. But I prefer the 3.83778e12 estimate ,which is still the best so far, since that one is associated with the unbiased mean. The more I decide to expand the more the bias.
This turned out not to be the case. Even though I can shred the variance of the first sequence of random numbers down to 2.5e12 by using lower values of N simulations before split , the second sequence of random numbers started giving larger values. Surely there is an optimum somewhere but I am not going to bother finding it. Anyway I am switching focus to testing with a less frequently calculated first sequence of perfts. And also a couple of other ideas in variance reduction inside the MC part this time.

Code: Select all

Time &#58; 6403922ms Tree &#58; nodes 9177654 depth 5/0 pps 15615 visits 100000016
0 h2h3   1.766181e+015 +- 5.215487e+011 1.813382e+015 +- 6.592160e+011   2888467    253834
0 h2h4   2.551712e+015 +- 6.218982e+011 2.620731e+015 +- 7.892811e+011   4106921    362322
0 g2g3   2.436623e+015 +- 6.053940e+011 2.499547e+015 +- 7.658254e+011   3891832    333268
0 g2g4   2.162810e+015 +- 5.603509e+011 2.218915e+015 +- 7.080184e+011   3334248    274522
0 f2f3   1.515409e+015 +- 4.782346e+011 1.552844e+015 +- 6.007535e+011   2428625    207731
0 f2f4   1.998872e+015 +- 5.621314e+011 2.051063e+015 +- 7.013837e+011   3355499    282826
0 e2e3   7.027216e+015 +- 1.059463e+012 7.159130e+015 +- 1.315015e+012  11919253   1150909
0 e2e4   7.133000e+015 +- 1.065527e+012 7.265329e+015 +- 1.321850e+012  12056270   1136197
0 d2d3   4.483619e+015 +- 7.967010e+011 4.587187e+015 +- 1.017632e+012   6740136    533671
0 d2d4   6.184575e+015 +- 9.719802e+011 6.325351e+015 +- 1.230489e+012  10032117    934659
0 c2c3   2.688746e+015 +- 6.491614e+011 2.751532e+015 +- 8.094271e+011   4474900    374845
0 c2c4   3.046656e+015 +- 6.882674e+011 3.120403e+015 +- 8.590486e+011   5030280    419144
0 b2b3   2.342552e+015 +- 5.915192e+011 2.408296e+015 +- 7.541956e+011   3715486    328434
0 b2b4   2.352046e+015 +- 5.926430e+011 2.411985e+015 +- 7.443838e+011   3729615    319988
0 a2a3   1.777197e+015 +- 5.237964e+011 1.825797e+015 +- 6.633409e+011   2913556    259623
0 a2a4   2.505831e+015 +- 6.137205e+011 2.572574e+015 +- 7.812871e+011   3999652    354120
0 g1h3   2.080177e+015 +- 5.617254e+011 2.133785e+015 +- 7.089516e+011   3350626    286751
0 g1f3   2.636913e+015 +- 6.193342e+011 2.703358e+015 +- 7.815280e+011   4073127    333381
0 b1c3   2.667366e+015 +- 6.509850e+011 2.729358e+015 +- 8.134836e+011   4500075    399287
0 b1a3   2.050768e+015 +- 5.707648e+011 2.100404e+015 +- 7.143496e+011   3459331    296889
Perft 6.140827e+016 +- 3.068742e+012  6.285097e+016 +- 3.856983e+012
And an even smaller one

Code: Select all

Time &#58; 5782594ms Tree &#58; nodes 27799619 depth 5/0 pps 17293 visits 100000000
0 h2h3   1.736958e+015 +- 4.384240e+011 1.815260e+015 +- 9.123759e+011   2990520    745333
0 h2h4   2.502029e+015 +- 5.116218e+011 2.620076e+015 +- 1.085081e+012   4072562   1073868
0 g2g3   2.392628e+015 +- 5.023985e+011 2.498309e+015 +- 1.134003e+012   3927022   1013837
0 g2g4   2.123288e+015 +- 4.678906e+011 2.218492e+015 +- 9.939452e+011   3406010    830628
0 f2f3   1.492583e+015 +- 4.048312e+011 1.554152e+015 +- 7.503766e+011   2549796    582451
0 f2f4   1.961390e+015 +- 4.672760e+011 2.051777e+015 +- 1.002480e+012   3397137    858677
0 e2e3   6.930430e+015 +- 8.633096e+011 7.157823e+015 +- 1.941372e+012  11595592   3372713
0 e2e4   7.036678e+015 +- 8.686754e+011 7.264484e+015 +- 1.902902e+012  11740154   3417681
0 d2d3   4.418302e+015 +- 6.672523e+011 4.587952e+015 +- 1.465322e+012   6926879   1803338
0 d2d4   6.095809e+015 +- 7.943953e+011 6.330096e+015 +- 1.838796e+012   9818292   2824445
0 c2c3   2.642887e+015 +- 5.397222e+011 2.752418e+015 +- 1.137403e+012   4532175   1199619
0 c2c4   2.989416e+015 +- 5.674621e+011 3.120617e+015 +- 1.324983e+012   5009926   1342374
0 b2b3   2.302021e+015 +- 4.911594e+011 2.407405e+015 +- 9.670110e+011   3753218    966597
0 b2b4   2.308663e+015 +- 4.919500e+011 2.410406e+015 +- 1.006921e+012   3765302    962567
0 a2a3   1.746746e+015 +- 4.396997e+011 1.826924e+015 +- 1.018873e+012   3008004    755631
0 a2a4   2.457783e+015 +- 5.061977e+011 2.573956e+015 +- 1.090063e+012   3986554   1046360
0 g1h3   2.046017e+015 +- 4.688346e+011 2.134364e+015 +- 1.218835e+012   3419860    863590
0 g1f3   2.588700e+015 +- 5.131210e+011 2.702784e+015 +- 1.031646e+012   4096351   1036367
0 b1c3   2.617637e+015 +- 5.356776e+011 2.729853e+015 +- 1.089800e+012   4464472   1185359
0 b1a3   2.015958e+015 +- 4.770116e+011 2.100728e+015 +- 1.005556e+012   3540174    903438
Perft 6.040593e+016 +- 2.535242e+012  6.285788e+016 +- 5.544666e+012
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Defect of a node?

Post by Michel »

So again what do you mean by using weights alone since you have to pass through the internal nodes anyway and you can't use weights in there.
Well the weight at an internal node is simply the sum of the weights of the leaf nodes (of the subtree) below this node. If you follow this rule you will select the leaf nodes with the
correct frequency.

It is unfortunately slightly tricky to keep this up to date since we are not in a tree but in a directed graph. So naive back propagation does not work entirely correctly (this was the error in my implementation).

EDIT.

It would probably be better to design a data structure which allows you to jump to the
leaf nodes directly, skipping the internal nodes, which are not relevant anyway.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Defect of a node?

Post by Daniel Shawul »

Michel wrote:
So again what do you mean by using weights alone since you have to pass through the internal nodes anyway and you can't use weights in there.
Well the weight at an internal node is simply the sum of the weights of the leaf nodes (of the subtree) below this node. If you follow this rule you will select the leaf nodes with the
correct frequency.

It is unfortunately slightly tricky to keep this up to date since we are not in a tree but in a directed graph. So naive back propagation does not work entirely correctly (this was the error in my implementation).

EDIT.

It would probably be better to design a data structure which allows you to jump to the
leaf nodes directly, skipping the internal nodes, which are not relevant anyway.
You would have to use non-uniform sampling in there too so it becomes the same as UCT anyway pre-grown or not. You can not use the the lightweight (less memory intensive) depth-first because you need statistics to be stored inside the internal nodes as well. The back propagation operator for perft just adds means, variances etc... All my formulas ,weights I suppose, are based on this so I don't need to back propagate them directly. Using pre-grown or not does not make a difference. BTW back propagating uncertainty is more difficult for MIN/MAX operations as in tree searching i.e before UCT. Remi coulom had some complex formula's for that. In UCT it is common to select the most frequently visited node without even looking at the actual score...

About your formula: What is wrong with calculating the weights based back propagated mean and variance instead of back propagating the weights directly (unless I misunderstood you) ? That is what I did when testing your formula. Also I find the use of the mean in the variance reduction formula more and more vague. Earlier when we solely depended on static estimates, it makes sense to proportion weights based on sub-tree sizes i.e mean. Now that we have good estimates of variance at any time during iteration, I don't see much use for it. It is a MAB problem with the main goal of reducing variance. My solution for it is a greedy selection of an arm that leads to highest reduction in variance i.e (sigma^2/N - sigma^2/N + 1).
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Defect of a node?

Post by hgm »

I still believe it will be very hard to beat the "fixed move-acceptance probability approach", (if that can be beaten at all).
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Defect of a node?

Post by Michel »

I still believe it will be very hard to beat the "fixed move-acceptance probability approach", (if that can be beaten at all).
I think it can be beaten asymptotically. It is not clear if it can be beaten
for realistic perrft values, given the trivial implementation.

A while ago I computed the propagation of (nodes visited)*variance for both your
method and Peter's method with uniform sampling for a uniform tree and the
propagation in Peter's case was better.

This was based on a pure implementation of your method though. If I understood correctly you keep track in your computation of the number of accepted moves at each ply
I don't know how this affects the result.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Defect of a node?

Post by Daniel Shawul »

Well the variance reduction methods in the full width part has clearly been improved compared to uniform depth expansion. It is _complimentary_ with the variance reduction methods in the selection (Monte carlo) part of the tree so everyone can benefit from it. You may be right with your claim for the selective part as it is not undoubtedly proven a pure MC (Peter's approach) is better than what you do.

What I am planning to do in the MC part is slightly different. I am going to run two or more perfts for a leaf node. A variance reduction idea I got known as "using anthitheic variables" (am sure screwed up the spelling there). The idea is for a monotone function, using negatively correlated sequence of random numbers can reduce variance upto 70% or so I read. For an ordered tree like in an LMR , this may work if we have good move ordering based on tree size.
I don't know much about the method so if any one has an idea on the workability of the method.