Perft(13) betting pool

Daniel Shawul · Post by **Daniel Shawul** » Thu Jul 28, 2011 4:09 am

It is getting better with the right constants. Still applying it at the root only though. Balancing exploration and exploitation is a tough task.

Code: Select all

&#91;st = 5557ms, mt = 29250ms , moves_left 10&#93;
Time &#58; 47344ms Tree &#58; nodes 21 depth 0/0 pps 21122 visits 1000000
h2h3   5.615973e+016    1909
h2h4   7.733088e+016    2381
g2g3   7.266894e+016    2263
g2g4   6.549204e+016    2098
f2f3   4.475240e+016    1710
f2f4   6.342249e+016    2053
e2e3   2.407663e+017  255279
e2e4   2.474331e+017  663269
d2d3   1.374823e+017    5342
d2d4   2.140450e+017   39885
c2c3   8.206198e+016    2511
c2c4   9.907444e+016    3077
b2b3   7.252271e+016    2260
b2b4   7.467716e+016    2313
a2a3   5.528931e+016    1892
a2a4   8.211674e+016    2512
g1h3   6.459196e+016    2078
g1f3   8.015133e+016    2457
b1c3   8.754613e+016    2674
b1a3   6.267552e+016    2037
Perft 1.980261e+018
move e2e4

Edit: I don't think I am doing the averaging right at all.
The problem is the following. Say I have improved the estimation of perft of a2a4 by playing 10 more games, then how do you update the parents (i.e the root's perft count automatically). The way I am doing it right now seems wrong, since I average it with its previous value using the correct proportion... But the previous average is an average perft of many moves. Hope I am clear enough ?

Michel · Post by **Michel** » Thu Jul 28, 2011 4:27 am

This seems very nice.

If I understand correctly: this is what you are _currently_ doing.

(1) You write Perft at the root as the sum of the Perfts after 1 move.

(2) The perfts after one move you compute (presumably)
using Peter's algorithm with uniform sampling.

(3) You select in which moves to sample more according to some dynamic measure (roughly based on subtree size).

I think this is equivalent to a non-uniform version of Peter's algorithm used at the root where the weights are dynamically adjusted using subtree sizes.

Your measurements seems to indicate that this performs very well.
This means that the "fullDepth" expansion people have been using
was a bad idea.

I understand that the next step is to expand nodes with large subtree
sizes.

Michel · Post by **Michel** » Thu Jul 28, 2011 4:35 am

I ran a test where I do 1000 random walks from the root position, then print the average value, then repeat in an infinite loop. When using uniform weights, this average-of-1000 has a standard deviation of about 1.05e17. With the algorithm I described above, after about 28 million random walks, the standard deviation for an average-of-1000 is about 4.4e16.

So a substantial reduction of variance.

However I am somewhat worried that using a hash table over the whole tree might be too chaotic (apart from the performance issues).

Daniel is experimenting with a similar idea for dynamic adjustment
where he takes statistics for nodes close to the root (presumably
these are the most important ones). He seems to be getting
very good results.

Daniel Shawul · Post by **Daniel Shawul** » Thu Jul 28, 2011 4:41 am

Yes it is a very good algorithm that has literally revolutionized computer Go successfully, UCT tree search. It should have been the first thing that should have come to my mind when I saw a plain monte-carlo approach used for Perft. But I could be so stupid sometimes!!

a) First do a certain number of simulations (plain MC which is exactly what Peter is using) for each move. I do 1 now but I think atleast 10 is required anyway it is adjustable.

b) Now that you have an estimate on the sub-tree sizes , the next time you visit this node you can choose which sub-tree to sample more. I used a greedy forumla that just picks the largest sub-tree first, but then I raised the exploration constant higher and higher until I got the above result.

c) I haven't started expanding yet but the idea is to expand the biggest nodes such as e2e4 and explore them more. So this is so dynamically guided it will lower the variance so much.

I don't know why I didnt think of it at first when I have the implementation already in there. And it is a tough one to implement at that!

I am going to try many formulas to pick best node to explore given sub-tree estimates by doing a couple of simulations. I will try your formula too.
Any ideas welcome.

More tommorow

Michel · Post by **Michel** » Thu Jul 28, 2011 5:14 am

Ok for those jumping into this thread now.

I think that Daniel's and Peter's dynamic adjustment algorithms are very similar.

The differences are:

(1) Peter also keeps track of the variance (in principle this should allow
for more accurate weight selection).

(2) Peter uses a hash table to keeps statistics over the whole tree.
Daniel intends to keep statistics close to the root (for the moment only
at the root). Since in Peter's replacement strategy nodes with the biggest
subtree are kept, in the limit both approaches will presumably converge
to the same thing.

(3) Peter also uses a "fullDepth" search. Perhaps this is a waste of
resources? It is more or less equivalent to a uniform MC search
for fullDepth plies, and we are trying to get rid of uniform searches.

Michel · Post by **Michel** » Thu Jul 28, 2011 6:38 am

(4) I forgot to mention the exploration issue (how do you discover the large
subtrees?).

Daniel has a number of explicit exploration rounds (currently only at the root) to gather initial statistics.

Peter gives a move with insufficient statistics
a weight which is the average of the weights of the moves in that position
for which there are statistics (if there are no such moves then uniform
sampling is used). This insures that an unexplored move will not get a very
low weight and hence will be explored.

Daniel Shawul · Post by **Daniel Shawul** » Thu Jul 28, 2011 6:40 am

Michel wrote:Ok for those jumping into this thread now.

I think that Daniel's and Peter's dynamic adjustment algorithms are very similar.

The differences are:

(1) Peter also keeps track of the variance (in principle this should allow
for more accurate weight selection).

(2) Peter uses a hash table to keeps statistics over the whole tree.
Daniel intends to keep statistics close to the root (for the moment only
at the root). Since in Peter's replacement strategy nodes with the biggest
subtree are kept, in the limit both approaches will presumably converge
to the same thing.

(3) Peter also uses a "fullDepth" search. Perhaps this is a waste of
resources? It is more or less equivalent to a uniform MC search
for fullDepth plies, and we are trying to get rid of uniform searches.

Oh no Michael. They are very very different algorithms and here are _some_ reasons why.
Basically peter is trying to tune the wrong part of the tree i.e MC part. He also still tries to use those weights and multipliers..
I have none of those because I am working on a tree I store keep in memory .
UCT is a very powerful algorithm used for tree searching which will become close to
alpha-beta on infinite number of games.

a) On a node I keep the following :
- average perft value
- number of visits of parent and each children
{alos RAVE (history heuristic counts) but are not necessary for perft at least yet..}

The standard deviation is _indirctely_ calculated by the number of visits of parent and child node. More visited nodes
will have low variance. So it is there if you look at the second term in the formula I gave couple of pages ago.

b) Peter is trying to hash the Monte-carlo part. This is very wrong.
The MC has to be fast and should just use simple heuristic like reduce capture's weights or anything similar
that pops up in our mind. Doesn't matter much if it works or not.
By the way there is an equivalent implementation of UCT done using hashtables.
Again it stores only nodes that are expanded from the root.

c) UCT is a dynamic learner. It knows which nodes to expand dynamically and forms a very selctive tree.
So it can go very very deep with low exploration coefficient. This is absent in Peter's method because again
he is doing the MC part only. The upper tree still uses uniform sampling MC.

d) Keeping all the first 3 ply nodes is unnecessary because UCT perft gets it much more correctly
as in the case of LMR perft. In a few iterations the first move will be expanded deeply to give good estimates. Wait till I am done with it.

e) I can seed initial learning in both the MC and upper tree part with whatever heuristic.
For example RAVE (Rapid Action Value Estimation) is a history heuristic type algorithm that
tries to estimate score of moves. In perft case probably not applicable but their is a capablity
to introduce any prior knowledge as a bias to the selection formula (second term of equation) to both parts of the tree (upper and MC part). The MC should be simple
and as such should not hash tables at all.

Added after looking at your second post
f) I do 10 simulations for each node first and store perft counts. So I have them already stored and getting averaged at each node. I don't need any heuristic to estimate tree size. It is already there by doing a few simulations before starting exploring You try to balance exploration and exploitation by using a formula. The constant I have been raising is to get more exploration for the other nodes as well.

I even explained the whole thing to Peter and whoever reads it before you guys started talking dynamic. But no one listens until you start showing results

The reason why I wanted you to start from the root is because within the tree we are using _uniform_ sampling. One way to decrease the variance could be using upper confidence bound by saving the first 1-3 plies in memory. I mentioned the following formula a while back that is used for UCT, I am sure it can be modified to give much better results than any heuristic since this one learns it dynamically. But it requires complicated programming..
Code: Select all
X = Xmean + Constant * sqrt&#40; log&#40;parent_visits&#41; / child_visits&#41;) 
Select the node with maximum X value. The second term which is the variance is meant for a mean score between 0 < X < 1 but I am sure it can be modified for our case. Anyway even using simple proportioning based on mean nodes count at the tree, this _dynamic_ adjustement should be better than any heuristic.

Here the idea for this kickass algorithm. You can say it is UCT for perft but I haven't actually implemented it so it can be all for nothing if I made some mistake in planning somewhere
Code: Select all
a&#41; Start from the root and do 10 Monte carlo simulations for each move 
b&#41; Pick the best node to expand using UCB formula I provided above 
c&#41; Expand the selected node which will be one of the biggest nodes with the right formula. 
d&#41; Do this recursively for the children of that big node 
So this algorithm explores the bigger nodes more often, while the smaller nodes will have their estimates from small number of simulations. No need for heuristic, no need for fixed depth expansion unlike the one we are doing i.e ply 3-6 fixed with uniform sampling.

If I have to write out my imagination, this algorithm will be so good as to be the best. But I could be wrong somewhere. Opinions ?

Michel · Post by **Michel** » Thu Jul 28, 2011 6:53 am

I think both methods are mathematically similar.
I outlined the major differences in my two prior posts.

But I am not in the mood to start another 100 post discussion about this.
Let's just agree to disagree for now.

Daniel Shawul · Post by **Daniel Shawul** » Thu Jul 28, 2011 7:04 am

Michel wrote:I think both methods are mathematically similar.
I outlined the major differences in my two prior posts.

But I am not in the mood to start another 100 post discussion about this.
Let's just agree to disagree for now.

I am sorry but you should look at the differneces I mentioned above. Your only con for my method was it doesn't keep variance but it does since it keeps visits. for example e2e4 has the highest visits which means less variance. Also I use different formula. I don't use those weights. I don't estimate tree size in any way or form. I mentioned the idea of dynamic exploration to all of you first. It is very different and I don't know why you thought it is the same. The only thing that Peter did was to add hash tables to the MC part and improve his weights. And that is even wrong because it was done for the wrong part. You are up against a strong algorithm that is not invented by me. Sorry there is no disagreement.

Daniel Shawul · Post by **Daniel Shawul** » Thu Jul 28, 2011 7:26 am

I feel that I needed to reply to you point by point.

Michel wrote:Ok for those jumping into this thread now.

I think that Daniel's and Peter's dynamic adjustment algorithms are very similar.

The differences are:

(1) Peter also keeps track of the variance (in principle this should allow
for more accurate weight selection).

This is wrong because I have number of visits to parent and children nodes that I use to calculate variance. This is absent in Peter's method. AFAIK you use just estimated tree size for your variance calculation. My variance goes down as the node is visited more and more...
So you see they are infact mathematically different.

(2) Peter uses a hash table to keeps statistics over the whole tree.
Daniel intends to keep statistics close to the root (for the moment only
at the root). Since in Peter's replacement strategy nodes with the biggest
subtree are kept, in the limit both approaches will presumably converge
to the same thing.

Keeping nodes in the MC is not going to help and infact will be a huge bottleneck. And I am also not sure if he keeps nodes in the first 3 why would he when searchs them statically ?

(3) Peter also uses a "fullDepth" search. Perhaps this is a waste of
resources? It is more or less equivalent to a uniform MC search
for fullDepth plies, and we are trying to get rid of uniform searches.

Yes it is.

Infact I see _nothing_ common between these methods after I remembered I use node visits to determine the variance.

Perft(13) betting pool

Re: Perft(13) betting pool

Re: Perft(13) betting pool

Re: Perft(13) betting pool

Re: Perft(13) betting pool

Re: Perft(13) betting pool

Re: Perft(13) betting pool

Re: Perft(13) betting pool

Re: Perft(13) betting pool

Re: Perft(13) betting pool

Re: Perft(13) betting pool