Michel wrote:Ok for those jumping into this thread now.
I think that Daniel's and Peter's dynamic adjustment algorithms are very similar.
The differences are:
(1) Peter also keeps track of the variance (in principle this should allow
for more accurate weight selection).
(2) Peter uses a hash table to keeps statistics over the whole tree.
Daniel intends to keep statistics close to the root (for the moment only
at the root). Since in Peter's replacement strategy nodes with the biggest
subtree are kept, in the limit both approaches will presumably converge
to the same thing.
(3) Peter also uses a "fullDepth" search. Perhaps this is a waste of
resources? It is more or less equivalent to a uniform MC search
for fullDepth plies, and we are trying to get rid of uniform searches.
Oh no Michael. They are very very different algorithms and here are _some_ reasons why.
Basically peter is trying to tune the wrong part of the tree i.e MC part. He also still tries to use those weights and multipliers..
I have none of those because I am working on a tree I store keep in memory .
UCT is a very powerful algorithm used for tree searching which will become close to
alpha-beta on infinite number of games.
a) On a node I keep the following :
- average perft value
- number of visits of parent and each children
{alos RAVE (history heuristic counts) but are not necessary for perft at least yet..}
The standard deviation is _indirctely_ calculated by the number of visits of parent and child node. More visited nodes
will have low variance. So it is there if you look at the second term in the formula I gave couple of pages ago.
b) Peter is trying to hash the Monte-carlo part. This is very wrong.
The MC has to be fast and should just use simple heuristic like reduce capture's weights or anything similar
that pops up in our mind. Doesn't matter much if it works or not.
By the way there is an equivalent implementation of UCT done using hashtables.
Again it stores only nodes that are expanded from the root.
c) UCT is a dynamic learner. It knows which nodes to expand dynamically and forms a very selctive tree.
So it can go very very deep with low exploration coefficient. This is absent in Peter's method because again
he is doing the MC part only. The upper tree still uses uniform sampling MC.
d) Keeping all the first 3 ply nodes is unnecessary because UCT perft gets it much more correctly
as in the case of LMR perft. In a few iterations the first move will be expanded deeply to give good estimates. Wait till I am done with it.
e) I can seed initial learning in both the MC and upper tree part with whatever heuristic.
For example RAVE (Rapid Action Value Estimation) is a history heuristic type algorithm that
tries to estimate score of moves. In perft case probably not applicable but their is a capablity
to introduce any prior knowledge as a bias to the selection formula (second term of equation) to both parts of the tree (upper and MC part). The MC should be simple
and as such should not hash tables at all.
Added after looking at your second post
f) I do 10 simulations for each node first and store perft counts. So I have them already stored and getting averaged at each node. I don't need any heuristic to estimate tree size.
It is already there by doing a few simulations before starting exploring You try to balance exploration and exploitation by using a formula. The constant I have been raising is to get more exploration for the other nodes as well.
I even explained the whole thing to Peter and whoever reads it before you guys started talking dynamic. But no one listens until you start showing results
The reason why I wanted you to start from the root is because within the tree we are using _uniform_ sampling. One way to decrease the variance could be using upper confidence bound by saving the first 1-3 plies in memory. I mentioned the following formula a while back that is used for UCT, I am sure it can be modified to give much better results than any heuristic since this one learns it dynamically. But it requires complicated programming..
Code: Select all
X = Xmean + Constant * sqrt( log(parent_visits) / child_visits))
Select the node with maximum X value. The second term which is the variance is meant for a mean score between 0 < X < 1 but I am sure it can be modified for our case. Anyway even using simple proportioning based on mean nodes count at the tree, this _dynamic_ adjustement should be better than any heuristic.
Here the idea for this kickass algorithm. You can say it is UCT for perft but I haven't actually implemented it so it can be all for nothing if I made some mistake in planning somewhere
Code: Select all
a) Start from the root and do 10 Monte carlo simulations for each move
b) Pick the best node to expand using UCB formula I provided above
c) Expand the selected node which will be one of the biggest nodes with the right formula.
d) Do this recursively for the children of that big node
So this algorithm explores the bigger nodes more often, while the smaller nodes will have their estimates from small number of simulations. No need for heuristic, no need for fixed depth expansion unlike the one we are doing i.e ply 3-6 fixed with uniform sampling.
If I have to write out my imagination, this algorithm will be so good as to be the best. But I could be wrong somewhere. Opinions ?