Well what you describe below is exactly UCT with Michel's formula AFAIAC. I had difficulty testing his formula because it lacked
the "visits" count for a node. Your step (3) decides if to split a node without comparing it to other sibling
nodes. To achieve that you needed more heuristic since you do not compare with other siblings to pick the best.
You just decide if a node is worth splitting by itself.
The basic idea of UCT tree search over plain MC (without tree) is to investigate (split) nodes worth splitting i.e bigger nodes for perft.
So when you do that you gain much of what UCT is meant for. The tree will dynamically grow towards the bigger nodes f.i in an LMR tree.
It will grow towards the best looking move for tree search. There are many methods to control how aggressively you want to control the growth of the tree..
I will go through your steps one by one:
My algorithm is quite similar to UCT but it is not exactly like UCT. Here is what I do:
1. Start with an in-memory tree with only one node representing the root position.
I start with the root node expanded (20 moves) but that is no difference. For a UCT search, you
have to return a move to play eventually so it is usual to start with a splitted root node.
2. Do a random walk starting from the root, until a leaf node is reached. The probability controlling which child to visit are calculated from data in the child nodes. This probably causes some cache misses, but as a compensation I store moves in the tree nodes too, so I don't have to call the move generator until I reach a leaf node.
This is the monte carlo part. A light MC visits all children uniformly while a heavy MC use different heuristics that usually make it slower but could be better overall.
BTW "heavy" and "light" is not my terminology. It is an accepted terminology in the GO programming field.
3. When a leaf node is reached, I decide if it should be split. A node is split if there are fewer than 4M nodes in the tree, the node has been visited at least 20 times, the remaining depth is > 1, and the "cost" for this node is at least 90% of the largest cost for a leaf node seen during the last 1000 random walks. "cost" is defined as sqrt(mean^2 + variance).
This is the formula part that you do differently. You do not compare to other siblings.
-> Your first criterion 4M nodes is to not run out of memory. There are two solution for this when used for tree search (a) Store less visited part of the tree on disk
(b) Prune parts of the tree. The latter is not applicable for perft because we need exact estimates.
-> Your second criterion UCT has it too. Mine is I think at 1000 right now. Lowering that should expand the tree more.
-> Your third criterion is Michel's formula a heuristic to have an idea of the cost of the node sqrt(mean^2 + variance)
4. If a leaf node is split, child nodes are created for all legal moves.
Same as UCT and a major component because you decide here to visit the bigger nodes. The tree
will be guided to the bigger nodes.
5. If a leaf node is not split, I continue with a uniform probability random walk until the "perft" depth is reached. No hashing or tree node lookup is performed in this step. Just move generation and make/unmake moves.
Same. A light MC search. If you start using your previous static heuristic for captures or the HT method, it will become a heavy MC.
6. As the recursion unwinds, the mean and variance estimates are updated for all visited in-memory tree nodes.
Same. I do exactly that and is easy for perft. But this back propagation step is more difficult
for tree search. Remi coulom had a complex Backup operator before UCT was invented.
7. A single perft estimate computed from the just completed random walk is logged. If 1000 such values have been accumulated, print the average to standard output.
8. Go back to step 2.
I optimized my implementation a bit, and now it takes around 57 minutes on a 2.4 GHz Core2 CPU (using only one core) to generate 100e6 tree walks for perft(12). The code currently spends over 60% of the CPU time in a function that removes illegal moves from a pseudo-legal move list. I think this function can be optimized quite a lot though. Currently it just plays all moves and tests which moves leave the king in check.
The source code is here:
http://web.comhem.se/petero2home/PerfTE ... 4.java.txt. If you actually want to run it, you also need BitBoard.java from CuckooChess.
I think comparison by number of simulations should be fine. I was under the impression that it was the HT version you tested, in which case time tests would be more appropriate.