the article on wikipedia actually describes this fairly well https://en.wikipedia.org/wiki/Monte_Carlo_tree_searchsyzygy wrote:I think this is more or less what Stéphane is now working on.Milos wrote:Alpha0 NN has only tactics of depth 8 encoded in weights and it works.syzygy wrote:Run random simulations to get some sort of "averaged" estimation of the evaluation, the idea being that the tactics of the position are washed out sufficiently by the nature of mcts. Then tune the eval weights to better predict that washed out tactic-free estimation.CheckersGuy wrote:As for tuning the "regular" evaluation parameters and dont quite understand what you mean by "using mcts to train eval parameters"
I don't really expect it to work, but I wouldn't expect AlphaZero Chess to work, either
MCTS should be used for playing not training.
Here is what could actually work.
Instead of alpha-beta use identical MCTS as in Alpha0.
When you reach leaf node instead of only preforming eval perform optimized mpv (of moves that wouldn't be cut by LMR) depth 8 search + QS and return score. Transform the best move score into v and ratio of move scores into probability for each move to be played all using some linear function.
If you could for example perform this mpv depth 8 search+QS in 10ms, using 64 cores machine you'd have 6400 MCT searches performed each second. That is still quite shy from 80k performed in Alpha0, but with more powerful machine you'd get there.
And despite all the hype about Alpha0's NN eval, I really doubt that it is better than depth 8 mpv search + QS of SF.
If I now understand the AlphaZero MCTS correctly, it is actually a sort of book-builder search. Build a tree by expanding leaf nodes and back up the results to the root, using some rule to select the next leaf node to be expanded.
If this works well with SF's alpha-beta search and evaluation, that would be very funny.
Basically, mcts consicts of 3 steps.
1.Selection
2.expansion
3.simulation