Dear all,
I tried implementing an MCTS search just for fun in a nine men's morris game (I deliberately chose something I have no experience with so that the nice property of MCTS - no expert knowledge on eval needed - could be used).
I read some sources on MCTS and have a working program now but I have some probably rather stupid questions - I couldn't really find that much information on "standard MCTS" (at least much less than could be found on traditional alpha beta tree search).
1) When I do MCTS, I can choose to expand only the most promising child of the leaf node I end up with in the exploration phase; or I can choose to expand all children. I have found both implementations mentioned in pseudocode in the net. I would expect expanding only the most promising child to be better (because not wasting time on other children) - do you have any thoughts on this?
2) Memory management: 9 men's morris has the ugly property of often having very few moves, and sometimes very many (jumps in the endgame + mill giving large numbers of options), sometimes just a single digit number, and in a few instances nearly 100. MCTS runs until it's out of memory, I think, and so I guess I should use a malloc'ed array for each node that I free after the search; also when I expand a node, and add just one child to a node, I would only need to allocate memory for that one child node, and could add necessary memory later for more child nodes?
That seems a bit a lot of malloc() and free() for my taste, but I guess it's not avoidable? Or is there a nice solution for this type of problem?
3) Transpositions: I haven't done this yet, but I assume that having a hashtable to recognize identical positions reached through different paths would be beneficial?
4) I read some of Daniel Shawul's posts on Scorpio MCTS - is the general feeling here that MCTS should use a min-max backup operator for a game like chess (and nine men's morris, or checkers or...)? When would using an averaging operator and when using a min-max operator be best? Is there a general rule of thumb on that?
5) UCT search has this constant which biases exploration against exploitation. Some people mention changing this during the search - is there any general feeling on this (if and when and by how much typically this parameter should be adjusted)?
best regards
Martin
MCTS beginner questions
Moderators: hgm, Rebel, chrisw
-
- Posts: 72
- Joined: Mon Mar 07, 2016 4:41 pm
- Location: Zürich, Switzerland
-
- Posts: 1243
- Joined: Sat Dec 13, 2008 7:00 pm
Re: MCTS beginner questions
This feels a bit a contradictory statement at the start. You expand the leaf that has the highest upper confidence bound. There can only be one such node. If you're at a node which already has children, you have not finished iterating the UCT selection and you're not at a leaf.fierz wrote: 1) When I do MCTS, I can choose to expand only the most promising child of the leaf node I end up with in the exploration phase; or I can choose to expand all children. I have found both implementations mentioned in pseudocode in the net. I would expect expanding only the most promising child to be better (because not wasting time on other children) - do you have any thoughts on this?
Your problem is perhaps that in a node that has not been visited yet, you have no estimated win rate to calculate the UCB at the parent? This is the FPU problem. There's no perfect solution for that. You can assign a winrate of 1.0, a draw, the parent node value, something even smarter. This is where additional knowledge/priors can help a lot.
Not really. There are depth first reformulations of MCTS, but AFAIK the paper is in Japanese2) Memory management: 9 men's morris has the ugly property of often having very few moves, and sometimes very many (jumps in the endgame + mill giving large numbers of options), sometimes just a single digit number, and in a few instances nearly 100. MCTS runs until it's out of memory, I think, and so I guess I should use a malloc'ed array for each node that I free after the search; also when I expand a node, and add just one child to a node, I would only need to allocate memory for that one child node, and could add necessary memory later for more child nodes?
That seems a bit a lot of malloc() and free() for my taste, but I guess it's not avoidable? Or is there a nice solution for this type of problem?
It's also possible to store the tree implicitly in a transposition table, but this will involve more work at each node to recalculate the children.
You can allocate a large memory buffer for nodes and store indices instead of pointers. Aside from saving half the size on 64-bits, it will save you the malloc round trips.
Unclear.3) Transpositions: I haven't done this yet, but I assume that having a hashtable to recognize identical positions reached through different paths would be beneficial?
Most literature seems to assume TTables for MCTS.
But it was removed from Leela Zero after we failed to prove any advantage for having a TTable. Note that it does recognize transpositions in the cache for evaluating neural networks.
I think this is a misunderstanding. MCTS starts as an averaging operator and converges to mini-max in the limit. The speed with which this happens depends on the parametrisation and algorithmic tweaks.4) I read some of Daniel Shawul's posts on Scorpio MCTS - is the general feeling here that MCTS should use a min-max backup operator for a game like chess (and nine men's morris, or checkers or...)?
When would using an averaging operator and when using a min-max operator be best? Is there a general rule of thumb on that?
I haven't tried this specifically, but probably there are various ways to tweak UCT that boil down to the same.5) UCT search has this constant which biases exploration against exploitation. Some people mention changing this during the search - is there any general feeling on this (if and when and by how much typically this parameter should be adjusted)?
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: MCTS beginner questions
You expand not the most promising child, but the node with the highest upper-confidence-bound, i.e. one calculated with a balance of exploitation (highest score) and exploration (lowest number of visits). Note that you create all children of the selected node during expansion, so that maybe where the confusion is coming from.fierz wrote:Dear all,
I tried implementing an MCTS search just for fun in a nine men's morris game (I deliberately chose something I have no experience with so that the nice property of MCTS - no expert knowledge on eval needed - could be used).
I read some sources on MCTS and have a working program now but I have some probably rather stupid questions - I couldn't really find that much information on "standard MCTS" (at least much less than could be found on traditional alpha beta tree search).
1) When I do MCTS, I can choose to expand only the most promising child of the leaf node I end up with in the exploration phase; or I can choose to expand all children. I have found both implementations mentioned in pseudocode in the net. I would expect expanding only the most promising child to be better (because not wasting time on other children) - do you have any thoughts on this?
I keep a memory pool of nodes that will keep on growing but not shrinking. This will avoid malloc and free calls.2) Memory management: 9 men's morris has the ugly property of often having very few moves, and sometimes very many (jumps in the endgame + mill giving large numbers of options), sometimes just a single digit number, and in a few instances nearly 100. MCTS runs until it's out of memory, I think, and so I guess I should use a malloc'ed array for each node that I free after the search; also when I expand a node, and add just one child to a node, I would only need to allocate memory for that one child node, and could add necessary memory later for more child nodes?
That seems a bit a lot of malloc() and free() for my taste, but I guess it's not avoidable? Or is there a nice solution for this type of problem?
A waste of time IMO, the complexity not worth it.3) Transpositions: I haven't done this yet, but I assume that having a hashtable to recognize identical positions reached through different paths would be beneficial?
Minmax is superior to averaging for tactical games like chess. That is the sole reason why MCTS has not been so successful in chess compared to Go. Minmax and Averaging backup operators eventually converge to minmax but the rate of convergence is hugely different.4) I read some of Daniel Shawul's posts on Scorpio MCTS - is the general feeling here that MCTS should use a min-max backup operator for a game like chess (and nine men's morris, or checkers or...)? When would using an averaging operator and when using a min-max operator be best? Is there a general rule of thumb on that?
I decrease the exploration coefficient as we go towards end of time, say from 0.3 to 0.05, so that I explore more during the initial stages and exploit whatever PV we found during later stages.5) UCT search has this constant which biases exploration against exploitation. Some people mention changing this during the search - is there any general feeling on this (if and when and by how much typically this parameter should be adjusted)?
Cheers,best regards
Martin
Daniel
-
- Posts: 72
- Joined: Mon Mar 07, 2016 4:41 pm
- Location: Zürich, Switzerland
Re: MCTS beginner questions
I seem to have written that paragraph in a confusing wayThis feels a bit a contradictory statement at the start. You expand the leaf that has the highest upper confidence bound. There can only be one such node. If you're at a node which already has children, you have not finished iterating the UCT selection and you're not at a leaf.
I select a single node to expand, but then I can do a single rollout from that node, or I can do one rollout for all its children, effectively exploring all possible continuations from there and perhaps getting better statistics (or perhaps not...)
I thought about this - in my normal alphabeta engines I have such a large memory buffer for example for the movelist. But I still have a problem with this, namely what happens on the next search. Let's say I do my very first search, and I keep a list of indices which belong to each node. To make an example, node 1 would have indices 1-15, node 2 indices 16-18, node 3 indices 19-44 and so on. This would obviously work wonderfully.You can allocate a large memory buffer for nodes and store indices instead of pointers. Aside from saving half the size on 64-bits, it will save you the malloc round trips.
But now the engine is finished searching, and a new search 2 ply later in the game will be made (if it is playing against an opponent). From what I read on MCTS, one is supposed to keep the tree of the last search in memory and not throw it away because that would be a waste of good data. Let's say that nodes 1 and 3 still belong to the new starting position, whereas node 2 was for a position that can no longer be reached. So I would want to keep my indices 1-15 and 19-44, but free up the memory from 16-18, which gives a mess because I would need to find another position with 3 legal moves to fit in there. How do I solve this? Do I just trash the entire tree from the last search and start again?
I think Daniel was very clear on this, and it's not a misunderstanding. I can choose to average the value of the node based on the results of all children, or I can back up the value of the best child in minimax/negamax style. Of course, you would expect MCTS to focus on the best child in the long run anyway, but perhaps this takes more time to converge than if you back up the best child.I think this is a misunderstanding. MCTS starts as an averaging operator and converges to mini-max in the limit. The speed with which this happens depends on the parametrisation and algorithmic tweaks.
To me backing up the best child makes a lot of sense - if you have one move that scores 9/10 and the other scores 1/10, why would you back up a value of 0.5?
-
- Posts: 1243
- Joined: Sat Dec 13, 2008 7:00 pm
Re: MCTS beginner questions
It depends on the confidence in those values, right?fierz wrote: To me backing up the best child makes a lot of sense - if you have one move that scores 9/10 and the other scores 1/10, why would you back up a value of 0.5?
Note that on the next visit, you will explore the 9/10 one again, and you're backing up 0.66 already. The next visit will back up 0.7, etc. Also, if you had one node at 9/10, why did you expand another, instead of deepening the 9/10 one and backing up 9/10? I expect that in a practically tuned engine, you won't run into that scenario much because you will exploit the 9/10 arm a few times.
I tried forgetful approaches (so which track the latest values much faster) but never got an improvement, at least for Go.
-
- Posts: 72
- Joined: Mon Mar 07, 2016 4:41 pm
- Location: Zürich, Switzerland
Re: MCTS beginner questions
My math would make the 9/10 go to 10/11 and I would back up 11/21 = 0.524 - but of course I agree that it's unlikely that you have 1/10 and 9/10. I could imagine though that you are looking at some line that looks good for say white because it has exactly one refutation where black has to make all the right moves. As long as you haven't found this line, you will be looking at all kinds of moves where the white is winning all the time, until you hit on the refutation, then you need to visit this a lot until it is recognized. If you could back up minimax-style values you would back up the "correct" value much faster.Gian-Carlo Pascutto wrote: It depends on the confidence in those values, right?
Note that on the next visit, you will explore the 9/10 one again, and you're backing up 0.66 already. The next visit will back up 0.7, etc. Also, if you had one node at 9/10, why did you expand another, instead of deepening the 9/10 one and backing up 9/10? I expect that in a practically tuned engine, you won't run into that scenario much because you will exploit the 9/10 arm a few times.
I tried forgetful approaches (so which track the latest values much faster) but never got an improvement, at least for Go.
At least my nine men's morris engine seems to suffer from blindness to forced wins... but perhaps something else is wrong!
cheers
Martin
-
- Posts: 1470
- Joined: Mon Apr 23, 2018 7:54 am
Re: MCTS beginner questions
Not really an MCTS question, but let me ask here: how exactly do you upgrade from one NN to a bigger one? I mean, how is the 'learning' (weight adjustment) from all the training games with the previous NN kept when you start using the new one?Gian-Carlo Pascutto wrote:...
Most literature seems to assume TTables for MCTS.
But it was removed from Leela Zero after we failed to prove any advantage for having a TTable. Note that it does recognize transpositions in the cache for evaluating neural networks.
...
Why should this be better/faster long term than just starting with a large NN?
-
- Posts: 536
- Joined: Thu Mar 09, 2006 3:01 pm
Re: MCTS beginner questions
Look for net2net which is used in Leela Go and Chess
jp wrote:Not really an MCTS question, but let me ask here: how exactly do you upgrade from one NN to a bigger one? I mean, how is the 'learning' (weight adjustment) from all the training games with the previous NN kept when you start using the new one?Gian-Carlo Pascutto wrote:...
Most literature seems to assume TTables for MCTS.
But it was removed from Leela Zero after we failed to prove any advantage for having a TTable. Note that it does recognize transpositions in the cache for evaluating neural networks.
...
Why should this be better/faster long term than just starting with a large NN?
-
- Posts: 558
- Joined: Sat Mar 25, 2006 8:27 pm
Re: MCTS beginner questions
I don't think they are following Net2Net, but they are taking the training game history and fitting the new net to them.brianr wrote:Look for net2net which is used in Leela Go and Chess
jp wrote:Not really an MCTS question, but let me ask here: how exactly do you upgrade from one NN to a bigger one? I mean, how is the 'learning' (weight adjustment) from all the training games with the previous NN kept when you start using the new one?Gian-Carlo Pascutto wrote:...
Most literature seems to assume TTables for MCTS.
But it was removed from Leela Zero after we failed to prove any advantage for having a TTable. Note that it does recognize transpositions in the cache for evaluating neural networks.
...
Why should this be better/faster long term than just starting with a large NN?
I believe that the benefit of starting with smaller nets and expanding as needed is that you get more training data that way. 800 playouts on a 6x128 net goes a lot faster than 800 playouts on a 40x128 net.
-
- Posts: 1470
- Joined: Mon Apr 23, 2018 7:54 am
Re: MCTS beginner questions
Thanks. I don't see much on net2net, but it sounds like it's using the old NN to train the new one. Like supervised learning, is it?brianr wrote:Look for net2net which is used in Leela Go and Chess
Thanks. What do you mean by "training game history"? I was guessing not using the actual games themselves, as that sounds just like retraining from scratch with the new NN, although with a ready-made stash of games.Robert Pope wrote:I don't think they are following Net2Net, but they are taking the training game history and fitting the new net to them.