MCTS beginner questions

fierz · Post by **fierz** » Wed Apr 25, 2018 7:09 pm

Dear all,

I tried implementing an MCTS search just for fun in a nine men's morris game (I deliberately chose something I have no experience with so that the nice property of MCTS - no expert knowledge on eval needed - could be used).

I read some sources on MCTS and have a working program now but I have some probably rather stupid questions - I couldn't really find that much information on "standard MCTS" (at least much less than could be found on traditional alpha beta tree search).

1) When I do MCTS, I can choose to expand only the most promising child of the leaf node I end up with in the exploration phase; or I can choose to expand all children. I have found both implementations mentioned in pseudocode in the net. I would expect expanding only the most promising child to be better (because not wasting time on other children) - do you have any thoughts on this?

2) Memory management: 9 men's morris has the ugly property of often having very few moves, and sometimes very many (jumps in the endgame + mill giving large numbers of options), sometimes just a single digit number, and in a few instances nearly 100. MCTS runs until it's out of memory, I think, and so I guess I should use a malloc'ed array for each node that I free after the search; also when I expand a node, and add just one child to a node, I would only need to allocate memory for that one child node, and could add necessary memory later for more child nodes?
That seems a bit a lot of malloc() and free() for my taste, but I guess it's not avoidable? Or is there a nice solution for this type of problem?

3) Transpositions: I haven't done this yet, but I assume that having a hashtable to recognize identical positions reached through different paths would be beneficial?

4) I read some of Daniel Shawul's posts on Scorpio MCTS - is the general feeling here that MCTS should use a min-max backup operator for a game like chess (and nine men's morris, or checkers or...)? When would using an averaging operator and when using a min-max operator be best? Is there a general rule of thumb on that?

5) UCT search has this constant which biases exploration against exploitation. Some people mention changing this during the search - is there any general feeling on this (if and when and by how much typically this parameter should be adjusted)?

best regards
Martin

Gian-Carlo Pascutto · Thu Apr 26, 2018 10:08 am

fierz wrote: 1) When I do MCTS, I can choose to expand only the most promising child of the leaf node I end up with in the exploration phase; or I can choose to expand all children. I have found both implementations mentioned in pseudocode in the net. I would expect expanding only the most promising child to be better (because not wasting time on other children) - do you have any thoughts on this?

This feels a bit a contradictory statement at the start. You expand the leaf that has the highest upper confidence bound. There can only be one such node. If you're at a node which already has children, you have not finished iterating the UCT selection and you're not at a leaf.

Your problem is perhaps that in a node that has not been visited yet, you have no estimated win rate to calculate the UCB at the parent? This is the FPU problem. There's no perfect solution for that. You can assign a winrate of 1.0, a draw, the parent node value, something even smarter. This is where additional knowledge/priors can help a lot.

2) Memory management: 9 men's morris has the ugly property of often having very few moves, and sometimes very many (jumps in the endgame + mill giving large numbers of options), sometimes just a single digit number, and in a few instances nearly 100. MCTS runs until it's out of memory, I think, and so I guess I should use a malloc'ed array for each node that I free after the search; also when I expand a node, and add just one child to a node, I would only need to allocate memory for that one child node, and could add necessary memory later for more child nodes?
That seems a bit a lot of malloc() and free() for my taste, but I guess it's not avoidable? Or is there a nice solution for this type of problem?

Not really. There are depth first reformulations of MCTS, but AFAIK the paper is in Japanese

It's also possible to store the tree implicitly in a transposition table, but this will involve more work at each node to recalculate the children.

You can allocate a large memory buffer for nodes and store indices instead of pointers. Aside from saving half the size on 64-bits, it will save you the malloc round trips.

3) Transpositions: I haven't done this yet, but I assume that having a hashtable to recognize identical positions reached through different paths would be beneficial?

Unclear.

Most literature seems to assume TTables for MCTS.

But it was removed from Leela Zero after we failed to prove any advantage for having a TTable. Note that it does recognize transpositions in the cache for evaluating neural networks.

4) I read some of Daniel Shawul's posts on Scorpio MCTS - is the general feeling here that MCTS should use a min-max backup operator for a game like chess (and nine men's morris, or checkers or...)?

When would using an averaging operator and when using a min-max operator be best? Is there a general rule of thumb on that?

I think this is a misunderstanding. MCTS starts as an averaging operator and converges to mini-max in the limit. The speed with which this happens depends on the parametrisation and algorithmic tweaks.

5) UCT search has this constant which biases exploration against exploitation. Some people mention changing this during the search - is there any general feeling on this (if and when and by how much typically this parameter should be adjusted)?

I haven't tried this specifically, but probably there are various ways to tweak UCT that boil down to the same.

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 26, 2018 1:39 pm

fierz wrote:Dear all,

I tried implementing an MCTS search just for fun in a nine men's morris game (I deliberately chose something I have no experience with so that the nice property of MCTS - no expert knowledge on eval needed - could be used).

I read some sources on MCTS and have a working program now but I have some probably rather stupid questions - I couldn't really find that much information on "standard MCTS" (at least much less than could be found on traditional alpha beta tree search).

1) When I do MCTS, I can choose to expand only the most promising child of the leaf node I end up with in the exploration phase; or I can choose to expand all children. I have found both implementations mentioned in pseudocode in the net. I would expect expanding only the most promising child to be better (because not wasting time on other children) - do you have any thoughts on this?

You expand not the most promising child, but the node with the highest upper-confidence-bound, i.e. one calculated with a balance of exploitation (highest score) and exploration (lowest number of visits). Note that you create all children of the selected node during expansion, so that maybe where the confusion is coming from.

2) Memory management: 9 men's morris has the ugly property of often having very few moves, and sometimes very many (jumps in the endgame + mill giving large numbers of options), sometimes just a single digit number, and in a few instances nearly 100. MCTS runs until it's out of memory, I think, and so I guess I should use a malloc'ed array for each node that I free after the search; also when I expand a node, and add just one child to a node, I would only need to allocate memory for that one child node, and could add necessary memory later for more child nodes?
That seems a bit a lot of malloc() and free() for my taste, but I guess it's not avoidable? Or is there a nice solution for this type of problem?

I keep a memory pool of nodes that will keep on growing but not shrinking. This will avoid malloc and free calls.

3) Transpositions: I haven't done this yet, but I assume that having a hashtable to recognize identical positions reached through different paths would be beneficial?

A waste of time IMO, the complexity not worth it.

4) I read some of Daniel Shawul's posts on Scorpio MCTS - is the general feeling here that MCTS should use a min-max backup operator for a game like chess (and nine men's morris, or checkers or...)? When would using an averaging operator and when using a min-max operator be best? Is there a general rule of thumb on that?

Minmax is superior to averaging for tactical games like chess. That is the sole reason why MCTS has not been so successful in chess compared to Go. Minmax and Averaging backup operators eventually converge to minmax but the rate of convergence is hugely different.

5) UCT search has this constant which biases exploration against exploitation. Some people mention changing this during the search - is there any general feeling on this (if and when and by how much typically this parameter should be adjusted)?

I decrease the exploration coefficient as we go towards end of time, say from 0.3 to 0.05, so that I explore more during the initial stages and exploit whatever PV we found during later stages.

best regards
Martin

Cheers,
Daniel

fierz · Post by **fierz** » Thu Apr 26, 2018 3:43 pm

This feels a bit a contradictory statement at the start. You expand the leaf that has the highest upper confidence bound. There can only be one such node. If you're at a node which already has children, you have not finished iterating the UCT selection and you're not at a leaf.

I seem to have written that paragraph in a confusing way

I select a single node to expand, but then I can do a single rollout from that node, or I can do one rollout for all its children, effectively exploring all possible continuations from there and perhaps getting better statistics (or perhaps not...)

You can allocate a large memory buffer for nodes and store indices instead of pointers. Aside from saving half the size on 64-bits, it will save you the malloc round trips.

I thought about this - in my normal alphabeta engines I have such a large memory buffer for example for the movelist. But I still have a problem with this, namely what happens on the next search. Let's say I do my very first search, and I keep a list of indices which belong to each node. To make an example, node 1 would have indices 1-15, node 2 indices 16-18, node 3 indices 19-44 and so on. This would obviously work wonderfully.
But now the engine is finished searching, and a new search 2 ply later in the game will be made (if it is playing against an opponent). From what I read on MCTS, one is supposed to keep the tree of the last search in memory and not throw it away because that would be a waste of good data. Let's say that nodes 1 and 3 still belong to the new starting position, whereas node 2 was for a position that can no longer be reached. So I would want to keep my indices 1-15 and 19-44, but free up the memory from 16-18, which gives a mess because I would need to find another position with 3 legal moves to fit in there. How do I solve this? Do I just trash the entire tree from the last search and start again?

I think this is a misunderstanding. MCTS starts as an averaging operator and converges to mini-max in the limit. The speed with which this happens depends on the parametrisation and algorithmic tweaks.

I think Daniel was very clear on this, and it's not a misunderstanding. I can choose to average the value of the node based on the results of all children, or I can back up the value of the best child in minimax/negamax style. Of course, you would expect MCTS to focus on the best child in the long run anyway, but perhaps this takes more time to converge than if you back up the best child.
To me backing up the best child makes a lot of sense - if you have one move that scores 9/10 and the other scores 1/10, why would you back up a value of 0.5?

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Thu Apr 26, 2018 6:17 pm

fierz wrote: To me backing up the best child makes a lot of sense - if you have one move that scores 9/10 and the other scores 1/10, why would you back up a value of 0.5?

It depends on the confidence in those values, right?

Note that on the next visit, you will explore the 9/10 one again, and you're backing up 0.66 already. The next visit will back up 0.7, etc. Also, if you had one node at 9/10, why did you expand another, instead of deepening the 9/10 one and backing up 9/10? I expect that in a practically tuned engine, you won't run into that scenario much because you will exploit the 9/10 arm a few times.

I tried forgetful approaches (so which track the latest values much faster) but never got an improvement, at least for Go.

fierz · Post by **fierz** » Thu Apr 26, 2018 6:42 pm

Gian-Carlo Pascutto wrote: It depends on the confidence in those values, right?

Note that on the next visit, you will explore the 9/10 one again, and you're backing up 0.66 already. The next visit will back up 0.7, etc. Also, if you had one node at 9/10, why did you expand another, instead of deepening the 9/10 one and backing up 9/10? I expect that in a practically tuned engine, you won't run into that scenario much because you will exploit the 9/10 arm a few times.

I tried forgetful approaches (so which track the latest values much faster) but never got an improvement, at least for Go.

My math would make the 9/10 go to 10/11 and I would back up 11/21 = 0.524 - but of course I agree that it's unlikely that you have 1/10 and 9/10. I could imagine though that you are looking at some line that looks good for say white because it has exactly one refutation where black has to make all the right moves. As long as you haven't found this line, you will be looking at all kinds of moves where the white is winning all the time, until you hit on the refutation, then you need to visit this a lot until it is recognized. If you could back up minimax-style values you would back up the "correct" value much faster.

At least my nine men's morris engine seems to suffer from blindness to forced wins... but perhaps something else is wrong!

cheers
Martin

jp · Post by jp » Fri Apr 27, 2018 1:11 pm

Gian-Carlo Pascutto wrote:...
Most literature seems to assume TTables for MCTS.

But it was removed from Leela Zero after we failed to prove any advantage for having a TTable. Note that it does recognize transpositions in the cache for evaluating neural networks.
...

Not really an MCTS question, but let me ask here: how exactly do you upgrade from one NN to a bigger one? I mean, how is the 'learning' (weight adjustment) from all the training games with the previous NN kept when you start using the new one?

Why should this be better/faster long term than just starting with a large NN?

brianr · Post by **brianr** » Fri Apr 27, 2018 3:14 pm

Look for net2net which is used in Leela Go and Chess

jp wrote:
Gian-Carlo Pascutto wrote:...
Most literature seems to assume TTables for MCTS.

But it was removed from Leela Zero after we failed to prove any advantage for having a TTable. Note that it does recognize transpositions in the cache for evaluating neural networks.
...
Not really an MCTS question, but let me ask here: how exactly do you upgrade from one NN to a bigger one? I mean, how is the 'learning' (weight adjustment) from all the training games with the previous NN kept when you start using the new one?

Why should this be better/faster long term than just starting with a large NN?

Robert Pope · Post by **Robert Pope** » Fri Apr 27, 2018 4:36 pm

brianr wrote:Look for net2net which is used in Leela Go and Chess

jp wrote:
Gian-Carlo Pascutto wrote:...
Most literature seems to assume TTables for MCTS.

But it was removed from Leela Zero after we failed to prove any advantage for having a TTable. Note that it does recognize transpositions in the cache for evaluating neural networks.
...
Not really an MCTS question, but let me ask here: how exactly do you upgrade from one NN to a bigger one? I mean, how is the 'learning' (weight adjustment) from all the training games with the previous NN kept when you start using the new one?

Why should this be better/faster long term than just starting with a large NN?

I don't think they are following Net2Net, but they are taking the training game history and fitting the new net to them.

I believe that the benefit of starting with smaller nets and expanding as needed is that you get more training data that way. 800 playouts on a 6x128 net goes a lot faster than 800 playouts on a 40x128 net.

jp · Post by jp » Fri Apr 27, 2018 6:09 pm

brianr wrote:Look for net2net which is used in Leela Go and Chess

Thanks. I don't see much on net2net, but it sounds like it's using the old NN to train the new one. Like supervised learning, is it?

Robert Pope wrote:I don't think they are following Net2Net, but they are taking the training game history and fitting the new net to them.

Thanks. What do you mean by "training game history"? I was guessing not using the actual games themselves, as that sounds just like retraining from scratch with the new NN, although with a ready-made stash of games.

MCTS beginner questions

MCTS beginner questions

Re: MCTS beginner questions

Re: MCTS beginner questions

Re: MCTS beginner questions

Re: MCTS beginner questions

Re: MCTS beginner questions

Re: MCTS beginner questions

Re: MCTS beginner questions

Re: MCTS beginner questions

Re: MCTS beginner questions