MCTS vs Alpha-beta

Werewolf · Post by **Werewolf** » Sat Aug 15, 2020 6:23 pm

With Lc0 a lot of the attention went on the NN and how it was better than a hand-crafted Evaluation Function.

But I'm trying to understand the search side of things.
I have a basic, layman's, working knowledge of alpha-beta search in chess and virtually no real understanding of MCTS, which I think is used in Lc0 and Komodo MCTS?

Can I ask:
1) Where are details of how the search works, specific to chess?
2) What are the pros and cons of MCTS vs AB?
3) Does the Lc0 MCTS play the game out fully or just to a fixed point? If it is fixed, how deep?

I've heard MCTS is less able to handle tactics but I'm not sure why.

smatovic · Post by **smatovic** » Sat Aug 15, 2020 6:55 pm

Werewolf wrote: ↑Sat Aug 15, 2020 6:23 pm With Lc0 a lot of the attention went on the NN and how it was better than a hand-crafted Evaluation Function.

But I'm trying to understand the search side of things.
I have a basic, layman's, working knowledge of alpha-beta search in chess and virtually no real understanding of MCTS, which I think is used in Lc0 and Komodo MCTS?

Can I ask:
1) Where are details of how the search works, specific to chess?
2) What are the pros and cons of MCTS vs AB?
3) Does the Lc0 MCTS play the game out fully or just to a fixed point? If it is fixed, how deep?

I've heard MCTS is less able to handle tactics but I'm not sure why.

https://www.chessprogramming.org/UCT#Ra ... an_et_al.

UCT has been shown to outperform traditional minimax based approaches in several challenging domains such as Go and KriegSpiel, although minimax search still prevails in other domains such as Chess. This work provides insights into the properties of adversarial search spaces that play a key role in the success or failure of UCT and similar sampling-based approaches. We show that certain "early loss" or "shallow trap" configurations, while unlikely in Go, occur surprisingly often in games like Chess (even in grandmaster games). We provide evidence that UCT, unlike minimax search, is unable to identify such traps in Chess and spends a great deal of time exploring much deeper game play than needed.

MCTS-UCT was used successfully in Go, MCTS-PUCT was used successfully in A0/LC0 with NN Policy/Value head.

--
Srdja

dkappe · Post by **dkappe** » Sat Aug 15, 2020 6:56 pm

MCTS is an umbrella term. There are lots of similar algorithms grouped under it. Essentially, these are algorithms that sample the most promising parts of the search tree, rather than doing an exhaustive search.

The chess programming wiki has a reasonable overview of the topic. https://www.chessprogramming.org/UCT

In the future, MCTS will continue to be a viable option for those wanting to double down on expensive evaluation. Note that a0lite, searching just 400 nps using UCT, was able to scrounge a few points in the tcec qualifying league against engines searching 7 orders of magnitude more nodes.

Cornfed · Post by **Cornfed** » Sat Aug 15, 2020 7:29 pm

I find the only real strength (and it is important) in MCTS is for ANALYSIS - particularly for building up your repertoire for OTB play.

Being able to see just how narrow a path your opponent has to walk if you play 2nd/3rd/4th 'best' moves (compared to 'best' engine eval moves) is SO important. Practical play is what guides OTB games, not 'perfect analysis'. It is human vs human play and mistakes big and small rule the day there.

That is why I intend on getting the next version of Komodo - for the MCTS it offers.

You can't really do the same by letting, say regular or NN Stockfish look at 4 or 5 pv. From everything I read, MCTS is perfect for that. I do hope Team Komodo does not abandon it. I think there is a market for MCTS used for OTB prep.

DrCliche · Post by **DrCliche** » Sat Aug 15, 2020 8:02 pm

A simplified explanation of how A0 and LC0 implement MCTS -- or, more accurately, pUCT -- goes something like this:

There is a list of everything that could be a move in any position. For example, if there were a queen on a1, it could move to h8, so a1-h8 is on that list, regardless of what the actual position in the current game is.
The "policy head" of the neural network assigns a score to every entry on that list. If the neural networks are doing their job well, "moves" that don't represent actual moves in the current position should be given a score of 0 by the neural network, though I believe this is also explicitly enforced externally. (As in, external to the neural network, all impossible moves are zeroed out.)
A "rollout" is performed on the highest scoring move on the list. The game tree is navigated to a certain depth, following the branch with the highest score at each node, until the desired depth is reached, and then the "value head" of the neural network assigns a value to the terminal node.
This information is propagated up the game tree and scores are adjusted accordingly using some formula.
Any nodes that were visited have their scores adjusted by some formula so that they are less likely to be visited in the future. This forces exploration and is governed by a hyperparameter of the algorithm. (A human picks it.)
When thinking time runs out, the most visited node is selected as the move.

As to why this procedure is bad (relative to good AB search) at tactics, it's because many tactical sequences are quite deep, and tend to involve series of moves with features that are "generally bad", and as such are assigned low intrinsic weight by the policy head. For example, hanging a queen is generally bad, and the policy head will most likely rank any move that hangs a queen quite poorly.

If there's a good tactical sequence where enough intermediate positions have generally bad features -- all of which will likely be assigned low intrinsic weight -- then it's very possible that sequence will never be explored enough for the algorithm to realize the truth of the position.

h1a8 · Post by **h1a8** » Sat Aug 15, 2020 10:20 pm

DrCliche wrote: ↑Sat Aug 15, 2020 8:02 pm A simplified explanation of how A0 and LC0 implement MCTS -- or, more accurately, pUCT -- goes something like this:

There is a list of everything that could be a move in any position. For example, if there were a queen on a1, it could move to h8, so a1-h8 is on that list, regardless of what the actual position in the current game is.

The "policy head" of the neural network assigns a score to every entry on that list. If the neural networks are doing their job well, "moves" that don't represent actual moves in the current position should be given a score of 0 by the neural network, though I believe this is also explicitly enforced externally. (As in, external to the neural network, all impossible moves are zeroed out.)

A "rollout" is performed on the highest scoring move on the list. The game tree is navigated to a certain depth, following the branch with the highest score at each node, until the desired depth is reached, and then the "value head" of the neural network assigns a value to the terminal node.

This information is propagated up the game tree and scores are adjusted accordingly using some formula.

Any nodes that were visited have their scores adjusted by some formula so that they are less likely to be visited in the future. This forces exploration and is governed by a hyperparameter of the algorithm. (A human picks it.)

When thinking time runs out, the most visited node is selected as the move.
As to why this procedure is bad (relative to good AB search) at tactics, it's because many tactical sequences are quite deep, and tend to involve series of moves with features that are "generally bad", and as such are assigned low intrinsic weight by the policy head. For example, hanging a queen is generally bad, and the policy head will most likely rank any move that hangs a queen quite poorly.

If there's a good tactical sequence where enough intermediate positions have generally bad features -- all of which will likely be assigned low intrinsic weight -- then it's very possible that sequence will never be explored enough for the algorithm to realize the truth of the position.

Thank you. I have to absorb this a little longer to fully understand.
But isn't it NN engines hanging queens and doing a lot of sacrifices for positional (not tactical) advantage?

So those nodes are visited the most because those moves are chosen.

dkappe · Post by **dkappe** » Sat Aug 15, 2020 11:07 pm

h1a8 wrote: ↑Sat Aug 15, 2020 10:20 pm
But isn't it NN engines hanging queens and doing a lot of sacrifices for positional (not tactical) advantage?

You will find that small nets looking at in the hundreds or thousands of nodes per move will occasionally fall for simple tactics, but not just hanging material.

Big nets on gpu’s are looking at hundreds of thousands or millions of nodes per move. The kind of tactical oversight they fall for is the extremely deep stuff that only the latest stockfish versions can find on monster hardware. Even those aren’t that common and might happen to another ab Engine that doesn’t search as deep.

Certainly mcts/nn engines will value dynamic elements of a position if they improve their chances of winning.

h1a8 wrote: ↑Sat Aug 15, 2020 10:20 pm
So those nodes are visited the most because those moves are chosen.

It’s the other way around. A move is played because it was visited the most.

Nay Lin Tun · Post by **Nay Lin Tun** » Sun Aug 16, 2020 6:57 am

h1a8 wrote: ↑Sat Aug 15, 2020 10:20 pm
DrCliche wrote: ↑Sat Aug 15, 2020 8:02 pm A simplified explanation of how A0 and LC0 implement MCTS -- or, more accurately, pUCT -- goes something like this:

There is a list of everything that could be a move in any position. For example, if there were a queen on a1, it could move to h8, so a1-h8 is on that list, regardless of what the actual position in the current game is.

The "policy head" of the neural network assigns a score to every entry on that list. If the neural networks are doing their job well, "moves" that don't represent actual moves in the current position should be given a score of 0 by the neural network, though I believe this is also explicitly enforced externally. (As in, external to the neural network, all impossible moves are zeroed out.)

A "rollout" is performed on the highest scoring move on the list. The game tree is navigated to a certain depth, following the branch with the highest score at each node, until the desired depth is reached, and then the "value head" of the neural network assigns a value to the terminal node.

This information is propagated up the game tree and scores are adjusted accordingly using some formula.

Any nodes that were visited have their scores adjusted by some formula so that they are less likely to be visited in the future. This forces exploration and is governed by a hyperparameter of the algorithm. (A human picks it.)

When thinking time runs out, the most visited node is selected as the move.
As to why this procedure is bad (relative to good AB search) at tactics, it's because many tactical sequences are quite deep, and tend to involve series of moves with features that are "generally bad", and as such are assigned low intrinsic weight by the policy head. For example, hanging a queen is generally bad, and the policy head will most likely rank any move that hangs a queen quite poorly.

If there's a good tactical sequence where enough intermediate positions have generally bad features -- all of which will likely be assigned low intrinsic weight -- then it's very possible that sequence will never be explored enough for the algorithm to realize the truth of the position.
Thank you. I have to absorb this a little longer to fully understand.
But isn't it NN engines hanging queens and doing a lot of sacrifices for positional (not tactical) advantage?

So those nodes are visited the most because those moves are chosen.

Leela wont hang any material in 99.99% of time even if she have zero search , 1 node (policy only). I dont know how and where you get those rumours from.

Leela on zero search is 2400 blitz in lichess. (Stronger than most Lower rated titled players, NM, CM etc)

https://lichess.org/@/Leela1Node

Werewolf · Post by **Werewolf** » Sun Aug 30, 2020 6:03 pm

smatovic wrote: ↑Sat Aug 15, 2020 6:55 pm
Werewolf wrote: ↑Sat Aug 15, 2020 6:23 pm With Lc0 a lot of the attention went on the NN and how it was better than a hand-crafted Evaluation Function.

But I'm trying to understand the search side of things.
I have a basic, layman's, working knowledge of alpha-beta search in chess and virtually no real understanding of MCTS, which I think is used in Lc0 and Komodo MCTS?

Can I ask:
1) Where are details of how the search works, specific to chess?
2) What are the pros and cons of MCTS vs AB?
3) Does the Lc0 MCTS play the game out fully or just to a fixed point? If it is fixed, how deep?

I've heard MCTS is less able to handle tactics but I'm not sure why.

https://www.chessprogramming.org/UCT#Ra ... an_et_al.

UCT has been shown to outperform traditional minimax based approaches in several challenging domains such as Go and KriegSpiel, although minimax search still prevails in other domains such as Chess. This work provides insights into the properties of adversarial search spaces that play a key role in the success or failure of UCT and similar sampling-based approaches. We show that certain "early loss" or "shallow trap" configurations, while unlikely in Go, occur surprisingly often in games like Chess (even in grandmaster games). We provide evidence that UCT, unlike minimax search, is unable to identify such traps in Chess and spends a great deal of time exploring much deeper game play than needed.
MCTS-UCT was used successfully in Go, MCTS-PUCT was used successfully in A0/LC0 with NN Policy/Value head.

--
Srdja

I read the full article you posted which made my head spin, but it was helpful.

So to sum up, Lc0 isn't doing full-game "rollouts", but rather it's building a kindof speculative, highly asymmetric game tree ?

dkappe · Post by **dkappe** » Sun Aug 30, 2020 6:48 pm

The base version of a0lite (https://github.com/dkappe/a0lite) has a mcts/uct algorithm that is only 95 lines of python code. Have a play with it. It certainly helped my understanding of how it works.

MCTS vs Alpha-beta

MCTS vs Alpha-beta

Re: MCTS vs Alpha-beta

Re: MCTS vs Alpha-beta

Re: MCTS vs Alpha-beta

Re: MCTS vs Alpha-beta

Re: MCTS vs Alpha-beta

Re: MCTS vs Alpha-beta

Re: MCTS vs Alpha-beta

Re: MCTS vs Alpha-beta

Re: MCTS vs Alpha-beta