With Lc0 a lot of the attention went on the NN and how it was better than a hand-crafted Evaluation Function.
But I'm trying to understand the search side of things.
I have a basic, layman's, working knowledge of alpha-beta search in chess and virtually no real understanding of MCTS, which I think is used in Lc0 and Komodo MCTS?
Can I ask:
1) Where are details of how the search works, specific to chess?
2) What are the pros and cons of MCTS vs AB?
3) Does the Lc0 MCTS play the game out fully or just to a fixed point? If it is fixed, how deep?
I've heard MCTS is less able to handle tactics but I'm not sure why.
MCTS vs Alpha-beta
Moderators: hgm, Rebel, chrisw
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: MCTS vs Alpha-beta
https://www.chessprogramming.org/UCT#Ra ... an_et_al.Werewolf wrote: ↑Sat Aug 15, 2020 6:23 pm With Lc0 a lot of the attention went on the NN and how it was better than a hand-crafted Evaluation Function.
But I'm trying to understand the search side of things.
I have a basic, layman's, working knowledge of alpha-beta search in chess and virtually no real understanding of MCTS, which I think is used in Lc0 and Komodo MCTS?
Can I ask:
1) Where are details of how the search works, specific to chess?
2) What are the pros and cons of MCTS vs AB?
3) Does the Lc0 MCTS play the game out fully or just to a fixed point? If it is fixed, how deep?
I've heard MCTS is less able to handle tactics but I'm not sure why.
MCTS-UCT was used successfully in Go, MCTS-PUCT was used successfully in A0/LC0 with NN Policy/Value head.UCT has been shown to outperform traditional minimax based approaches in several challenging domains such as Go and KriegSpiel, although minimax search still prevails in other domains such as Chess. This work provides insights into the properties of adversarial search spaces that play a key role in the success or failure of UCT and similar sampling-based approaches. We show that certain "early loss" or "shallow trap" configurations, while unlikely in Go, occur surprisingly often in games like Chess (even in grandmaster games). We provide evidence that UCT, unlike minimax search, is unable to identify such traps in Chess and spends a great deal of time exploring much deeper game play than needed.
--
Srdja
-
- Posts: 1631
- Joined: Tue Aug 21, 2018 7:52 pm
- Full name: Dietrich Kappe
Re: MCTS vs Alpha-beta
MCTS is an umbrella term. There are lots of similar algorithms grouped under it. Essentially, these are algorithms that sample the most promising parts of the search tree, rather than doing an exhaustive search.
The chess programming wiki has a reasonable overview of the topic. https://www.chessprogramming.org/UCT
In the future, MCTS will continue to be a viable option for those wanting to double down on expensive evaluation. Note that a0lite, searching just 400 nps using UCT, was able to scrounge a few points in the tcec qualifying league against engines searching 7 orders of magnitude more nodes.
The chess programming wiki has a reasonable overview of the topic. https://www.chessprogramming.org/UCT
In the future, MCTS will continue to be a viable option for those wanting to double down on expensive evaluation. Note that a0lite, searching just 400 nps using UCT, was able to scrounge a few points in the tcec qualifying league against engines searching 7 orders of magnitude more nodes.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
-
- Posts: 511
- Joined: Sun Apr 26, 2020 11:40 pm
- Full name: Brian D. Smith
Re: MCTS vs Alpha-beta
I find the only real strength (and it is important) in MCTS is for ANALYSIS - particularly for building up your repertoire for OTB play.
Being able to see just how narrow a path your opponent has to walk if you play 2nd/3rd/4th 'best' moves (compared to 'best' engine eval moves) is SO important. Practical play is what guides OTB games, not 'perfect analysis'. It is human vs human play and mistakes big and small rule the day there.
That is why I intend on getting the next version of Komodo - for the MCTS it offers.
You can't really do the same by letting, say regular or NN Stockfish look at 4 or 5 pv. From everything I read, MCTS is perfect for that. I do hope Team Komodo does not abandon it. I think there is a market for MCTS used for OTB prep.
Being able to see just how narrow a path your opponent has to walk if you play 2nd/3rd/4th 'best' moves (compared to 'best' engine eval moves) is SO important. Practical play is what guides OTB games, not 'perfect analysis'. It is human vs human play and mistakes big and small rule the day there.
That is why I intend on getting the next version of Komodo - for the MCTS it offers.
You can't really do the same by letting, say regular or NN Stockfish look at 4 or 5 pv. From everything I read, MCTS is perfect for that. I do hope Team Komodo does not abandon it. I think there is a market for MCTS used for OTB prep.
-
- Posts: 65
- Joined: Sun Aug 19, 2018 10:57 pm
- Full name: Nickolas Reynolds
Re: MCTS vs Alpha-beta
A simplified explanation of how A0 and LC0 implement MCTS -- or, more accurately, pUCT -- goes something like this:
If there's a good tactical sequence where enough intermediate positions have generally bad features -- all of which will likely be assigned low intrinsic weight -- then it's very possible that sequence will never be explored enough for the algorithm to realize the truth of the position.
- There is a list of everything that could be a move in any position. For example, if there were a queen on a1, it could move to h8, so a1-h8 is on that list, regardless of what the actual position in the current game is.
- The "policy head" of the neural network assigns a score to every entry on that list. If the neural networks are doing their job well, "moves" that don't represent actual moves in the current position should be given a score of 0 by the neural network, though I believe this is also explicitly enforced externally. (As in, external to the neural network, all impossible moves are zeroed out.)
- A "rollout" is performed on the highest scoring move on the list. The game tree is navigated to a certain depth, following the branch with the highest score at each node, until the desired depth is reached, and then the "value head" of the neural network assigns a value to the terminal node.
- This information is propagated up the game tree and scores are adjusted accordingly using some formula.
- Any nodes that were visited have their scores adjusted by some formula so that they are less likely to be visited in the future. This forces exploration and is governed by a hyperparameter of the algorithm. (A human picks it.)
- When thinking time runs out, the most visited node is selected as the move.
If there's a good tactical sequence where enough intermediate positions have generally bad features -- all of which will likely be assigned low intrinsic weight -- then it's very possible that sequence will never be explored enough for the algorithm to realize the truth of the position.
-
- Posts: 508
- Joined: Fri Jun 04, 2010 7:23 am
Re: MCTS vs Alpha-beta
Thank you. I have to absorb this a little longer to fully understand.DrCliche wrote: ↑Sat Aug 15, 2020 8:02 pm A simplified explanation of how A0 and LC0 implement MCTS -- or, more accurately, pUCT -- goes something like this:
As to why this procedure is bad (relative to good AB search) at tactics, it's because many tactical sequences are quite deep, and tend to involve series of moves with features that are "generally bad", and as such are assigned low intrinsic weight by the policy head. For example, hanging a queen is generally bad, and the policy head will most likely rank any move that hangs a queen quite poorly.
- There is a list of everything that could be a move in any position. For example, if there were a queen on a1, it could move to h8, so a1-h8 is on that list, regardless of what the actual position in the current game is.
- The "policy head" of the neural network assigns a score to every entry on that list. If the neural networks are doing their job well, "moves" that don't represent actual moves in the current position should be given a score of 0 by the neural network, though I believe this is also explicitly enforced externally. (As in, external to the neural network, all impossible moves are zeroed out.)
- A "rollout" is performed on the highest scoring move on the list. The game tree is navigated to a certain depth, following the branch with the highest score at each node, until the desired depth is reached, and then the "value head" of the neural network assigns a value to the terminal node.
- This information is propagated up the game tree and scores are adjusted accordingly using some formula.
- Any nodes that were visited have their scores adjusted by some formula so that they are less likely to be visited in the future. This forces exploration and is governed by a hyperparameter of the algorithm. (A human picks it.)
- When thinking time runs out, the most visited node is selected as the move.
If there's a good tactical sequence where enough intermediate positions have generally bad features -- all of which will likely be assigned low intrinsic weight -- then it's very possible that sequence will never be explored enough for the algorithm to realize the truth of the position.
But isn't it NN engines hanging queens and doing a lot of sacrifices for positional (not tactical) advantage?
So those nodes are visited the most because those moves are chosen.
-
- Posts: 1631
- Joined: Tue Aug 21, 2018 7:52 pm
- Full name: Dietrich Kappe
Re: MCTS vs Alpha-beta
You will find that small nets looking at in the hundreds or thousands of nodes per move will occasionally fall for simple tactics, but not just hanging material.
Big nets on gpu’s are looking at hundreds of thousands or millions of nodes per move. The kind of tactical oversight they fall for is the extremely deep stuff that only the latest stockfish versions can find on monster hardware. Even those aren’t that common and might happen to another ab Engine that doesn’t search as deep.
Certainly mcts/nn engines will value dynamic elements of a position if they improve their chances of winning.
It’s the other way around. A move is played because it was visited the most.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
-
- Posts: 708
- Joined: Mon Jan 16, 2012 6:34 am
Re: MCTS vs Alpha-beta
Leela wont hang any material in 99.99% of time even if she have zero search , 1 node (policy only). I dont know how and where you get those rumours from.h1a8 wrote: ↑Sat Aug 15, 2020 10:20 pmThank you. I have to absorb this a little longer to fully understand.DrCliche wrote: ↑Sat Aug 15, 2020 8:02 pm A simplified explanation of how A0 and LC0 implement MCTS -- or, more accurately, pUCT -- goes something like this:
As to why this procedure is bad (relative to good AB search) at tactics, it's because many tactical sequences are quite deep, and tend to involve series of moves with features that are "generally bad", and as such are assigned low intrinsic weight by the policy head. For example, hanging a queen is generally bad, and the policy head will most likely rank any move that hangs a queen quite poorly.
- There is a list of everything that could be a move in any position. For example, if there were a queen on a1, it could move to h8, so a1-h8 is on that list, regardless of what the actual position in the current game is.
- The "policy head" of the neural network assigns a score to every entry on that list. If the neural networks are doing their job well, "moves" that don't represent actual moves in the current position should be given a score of 0 by the neural network, though I believe this is also explicitly enforced externally. (As in, external to the neural network, all impossible moves are zeroed out.)
- A "rollout" is performed on the highest scoring move on the list. The game tree is navigated to a certain depth, following the branch with the highest score at each node, until the desired depth is reached, and then the "value head" of the neural network assigns a value to the terminal node.
- This information is propagated up the game tree and scores are adjusted accordingly using some formula.
- Any nodes that were visited have their scores adjusted by some formula so that they are less likely to be visited in the future. This forces exploration and is governed by a hyperparameter of the algorithm. (A human picks it.)
- When thinking time runs out, the most visited node is selected as the move.
If there's a good tactical sequence where enough intermediate positions have generally bad features -- all of which will likely be assigned low intrinsic weight -- then it's very possible that sequence will never be explored enough for the algorithm to realize the truth of the position.
But isn't it NN engines hanging queens and doing a lot of sacrifices for positional (not tactical) advantage?
So those nodes are visited the most because those moves are chosen.
Leela on zero search is 2400 blitz in lichess. (Stronger than most Lower rated titled players, NM, CM etc)
https://lichess.org/@/Leela1Node
-
- Posts: 1796
- Joined: Thu Sep 18, 2008 10:24 pm
Re: MCTS vs Alpha-beta
I read the full article you posted which made my head spin, but it was helpful.smatovic wrote: ↑Sat Aug 15, 2020 6:55 pmhttps://www.chessprogramming.org/UCT#Ra ... an_et_al.Werewolf wrote: ↑Sat Aug 15, 2020 6:23 pm With Lc0 a lot of the attention went on the NN and how it was better than a hand-crafted Evaluation Function.
But I'm trying to understand the search side of things.
I have a basic, layman's, working knowledge of alpha-beta search in chess and virtually no real understanding of MCTS, which I think is used in Lc0 and Komodo MCTS?
Can I ask:
1) Where are details of how the search works, specific to chess?
2) What are the pros and cons of MCTS vs AB?
3) Does the Lc0 MCTS play the game out fully or just to a fixed point? If it is fixed, how deep?
I've heard MCTS is less able to handle tactics but I'm not sure why.
MCTS-UCT was used successfully in Go, MCTS-PUCT was used successfully in A0/LC0 with NN Policy/Value head.UCT has been shown to outperform traditional minimax based approaches in several challenging domains such as Go and KriegSpiel, although minimax search still prevails in other domains such as Chess. This work provides insights into the properties of adversarial search spaces that play a key role in the success or failure of UCT and similar sampling-based approaches. We show that certain "early loss" or "shallow trap" configurations, while unlikely in Go, occur surprisingly often in games like Chess (even in grandmaster games). We provide evidence that UCT, unlike minimax search, is unable to identify such traps in Chess and spends a great deal of time exploring much deeper game play than needed.
--
Srdja
So to sum up, Lc0 isn't doing full-game "rollouts", but rather it's building a kindof speculative, highly asymmetric game tree ?
-
- Posts: 1631
- Joined: Tue Aug 21, 2018 7:52 pm
- Full name: Dietrich Kappe
Re: MCTS vs Alpha-beta
The base version of a0lite (https://github.com/dkappe/a0lite) has a mcts/uct algorithm that is only 95 lines of python code. Have a play with it. It certainly helped my understanding of how it works.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".