MCTS evaluation question

maksimKorzh · Post by **maksimKorzh** » Tue Nov 03, 2020 12:21 am

Hi guys, after getting criticized for embedding sf nnue into my engine I decided to change the exploration direction
and now learning MCTS algorithm. What I came up with so far is working Tic Tac Toe game that uses UCB1 formula
for selecting nodes and light (random) rollouts for the simulation phase. I've also been reading an article describing how
to use NN to guide MCTS. I don't say that MCTS guided by NN is that clear from the implementation perspective but
at least it's surprisingly clear conceptually - we just alter UCB1 (or it's UCT variation) so that NN predictions influence
which node to consider. But there're still some points that are completely unclear for me, so I'm asking for some clarifications.

So let's imagine we're making a rollout. I read that rollout should be done until terminal state of the game is reached, e.g. win/draw/loss.
So does it mean that MCTS based search doesn't have a static evaluation like alpha beta search does?

I also know (from Daniel Shawul) that alpha-beta rollouts are used, so do alpha-beta rollouts play until the terminal node has been reached as well?
Or they can return a value at a certain depth?

Last thing that confuses me the most - I read that Alpha Zero uses 2 NN - one to guide MCTS which is clear and another for the evaluation and this
really confuses me the most because static evaluation is needed only when we can't search until the terminal state right? So if we do search until the terminal state then why do we need static eval, we don't right?

In regards to above - does Leela have an evaluation NN (I guess so...) and if it does then HOW is that different from sf nnue.
I mean is different conceptually or just a different architecture and implementation but the same gist?

I would appreciate any insights. Thanks in advance!

P.S. I've been exploring source code of A0lite by dkappe and saw that it's definitely using a separate NN for eval while MCTS is UCT based but it's still a mystery for me like how (WHEN) the score (is it STATIC EVAL?) is getting assigned.

Madeleine Birchfield · Tue Nov 03, 2020 12:49 am

MCTS is just a search algorithm, you could replace your alpha beta minimax search with whatever MCTS based search, and use your evaluation function whether NNUE or handcrafted with MCTS. That is what Komodo and the Stockfish derivatives BrainLearn and ShashChess do with their MCTS option.

What differs between traditional engines and Allie, Leela, and Scorpio are that in Allie, Leela, and Scorpio, in addition to evaluation, move ordering is done in the neural network as well, instead of using handwritten heuristics as in Stockfish, Komodo, Ethereal, et cetera. This is what 'use NN to guide MCTS' means, it is using a neural network for move ordering in the search.

You mentioned that 'I read that Alpha Zero uses 2 NN - one to guide MCTS which is clear and another for the evaluation'; that is not true. It is AlphaGo and AlphaGo Zero that used two neural networks, one for move ordering to guide the MCTS, and the other for the evaluation. When Deepmind moved to AlphaZero, it fused the two networks into one neural network that has two outputs, policy (for move ordering) and vlaue (for evaluation).

So in a sense, AlphaGo and AlphaGo Zero are closer to traditional engines than AlphaZero style engines, in the sense that a clear boundary between search and evaluation could still be drawn and their move ordering/policy network could still be considered as part of the search, while AlphaZero's and Leela's networks kind of blur the boundaries between search and evaluation.

The MCTS algorithm that Daniel Shawul uses in Scorpio is a hybrid MCTS/AB search, and the alpha-beta rollouts is a feature unique to that search algorithm. Other MCTS algorithms like PUCT or RENTS do not have any alpha-beta rollouts.

maksimKorzh · Post by **maksimKorzh** » Tue Nov 03, 2020 1:52 am

Madeleine Birchfield wrote: ↑Tue Nov 03, 2020 12:49 am MCTS is just a search algorithm, you could replace your alpha beta minimax search with whatever MCTS based search, and use your evaluation function whether NNUE or handcrafted with MCTS. That is what Komodo and the Stockfish derivatives BrainLearn and ShashChess do with their MCTS option.

What differs between traditional engines and Allie, Leela, and Scorpio are that in Allie, Leela, and Scorpio, in addition to evaluation, move ordering is done in the neural network as well, instead of using handwritten heuristics as in Stockfish, Komodo, Ethereal, et cetera. This is what 'use NN to guide MCTS' means, it is using a neural network for move ordering in the search.

You mentioned that 'I read that Alpha Zero uses 2 NN - one to guide MCTS which is clear and another for the evaluation'; that is not true. It is AlphaGo and AlphaGo Zero that used two neural networks, one for move ordering to guide the MCTS, and the other for the evaluation. When Deepmind moved to AlphaZero, it fused the two networks into one neural network that has two outputs, policy (for move ordering) and vlaue (for evaluation).

So in a sense, AlphaGo and AlphaGo Zero are closer to traditional engines than AlphaZero style engines, in the sense that a clear boundary between search and evaluation could still be drawn and their move ordering/policy network could still be considered as part of the search, while AlphaZero's and Leela's networks kind of blur the boundaries between search and evaluation.

The MCTS algorithm that Daniel Shawul uses in Scorpio is a hybrid MCTS/AB search, and the alpha-beta rollouts is a feature unique to that search algorithm. Other MCTS algorithms like PUCT or RENTS do not have any alpha-beta rollouts.

Thanks for such an extended answer.
So I narrow my question - in Leela where policies and value are the outputs of a single NN - how VALUE is calculated, I mean does it take board position into a count? Does it take care of material/positional score? Does it happen in terminal nodes or in non terminal nodes? And the most important question - does rollout actually reach the win/draw/loss state?

Madeleine Birchfield · Tue Nov 03, 2020 8:29 am

maksimKorzh wrote: ↑Tue Nov 03, 2020 1:52 am Thanks for such an extended answer.
So I narrow my question - in Leela where policies and value are the outputs of a single NN - how VALUE is calculated, I mean does it take board position into a count? Does it take care of material/positional score? Does it happen in terminal nodes or in non terminal nodes? And the most important question - does rollout actually reach the win/draw/loss state?

I'm not sure about the exact details of the implementation of Leela's neural network; somebody like Dietrich Kappe or Albert Silver might be a better person to ask about as both of them has had experience with training nets compatible with Leela, or you could go to ask the Leela developers themselves. However, you could have a look at Leela's network code directly here:

https://github.com/LeelaChessZero/lc0/t ... src/neural
https://github.com/LeelaChessZero/lc0/b ... /weights.h

Joerg Oster · Post by **Joerg Oster** » Tue Nov 03, 2020 12:07 pm

maksimKorzh wrote: ↑Tue Nov 03, 2020 1:52 am Thanks for such an extended answer.
So I narrow my question - in Leela where policies and value are the outputs of a single NN - how VALUE is calculated, I mean does it take board position into a count? Does it take care of material/positional score? Does it happen in terminal nodes or in non terminal nodes? And the most important question - does rollout actually reach the win/draw/loss state?

Well, Lc0 doesn't do Rollouts/Simulations, but solely rely on the evaluation part of the NN to evaluate a node.

You don't have to evaluate terminal nodes, as you already know the result. Win, Loss or Draw.
That's why they are terminal!

This page https://int8.io/monte-carlo-tree-search ... Alpha_Zero
has a nice introduction to MCTS/UCT and also shows the differences between the basic UCT formula to guide the search and the p-UCT formula used in Alpha Zero.

maksimKorzh · Post by **maksimKorzh** » Tue Nov 03, 2020 12:40 pm

Joerg Oster wrote: ↑Tue Nov 03, 2020 12:07 pm
maksimKorzh wrote: ↑Tue Nov 03, 2020 1:52 am Thanks for such an extended answer.
So I narrow my question - in Leela where policies and value are the outputs of a single NN - how VALUE is calculated, I mean does it take board position into a count? Does it take care of material/positional score? Does it happen in terminal nodes or in non terminal nodes? And the most important question - does rollout actually reach the win/draw/loss state?
Well, Lc0 doesn't do Rollouts/Simulations, but solely rely on the evaluation part of the NN to evaluate a node.

You don't have to evaluate terminal nodes, as you already know the result. Win, Loss or Draw.
That's why they are terminal!

This page https://int8.io/monte-carlo-tree-search ... Alpha_Zero
has a nice introduction to MCTS/UCT and also shows the differences between the basic UCT formula to guide the search and the p-UCT formula used in Alpha Zero.

Thank you so much, very insightful resources.

mvanthoor · Post by **mvanthoor** » Thu Nov 05, 2020 11:37 am

maksimKorzh wrote: ↑Tue Nov 03, 2020 12:21 am Hi guys, after getting criticized for embedding sf nnue into my engine I decided to change the exploration direction
and now learning MCTS algorithm.

Cool

I hope you intend to make video's about it; I look forward to your implementations and explanations.

Even though I personally don't really like the coding style you use (to each their own; it's your code and you are the one who has to maintain or use it), I think your video's and explanations are very good. You don't explain any new stuff, but sometimes you explain things just a bit differently than the well-known sources. It has happened a few times that one of your video's switched my understanding of a topic from "I think I understand this... but I could be wrong, so I need to read more about it" to "I understand this, and it works like..."

MCTS is something I want to study as well, later on. Maybe make it an option. At some point I'll probably also look into NNUE, but only if I can write the code myself and train my own networks using my own engine.

All of that stuff will take some time yet

maksimKorzh · Post by **maksimKorzh** » Thu Nov 05, 2020 2:38 pm

mvanthoor wrote: ↑Thu Nov 05, 2020 11:37 am
maksimKorzh wrote: ↑Tue Nov 03, 2020 12:21 am Hi guys, after getting criticized for embedding sf nnue into my engine I decided to change the exploration direction
and now learning MCTS algorithm.
Cool I hope you intend to make video's about it; I look forward to your implementations and explanations.

Even though I personally don't really like the coding style you use (to each their own; it's your code and you are the one who has to maintain or use it), I think your video's and explanations are very good. You don't explain any new stuff, but sometimes you explain things just a bit differently than the well-known sources. It has happened a few times that one of your video's switched my understanding of a topic from "I think I understand this... but I could be wrong, so I need to read more about it" to "I understand this, and it works like..."

MCTS is something I want to study as well, later on. Maybe make it an option. At some point I'll probably also look into NNUE, but only if I can write the code myself and train my own networks using my own engine.

All of that stuff will take some time yet

Even though I personally don't really like the coding style you use

Well)))) I prefer "not best practice style code" but that you can fully rely on in terms of understanding of how it works rather that "best practice/production/cool way how pros do" code that you can't figure out of what's going on there because code forces you to get clear with best practices first and only then hopefully with the subject itself (I always get stuck at first pays and usually never get to the subject - that's the reason for my videos to exist).

Cool I hope you intend to make video's about it

Without an intent to make a video series I wouldn't even start that))) I would ever create BBC if it wasn't the matter of video series.

It has happened a few times that one of your video's switched my understanding of a topic from "I think I understand this... but I could be wrong, so I need to read more about it" to "I understand this, and it works like..."

now you know the reason why am I making these videos)

MCTS is something I want to study as well, later on. Maybe make it an option. At some point I'll probably also look into NNUE, but only if I can write the code myself and train my own networks using my own engine.

Well, first of all MCTS is just another search algorithm as it was already mentioned in this thread before. I'm going (already working!) to create 1000 times simplified Leela type engine. Note that even though NNUE is a neural net still it's completely different conceptually from Leela's net. According to my code monkey's understanding SF NNUE is a REGRESSION network which takes board position as input and gives an estimate evaluation value in centi pawns as the output (I name what regression is not because you don't know it but to settle the difference between classification and regression problems in my head - code monkeys need LOTS of repetitions to claim they understand something and start making use of it). On the other hand Leela type net has two outputs - one is logit probability for every move which affects the formula (UCT) balancing between exploration/exploitation and another is winning probabilty which is NOT in centi pawns but just a value of how likely the side is about to win in the current position. Another thing I've finally realized is that AlphaGo used 2 nets - one did output logit probabilities (to influaence the move ordering) and another did winning probability. AlphaZero and Leela already using 1 NN with 2 outputs... In my dumb implementation I would rather go for 2 nets because it's easier for me to understand how they operate.

And one last thing - NNUE can be used instead of eval NN in Leela type engines but I guess (someone please correct me if I'm wrong) it takes all the fun from the idea of reinforcement learning, I mean the idea behind Leela type of NN is that it improves through the self-play while NNUE according to CPW: "using a mixture of supervised and reinforcement learning methods".

Anyway, at the moment I've implemented tictactoe game with MCTS in python (and making a series on that as well btw) and the next step would be do create 2 NNs to use instead of random rollouts. So as far as I currently understand the idea - it's the matter of MCTS to generate training data (logit probabilities for move ordering and winning probability) so we can then compare them with actual node considerations during search along with actual winning results. As they claim in one of the online tutorials: "it's all we need to train our network(s)"

mvanthoor · Post by **mvanthoor** » Thu Nov 05, 2020 3:14 pm

maksimKorzh wrote: ↑Thu Nov 05, 2020 2:38 pm ... neural nets ...

I think it's supercool you're doing this. Even though I can obviously find out most things on my own (probably in the same way as you do), your video's/summaries will be a great help.

I'll have to look into downloading and archiving your video's like I've done with BlueFever's series. It may be a long time before I come around to stuff like this. You're writing small engines to explore and explain concepts; I'm trying to write a chess engine that, hopefully, will still be around in another 15-20 years (and maintained by myself). Therefore I won't be trying lots of different things at the same time, obviously.

Madeleine Birchfield · Fri Nov 06, 2020 6:08 am

maksimKorzh wrote: ↑Tue Nov 03, 2020 12:21 am Hi guys, after getting criticized for embedding sf nnue into my engine

Have you tried implementing the NNUE algorithm into BBC yourself instead of using Stockfish's NNUE code?

MCTS evaluation question

MCTS evaluation question

Re: MCTS evaluation question

Re: MCTS evaluation question

Re: MCTS evaluation question

Re: MCTS evaluation question

Re: MCTS evaluation question

Re: MCTS evaluation question

Re: MCTS evaluation question

Re: MCTS evaluation question

Re: MCTS evaluation question