Search Question

Discussion of chess software programming and technical issues.

Moderators: hgm, chrisw, Rebel

Werewolf
Posts: 1909
Joined: Thu Sep 18, 2008 10:24 pm

Search Question

Post by Werewolf »

Is it fair to say Lc0's search is based on the assumption the opponent's play will be sub-optimal?

Is it fair to say Stockfish's search is based on the assumption the opponent's play will be optimal?

If these two statements are correct, is it fair to say Lc0 will tend to choose moves which are more tricky/trappy/difficult for sub-optimal opponents such as humans?
User avatar
towforce
Posts: 11999
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: Search Question

Post by towforce »

Just a guess - I don't actually know - but I would think that they're both using alpha-beta pruning (link), in which case they'll both be "assuming best play" from their opponent.
The simple reveals itself after the complex has been exhausted.
User avatar
Eelco de Groot
Posts: 4622
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: Search Question

Post by Eelco de Groot »

I am probably the last person who should answer any Lc0 questions but I believe I have read somewhere Alpha Zero and the like mostly use some form of Monte Carlo searching, I thinkthat is true for the Go programs too. You can get a bit broader searches that way but there is no asymmetric evaluation that I know of. You could get that with using a diferent network for Black and White for instance, I suppose. If one of those networks is weaker than your strongest choice the question is if you would really gain a lot even if it (the weaker network) modeled the opponent better. If you are within the draw margin in chess you can take some chances because the draw would still be safe. But if you are ahead, assuming optimal play is better, think of mate searches for instance in the extreme case. And if you are behind, taking chances, on the whole will not do much good (on average I mean) so useful there mainly if you need a win at all costs anyway (or you need a draw and extra chances of a loss by taking more risk don't matter much because the chance of a loss is already very big). It (asymmetric Lc0) if that exists would be similar to all the "contempt" tries in alpha beta chess. Presumably Torch knows more about this than Stockfish. But the asymmetry is not something specific to Lc0 I believe, playing style is mainly different by differently trained networks see also programs like Patricia (NNUE), but what Werewolf is referring too sounds more like the contempt approach in Alpha Beta + NNUE these days.
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
User avatar
shawn
Posts: 84
Joined: Fri Jun 28, 2024 9:24 am
Full name: Shawn Xu

Re: Search Question

Post by shawn »

Werewolf wrote: Thu Jul 04, 2024 12:52 pm Is it fair to say Lc0's search is based on the assumption the opponent's play will be sub-optimal?

Is it fair to say Stockfish's search is based on the assumption the opponent's play will be optimal?

If these two statements are correct, is it fair to say Lc0 will tend to choose moves which are more tricky/trappy/difficult for sub-optimal opponents such as humans?
No, Lc0 assumes perfect play. However, it is possible to configure Lc0 in such a way that it chooses sub-optimal moves that achieve a higher win rate against weaker opponents. This is called WDL Contempt.
User avatar
shawn
Posts: 84
Joined: Fri Jun 28, 2024 9:24 am
Full name: Shawn Xu

Re: Search Question

Post by shawn »

towforce wrote: Fri Jul 05, 2024 9:13 pm Just a guess - I don't actually know - but I would think that they're both using alpha-beta pruning (link), in which case they'll both be "assuming best play" from their opponent.
Lc0 is unique among top engines in that it uses MCTS instead of alpha-beta pruning. MCTS allows Leela to batch evaluate positions with a heavy neural network, which is not possible under a traditional alpha-beta minimax search.
User avatar
towforce
Posts: 11999
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: Search Question

Post by towforce »

shawn wrote: Sat Jul 06, 2024 1:24 am
towforce wrote: Fri Jul 05, 2024 9:13 pm Just a guess - I don't actually know - but I would think that they're both using alpha-beta pruning (link), in which case they'll both be "assuming best play" from their opponent.
Lc0 is unique among top engines in that it uses MCTS instead of alpha-beta pruning. MCTS allows Leela to batch evaluate positions with a heavy neural network, which is not possible under a traditional alpha-beta minimax search.

Thank you for correcting my misconception. Always good to learn! :)
The simple reveals itself after the complex has been exhausted.
Werewolf
Posts: 1909
Joined: Thu Sep 18, 2008 10:24 pm

Re: Search Question

Post by Werewolf »

shawn wrote: Sat Jul 06, 2024 1:21 am
Werewolf wrote: Thu Jul 04, 2024 12:52 pm Is it fair to say Lc0's search is based on the assumption the opponent's play will be sub-optimal?

Is it fair to say Stockfish's search is based on the assumption the opponent's play will be optimal?

If these two statements are correct, is it fair to say Lc0 will tend to choose moves which are more tricky/trappy/difficult for sub-optimal opponents such as humans?
No, Lc0 assumes perfect play. However, it is possible to configure Lc0 in such a way that it chooses sub-optimal moves that achieve a higher win rate against weaker opponents. This is called WDL Contempt.
And are you certain about this? Recently Larry K asserted that Lc0 / A0 chose moves based on this (I think "unsound" was the word he used IIRC) and various sources support this view.
On the Lc0 discord they are saying it is (effectively) true that Lc0 assumes sub-optimal play, not because of contempt but because of:

"MCTS doesn’t assume sub optimal play entirely, but Leela prefers moves whose suboptimal play leads to her winning more in search. This comes about in two ways:

1.) Leela thinks one move is better than another for a while and builds up a lot of nodes, but then leela searches another move whose Q increases more than its U decreases consecutively over and over again, which would get searched more eventually and be deemed a better move by Q (this would isn’t actually guaranteed).

2.) This scenario is especially apparent in endgames and you will typically see leela’s eval go negative in drawn games in events like CCC and TCEC because leela will prefer positions whose suboptimal moves are wins for her.

So I would not say that leela is risky, but I would say that leela is tricky however. It cares about suboptimal moves, but at infinite search it will prefer the best move."
User avatar
hgm
Posts: 28205
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Search Question

Post by hgm »

MCTS attempts to find the best move against what it considers best counterplay. But for a finite search it is not perfect at this, as it also keeps track of the reliability of the scores it assigns to the moves. So it will also spend time on sub-optimal moves of the opponent, just to get a more reliable score on those, in order to establish that these are indeed sub-optimal.

It is true that the searches of the sub-optimal moves are weighted into the score of the position initially, and that this effect only disappears after it is clear which is the best move, so that only that one is searched deeper, and will eventually dominate the score. At that stage positions where you can make many bad moves, especially bad moves that require a lot of search before you will see they are bad, will be evaluated lower for a longer time. And that the opponent thus would prefer to bring you in such positions.
Werewolf
Posts: 1909
Joined: Thu Sep 18, 2008 10:24 pm

Re: Search Question

Post by Werewolf »

hgm wrote: Sat Jul 06, 2024 5:57 pm MCTS attempts to find the best move against what it considers best counterplay. But for a finite search it is not perfect at this, as it also keeps track of the reliability of the scores it assigns to the moves. So it will also spend time on sub-optimal moves of the opponent, just to get a more reliable score on those, in order to establish that these are indeed sub-optimal.

It is true that the searches of the sub-optimal moves are weighted into the score of the position initially, and that this effect only disappears after it is clear which is the best move, so that only that one is searched deeper, and will eventually dominate the score. At that stage positions where you can make many bad moves, especially bad moves that require a lot of search before you will see they are bad, will be evaluated lower for a longer time. And that the opponent thus would prefer to bring you in such positions.
Thank you that’s helpful.
However, it seems some of that could loosely apply to A-B where inferior moves are considered until they are shown to be inferior and pruned back.

I can’t see anything conclusive about MCTS which lends itself to a more testing style against inferior players, such as humans. Would you agree?
User avatar
hgm
Posts: 28205
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Search Question

Post by hgm »

I have never looked at the LC0 source, so I don't know how exactly they implemented MCTS. But in the basic MCTS flavor you would calculate the value of a position by the average of the value of the daughter positions, weighted by the number of times the moves leading to those have been tried. In minimax / alpha-beta you would just take the score of the best move. In the latter case having searched inferior moves would not affect the score of the position. But with MCTS they do, which game-theoretically is of course wrong. In the limit of infinite search the contribution of the weaker moves would eventually vanish, because the move-selection criterion would try them less and less frequently as it becomes more obvious (statistically) they are inferior.

For a finite search the sub-optimal branches still contribute to the score. So you might choose a move that offers many opportunities for the opponent to 'blunder', (i.e. 'set a trap'), as these slightly suppress his score, even though the best move is slightly better for him than in a branch where he cannot blunder. (And the deeper the search that is needed to recognize the blunder, the more weight it will acquire before it is finally discarded!)

Of course this way of looking at things is a bit dubious. Game-theoretically there is no such thing as a 'slightly better' move. You either win, draw or lose, and the only sub-optimal moves are those that blunder away 0.5 or 1 point. We are arguing here in terms of a heuristic evaluation, which tries to estimate the winning chances. In this context the number of blunders the opponent can make is a very significant part of the evaluation; using the game-theoretical (WDL) evaluation would make you play very poorly indeed. You would never blunder, but you would never put any pressure on the opponent to make him blunder. You would just play random moves that dou not deteriorate your result.