Why there is no interest in Computer with odds Vs Humans match?

lkaufman · Post by **lkaufman** » Tue Aug 06, 2019 8:07 am

jp wrote: ↑Tue Aug 06, 2019 7:25 am
lkaufman wrote: ↑Fri Aug 02, 2019 5:58 pm
jp wrote: ↑Fri Aug 02, 2019 6:35 am If it sees a clear refutation of the move, will it really still play the bluff?
Yes. Of course the devil is in the details, but it basically chooses its move by a weighted average of all plausible replies, so if one is bad for the engine but the other 29 are good for the engine, it might take the chance, depending on how "bad" and "good" and the number of visits. Remember, engines don't evaluate lines as won or drawn, but just as a win prob. (or centipawn score, which are convertible into each other). So it's not a simple question.
When I posted before, I was assuming that the number of visits would work out in a way that would avoid bluffing. Later, I thought different chess engines may be doing MCTS differently, so there might be differences in bluffing.

Does Komodo MCTS use UCT? I'm not sure from what Mark wrote a few months ago.

mjlef wrote: ↑Wed May 08, 2019 4:24 pm For MCTS you need to keep a tree in memory. In Komodo MCTS, a node in the tree consists of a visit count, sim of win probabilities, a policy value, a like to the next sibling, and a link to a child node. The key is determining which node to visit. So you start at the root node (the present position) and look at all its children, apply a formula like UCT1 to decide which node to select. The you look at the selected nodes children again using UCT1, and do on until you hit one of these:

a. Checkmate
b. Stalemate
c. 50 move draw
d. rep draw
e. leaf node

A leaf node has no children yet, so if that is selected, you generate its children. Lco/leela/lphaZero all use a NN to generate the win prob at this point, and fill in policy network information in the children. Then you back up the win probability, remembering to flip it as you go up the tree so it is from the point of view of that side. If you use a win prob in the range of 0.0..1.0 you just return 1.0 - winprob as you go up the tree to thr root. You increase the count and add with winprob to the nodes's winprob sum.

I see in that thread reference to the CP wiki, which says

UCT ... deals with the flaw of Monte-Carlo Tree Search, when a program may favor a losing move with only one or a few forced refutations, but due to the vast majority of other moves provides a better random playout score than other, better moves.

,
where:
Xj is the win ratio of the child
n is the number of times the parent has been visited
nj is the number of times the child has been visited
C is a constant to adjust the amount of exploration and incorporates the sqrt(2) from the UCB1 formula.
The first component of the UCB1 formula above corresponds to exploitation, as it is high for moves with high average win ratio. The second component corresponds to exploration, since it is high for moves with few simulations.

What we use is closer to what AlphaZero and Lc0 use, but we have changed it substantially. I think that they would all behave similarly in terms of "bluffing". But the degree of "bluffing" will vary from one implementation to another. I think that Lc0 will probably "bluff" to a greater degree when losing than Komodo MCTS; Komodo MCTS won't go quite as "crazy" in a lost position. A way to test this bluffing would be to have Komodo and Komodo MCTS each play a series of handicap games against the same top GM or GMs with the same handicaps to see which would do better, but it would be rather expensive to play a lot of serious games with each engine. I suppose I could run the same experiment against some weaker engine (or even Komodo time-handicapped), but it will only tell us if the bluffing strategy works against engines.

jp · Post by jp » Mon Aug 12, 2019 11:54 am

lkaufman wrote: ↑Tue Aug 06, 2019 8:07 am What we use is closer to what AlphaZero and Lc0 use, but we have changed it substantially. I think that they would all behave similarly in terms of "bluffing". But the degree of "bluffing" will vary from one implementation to another. I think that Lc0 will probably "bluff" to a greater degree when losing than Komodo MCTS; Komodo MCTS won't go quite as "crazy" in a lost position.

All engines using some form of PUCT should converge to zero bluffing from the search, though at different rates. I'd think therefore that the main differences in degree of bluffing will come from the evaluation. Komodo's evaluation I assume is the same as in the non-MCTS version and I'd expect it to have negligible bluffing. The NN IDs I think do have significant bluffing on their own, apart from any search bluffing. I also think the NN bluffing is not just in a losing or very bad position.

lkaufman · Post by **lkaufman** » Mon Aug 12, 2019 4:17 pm

jp wrote: ↑Mon Aug 12, 2019 11:54 am
lkaufman wrote: ↑Tue Aug 06, 2019 8:07 am What we use is closer to what AlphaZero and Lc0 use, but we have changed it substantially. I think that they would all behave similarly in terms of "bluffing". But the degree of "bluffing" will vary from one implementation to another. I think that Lc0 will probably "bluff" to a greater degree when losing than Komodo MCTS; Komodo MCTS won't go quite as "crazy" in a lost position.
All engines using some form of PUCT should converge to zero bluffing from the search, though at different rates. I'd think therefore that the main differences in degree of bluffing will come from the evaluation. Komodo's evaluation I assume is the same as in the non-MCTS version and I'd expect it to have negligible bluffing. The NN IDs I think do have significant bluffing on their own, apart from any search bluffing. I also think the NN bluffing is not just in a losing or very bad position.

The search bluffing is much greater in Komodo than in Lc0 because the NPS rate is much lower (Komodo relies more on quality of searches than on quantity), so the theoretical convergence remains theoretical. The bluffing is in all positions, since best play is not assumed, but in a position where many moves win most people wouldn't use the term to describe a winning move that is not the fastest win against perfect play. But it will happen in fairly even positions, Komodo MCTS might risk a somewhat inferior position if the opponent finds a brilliant sequence in order to get a good position if he doesn't find it.

Why there is no interest in Computer with odds Vs Humans match?

Re: Why there is no interest in Computer with odds Vs Humans match?

Re: Why there is no interest in Computer with odds Vs Humans match?

Re: Why there is no interest in Computer with odds Vs Humans match?