MCTS without random playout?

Antonio Torrecillas · Post by **Antonio Torrecillas** » Sun Jan 01, 2012 8:02 pm

Hi,

I'm working again on my unconventional chess engine and looking for new ways to improve the probability Bstar algorithm.I have tried to eliminate the double expansion of the nodes (real + optimism),so as to unify the two phases of the search.
This thinking has led me to an algorithm very similar to MCTS.

Let us examine this:(I know I've oversimplified all algorithms. This is only to highlight the similarities and essential differences.)

Code: Select all

MCTS algorithm&#58;
===============
Selection step.
   from root to a leaf, select the nodes according to heuristic UCT. &#40;Upper Confidence Bound applied to tree&#41;.
Play out.
   Random or pseudo-random self-play.
expansion.
   add node to the tree.
Backpropagation.  
   Propagating the values back in the tree &#40;negamax or minimax&#41;

Code: Select all

PB* algorithm&#58;
==============
TraceDown step.
  from root to a leaf, select the nodes according to heuristic PB*.&#40;alternate best move with best optimism probability&#41;.
Expand.
  Alpha-beta, add node to the tree.
Backup.
   Propagating the values back in the tree &#40;negamax or minimax&#41;

Code: Select all

How to name algorithm ?&#58;
========================
Selection step.
   from root to a leaf, select the nodes according to heuristic UCT.
Expand.
   Alpha-beta, add node to the tree.
Backup.
   Propagating the values back in the tree &#40;negamax or minimax&#41;

This is my question:
What name should have a MCTS algorithm with the "random play out" replaced by an "alpha-beta"?

Without randomness it looks like it no longer deserve the words "Monte Carlo".

regards, Antonio.

Dave_N · Post by **Dave_N** » Sun Jan 01, 2012 8:36 pm

EABTS (expanding alpha beta tree search )

Dave_N · Post by **Dave_N** » Sun Jan 01, 2012 8:47 pm

UCABTS (Upper Confidence Alpha Beta Tree Search) or UCAB

Daniel Shawul · Post by **Daniel Shawul** » Mon Jan 02, 2012 5:43 pm

I am not sure I understand your method but if you replace the random playout by alpha-beta (shallow search / evaluation i suppose), it means the algorithm becomes deterministic. Or is your evaluation probabilistic as well which gives multiple values instead of one heuristic value ? Just a "keep it up" thought for the new year : ) Good luck.

Antonio Torrecillas · Post by **Antonio Torrecillas** » Mon Jan 02, 2012 7:33 pm

You're right, the algorithm is deterministic and so is PB*.
PB* has two evaluation values, the normal value comes from an alpha-beta.(the depth is called probe depth).
The second value is the lower bound that comes from a nullmove + alpha-beta.(pessimistic value).
Note that the pessimistic value of one side is the optimistic value of the other side.
Then we define a target value, usually the value of the best move at root plus some extra centipawns.
Now we can determine a probability that a node is best at root.
if the Real_value is >= Target_value the probability is 100%.
if the target_value is >= optimistic_value the probability is 0%.
in between we use the proportional rule.
PB* is deterministic in that it always choose the best real value or the best probability.

the main drawback of PB* is this double probe search (real + pessimistic) and the double phase search (select + verify).
nevertheless, its performance is better than MCTS in chess. Well, at least in my experiments.

The new approach is a kind of Greedy algorithm (always choose the best heuristic value) on top of alpha-beta who serve as evaluation.
or a mcts without random playout.

Because I'll publish the sources, I don't want to use terms that may be ambiguous or confusing.
That is the reason for my post.

best regards.

Daniel Shawul · Post by **Daniel Shawul** » Mon Jan 02, 2012 9:19 pm

I was not not familiar with PB* before.Thanks for the explanation. It should definitely be better than MCTS at least for chess because random play outs are bad with tactics even when using a heavy play out scheme. You may have luck in other games with it but I lost hope with UCT for chess. Just to make sure I am following you, your algorithm generates a tree in memory where nodes are selected with UCT rule (instead of a PB* heuristic rule) but it is the same as PB* for the rest. The eval triplet at the leaf nodes beinf:

Code: Select all

normal = alpha-beta
pessimistic = forced nullmove + alpha-beta 
optimistic = alpha-beta + forced nullmove + alpha-beta

Only two of these are needed as you mentinoned (real + pessimstic). So you iteratively increment the depth of the alpha-beta search on each visit of the leaf node. So it sounds like an algorithm for selectivity in the deeper part of the tree (nodes stored in memory). The UCT tree is grown very selectively especially if you use a low exploration coefficient very similar to an LMR + pvs tree. It should be competitive with standard pvs search and I am not surprized that it beats mcts for chess.
I look forward to your code.
regards,
Daniel

Antonio Torrecillas · Post by **Antonio Torrecillas** » Tue Jan 03, 2012 7:53 pm

Just to solve or dissolve the interest in this type of algorithms let me provide some data.
The current implementation with Greedy search and a simple evaluation (ufo evaluation = material + piece square ) yields the following results:
Rocinante 2.0 (Greedy+UFO)<> Clueless V1.4 TC:60/10 +694-2993=313 ELO:-227
Rocinante 2.0 (Greedy+UFO)<> Rocinante 1.01 JA (PB*+UFO) TC:60/24 +68-16=16 ELO:+200
(Rocinante 2.0 will have an UCI option to select the search algorithm: Greedy or PB*)

Obviously I should try other engines and control time, but we can estimate an ELO (CCRL 40/4) of 1800 for Rocinante 2.0
In the schedule I have yet to try an probe AB fixed time rather than fixed depth. Check how it responds with a more complete evaluation function and I'm thinking a way to integrate the nullmove heuristic in the search.

still remains a lot of fun with this idea.

Note: I decided to call this algorithm as Greedy search. If someone in the community considers this term inappropriate, please argue why.

Daniel, For more details on PB* check http://chessprogramming.wikispaces.com/B%2A
Or download the JA package of Rocinante 1.01. and look at Bstar.cpp
The best is the paper "B* Probability Based Search " Hans Berliner and Chris McConnell.

smatovic · Post by **smatovic** » Fri Mar 02, 2012 8:15 pm

Hi Antonio,

i tried to implement this win count based best first minimax and have to say that my version can not compete with an AB depth 4 searcher. But it plays a lot of better than an plain MC or UCT approach...so may i ask what uct score formula you use? my looks like this:

Code: Select all

uct_score =  child.wins / child.visits  + 10 *  sqrt&#40; log&#40;parent.visits&#41; / &#40;5*child.visits&#41; ) ;

On Leaf nodes i do an AB-Qsearch, if the score is positive i add a win. The Score from the AB search is backed up to the root and the root move with the best score is selectet.

--
Srdja

Antonio Torrecillas · Fri Mar 02, 2012 10:37 pm

Hi Srdja,
currently I use the following formula:

Code: Select all

Area&#91;i&#93; = Real&#91;i&#93; + sqrt&#40;&#40;2000.0 * log&#40;np&#41;)/&#40;aux->SubtreeSize+1&#41;);

Real is the eval for this node (AB + quiesce depth 1-3 function of the number of moves. So deeper in end games and forced positions).
np is the total of visit (parent node)
SubtreeSize is the number of visit of this node.

The branch with best Area is explored.
AB values are backed up as you describe.

I'm working in the eval and I just did a quick test against Clueless +33-49=18 TC:60/10

There is a clear improvement with a better eval, but Rocinante is still weak tactically.

Regards,Antonio.

smatovic · Post by **smatovic** » Sat Mar 03, 2012 5:41 pm

thanks,

i still play with these values....interesting how the engine changes its play...

Currently i got with this formula the best results:

Code: Select all

uct_score =  child.wins / child.visits  + 0.01 *  sqrt&#40; &#40;2000 * log&#40;parent.visits&#41;)  / child.visits ) ;

--
Srdja

MCTS without random playout?

MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?