MCTS without random playout?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Antonio Torrecillas
Posts: 90
Joined: Sun Nov 02, 2008 4:43 pm
Location: Barcelona

MCTS without random playout?

Post by Antonio Torrecillas »

Hi,

I'm working again on my unconventional chess engine and looking for new ways to improve the probability Bstar algorithm.I have tried to eliminate the double expansion of the nodes (real + optimism),so as to unify the two phases of the search.
This thinking has led me to an algorithm very similar to MCTS.

Let us examine this:(I know I've oversimplified all algorithms. This is only to highlight the similarities and essential differences.)

Code: Select all

MCTS algorithm:
===============
Selection step.
   from root to a leaf, select the nodes according to heuristic UCT. (Upper Confidence Bound applied to tree).
Play out.
   Random or pseudo-random self-play.
expansion.
   add node to the tree.
Backpropagation.  
   Propagating the values back in the tree (negamax or minimax)

Code: Select all

PB* algorithm:
==============
TraceDown step.
  from root to a leaf, select the nodes according to heuristic PB*.(alternate best move with best optimism probability).
Expand.
  Alpha-beta, add node to the tree.
Backup.
   Propagating the values back in the tree (negamax or minimax)

Code: Select all

How to name algorithm ?:
========================
Selection step.
   from root to a leaf, select the nodes according to heuristic UCT.
Expand.
   Alpha-beta, add node to the tree.
Backup.
   Propagating the values back in the tree (negamax or minimax)
This is my question:
What name should have a MCTS algorithm with the "random play out" replaced by an "alpha-beta"?

Without randomness it looks like it no longer deserve the words "Monte Carlo".

regards, Antonio.
Dave_N
Posts: 153
Joined: Fri Sep 30, 2011 7:48 am

Re: MCTS without random playout?

Post by Dave_N »

EABTS (expanding alpha beta tree search )
Dave_N
Posts: 153
Joined: Fri Sep 30, 2011 7:48 am

Re: MCTS without random playout?

Post by Dave_N »

UCABTS (Upper Confidence Alpha Beta Tree Search) or UCAB
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: MCTS without random playout?

Post by Daniel Shawul »

I am not sure I understand your method but if you replace the random playout by alpha-beta (shallow search / evaluation i suppose), it means the algorithm becomes deterministic. Or is your evaluation probabilistic as well which gives multiple values instead of one heuristic value ? Just a "keep it up" thought for the new year : ) Good luck.
Antonio Torrecillas
Posts: 90
Joined: Sun Nov 02, 2008 4:43 pm
Location: Barcelona

Re: MCTS without random playout?

Post by Antonio Torrecillas »

You're right, the algorithm is deterministic and so is PB*.
PB* has two evaluation values, the normal value comes from an alpha-beta.(the depth is called probe depth).
The second value is the lower bound that comes from a nullmove + alpha-beta.(pessimistic value).
Note that the pessimistic value of one side is the optimistic value of the other side.
Then we define a target value, usually the value of the best move at root plus some extra centipawns.
Now we can determine a probability that a node is best at root.
if the Real_value is >= Target_value the probability is 100%.
if the target_value is >= optimistic_value the probability is 0%.
in between we use the proportional rule.
PB* is deterministic in that it always choose the best real value or the best probability.

the main drawback of PB* is this double probe search (real + pessimistic) and the double phase search (select + verify).
nevertheless, its performance is better than MCTS in chess. Well, at least in my experiments.

The new approach is a kind of Greedy algorithm (always choose the best heuristic value) on top of alpha-beta who serve as evaluation.
or a mcts without random playout.

Because I'll publish the sources, I don't want to use terms that may be ambiguous or confusing.
That is the reason for my post.

best regards.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: MCTS without random playout?

Post by Daniel Shawul »

I was not not familiar with PB* before.Thanks for the explanation. It should definitely be better than MCTS at least for chess because random play outs are bad with tactics even when using a heavy play out scheme. You may have luck in other games with it but I lost hope with UCT for chess. Just to make sure I am following you, your algorithm generates a tree in memory where nodes are selected with UCT rule (instead of a PB* heuristic rule) but it is the same as PB* for the rest. The eval triplet at the leaf nodes beinf:

Code: Select all

normal = alpha-beta
pessimistic = forced nullmove + alpha-beta 
optimistic = alpha-beta + forced nullmove + alpha-beta
Only two of these are needed as you mentinoned (real + pessimstic). So you iteratively increment the depth of the alpha-beta search on each visit of the leaf node. So it sounds like an algorithm for selectivity in the deeper part of the tree (nodes stored in memory). The UCT tree is grown very selectively especially if you use a low exploration coefficient very similar to an LMR + pvs tree. It should be competitive with standard pvs search and I am not surprized that it beats mcts for chess.
I look forward to your code.
regards,
Daniel
Antonio Torrecillas
Posts: 90
Joined: Sun Nov 02, 2008 4:43 pm
Location: Barcelona

Re: MCTS without random playout?

Post by Antonio Torrecillas »

Just to solve or dissolve the interest in this type of algorithms let me provide some data.
The current implementation with Greedy search and a simple evaluation (ufo evaluation = material + piece square ) yields the following results:
Rocinante 2.0 (Greedy+UFO)<> Clueless V1.4 TC:60/10 +694-2993=313 ELO:-227
Rocinante 2.0 (Greedy+UFO)<> Rocinante 1.01 JA (PB*+UFO) TC:60/24 +68-16=16 ELO:+200
(Rocinante 2.0 will have an UCI option to select the search algorithm: Greedy or PB*)

Obviously I should try other engines and control time, but we can estimate an ELO (CCRL 40/4) of 1800 for Rocinante 2.0
In the schedule I have yet to try an probe AB fixed time rather than fixed depth. Check how it responds with a more complete evaluation function and I'm thinking a way to integrate the nullmove heuristic in the search.

still remains a lot of fun with this idea. :-)

Note: I decided to call this algorithm as Greedy search. If someone in the community considers this term inappropriate, please argue why.

Daniel, For more details on PB* check http://chessprogramming.wikispaces.com/B%2A
Or download the JA package of Rocinante 1.01. and look at Bstar.cpp
The best is the paper "B* Probability Based Search " Hans Berliner and Chris McConnell.
smatovic
Posts: 2658
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: MCTS without random playout?

Post by smatovic »

Hi Antonio,

i tried to implement this win count based best first minimax and have to say that my version can not compete with an AB depth 4 searcher. But it plays a lot of better than an plain MC or UCT approach...so may i ask what uct score formula you use? my looks like this:

Code: Select all

uct_score =  child.wins / child.visits  + 10 *  sqrt&#40; log&#40;parent.visits&#41; / &#40;5*child.visits&#41; ) ;
On Leaf nodes i do an AB-Qsearch, if the score is positive i add a win. The Score from the AB search is backed up to the root and the root move with the best score is selectet.


--
Srdja
Antonio Torrecillas
Posts: 90
Joined: Sun Nov 02, 2008 4:43 pm
Location: Barcelona

Re: MCTS without random playout?

Post by Antonio Torrecillas »

Hi Srdja,
currently I use the following formula:

Code: Select all

Area&#91;i&#93; = Real&#91;i&#93; + sqrt&#40;&#40;2000.0 * log&#40;np&#41;)/&#40;aux->SubtreeSize+1&#41;);
Real is the eval for this node (AB + quiesce depth 1-3 function of the number of moves. So deeper in end games and forced positions).
np is the total of visit (parent node)
SubtreeSize is the number of visit of this node.

The branch with best Area is explored.
AB values are backed up as you describe.

I'm working in the eval and I just did a quick test against Clueless +33-49=18 TC:60/10

There is a clear improvement with a better eval, but Rocinante is still weak tactically.

Regards,Antonio.
smatovic
Posts: 2658
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: MCTS without random playout?

Post by smatovic »

thanks,

i still play with these values....interesting how the engine changes its play...

Currently i got with this formula the best results:

Code: Select all

uct_score =  child.wins / child.visits  + 0.01 *  sqrt&#40; &#40;2000 * log&#40;parent.visits&#41;)  / child.visits ) ; 
--
Srdja