MCTS without random playout?

Daniel Shawul · Post by **Daniel Shawul** » Mon Mar 05, 2012 2:29 am

Hello Anotonio,
I was reading a recent paper that summarizes efforts on MCTS past five years when I came across a section that looks like what you are doing here.
They call it MCTS alpha-beta and apparently is used in the strongest LOA player. (Section 4.8.7.) You can find the paper by Bjornsson from there.
----
cheers

Antonio Torrecillas · Post by **Antonio Torrecillas** » Mon Mar 05, 2012 7:26 pm

OK, if the term MCTS AB is accepted. I call it this way, though I don't like the term Monte Carlo for something that is not random. That's misleading.

Thanks Daniel.

Regards.

diep · Post by **diep** » Thu Mar 08, 2012 9:07 am

Antonio Torrecillas wrote:Hi,

I'm working again on my unconventional chess engine and looking for new ways to improve the probability Bstar algorithm.I have tried to eliminate the double expansion of the nodes (real + optimism),so as to unify the two phases of the search.
This thinking has led me to an algorithm very similar to MCTS.

Let us examine this:(I know I've oversimplified all algorithms. This is only to highlight the similarities and essential differences.)
Code: Select all
MCTS algorithm:
===============
Selection step.
   from root to a leaf, select the nodes according to heuristic UCT. (Upper Confidence Bound applied to tree).
Play out.
   Random or pseudo-random self-play.
expansion.
   add node to the tree.
Backpropagation.  
   Propagating the values back in the tree (negamax or minimax)
Code: Select all
PB* algorithm:
==============
TraceDown step.
  from root to a leaf, select the nodes according to heuristic PB*.(alternate best move with best optimism probability).
Expand.
  Alpha-beta, add node to the tree.
Backup.
   Propagating the values back in the tree (negamax or minimax)
Code: Select all
How to name algorithm ?:
========================
Selection step.
   from root to a leaf, select the nodes according to heuristic UCT.
Expand.
   Alpha-beta, add node to the tree.
Backup.
   Propagating the values back in the tree (negamax or minimax)
This is my question:
What name should have a MCTS algorithm with the "random play out" replaced by an "alpha-beta"?

Without randomness it looks like it no longer deserve the words "Monte Carlo".

regards, Antonio.

hi Antonio,

As a sidenote have you investigated CNS already?
So the conspiracy number search.

I refer to the original CNS not the later failed CNS-2 that P.Conners used and where i need to note that its author didn't reply a single email to me, though the email adress was correct (and for his tournament DID reply on that email adress); despite that i had done big effort, 6 months work at least, to implement CNS.

CNS is a highly interesting search algorithm that allows a lot of additions.

For example you can easily add nullmove and all sorts of interesting ways to selective search should work better with it; for exampe multicut type ideas you can incorporate a lot.

Classical B* is a bit simple formulated of course without knowing how you exactly search.

diep · Post by **diep** » Thu Mar 08, 2012 9:31 am

Antonio Torrecillas wrote:You're right, the algorithm is deterministic and so is PB*.
PB* has two evaluation values, the normal value comes from an alpha-beta.(the depth is called probe depth).
The second value is the lower bound that comes from a nullmove + alpha-beta.(pessimistic value).
Note that the pessimistic value of one side is the optimistic value of the other side.
Then we define a target value, usually the value of the best move at root plus some extra centipawns.
Now we can determine a probability that a node is best at root.
if the Real_value is >= Target_value the probability is 100%.
if the target_value is >= optimistic_value the probability is 0%.
in between we use the proportional rule.
PB* is deterministic in that it always choose the best real value or the best probability.

the main drawback of PB* is this double probe search (real + pessimistic) and the double phase search (select + verify).
nevertheless, its performance is better than MCTS in chess. Well, at least in my experiments.

The new approach is a kind of Greedy algorithm (always choose the best heuristic value) on top of alpha-beta who serve as evaluation.
or a mcts without random playout.

Because I'll publish the sources, I don't want to use terms that may be ambiguous or confusing.
That is the reason for my post.

best regards.

Combining 2 different values in a non transparant manner is very difficult Antonio.

I can give an even simpler example there.

It's 2004 and 2005 that i carried out another few massive attempts trying to search a lot deeper with Diep.

One attempts that took me half a year nearly. Well, far over 3+ months fulltime work in fact, was next idea.

Diep uses to forward prune last few plies, usually last 4 plies, in next manner:

if( score = -qsearch(-beta,-beta+1) >= beta ) then return score;

So a simple qsearch nullmove. However 90%+ of the positions we have a static evaluation in this position which is over 3 pawns difference to evaluation function.

Now a big problem in most real selective forms of search that don't use a simple depth limited iteration depth, which includes also CNS i just posted about, is how to efficient terminate search.

Typically 'brute force' programs are way more efficient last few plies.

In Diep pruning by nullmove i wanted to speedup further in case of big difference between evaluation and beta; let's assume for now that alfa+1 == beta, as we speak in case of Diep about principal variation search.

(the name pvs is overloaded too much to not write it out i'd guess)

Diep is a slow searcher as well. About 200k nps on a fast core, or somewhat above 150k nps on each core i've got over here. Say a 2.3-2.5Ghz core.

So i wrote a fast beancounter program that searches in 32 bits around a 3 million nps, this already at a k7- 2.1Ghz processor. Yes single core of course, this fast beancounter isn't parallel obviously.

Now idea is this:

Nullmove will work great of course, yet it wastes nodes in case we're 3 pawns above beta or more, on average it wastes 7 nodes then which
make up for over 50% of all the positions that Diep searches in a single search, yet it basically wastes from Diep EXPENSIVE nodes to nullmove here. I simply want to prune, yet direct hard fast forward prune as nowadays is so popular in the beancounters that dominate right now, that doesn' t work for Diep of course; So i want basically something a bit less reliable than nullmove, yet nearly as reliable, that in the first place is very quick:

if( depth < 5 && eval >= beta + 3 pawns ) {
// nullmove will handle this
int score = quicksearch(eval,alfa,beta);
if( score >= beta + 3 pawns ) then return beta;
}

Note this is the fundamental idea; of course when it would work it
can get expanded.

It just didn't work at all. Whatever i tried, it didn't work. I still feel
it is an interesting idea. It searched up to 3 ply deeper, especially when
trying it last 5 plies it was really searching a lot deeper.

Combining the 2 different scores proved difficult; also a big problem is the difference in what qsearch picks up - Diep picks up way more in qsearch.

If you search 3 million nps you just don't have the system time to do fancy things in qsearch like i do in Diep.

Now i simply ignored the tactical difference between both approaches and played some games. The difference was so huge, it wasn't even funny. Normal Diep hundreds of elo's stronger than this approach.

For world champs 2005 i really had counted at getting this to work and to outsearch everyone over there. 3 weeks before world champs i had to abort this hard work and the world champs became a disaster for Diep.

Antonio Torrecillas · Thu Mar 08, 2012 10:37 pm

Hi Vincent,

So far I've tested in increasing order of success:
MCTS (random playout), probability B*, and now MCTS (AB playout).

My feeling is that cns probably has the same drawbacks than pb* since it was an attempt to apply cn theory to B*.
but I can't say properly, maybe someday I give it a try.

regards Antonio.

diep · Post by **diep** » Sun Mar 11, 2012 6:33 am

Antonio Torrecillas wrote:Hi Vincent,

So far I've tested in increasing order of success:
MCTS (random playout), probability B*, and now MCTS (AB playout).

My feeling is that cns probably has the same drawbacks than pb* since it was an attempt to apply cn theory to B*.
but I can't say properly, maybe someday I give it a try.

regards Antonio.

A few questions:

How strong is the unconventionals evaluation function and do you use a quiescencesearch?

Why are you focussing upon monte carlo?

Basically i see Monte Carlo using randomness as the weakest form of search. You can see todays deep searching engines such as stockfish more of as a kind of a guided monte carlo search with focus upon mainline checking.

smatovic · Post by **smatovic** » Sun Mar 11, 2012 1:54 pm

Hi Vincent,

i ve read some papers about CNS and CCNS,
these algorihtms look very promising,
especially CCNS with its alphabeta bounds for the leaf node search.

Unfortunately the CCNS is not trivial....will need some time to get into this...

--
Srdja

diep · Post by **diep** » Sun Mar 11, 2012 2:50 pm

smatovic wrote:Hi Vincent,

i ve read some papers about CNS and CCNS,
these algorihtms look very promising,
especially CCNS with its alphabeta bounds for the leaf node search.

Unfortunately the CCNS is not trivial....will need some time to get into this...

--
Srdja

CCNS, especially the parallel CCNS from Ulf Lorenz is not interesting at all.
first of all he did not respond to any email mine at all when i put in fulltime work into CNS, which means he's guilty, as he wrote on paper it's tactical very strong. Yet when i asked details to compare my CNS with what he had achieved, he didn't give any, didn't even respond. That's very bad way to do science. Very disgusting.

Those 2 bounds that were introduced make the algorithm a lot weaker of course as you throw away quality, meanwhile you need to search more nodes in order to get 2 confirmations within the search window, whereas its weakness already is that it is so inefficient "last few plies" as far as you can say that about these algorithms.

That's all overdone therefore.

Additional another thing i find not so interesting is the 2 ply fullwidth searches introduced somewhere. If you simply don't know how to parallellize your software, then it won't help to 'rape' your CNS algorithm by doing 2 ply searches everywhere near the leaves.

Basically CNS is most interesting, yet leave out all the crap that's needing too much of an overhead.

diep · Post by **diep** » Sun Mar 11, 2012 3:09 pm

To put it in perspective, if i use a window like that in Diep, it starts having a factor 6 overhead for everything.

Also a 2 ply fullwidth search is tactical incredible strong, whereas i find it cheating to use that 'building block' to solve simple tactical problems with CNS; for the claim there on how strong it is tactical the extensions in your fullwidth 2 ply search then make all the difference between finding something or not.

And in the first place if you take a look at CNS, you want to know how strong it is in its core strength.

Either your CNS is gonna see something or the algorithm sucks; it's most interesting to take a look at it when it is in its purest form, as the purest form is the interesting framework that shows the concept.

Note that i specifically speak about tactics with respect to CNS. As a normal framework to get a high elo chessengine it's not so interesting;

Basically if you look at top games where both sides play great, at least a number of games, both sides try to maximize directly or indirectly their mobility.

Let's assume now that the objective best moves maximize along the path for both sides the mobility.

This actually is in many positions the best play for both sides.

CNS dramatically fails to search this best line very deep, as it always expands the path with the smallest conspiracy number. So it's really trying to figure out the forced lines real deep.

I feel however that there is 2 domains for research there with respect to CNS.

a) combine todays superselective depthlimited searches with CNS; if the CNS already picks up much of the tactics within the search, maybe you can
prune more / reduce more

This is not an easy challenge. It is fundamental research.

b) now where CNS expands the smallest conspiracy number first, you can very well imagine that we also can reformulate maximizing the mobility as a goal to expand.

So that would be a 'double sides CNS'. One that expands both the smallest conspiracy number as well as maximizes the mobility.

Now obviously you can invent a new name for such a search as well.

c) similar to A: combine depth limited alfabeta with CNS

We all realize this is years of work to do well, simply because the classic searching brute force software has been algorithmic optimized and enhanced so much. Most interesting i would find B, as i realize all too
well that for a higher 'elorating' B is most interesting; A and C won't top the charts there i'd guess, but that's a blindfolded guess, as maybe someone finds a genius way to combine CNS with other techniques.

It's too much for a PHD thesis for example to do this; usually the PHD students aren't strong enough in game tree search - it takes 10+ years to understand something about game tree search - especially because fundamentals about it, especially how to think and apply logics within search, you have to get through tons of desinformation first; not seldom
deliberate written down wrong in the forums.

smatovic · Post by **smatovic** » Sun Mar 11, 2012 3:13 pm

Additional another thing i find not so interesting is the 2 ply fullwidth searches introduced somewhere. If you simply don't know how to parallellize your software, then it won't help to 'rape' your CNS algorithm by doing 2 ply searches everywhere near the leaves.

I agree.

Basically CNS is most interesting, yet leave out all the crap that's needing too much of an overhead.

But how do you handle a 16 bit Score value with CNS?

I could store the C-Numbers for 8 bit Scores at each expanded node, but i am not sure if this implementation is too rough-grained.

--
Srdja

MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?

Re: MCTS without random playout?