A new algorithm accounting for the uncertainty in eval funcs

tholden · Post by **tholden** » Wed Nov 12, 2014 9:49 pm

Dear all,

The document below details a new algorithm for solving games such as chess which I'd love to persuade someone to implement. Even if you've no intention of implementing it, I would still love to hear your comments. I'm not naive enough to think it will beat cutting edge algorithms, but it will at least play differently, arguably in a more "human-like" manner, in which there is considerable value.

Before you dismiss me as another nut-job, let me say by way of defence that I am a professional academic economist, so understand well issues around decisions under uncertainty. I also used to be a professional computer games programmer for EA, so I have some experience of practical implementation issues. I have implemented basic alpha-beta in the past, so I'm aware of the "standard" approach to these problems as well.

http://www.tholden.org/wp-content/uploa ... -Chess.pdf

Any comments would be much appreciated,

Best regards,

Tom

PK · Post by PK » Wed Nov 12, 2014 11:15 pm

It seems that You think along the same lines as the inventors of Monte Carlo Tree Search. This algorithm is currently used in all the major programs playing the Asian game known under the name of go (weichi, baduk). The game is notoriously hard to evaluate, so the programs basically play a bunch of random games, and the aggregate score replaces evaluation function output. Then the similarities start: once a move is tried sufficiently often, it becomes a node, and node value is calculated by a function taking into account scores of child nodes and move frequency. You will understand the equations much better than I do. In chess, however, such attempts did not work.

hgm · Post by **hgm** » Wed Nov 12, 2014 11:30 pm

Well, as you already remark upon, as soon as you make the value of a node not only dependent on that of the best move, but start to weight in the existense of moves that are close in score, you would have to actually find an exact score for those moves, rather than just an upper limit (as lapha-beta does). That can be very expensive.

I once tried something like this in micro-Max, giving 10cP bonus to a node where there is a second move closer than 10cP from the first. This means you always have to lower alpha by 10cP, so recognize moves that fail low by lettle enough that the bonus might bring them in window. And that even holds when you increase alpha to the score of the best move, as long as you haven't found a second move that is close in that node yet.

It wasn't competitive at all to the normal search. I did not try to judge if their was a nearby move with a strongly-reduced search, though. Modern engines do this for determining if a move is singular. (They do search it deeper than, rather than penalizing it in score, though.) And it does not seem to hurt them.

Another thing that worried me is the dependency of moves. Very often two moves have exactly the same score. But that offers only fake security against a score drop, because it turns out they derive that score from the same leaf, reached through transposition of moves. And such dependencies are very hard to detect.

tholden · Post by **tholden** » Wed Nov 12, 2014 11:59 pm

I am aware of the Monte-Carlo tree search approach. What I'm suggesting is somewhere in between that approach and alpha-beta. Still using evaluation functions for guidance, rather than playing to maximum depth, but using a random (though not uniform) search.

tholden · Post by **tholden** » Thu Nov 13, 2014 12:07 am

The concern about it turning into a brute force search of the entire game tree is addressed in the doc. This may be avoided since there is no need to evaluate all nodes to the same depth. The sparsity of the search could easily be tuned, by tweaking the exploration probabilities, so either more or less sparsity than alpha-beta is achievable, though it is unlikely to match the efficiency of alpha-beta in terms of performing the "right" evaluations. Where it will perform better though is in the mapping from evaluation scores to the move chosen.

Not sure I see why having multiple sequences of moves leading to the same state need be problematic. At the cost of some loss of efficiency, you can just treat them as different states. Or you can use a hash-table for the tree-nodes to detect the identical state, and then work with a game lattice rather than a game tree.

kinderchocolate · Post by **kinderchocolate** » Thu Nov 13, 2014 2:30 am

Your algorithm sounds like a standard lattice algorithm. The maths is easy but I'm not very sure in the document how you'd derive those probability. Say if I give you an arbitrary position, how would you use the algorithm to derive the probabilities and make a move?

AlvaroBegue · Post by **AlvaroBegue** » Thu Nov 13, 2014 2:45 am

Eric Baum and Warren Smith have some old papers about handling uncertainty in scores, but I am not optimistic that the whole enterprise makes sense. For the most part the only job of the evaluation function is estimating the expected value of the utility of winning, or any quantity that respects the order.

In the context of MCTS, when using heuristics to initialize the statistics in a new node in the tree, you may want to know an estimate of the expected value and an idea of how certain we are, so you can properly combine the heuristic and the evidence you obtain from running playouts. That's the only context in which I think it makes sense to have an estimate of uncertainty in the evaluation.

tholden · Post by **tholden** » Thu Nov 13, 2014 1:28 pm

Not sure what a "standard lattice algorithm" is. Link? The google results for "lattice algorithm chess" did not seem to be at all related.

The algorithm for derive probabilities was described in the doc. You explore the tree in a random fashion, with probabilities proportional to probabilities of weak dominance. When new leaves are evaluated for the first time, you use the evaluation function to get probabilities. Probabilities are then updated up the tree using the given formula.

Could you be more specific in your question?

tholden · Post by **tholden** » Thu Nov 13, 2014 1:40 pm

Why don't you think "the whole enterprise makes sense"? The distribution of the expectation of the maximum of a collection of noisy random variables is very different to the maximum of their expectations. This is the crux of the problem with the standard approach. The result of an evaluation function deep in the tree is highly noisy, and if game play ever arrived at that point, we would know a much more precise estimate of their true expected value. Failing to take the uncertainty into account is just mathematically wrong. (Though it certainly seems like it's an approximation which has done OK of course.)

Your comment about MCTS seems to go some way towards what I'm doing. But what I'm proposing is certainly a long way from standard MCTS.

tholden · Post by **tholden** » Thu Nov 13, 2014 2:04 pm

Thank you for pointing me to those Baum and Smith papers by the way. They are certainly pursuing a similar idea. In particular the approximating assumptions (a) and (b) in the 1997 AI paper are precisely the ones I made.

Where we differ are the following points:

* They approximate the distribution of a conventional evaluation function. This makes their life much harder. I on the other hand use the realisation that a "perfect" evaluation function would only ever return win, lose or draw, enabling me to work just with three probabilities, rather than entire distributions over the real line. This greatly simplifies computations.
* I propose a similar search heuristic for which child to expand. Their heuristic is interesting, and it would be nice to see performance comparisons, but my hunch is that the cost of the extra complexity of their leaf expansion decision will not make up for its increased efficiency, particularly since it appears to make parallel versions of the algorithm much harder to implement.

A new algorithm accounting for the uncertainty in eval funcs

A new algorithm accounting for the uncertainty in eval funcs

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f