Different eval for white/black

syzygy · Post by **syzygy** » Tue Jan 06, 2015 7:49 pm

hgm wrote:
syzygy wrote:Exactly. So your opponent might now have a line of play that you are blind to. So you're not going to do well against all strategies anymore, which means you're not going to do well against (objectively) optimal play.
This doesn't necessarily follow. And the whole point is that you will not need to do well against objectively optimal play, because you will be faced with the silly moves you can seduce your opponent to make, and want to exploit those.

The proposed algorithm not only assumes that you know how the opponent evaluates positions, but also that the opponent knows how you evaluate positions.

If you really want to seduce the opponent into falling into a known weakness it has, you would have to let it minimax based only on its own evaluation upto the depth you expect it to reach. If you do that for all positions in the tree, you know the opponent's strategy. You can then find the path that optimises your own evaluation. But this isn't going to work as it is too computationally expensive.

In my view, the "right" way to take into account expected opponent weaknesses and strengths is by making the evaluation asymmetric. If your opponent playing as black is bad with knights, then assign white a bonus if the position has many knights.
Most misconceptions of your opponent cannot be efficiently exploited this way. E.g. when he undervalues Archbishops and thinks hey are an equal trade for Knight + Bishop, you have to offer him N + B + P for A to make him surely squander his Archbishop in the illusion it gains him a Pawn. I don't know how any asymmetric eval could encourage you to do that.

Give white a bonus for positions where black is likely to accept an objectively bad trade.

hgm · Post by **hgm** » Tue Jan 06, 2015 8:19 pm

syzygy wrote:The proposed algorithm not only assumes that you know how the opponent evaluates positions, but also that the opponent knows how you evaluate positions.

Indeed, this is true.

If you really want to seduce the opponent into falling into a known weakness it has, you would have to let it minimax based only on its own evaluation upto the depth you expect it to reach. If you do that for all positions in the tree, you know the opponent's strategy. You can then find the path that optimises your own evaluation. But this isn't going to work as it is too computationally expensive.

This is true. In nodes where he is to move you would have to decide his choice based on minimaxing his scores only. So in contrast to what I thought the proposed algorithm was, the value of a node where al move scores are pairs (mine, his) would have to be the pair (max(mine), max(his)) in nodes were I am to move, and the (mine, his) pair of the move with the maximum 'his' where he has the move.

That isn't really more expensive than minimax, except that you now of course have to evaluate every leaf twice. But minimax is too expensive compared to alpha-beta. I am not sure if you cannot rescue part of alpha-beta, however. There must be branches that are irrelevant for both members of the pair. In the degenerate case, where both evals are equal, you can obviously cut just as much as always. One presumes that the more similar the evaluations are, the closer the minimal tree will approach that of simple alpha-beta.

matthewlai · Post by **matthewlai** » Wed Jan 07, 2015 3:51 pm

hgm wrote:
mcostalba wrote:..., they did exactly this: different king safety evaluation according if you were the side to move (at root) or not.
Unless I misunderstood the OP this is not what he proposed at all. In his idea both evaluations are always made in every leaf, irrespective of who has the move in the root, and both propagated towards the root based on the chosen move. It is the side to move at the current ply level that decides which of the two evaluations will be used to determine which move is best.

I wonder how badly this interferes with alpha-beta pruning, however. If side A takes a beta cutoff, his score would be a lower bound, because there are unsearched moves that might score higher. There is no guarantee at all that the opponent (B)'s score associated with that move is a lower bound. One of the unsearched moves might have a better A-score, but a lower B-score (both from A point of view), so that the true B-score of the node would actually be lower.

That's exactly what I meant.

Alpha-beta would probably require some tweaking, but even then it may become non-optimal. For example, instead of passing alpha as beta to the next ply, we can pass the opponent's score associated with the position that generated the alpha, or something to that effect. This is giving me a headache just thinking about it, though.

matthewlai · Post by **matthewlai** » Wed Jan 07, 2015 3:57 pm

sje wrote:
bob wrote:This should be doable. But it represents a LOT of work for a minimal gain. For example, when you attend the next WCCC with a "tuned program" all you can tune against is existing versions. But most participants will show up with code that has some new ideas included, which means you are now playing against a different opponent than the one you tuned against. Against static programs it would certainly work, but it does represent an enormous computational cost.
GM level human players do this by preparing opening repertoires targeted to opponents based on prior opponent play. This once provided a big tactical advantage in those years long past when most games went twenty to thirty ply deep with prepared opening moves on both sides. Today with huge opening databases which force diverse play, the idea doesn't work so well.

Having different positional evaluations or search customizations based on opponent modeling sounds like a big bug magnet; tough to implement and tougher to debug. Also, it would be very difficult to accurately measure results against human players -- how do you expect any human to play a thousand or more games needed for testing?

It would definitely be a big bug magnet, but then, so were most big ideas when first introduced.

There would need to be a faster tuning method than gameplay.

For example, it may be possible to get the player's past games, pick say 1000 positions where it's the player's move, and the move she decides on.

Then, we can filter out positions where the move is tactical (if one move scores more than 50cp higher than the second best).

Then we just have to tune an eval that, when combined with search, will generate those same moves as much as possible.

This way, no more than 100 or so past games are required, and most established players do have more than that many games publicly available.

matthewlai · Post by **matthewlai** » Wed Jan 07, 2015 4:13 pm

syzygy wrote: The proposed algorithm not only assumes that you know how the opponent evaluates positions, but also that the opponent knows how you evaluate positions.

That is true, but if you use the same eval, you are not only assuming the opponent knows how you evaluate positions, but also that she evaluates positions the same way as you.

I would say the assumptions in the proposed algorithm are more valid.

syzygy · Post by **syzygy** » Wed Jan 07, 2015 6:07 pm

matthewlai wrote:
syzygy wrote: The proposed algorithm not only assumes that you know how the opponent evaluates positions, but also that the opponent knows how you evaluate positions.
That is true, but if you use the same eval, you are not only assuming the opponent knows how you evaluate positions, but also that she evaluates positions the same way as you.

I would say the assumptions in the proposed algorithm are more valid.

No, regular minimax does not assume that your opponent evalutes positions the same way as you. That really is a misconception. Regular minimax simply optimises the evaluation against all possible counterplay.

Putting your money on the assumption that your opponent is inferior is extremely dangerous under most circumstances. Only when swindling is the only option left could it be of help.

sje · Post by **sje** » Wed Jan 07, 2015 7:12 pm

An exceptional case where opponent modeling is simple and does work is when:

1. The opponent always selects a move wholly randomly,

And

2. The program's objective is to mate as quickly as possible

Then

3. The program's optimal strategy is to play as recklessly aggressively as possible while allowing the random moving opponent a reasonable number of moves so that an optimal response is not likely to be made.

A variant of this approach is seen in simultaneous exhibitions where weaker players are rapidly eliminated with aggressive but slightly unsound play thus allowing more time to hammer on the stronger challengers.

syzygy · Post by **syzygy** » Thu Jan 08, 2015 1:19 am

syzygy wrote:No, regular minimax does not assume that your opponent evalutes positions the same way as you. That really is a misconception. Regular minimax simply optimises the evaluation against all possible counterplay.

Putting your money on the assumption that your opponent is inferior is extremely dangerous under most circumstances. Only when swindling is the only option left could it be of help.

In this paper the same (in my view misguided) view is taken:

While human players adjust their playing strategy according to their opponent, computer programs, which are based on the minimax algorithm, use the same playing strategy against a novice as against an expert. This is due to the assumption of minimax that the opponent uses the same strategy as the player.

Wrong. That computer programs don't adjust their play to the opponent is for the simple reason that they use the same search engine against all opponents... Same search, same evaluation. Nothing to do with any assumption that the opponent uses the "same strategy".

That minimax is symmetric merely reflects the fact that it deals with zero-sum games. Chess is a zero-sum game. Minimax is therefore the correct approach for chess. In my maybe not so humble opinion, the approach of the paper's authors is simply flawed.

The paper contains a rather painful statement:

Tic-tac-toe on a 3 x 3 board. There is a simple winning strategy for this game. (...)

The most interesting part is the adaptation of alpha-beta pruning. Regular alpha-beta relies on the evaluation for white being the opposite of the evaluation for black. This does not exclude asymmetric evaluations; it just means that if v is the position's value from white's point of view and w is the position's value from black's point of view, then v + w = 0. If v +w <> 0 in general, then some pruning can still be done provided that a bound on |v + w| is known.

Different eval for white/black

Re: Different eval for white/black

Re: Different eval for white/black

Re: Different eval for white/black

Re: Opening preperation

Re: Different eval for white/black

Re: Different eval for white/black

An exceptional case

Re: Different eval for white/black