Chess with incomplete information

hgm · Post by **hgm** » Fri Dec 13, 2013 8:05 pm

Consider the following game (an abstraction of 'Chess 2'):

The board and pieces are those of orthodox Chess, but the players also each have a number of 'coins' in hand. For each piece you capture, you get one extra stone in hand.

The coins can affect the board in the following way: whenever a capture is made, the side that lost the piece can retaliate against the capturer by paying for its removal with coins. For this he will have to outbid the opponent, however, who can pay coins for the survival of the piece.

The bidding takes place 'double blind': both sides make their bid without the other seeing it, then the bids are revealed, and if the bids happen to be equal, or the piece's owner bids more, the piece stays. Otherwise it is removed. Both sides lose all the coins they have been bidding, also the side that lost. The players are aware of the number of coins they each have before the bidding starts.

That's all. What makes this essentially different from normal Chess is the fact that you don't take turns in bidding, but have to fix your own bid before you know what your opponent will be bidding. This makes that you cannot in advance know what the result of your 'move' will be; even if you bid so high that he cannot top you, you don't know with how much coins he will be left. So something else than minimax and alpha-beta will be needed to play this game!

My idea was to replace the minimax search by this:

Code: Select all

Search(int stm, int depth)
{
  xstm = Opponent(stm);
  bestScore = (depth > 0 ? -INF : Evaluate()); // stand pat
  for(ALL_MOVES) {
    if(!IsCapture && depth <= 0) continue; // no non-capts in QS
    myMaxBid = coins[stm];
    hisMaxBid = coins[xstm];
    if(!IsCapture()) myMaxBid = hisMaxBid = 0; // no bidding on non-capt.
    MakeMove(); // board only
    for(myBid = 0; myBid <= myMaxBid; myBid++) {
      for(hisBid = 0; hisBid <= hisMaxBid; hisBid++) {
        coins[stm] -= myBid; coins[xstm -= hisBid;
        outcomeScore[myBid][hisBid] = -Search(xstm); // recursion
        coins[stm] += myBid; coins[xstm += hisBid;
      }
    }
    UnMake();
    score = Optimize(outcomeSore, myMaxBid, hisMaxBid);
    if(score > bestScore) bestScore = score;
  }
  return bestScore;
}

The crux is in the routine Optimize(), which calculate the optimal probabilities for bidding each number of stones for both sides, depending on the search scores of all possible outcomes, as stored in the rectangular array outcomeScores[][]. With these probabilities it can calculate the expectation value of the game state that will materialize after the bidding, and will return that. This is then used as the score for the capture, in a normal minimax way.

Any comments or suggestions?

tpetzke · Post by **tpetzke** » Fri Dec 13, 2013 8:19 pm

At the end it will probably not be a lot different than normal minimax. The actual move list that you have in a given position is expanded. So a PxP capture is no longer a single move it is now a list of moves the have the start, target square and captured piece in common but use a different amount of coins and depending on the numbers the attacker loses material or not.

When you evaluate the leaf nodes you will also have to evaluate the number of coins that you and the opponent still have, like an additional material factor.

So actually your branching factor explodes but the core algorithm doesn't have to change.

Thomas...

Daniel Shawul · Post by **Daniel Shawul** » Fri Dec 13, 2013 8:35 pm

You didn't mention the number of coins each player start with. If it is not many, the brute force way can be the best. But I doubt the coins are few, so a Monte-carlo search is probably better as it is usually the case for other incomplete information games. Another lazy option is to always bid with a number of coins that give an average outcome in a short SEE. This reminds me of a Google competition I participated in. In that game when two players are on a stand off forming walls, and then forwarding a couple of ants to fight with a certain formation simultaneously, one of the players will loose its ants based on the resulting formation which tells which player dominated. The best program that won the competition used a 3-ply alpha-beta which is applied to multiple opponents. I think the situation here with players bidding simultaneously is the same.

Edit: I should also add that in that competition they were deliberately trying to avoid alpha-beta solutions by constraining the time allowed for making moves to 500ms. You just couldn't use brute force search with the thousand ants you and opponents have. One thing we can do here is also to compute a database of patterns and look up. Here for example it is possible to precompute bidding patters and store winning probabilities for 5 vs 5, 6 vs 4 etc. just like a database. Some people did that.

wgarvin · Post by **wgarvin** » Fri Dec 13, 2013 8:48 pm

hgm wrote:The coins can affect the board in the following way: whenever a capture is made, the side that lost the piece can retaliate against the capturer by paying for its removal with coins. For this he will have to outbid the opponent, however, who can pay coins for the survival of the piece.

The bidding takes place 'double blind': both sides make their bid without the other seeing it, then the bids are revealed, and if the bids happen to be equal, or the piece's owner bids more, the piece stays. Otherwise it is removed. Both sides lose all the coins they have been bidding, also the side that lost. The players are aware of the number of coins they each have before the bidding starts.

I love this bidding mechanic by the way (in general--maybe not for a chess variant). I played a board game recently, I think it was Game of Thrones, that had this mechanic. Players accumulated some kind of token during gameplay, and at certain times all players would have to "bid" a certain number of tokens, with the bids revealed simultaneously, to try and win a political role or something like that. Winning the bidding war would give them a short-to-medium-term advantage of some sort over other players but it would also use up tokens, which had other more immediate uses as well. So some strategizing about what to spend them on, when and how much to bid etc. definitely made a difference in that game.

Poker programs might be the example to follow, for the bidding mechanic. But I don't really understand how to combine their MC approach with tree search.

Daniel Shawul · Post by **Daniel Shawul** » Fri Dec 13, 2013 10:10 pm

It sound difficult to write a SEE here because the bidding is done at every stage. A player can capture a defended pawn with his queen QXP and then prevent its queen from being captured by bidding high. The coins and pieces are highly intertwined, so I think a SEE should have the number of coins of both sides. They can't be assigned a value though. Pre-calculating could be of great help here, once we know the types of pieces that will be captured in the exchange process without bidding stopping it. So an example PQP , with 5 coins v 4 coins. The exchanges are QxP,PXQ,BXP. The white player wins a pawn but only if it bid with all its coins, otherwise it is doomed. In any case, probabilistic evaluation of this line should not-encourage it, since there is a risk of loosing a queen with just 1 coin advantage.

Edit: 'Piece value' for coins seems to be very important and should be a part of the SEE outcome. We want to know how many coins we will be left with after the exchange, since a playe with no coins will probably loose the next N captures where N is the number of pieces the opponent have.

hgm · Post by **hgm** » Fri Dec 13, 2013 11:18 pm

I think in Chess 2 players start with 3 coins each. So say they also do that here.

hgm · Post by **hgm** » Sat Dec 14, 2013 10:12 am

wgarvin wrote:I love this bidding mechanic by the way (in general--maybe not for a chess variant). ...

This type of game is an instance of the "Prisoner's Dilemma". Because this might be something well outside the experience of Chess programmers, perhaps I should say some more about this.

In general, in such games, the opponent would always win when he could adapt his choice to yours, if he would heve known it. This rules out that there could be a single deterministic choice that is best. If there was, the opponent would know it too, and adapt his choice accordingly to exploit it. E.g. suppose both players could bid either 0 or 1, with the following pay-off matrix (POV of player A):

Code: Select all

   0  1 <--- Player A
0  2 -3
1 -1  2
|___ Player B

So A receives 2 from B on equal bids, but has to pay 3 to B if he bids 1 and B bids 0, etc. At first glance this seems fair, as 2+2 = 1+3. But that is only of relevance when A and B were random movers that played 0 and 1 with 50% probability. 0 cannot be the single best choice for A, as it is refuted by B playing 1. Likewise 1 cannot be best choice, being refuted by B playing 0. To win on average, A would have to keep B in the dark about his choice. Playing 0 and 1 randomly, 50/50, also is a failing strategy: B would refute it by always playing 0, so that in half the cases he pays 2, but in the other half he receives 3.

But by optimizing the probabilities with which A randomly plays 0 and 1, A can have the upper hand: suppose A plays 1 with probability p (and thus 0 with probability 1-p), and B plays 1 with probability q. Then the expected payoff is

2*(1-p)*(1-q) + 2*p*q - (1-p)*q - 3*p*(1-q)
= 2 - 2*p - 2*q + 2*p*q + 2*p*q - q + p*q - 3*p + 3*p*q
= 2 - 5*p - 3*q + 8*p*q

Now the best strategy for each player must have this payoff independent to small changes in the probability they control, for if it were not, they could change that probability in one direction or the other to improve their result. So the derivatives with respect to p and q must be zero:

-5 + 8*q = 0
-3 + 8*p = 0

or q = 5/8 and p = 3/8. If player A sets p = 3/8, the payoff becomes

2 - 5*(3/8) - 3*q + 8*(3/8)*q
= 2 - 15/8 - 3*q + 3*q
= 16/8 - 15/8 = 1/8

In other words, it is positive (in favor of A), and independent of q (so there is nothing B can do against it; however he sets his probabilities, he will consistently lose). Deeper analysis shows that this is because although 2+2 = 1+3 (irrelevant), 2*2 > 1*3 (relevant).

This example shows how this type of situation should be handled: depending on the payoff matrix, both players will choose to play their possible bids with fixed probabilities to optimize their result. In the Chess-2-like game I proposed here as a model, the payoff matrix itself would have to be obtained by recursively searching the outcome of every possible biddings. Once these are known, the Optimize() routine would do a calculation similar to the one I presented in this post.

(Note that it can happen that the condition of zero derivative cannot be met for a probability in the range [0,1]. In that case the optimum would be an edge extremum, obtained by setting the probability 0 or 1. E.g. when you have 4 coins, and you know the opponent only has 1, you should obviously never bid 3 or 4.)

The Optimize() routine finally tells you the expectation value of the scores in the payoff matrix. To minimax it against the alternative moves, you would have to compare such expectation values against absolutely deterministic scores, or in general against expectation values with another probability distribution. This also is non-trivial. E.g. when you are a minor ahead, you would prefer certain loss of a Pawn over a 50-50 probability of losing or gaining a Queen (which on average is neutral). While when you were a minor behind you would actually prefer that. In both cases the gamble to gain/lose a Queen gives you a 50% chance to win the game, but in the first case it is compared against a certain win, while in the second case against an even more certain loss. I think this problem would go away if you used winning probabilities as scores everywhere, because the expectation value of stochastically picked winning probabilities is still a pure winning probability, indistinguishable from any other winning probability of the same value. (If you ignore draws, that is, so that the probability implies the distribution of the two possible outcomes, win or lose.) Not sure how to factor in draws, however. But I guess that would be a general problem in Chess engines, even in orthodox Chess.

hgm · Post by **hgm** » Sat Dec 14, 2013 10:42 am

Daniel Shawul wrote:'Piece value' for coins seems to be very important and should be a part of the SEE outcome. We want to know how many coins we will be left with after the exchange, since a playe with no coins will probably loose the next N captures where N is the number of pieces the opponent have.

Indeed, I expect the coins would have appreciable value. Probably more than a Pawn. I never actually played this game, but I guess you would try to reserve them to render the opponent's most powerful pieces powerless. If you have the majority of coins, he could never capture anything with his Queen without being bought away. It seems a waste to spend that advantage on removing a minor; a Queen that cannot capture devaluates much more than a the value of minor.

hgm · Post by **hgm** » Tue Dec 24, 2013 2:42 pm

The really difficult problem here is to find an equivalent of alpha-beta pruning. The algorithm for optimizing the bidding probabilities, and calculating the expectation value of the capture from it, assumes exact scores are known for all of the outcomes. It would be very inconvenient if this would force you to search all of the possible outcomes with open window.

Of course when all outcomes are lower bounds, you can apply the algorithm to them as if they were exact scores, and the expectation value it will come up with will be the lower bound of the capture. And even when some outcomes have the opposite bound, this would not matter if you can afford to play them with zero probability. Just as in alpha-beta you don't have to choose the best move, but are happy with any move that gets above beta, you can also afford to play the bidding sub-optimally (with wrong probabilities). This can save you effort, e.g. if you could afford to sub-optimally set some probabilities to zero. Then you would not have to search the corresponding outcomes at all. Clevely exploiting this is the equivalent of clever move ordering for the move choice.

For instance, if you can bid 0, 1 or 2, and the opponent only 0 or 1. If bidding 1 with 100% probability (e.g. to ensure your capturing Bishop survives, because he cannot outbid you), only two of the six possible outcomes can materialize, and the worst of case those would obviously be if the opponent bid 0 (as he loses the bidding anyway, and wants to have as many coins left as possible). So if you only search the (1,0) outcome, a score above beta would be enough to cause the cutoff. In other cases you might not care if the capturing piece is bought away (e.g. when you play PxR, the gain of the rook might already guarantee a cutoff even if the P is bought away for merely 1 coin). So then you would bid 0 with 100% probability, and only have to search the (0,1) and (0,0) outcomes to see if it is better for the opponent to spend a coin for buying away the Pawn, or keep the coin and endure the Pawn. If both these outcomes would score above beta, there would be no need to search any of the outcomes for when you bid 1 or 2.

Counting on the (1,0) outcome alone to give a score above beta is not sufficient reason to search it with the window at beta. Because the search could fail low. And that does not imply the capture would fail low with optimal bidding. You would have to improve the bidding by involving other bids, to force the opponent away from the 0-bid that is obviously best against a 1-bid. (0,0) is obviously better (because you don't spend the coin, and the piece still survives), and by bidding it with finite probability you increase the expectation against a 0-bid (which, again, forces the opponent to also use a 1-bid). Suppose a coin is worth 100, it could very well be that you finally get the cutoff with this capture by having the outcome of (1,0) score >= beta-50, and that of (0,0) score >= beta+50, and play them with 50% probability each. But how would you know in advance that you would have to search the (1,0) outcome with a window at beta-50? A higher window could have made the search fail low, and would not reveal the required lower bound.

So it seems essential to have some smart logic to analyze the situation (how much you are above beta, and how much you stand to lose if you lose the bidding), and decide based on that what likely is the cheapest way to achieve a cutoff, in terms of biddings to search, and windows to use in this search.

Daniel Shawul · Post by **Daniel Shawul** » Tue Dec 24, 2013 6:05 pm

I don't think it is a case of the 'prisoner's dilemma' because the current game is a zero-sum game. Players can not get a better overall payoff by deviating from the nash equilibrium, since their gains add up to zero.

Nonethless the payoff matrix can be used to make a rational decision after calculating payoffs for all moves that the opponent can make. For simplicity consider only a one-capture tree where all the sub-trees onwards do not have a capture meaning perfect information to determine a heuristic eval. Further captures in the sub-trees only add to the uncertainity in the heuristic eval which is already uncertain anyway. Then you will have one payoff matrix for all the possible bidding that has the heuristic evals in each square. Then assuming a 'random bidder' for the opponent model, we can just pick the move that leads to the highest average eval. Depending on the value of the piece at stake and the number of coins I and my opponent have, a better bidding strategy can be devised.

For the MCTS approach, which is the more practical approach and easier to implement, it seems that a nash equilbrium may exist according to this paper.

Chess with incomplete information

Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information