Chess with incomplete information

Daniel Shawul · Post by **Daniel Shawul** » Tue Dec 24, 2013 6:59 pm

The really difficult problem here is to find an equivalent of alpha-beta pruning. The algorithm for optimizing the bidding probabilities, and calculating the expectation value of the capture from it, assumes exact scores are known for all of the outcomes. It would be very inconvenient if this would force you to search all of the possible outcomes with open window.

You can use alpha-beta and its enhancements in the non-capture part of the search but not at the ply we are bidding for piece removal. There a player tries to maximize its average score, so you can think of it as an MAX-AVG algorithm, elsewhere it is MAX-MIN. The MIN player is absent with the previous because the opponent doesn't know of what move we (MAX) made to be able to minimize it. So if in the payoff matrix MAX is the row and MIN col, MAX will choose the row that maximizes its payoff for all the possible biddings that MIN can make. We can't use alpha-beta here because we need real scores for all the bids that MIN makes. The key here is to model opponent's bidding strategy, so that we can use a better weighted average than the 'random bidder'.

AlvaroBegue · Post by **AlvaroBegue** » Tue Dec 24, 2013 7:27 pm

Three thoughts:

1) I believe a single Nash equilibrium is guaranteed to exist because it's a two-player zero-sum game. If you have already computed the payout matrix, finding the Nash equilibrium is a linear programing problem.

2) Perhaps it would be easier to come up with an alpha-beta algorithm that uses only zero-window searches. If you use zero-window searches to prove bounds for the entries of the payout matrix, sometimes you get other bounds for free (e.g., when the result of a search is above beta after you bid some amount i and the opponent bids some amount j and you win the auction, the same bound is true in situations where the opponent bids more than j but not enough to win the auction). I don't have a good suggestion for what searches to perform, but I am thinking of something that iteratively narrows things down, like MTD(f). Perhaps in the process of trying to compute bounds for the score based on the known bounds for the entries in the payoff matrix can tell you something about what entry you should really know more about, but I need to think about it some more.

3) Using MCTS indeed sounds more straight-forward to implement. I haven't read the article that Daniel Shawul posted yet, but my intuition is that any randomized multi-armed bandit algorithm that tends to sample the moves with the highest success ratio most of the time will naturally converge to the Nash equilibrium. I don't think anyone has managed to make a reasonable chess engine using MCTS, but this game is different enough that it is worth a try.

Daniel Shawul · Post by **Daniel Shawul** » Tue Dec 24, 2013 10:33 pm

I dug more into this and here are some observations

a) Michael Buro's solution of the game known as 'oshi-zumo' game, which is exactly the same as the Chess-2 game except that this one is even simpler with only coin bidding to drive a sumo-wrestler off the edge. There is no other moves (non-captures) to cloud the crux of the matter, only bidding with many coins at the start. He used linear programing to solve the game for specific configuration. Probably not practical to use LP, but very interesting nonetheless. You may not be able to access the 'solving the oshi-zumo game' paper directly.

b) MCTS is probably an overkill for this particular game where the number of coins <= 3 and also captures are very rare compared to non-caps. A progressive widening approach where many bidding combinations are added with iterative depth is probably a more viable approach, because most of the tree does not have simultaneous moves. Alpha-beta algorithms exist for simultaneous action games (real time combat) that work pretty well in practice. First we assume one player will reveal its move to the other, which is obviously wrong, but the hope is that the alpha-beta search will be deep enough to compensate for it. To even-out the advantage of the first player revealing its move to the other all the time, we can randomize color selection. The Randomized alpha-beta search (RAB) does that . So a simple approach is to randomize the color at even depths, and switch it at odd depths. Source is here

Daniel Shawul · Post by **Daniel Shawul** » Wed Dec 25, 2013 5:56 am

AlvaroBegue wrote:Three thoughts:

1) I believe a single Nash equilibrium is guaranteed to exist because it's a two-player zero-sum game. If you have already computed the payout matrix, finding the Nash equilibrium is a linear programing problem.

Yes. The maximum number of pieces is luckily 3 here, unlike the oshi-zumo game that starts with 50. So the matrix size is of maximum 4x4 when both has 3 coins each, then equations can be pre-computed for 4x4, 4x3 etc to find the mixed strategy nash equilibrium quickly for a given payoffs.

2) Perhaps it would be easier to come up with an alpha-beta algorithm that uses only zero-window searches. If you use zero-window searches to prove bounds for the entries of the payout matrix, sometimes you get other bounds for free (e.g., when the result of a search is above beta after you bid some amount i and the opponent bids some amount j and you win the auction, the same bound is true in situations where the opponent bids more than j but not enough to win the auction). I don't have a good suggestion for what searches to perform, but I am thinking of something that iteratively narrows things down, like MTD(f). Perhaps in the process of trying to compute bounds for the score based on the known bounds for the entries in the payoff matrix can tell you something about what entry you should really know more about, but I need to think about it some more.

Perhaps the following will work. Assume we both have 3 coins and i decide to bid with 3 coins, and opponent bids with 0,1 or 2. The outcome is that he looses the piece for all cases and also looses different amount of coins. The coin is included in the heuristic eval so it is pretty much like any other piece. So if I calculate the payoff for opponent bidding with 0 first, I can use it as an upper bound for 1 and 2, because in all cases it looses the piece and loosing more coins is never going to help him. I don't know if we can use the alpha-beta bounds passed to us when we are about to bid for capturing a piece, but atleast this very restricted form beta-pruning can be used there after which we can re-establish zero-window searches with the non-capture moves. But it needs some thought.

3) Using MCTS indeed sounds more straight-forward to implement. I haven't read the article that Daniel Shawul posted yet, but my intuition is that any randomized multi-armed bandit algorithm that tends to sample the moves with the highest success ratio most of the time will naturally converge to the Nash equilibrium. I don't think anyone has managed to make a reasonable chess engine using MCTS, but this game is different enough that it is worth a try.

One thing to bear in mind is that the current game is played with the mixed strategy nash equilibrium unlike other 2-player games we are used to such as Go, Chess etc where there exists a pure nash equilibrium. I know that MCTS for GO is proven to converge to alpha-beta, but it maybe a little different for simultaneous action games.

The simple pure strategy solution I proposed , MAX-AVG, is not a mixed strategy nash equlibrium but can avoid costly operations when the number of coins is large. If two pure strategies result in the same payoff (i.e. two rows with the same average payoff), we can convert it to a sort of mixed strategy by randomly choosing one of the strategies.

In any case, I have changed my mind about MCTS for Chess-2 even if the coins were more than 3, mainly because captures are rare except in qsearch. The randomized alpha-beta search with switching of sides seems to be easier to implement and more efficient .
_________________

Daniel Shawul · Post by **Daniel Shawul** » Wed Dec 25, 2013 2:04 pm

Perhaps the following will work. Assume we both have 3 coins and i decide to bid with 3 coins, and opponent bids with 0,1 or 2. The outcome is that he looses the piece for all cases and also looses different amount of coins. The coin is included in the heuristic eval so it is pretty much like any other piece. So if I calculate the payoff for opponent bidding with 0 first, I can use it as an upper bound for 1 and 2, because in all cases it looses the piece and loosing more coins is never going to help him. I don't know if we can use the alpha-beta bounds passed to us when we are about to bid for capturing a piece, but atleast this very restricted form beta-pruning can be used there after which we can re-establish zero-window searches with the non-capture moves. But it needs some thought.

Similarly we can adjust the alpha-bound when our opponent is bidding with larger number of coins than us. This led me to question if calculating payoff values is even necessary for clearly inferior cases. It looks like for any number of coins we bid with, there are exactly 3 critical cases that we need to consider, namely, us outbidding opponent by +1, equaling it 0, or not loosing any coins (bid by 0). So for our 4x4 matrix with 3 coins each

Code: Select all

  0 1 2 3 
0|X|X|-|-| 
1|X|X|X|-| 
2|X|-|X|X| 
2|X|-|-|X|

So we saved 6 payoff computations because they are inferior to other moves. Maybe we can set the dashes to -inf? It may even be possible to calculate the nash equilibrium directly for a sparse matrix.
Edit1:
If it is absolutely necessary we calculate some value for the dashes, we can set it to the critical value minus value of coins lost. So for the first raw, the two dashes where opponent bids by 2 and 3 get payoff(0,1) - coin_val and payoff(0,1) - 2 * coin_val. This will not be exact because loosing just one more coin may lead to a loss of piece that is otherwise avoidable, but in light of the possible savings in computation it maybe worth a try.

Edit2: If alpha-beta is to be used for calculating payoffs for the dashes, it seems a tighter upper/lower bound can be used that lowers/raises by the value of coins lost. For example, once we calculate payoff(3,0), we can set beta of payoff(3,1) calculation to payoff(3,0) - coin_val instead of payoff(3,0) which was what I suggested before.

Daniel Shawul · Post by **Daniel Shawul** » Wed Dec 25, 2013 3:12 pm

This is getting ridiculous. It seems it is possible to further reduce the number of critical payoff computations because a player doesn't have to outbid the opponent to retain its piece. We can consider two different cases:

a) Black to move (i.e. to capture) and white bidding to save its threatened piece:

Code: Select all

  0 1 2 3 
0|X|X|-|-| 
1|X|-|X|-| 
2|X|-|-|X| 
3|X|-|-|-|

White doesn't have to outbid but just equal opponent's bid.

b) White to move (i.e. to capture) and black bidding to save it's threatened piece

Code: Select all

  0 1 2 3 
0|X|-|-|-| 
1|X|X|-|-| 
2|X|-|X|-| 
3|X|-|-|X|

White must outbid black to capture his piece.

So in both cases, the number of critical payoff computations is decreased by 3 to bring down to 7 from the original 16.

rjgibert · Post by **rjgibert** » Thu Dec 26, 2013 3:44 am

One thing I noticed about searching in this game: Any fail low you get for a position with an n:m coin ratio will also always produce a fail low for coin ratios of n-i:m or n:m+j or n-i:m+j for positive i and j. The "inverse" of this observation is also true of fail highs.

This leads to...

There might be an improved way of implementing "null move" pruning. Instead of giving up turn to move, you can give up a coin or add a coin to the other side instead. Might work or might not work. If it works, then an advantage is, it avoids zugzwang problems of null move.

Daniel Shawul · Post by **Daniel Shawul** » Thu Dec 26, 2013 4:01 am

rjgibert wrote:One thing I noticed about searching in this game: Any fail low you get for a position with an n:m coin ratio will also always produce a fail low for coin ratios of n-i:m or n:m+j or n-i:m+j for positive i and j. The "inverse" of this observation is also true of fail highs.

This leads to...

There might be an improved way of implementing "null move" pruning. Instead of giving up turn to move, you can give up a coin or add a coin to the other side instead. Might work or might not work. If it works, then an advantage is, it avoids zugzwang problems of null move.

Well that is obvious because coin values are added to the heuristic eval, so standard null move without modification taks care of it. In all your cases given payoff with for coins (n,m), the rest of the cases (n-i,m), (n,m+j), (n-i,m+j) give the first player lower number of coins for positive i,j so the (n,m) payoff is an upper bound to all.

Daniel Shawul · Post by **Daniel Shawul** » Thu Dec 26, 2013 4:07 am

The bidding takes place 'double blind': both sides make their bid without the other seeing it, then the bids are revealed, and if the bids happen to be equal, or the piece's owner bids more, the piece stays. Otherwise it is removed. Both sides lose all the coins they have been bidding, also the side that lost. The players are aware of the number of coins they each have before the bidding starts.

There is an ambiguous case of both bidding with (0,0), in which the current description suggests that the piece stays since the bids are equal. But this means once the players run out of coins, no piece can be captured which is ridiculous. Hence we have to define it so that (0,0) is considered as 'no bid took place' to allow the piece to be captured. Oshu-zumo actually terminates the game in this case.

What led me to this is my further attempt to further reduce the number of critical payoff computations. If the two players bid with equal amounts (1,1) or (2,2) or (3,3), are the results the same? Since (3,3) allows for longer bidding sequence, and assuming there are enough captures exist, then the answer is no. But if (0,0) was considered a valid bid, they should be equal in which case I would have been able to save 2 more payoff computations...

Well this suggests (0,0) has to be considered as a 'no-bid' for a sensible game.

Daniel Shawul · Post by **Daniel Shawul** » Thu Dec 26, 2013 4:59 am

With this modified rule, i.e. Bid(0,0) considered as a no-bid that allows the piece to be captured, we can save one payoff calculation, hooray! But unfortunately, we will give it back in the second case when black is bidding to save his piece.

Black to capture:

Code: Select all

  0 1 2 3 
0|X|-|-|-| 
1|X|-|X|-| 
2|X|-|-|X| 
3|X|-|-|-|

Bid(0,1) now is non-critical because (0,0) already allows for the capture to go on.

White to capture:

Code: Select all

 0 1 2 3 
0|X|X|-|-| 
1|X|X|-|-| 
2|X|-|X|-| 
3|X|-|-|X|

In any case for N number of coins each, we have reduced cost of payoff computation from O(N^2) to O(N), which makes the game feasible for any number of coins. We can calculate approximate nash equilibrium, by calculating payoffs of the non-critical cells (dashes) from the critical ones by reducing/adding coin values. I wonder if we would be able to get probability ranges if we use bounded payoffs instead of exact ones e.g. Bid(0,2) has a payoff of atmost 10 instead of exactly 10...

Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information

Re: Chess with incomplete information