Why minimax is fundamentally flawed

hgm · Post by **hgm** » Sun Nov 09, 2014 8:09 pm

A known flaw of minimax is that it is in no hurry to achieve anything, with as a result that it might not achieve anything at all, but keeps postponing the gain it sees indefinitely. This is of course well known in the case of checkmates, and many people for this reason do not evaluate mate with the game-theoretical maximum score, but make the score depend on the distance from the root. By this kludge minimax will go for the fastest mate.

The problem is that most moves in a Chess game must be made without a forced mate being within the horizon, and the mate-score kludge does of course nothing for gains other than checkmates. So it can still happen that some necessary intermediate objective on the path to a win, such as a promotion, is postponed forever.

I have hardly ever seen a clearer example of this than in the following game from the CSVN tourney yesterday:

[pgn][Event "ICS Rated standard match"]
[Site "winboard.nl"]
[Date "2014.11.09"]
[Round "-"]
[White "Tornado"]
[Black "Rookie"]
[Result "1/2-1/2"]
[WhiteElo "1897"]
[BlackElo "1787"]

1. e4 e5 2. Nf3 Nc6 3. Bc4 Nf6 4. d3 Bc5 5. c3 a5 6. d4 exd4 7. cxd4 Bb4+
8. Kf1 d5 9. exd5 Nxd5 10. a3 Be7 11. Nc3 Be6 12. Bb3 Qd7 13. Kg1 O-O-O 14.
Ba4 Nb6 15. Bb5 Bg4 16. h3 Bxf3 17. Qxf3 Qe6 18. Bxc6 Qxc6 19. Qg4+ Kb8 20.
Be3 Nd5 21. Bd2 Rhe8 22. Rc1 Nxc3 23. Rxc3 Qb6 24. Bf4 Bd6 25. Bxd6 Qxd6
26. Re3 g6 27. Qe2 Rxe3 28. fxe3 c5 29. dxc5 Qxc5 30. Kh2 Qe5+ 31. g3 h5
32. Kg2 Qe4+ 33. Kh2 h4 34. gxh4 Rd3 35. Re1 Rb3 36. Qf2 Qe5+ 37. Kg2 Rxb2
38. Re2 Qe4+ 39. Kg3 Rb5 40. Kh2 Qd3 41. Qf4+ Ka7 42. Qd4+ Qxd4 43. exd4
Rb3 44. Re5 Rxa3 45. h5 gxh5 46. Rxh5 Ka6 47. Rh6+ b6 48. h4 Rd3 49. Rd6 a4
50. h5 a3 51. Rd8 Rd2+ 52. Kg3 a2 53. h6 Rd3+ 54. Kg2 Rxd4 55. Ra8+ Kb7 56.
Rxa2 Rh4 57. Rf2 Rxh6 58. Rxf7+ Ka6 59. Rf4 b5 60. Kf3 Re6 61. Rg4 Ka5 62.
Rg1 b4 63. Rb1 Ka4 64. Ra1+ Kb5 65. Rb1 Kc4 66. Rc1+ Kd3 67. Kg3 b3 68. Rc7
Rb6 69. Rd7+ Kc4 70. Rd1 b2 71. Rb1 Kb3 72. Kf4 Kc2 73. Rxb2+ Rxb2 74. Ke4
Rb4+ 75. Ke5 Kd2 76. Kd5 Kd3 77. Ke5 Rb5+ 78. Kf6 Kd4 79. Ke6 Kc4 80. Kd6
Rg5 81. Ke6 Kd3 82. Kf6 Rg8 83. Kf7 Rg5 84. Kf6 Rb5 85. Ke6 Kc3 86. Kd6 Rg5
87. Ke6 Rg7 88. Kf6 Ra7 89. Ke5 Rc7 90. Kd6 Rb7 91. Ke5 Rb5+ 92. Kd6 Rh5
93. Ke6 Kd2 94. Kd6 Rh7 95. Ke5 Rh2 96. Kd4 Rh4+ 97. Ke5 Rh6 98. Kd4 Rh7
99. Ke5 Rh6 100. Kd4 Rc6 101. Ke5 Ke3 102. Kd5 Rh6 103. Ke5 Rc6 104. Kd5
Rb6 105. Kc5 Rh6 106. Kd5 Rf6 107. Ke5 Rg6 108. Kd5 Ra6 109. Ke5 Kd2 110.
Ke4 Rg6 111. Kd4 Rh6 112. Ke4 Rh8 113. Kd4 Rh6 114. Ke4 Rg6 115. Kd4 Rg4+
116. Ke5 Rg8 117. Kd4 Rg7 118. Ke5 Ke3 119. Kf6 Rg4 120. Ke6 Ke4 121. Kd6
Rg6+ 122. Kc5 Rh6 123. Kc4 Rc6+
{Draw by the 50 move rule} 1/2-1/2
[/pgn]

At low search depth Rookie has no problem checkmating in KRK at all. But at this long TC, it cannot find the win, and falls victim to the 50-move rule! As Marcel explained afterwards: Rookie was thinking ~20 ply here, which is not enough to see a forced mate. So its search just goes for the position with the best heuristic evaluation. Which will be something with the strong King in the center, and the bare King pressed against the edge. But such a position can be forced with far fewer moves than the available 20 ply. So there are lots of branches that start by wasting 6 or 8 ply, and then decisively force the 'optimal' position in the remaining moves. And minimax cannot see any difference between the lines that immediately 'talk shop' and those that initially merely waste time. And as the latter are of course much more abundant, the chances that it picks a move that makes progress are very slim indeed. Of course when it would have accidentally reached the 'optimal' position, it would have seen that it was not really optimal at all, but a better position would come in the horizon, (namely the checkmate).

Now there are of course many ways in use to patch this problem. E.g.

* One could use 3-men tablebases. But this masks the problem, rather than solving it. It would work in this specific KRK case, but the same problem can occur in much more complex end-games, for which tablebases are not feasible.

* One can attempt to abuse the 50-move rule to blackmail a program into making progress where it would just fool around on its own, thinking it could always make the progress later. In KRK this will of course not help, because there is no other way to reset the move counter than a forced Rook sac, which makes it draw anyway.

* One could prevent local maxima in the heuristic eval, and require the eval increases monotonically towards the mate position. Then there is an incentive to progress as much as possible, which excludes pointless fooling around. The snag is of course that this basically requires the evaluation to be perfect, and with perfect evaluation search would be pointless to begin with. You would just make the move with the highest evaluation gain, and this would always bring you to the mate, as there would always be a move that increases the evaluation. Search is only needed to bridge dips in the evaluation, to be able to reach a better local optimum from the current one.

So there still is need for something similar to the mate-score kludge for other gains than checkmates. Note that this cannot be the same trick of applying a discount to the heuristic eval based on the distance to the root, in the same way as you discount mate scores based on that distance. Because in a fixed-depth search, for instance, all evaluations would have the same distance to the root, so this would still not distinguish one branch from another. You really want to select the branch where the gain that can be foreseen is reached as early as possible. So you want to discount it with the depth at which the gain actually materialized. For checkmates this is of course the end of the branch, as one never searches beyond checkmate.

So if at the end of the branch there is a certain eval score higher than the eval in the root (meaning that some way to make progress can be seen within the horizon), you would have to discount the score of that branch by the distance (mesasured from the root) of the point where the evaluation reached the value it has in the leaves. Beyond that point it has apparently just been wasting time, not upping the evaluation any further. This is exactly what a 'delayed-loss bonus' does: it adds one eval point to the score propagated towards the root when this score is below the current eval. Thus it counts the number of full-moves between the root and the point where the evaluation reaches (or exceeds) the value it has in the leave. In absence of anything better to do (as visible within the horizon), the program would then take the move that brings it to that eval in the quickest way. Meaning that it will quickly reach it, rather than fooling around thinking it can always go there later. And once it is there, it can see further, and hopefully does see an even more profitable goal, which it otherwise would never get to see. (Like the KRK mate in the example.)

elcabesa · Post by **elcabesa** » Sun Nov 09, 2014 9:27 pm

you can also tell the engine that KRvsk is won and giving higher score to losing king near the edge and positions whare kings are close toghether

mvk · Post by **mvk** » Sun Nov 09, 2014 9:31 pm

elcabesa wrote:you can also tell the engine that KRvsk is won and giving higher score to losing king near the edge and positions whare kings are close toghether

That is what the program does of course. Evaluation is +13.611 here. The problem is a combination of things, as HGM mentions. Take one factor out and there is no problem in this particular end game. The specific case (KRK in Rookie v3) is easy to solve (or mask). The observation would still be valid.

syzygy · Post by **syzygy** » Sun Nov 09, 2014 10:09 pm

I have never seen my engine find a way to make progress at iteration D and then decide at iteration D+d to first wander a bit around.

I have the feeling that Rookie may have a subtle bug that prevents it from finding mate in KRK and who knows whether this bug is also responsible for its delaying tactics.

hgm · Post by **hgm** » Sun Nov 09, 2014 10:30 pm

It is true that iterative deepening to a certain extent protects you against stalling, because the quickest route will be found first, when the depth is still so low that it is only the fasted path that can reach the 'optimum'. But it is not really a reliable method.

One reason to change after it found a way to force the local optimum just at the horizon (the first time it can see it) is that by searching two more ply it will realize it is in zugzwang to step off the optimum again. It won't like this, and will try to find a way to reach the optimum at the new horizon, two ply later. This might very well involve inserting a stalling move as the first ply. Two iterations deeper the same problem occurs, which might be solvable by deleting the stalling move again, but just as likely it could inject another stalling move, at the beginning or somewhere along the branch, leaving the stalling initial move in place.

Uri Blass · Post by **Uri Blass** » Sun Nov 09, 2014 10:33 pm

syzygy wrote:I have never seen my engine find a way to make progress at iteration D and then decide at iteration D+d to first wander a bit around.

I have the feeling that Rookie may have a subtle bug that prevents it from finding mate in KRK and who knows whether this bug is also responsible for its delaying tactics.

I totally agree.

Even if 2 moves have the same score there is no reason for the program to change its mind after finding good move at depth D

I could understand first wandering a bit around if it produce a better score but not if it leads to the same score.

Uri Blass · Post by **Uri Blass** » Sun Nov 09, 2014 10:42 pm

hgm wrote:It is true that iterative deepening to a certain extent protects you against stalling, because the quickest route will be found first, when the depth is still so low that it is only the fasted path that can reach the 'optimum'. But it is not really a reliable method.

One reason to change after it found a way to force the local optimum just at the horizon (the first time it can see it) is that by searching two more ply it will realize it is in zugzwang to step off the optimum again. It won't like this, and will try to find a way to reach the optimum at the new horizon, two ply later. This might very well involve inserting a stalling move as the first ply. Two iterations deeper the same problem occurs, which might be solvable by deleting the stalling move again, but just as likely it could inject another stalling move, at the beginning or somewhere along the branch, leaving the stalling initial move in place.

I could understand if you have only one line that lead to a good score when the score goes down inside that line so you prefer to delay it but it is clearly not the case in KRK

If you do not see the mate then stupid move now should lead to the same score as stupid move in the next move so there is no reason to prefer stupid move now(when you did not choose it first).

I also expect strong programs to see the shortest mate in KRK at long time control by search(stockfish has no problem to see the shortest mate).

mvk · Post by **mvk** » Sun Nov 09, 2014 10:57 pm

syzygy wrote:I have never seen my engine find a way to make progress at iteration D and then decide at iteration D+d to first wander a bit around.

I have the feeling that Rookie may have a subtle bug that prevents it from finding mate in KRK and who knows whether this bug is also responsible for its delaying tactics.

Yes, there is one open issue with mate which I haven't fixed yet and which affects grafting. Fixing that could possibly solve KRK again at long time controls, because then it can "graft" its way over the evaluation's maximum value "plateau". (A plateau which is there in the first place because the mopup heuristic is too simple also: Just an enemy king position table, and a term for the king-to-king distance. Nothing more. The rook position is not scored). So taking one of these two issues away should already solve it.

A mildly interesting thing is that at very short time controls the program usually secures the win. At long time controls it usually doesn't find the win, unless the mate is within the horizon. What is really suspect from my point of view, and might indicate an unknown bug, is that if it can find progress at fast time controls, why would it switch to delaying moves at deeper searches? Iterative deepening normally doesn't behave like that: If there are multiple equivalent paths leading to the same maximum heuristic evaluation, it should stick to the first one it finds, usually at a shallow iteration. The shallow iterations normally make sense (just move the king closer, and drive the opponent king to the edge). [Edit: I missed that HG already gave a plausible explanation for that in KRK above]

Using DTZ also solves it of course. That is my real plan for this. But not for next week.

syzygy · Post by **syzygy** » Mon Nov 10, 2014 7:56 am

hgm wrote:It is true that iterative deepening to a certain extent protects you against stalling, because the quickest route will be found first, when the depth is still so low that it is only the fasted path that can reach the 'optimum'. But it is not really a reliable method.

Theoretically there may be cases where it goes wrong. In practice this seems to occur so rarely that I am confident that any "solution" (replace the minimax principle by somthing else?) would negatively affect playing strength.

Of course for Rookie solving this problem would increase playing strength against programs that do not resign.

Maarten Claessens · Post by **Maarten Claessens** » Mon Nov 10, 2014 9:44 am

I did a little experiment starting with this position:

[d]8/8/8/8/8/2k5/1R6/K7 w - - 0 1

This is a mate in 16. Against Critter 1.6 with gaviota-bases it takes WaDuuttie 24 moves to mate when the search is limited to 7 ply. Any search-depth above 7 results in the mate. With a 6 ply search however progress if found too late and the 50-move rule is hit.

Why minimax is fundamentally flawed

Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed