Ways to avoid "Draw Death" in Computer Chess

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Sun Jul 30, 2017 6:03 pm

Uri Blass wrote:
Lyudmil Tsvetkov wrote:
hgm wrote:
Lyudmil Tsvetkov wrote:I understand the worst-mover is the engine that always picks the move with worst eval(-1200cps or so).
This will not work very well. Scores are bad when the opponent has a single line of play that makes them bad. But against an opponent that cannot recognize this line, the chances that they will end up as bad as you thought they were are very slim.

To quickly beat a random mover you should not minimize the score of his best move, but the average score of all his moves. Minimax would be no good.

And when the opponent is not a random mover, but actively tries to lose as well, playing the move that would give you the lowest score against an opponent that is trying to win would probably be fatal. That opponent will never do the moves you counted on when obtaining the score on the basis of which you selected your move. The worst-mover would only worry about the opponent's best moves. Which will be avoided like the plague. So that basically the worst-mover is totally blind for everything that can happen.

I probably would not have a problem at all to make a worst-mover score 100% against me. Uri already pointed out the strategy.
chances that a random-mover will pick some wins from better positions are of course much greater than chances it will pick wins from worse positions.

so, naturally, a worst-mover is simply the engine that picks all the worst moves, while the best-mover the perfect engine, picking all the best moves.

I don't see how any different strategy from choosing the lowest eval score could lead to better results.

of course, SF with such conditions is the best worst-mover, as it will reach big depths, modifying a weaker engine like that would not be very useful.
If I want to lose against random mover then the best strategy is to force the random mover to win.

If I have a big material advantage it is easy to force the opponent to win.

If you change the sign of SF evaluation to make the best move the worst move then SF does not have a good evaluation for that type of game because basically winning material if you do not win too much is good to force the opponent to mate later and SF is not going to try to win material in order to get a position that it can force the opponent to mate.

Evaluation that say that winning material is good except the last 2 connected pawns is probably good when you should also evaluate the distance to the following structure of KPP vs K that allow forcing mate and if you get closer to the structure it is also good.

[D]8/8/8/8/QR6/2kp4/2p5/2K5 b - - 2 1

this is not a stalemate proper, the 'stalemated' side still has a single move left.

rather, it looks to me as an anti-zugzwang(gosh, with what kind of terms one has to come up with here...)

in any case, such artificial positions are achievable by the stronger side in maybe one in every couple of trillion positions, so basically lack any real effect on the game.

picking the worst eval move, on the other hand, should clearly lead to positions that are worst and thus give much higher chances to the very weak engine to still blind-find an appropriate winning move.

hgm · Post by **hgm** » Sun Jul 30, 2017 6:41 pm

It is not supposed to be a stalemate. It is an example of how you force a worst-mover to checkmate you, after he stupidly allowed you to strip him of all material except a few Pawns (which you intentionally left him).

And every game of a program that is really clever in trying to lose against your 'Stockfish worst mover' would end similar to that. Not one in a trillion, but a trillion in a trillion. The worst mover is really extremely inept in losing.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Sun Jul 30, 2017 9:15 pm

hgm wrote:It is not supposed to be a stalemate. It is an example of how you force a worst-mover to checkmate you, after he stupidly allowed you to strip him of all material except a few Pawns (which you intentionally left him).

And every game of a program that is really clever in trying to lose against your 'Stockfish worst mover' would end similar to that. Not one in a trillion, but a trillion in a trillion. The worst mover is really extremely inept in losing.

anyone ever built a SF worst-mover to check in practice?

hgm · Post by **hgm** » Sun Jul 30, 2017 10:47 pm

You are the one that introduced it in the discussion, so you should know.

leavenfish · Post by **leavenfish** » Mon Jul 31, 2017 5:09 am

Ummm....outlaw engines?
Hey, if chess.com can use their magic to find out who is cheating and who is not...it should in theory be possible.

Rebel · Post by **Rebel** » Mon Jul 31, 2017 8:23 am

Laskos wrote:It shows that Computer Chess is heading towards "Draw Death" with stronger engines, more cores and longer time control.

The notorious diminishing returns = 0 claim by Vincent Diepenveen at depth=14 made somewhere in the beginning of this century is becoming a reality after all only that he was wrong about a factor 4-5 on depth he had seen well that sooner or later this would happen.

While moving in late and not read the whole thread I think that new formula for calculating the LOS and ELO are needed with as new ingrediënt the draw rate percentage.

For example, if X vs Y play 1000 games ending in 505-495 with 990 games ending in a draw then X is clearly better.

hgm · Post by **hgm** » Mon Jul 31, 2017 8:43 am

Rebel wrote:The notorious diminishing returns = 0 claim by Vincent Diepenveen at depth=14 made somewhere in the beginning of this century is becoming a reality after all only that he was wrong about a factor 4-5 on depth he had seen well that sooner or later this would happen.

Or you could say that he did not anticipate engines would lie a factor 4-5 about their depth.

Laskos · Post by **Laskos** » Mon Jul 31, 2017 12:38 pm

Rebel wrote:
Laskos wrote:It shows that Computer Chess is heading towards "Draw Death" with stronger engines, more cores and longer time control.
The notorious diminishing returns = 0 claim by Vincent Diepenveen at depth=14 made somewhere in the beginning of this century is becoming a reality after all only that he was wrong about a factor 4-5 on depth he had seen well that sooner or later this would happen.

I too consider that in the current paradigm, if current top engines will increase the depth by a factor of 2, to say depths 60-70, with all their extensions and reductions, they will be close to non-losing engines from starting opening position, most of the games will be draws even in fast games (from balanced positions). They will have a pretty good view on the whole relevant part of the game. The tree is getting bushier at shallow depths with increasing depth (iteration), it is not just deepening, but also widening, and widening effect in an engine like Stockfish is accountable for significant part of nodes, maybe half of them. So the argument that reductions like LMR in current paradigm will forever make chess engines "weak" seems to me to not stand.

While moving in late and not read the whole thread I think that new formula for calculating the LOS and ELO are needed with as new ingrediënt the draw rate percentage.

For example, if X vs Y play 1000 games ending in 505-495 with 990 games ending in a draw then X is clearly better.

Excellent intuition by you. Take +2 -0 =98 result. LOS with uniform prior is draw-independent 0.875. That is what most people use.

At first sight, the things are even worse. We can think before the match "most of the games will be drawn, the engines are very close in strength". And take, when calculating LOS a prior like this:

Unnormalized prior: [4*score*(1-score)]**8000
It looks like that:

And LOS in this case is even worse, LOS=0.723, and draw dependent. So, the +2 -0 =98 is completely undecided.

Now, if we are more informed, for example about Win/Loss ratio which is the slowest to decay with "Draw Death" according to Andreas data:

We see that Win/Loss ratio is significantly away from 1 even for very drawish chess. So, we can think "despite many draws, engines are very different in strength", and choose a different prior, for Win/Loss significantly different from 1 even for high draw rate:

Unnormalized prior: W**2 * L**(1/2)
Which for draw rate of 95% (the shape and LOS are independent of draw rate) looks like that:

With that sort of prior knowledge (considering that engines differ significantly in strength despite extremely high draw rates), we get a draw independent LOS=0.936 for +2 -0 =98 match, almost a significant result.

Rebel · Post by **Rebel** » Mon Jul 31, 2017 3:18 pm

Laskos wrote: Unnormalized prior: W**2 * L**(1/2)

Yes, something like that. I leave that in the good hands of the mathematicians (like you) among us, I am certainly not.

I am not an expert in ELO calculation but I suspect with the Draw Death Doom Day [DDDD

] in mind the number of games should start to play a prominent role. If we take my previous example (X vs Y) and multiply the number of games by 10 we get:

Code: Select all

Match-1 - 100   games - 50.5-49.5 - &#40;50.5%) - draws 99
Match-2 - 1000  games - 505 - 495 - &#40;50.5%) - draws 990
Match-3 - 10000 games - 5050-4950 - &#40;50.5%) - draws 9900

Match-1 with only 1 win doesn't say much about ELO.
Match-2 with 10 wins already says a whole lot.
Match-3 with 100 wins and no single loss in 10,000 games then X is clearly superior over Y and it should be made visible in ELO.

From a gut feeling I would say Match-1 +10 elo, Match-2 +50 elo and Match-3 +200-300 elo. Are there already formulas for that?

Laskos · Post by **Laskos** » Mon Jul 31, 2017 3:46 pm

Rebel wrote:
Laskos wrote: Unnormalized prior: W**2 * L**(1/2)
Yes, something like that. I leave that in the good hands of the mathematicians (like you) among us, I am certainly not.

I am not an expert in ELO calculation but I suspect with the Draw Death Doom Day [DDDD ] in mind the number of games should start to play a prominent role. If we take my previous example (X vs Y) and multiply the number of games by 10 we get:
Code: Select all
Match-1 - 100   games - 50.5-49.5 - &#40;50.5%) - draws 99
Match-2 - 1000  games - 505 - 495 - &#40;50.5%) - draws 990
Match-3 - 10000 games - 5050-4950 - &#40;50.5%) - draws 9900
Match-1 with only 1 win doesn't say much about ELO.
Match-2 with 10 wins already says a whole lot.
Match-3 with 100 wins and no single loss in 10,000 games then X is clearly superior over Y and it should be made visible in ELO.

From a gut feeling I would say Match-1 +10 elo, Match-2 +50 elo and Match-3 +200-300 elo. Are there already formulas for that?

I am not a mathematician, actually a physicist, but some shallow math I should know. And I like to have some empirical grounds. There is a quantity to use for that, t-value (related easily in this to p-value), it has a standard deviation 1, and 1.96 for 95% confidence (always as we discuss).

Here the expression for it is: (score - 1/2) / sigma

So:
Match-1: t-value = 1.01 +/- 1.96 (undecided)
Match-2: t-value = 3.18 +/- 1.96 (significant)
Match-3: t-value = 10.1 +/- 1.96 (highly significant)

Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess