## Playing the endgame like a boss !!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
jp
Posts: 1320
Joined: Mon Apr 23, 2018 5:54 am

### Re: Playing the endgame like a boss !!

Eduard wrote:
Fri Mar 15, 2019 7:35 am
I've seen on chess.com how leela checkmated with the queen and also with knight and bishop. But always before the 50. move. The problem must therefore be related to this 50. moves rule.
Yeah, it's interesting that it converted KRvK in exactly 49 moves.

hgm
Posts: 24435
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

### Re: Playing the endgame like a boss !!

An example from the monthly blitz where it is not so lucky to diffuse into a winning conversion, while aimlessly moving from one 'certain win' to another...

M ANSARI
Posts: 3426
Joined: Thu Mar 16, 2006 6:10 pm

### Re: Playing the endgame like a boss !!

hgm wrote:
Fri Mar 15, 2019 12:36 pm
The problem with that is that when you do not know how to convert 'certain wins', they suddenly become a lot less certain...

Your statement also bypasses the fact that win probabilities as determined by the NN are not infinitely accurate, but are necessarily polluted by a great deal of noise. So in practice, if you have the choice between going for an estimated win probability of 90.1% with estimated remaining game length of 20 moves, and an estimated win probability of 90.0% with estimated remaining length of 50, when the estimation noise is 3%, it would be really foolish to go for the 50 moves like that extra 0.1% is real. The estimated remaining length is likely a much more reliable indicator for whether you are dealing with a 90%+3% case rather than a 90.1%-3% case.

So what I am basically saying is that witholding the duration info from the NN during training will severely degrade the accuracy with which it can eventually estimate the win probabilities. And taking the theoretically best decision based on compromised data will in practice often lead to the wrong decision. When game length had been folded into the reward function during training, the NN would, in the example above, probably not have said 90.1% vs 90% +/- 3%, but 88% vs 92% +/- 1%. And that would enable it to go for what is the highest win probability in reality, rather than the imagined one based on inaccuracy.
Actually that is a very good point! There must be ways to fix it in the way the engine is trained. Very ironic as that is what AI is all about. Maybe it is time to start thinking of a more "correct" training scheme and maybe there are parameters that can be added to its training that can teach it to stop this nonsense.

M ANSARI
Posts: 3426
Joined: Thu Mar 16, 2006 6:10 pm

### Re: Playing the endgame like a boss !!

Double Post Again

Uri Blass
Posts: 8750
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

### Re: Playing the endgame like a boss !!

hgm wrote:
Fri Mar 15, 2019 8:07 am
mwyoung wrote:
Fri Mar 15, 2019 4:06 am
I have seen Lc0 play in a style where it does not care how long it takes to mate. As long as it wins. And this makes perfect sense if your learned the game from ZERO. And all that matters is wins and losses and draw. You get no bonus for finding the shortest win.

To fix this issue. I don't know if you could call Lc0. Zero any more.
I don't agree. In any game a faster win is preferable over a slower win. That is not domain-specific knowledge any more than that a win is preferable over a loss. It is not enough to know how to reach a position from which you theoretically can force a win if you do not know how to actually convert it. You have to train gae-playing entities how to make progress towards a win, especially in the Zero approach.

LC0 is just trained for the wrong thing. And it is likely this very much slows down its training, as in many of its training examples it will not be able to recognize it did something good because its inability to convert the won position will mask it.
Not in any game a faster win is preferable.
For example I can define chessX to have the same rules like chess except that the winner get (n-1)/n points and the loser get 1/n points when n is the number of moves in the game.

In chessX it is clear that a slower win is better.

hgm
Posts: 24435
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

### Re: Playing the endgame like a boss !!

You are of course right in that case, although it is debatable whether a win is truly a win if it doesn't achieve an all-vs-nothing score. I was thinking of games with a binary result, plus perhaps a draw. Things can be very different in games where the goal is to collect points, especially if the score is made explicitly dependent on the game duration.

But I would still not consider that domain-specific knowledge; the scoring system is part of the game rules. And when the scoring depends on duration, it would be even more detrimental to not use the game duration or the actual score during training, but only tell the NN if it got more points than the opponent.

Anyway, a performance like the one I posted is rather embarrassing.

jp
Posts: 1320
Joined: Mon Apr 23, 2018 5:54 am

### Re: Playing the endgame like a boss !!

jp wrote:
Fri Mar 15, 2019 3:22 pm
Eduard wrote:
Fri Mar 15, 2019 7:35 am
I've seen on chess.com how leela checkmated with the queen and also with knight and bishop. But always before the 50. move. The problem must therefore be related to this 50. moves rule.
Yeah, it's interesting that it converted KRvK in exactly 49 moves.
I think there was another 49 move conversion today near the end of ccc6...

Alexander Lim
Posts: 42
Joined: Sun Mar 10, 2019 12:16 am
Full name: Alexander Lim

### Re: Playing the endgame like a boss !!

One of the problems so far is we haven't been able to compare Leela's endgame play with another NN engine. Is it an issue with the value head, policy head, MCTS or the training process? Chess Fighter uses all of the above and doesn't display any such issues.

First of all here is CF's continuation after 120. ... Ke3:

(Looks like I entered the position incorrectly but it shouldn't affect the conclusions of this post)

CF's evaluation ranges from -1 to 1 from which I multiply by 10 to get a display output from -10 to 10 where 10 == white win.

There are two phases to analyse here:

Stage 1: From move 1 to 16 the NN is guiding the game gradually towards a position where mate is likely. Notice the gradual increase in evaluation from +5.52 to +9.21. The game is definitely winning for white but not yet won and CF knows this.

Stage 2: On move 17 there is a sudden jump in evaluation from +9.21 to +9.96. This means CF has now 'seen' mate and without fail the MCTS algorithm converges onto the mate 4 moves later.

Now for Leela:

Stage 1: This lasts roughly from move 120 to move 234 (114 moves!). Using a 6x64 Leela net most its the evals hover around 95% (+9.75 using CF scale). The problem is this is too high and together with noise there is just not enough wiggle room for the eval climb towards a mating position. It also seems the position at move 234 was the result of random shuffling and/or threat of 50 move rule.

Stage 2: Actually this part is fine. Though it's not clear if it was the threat of a 50-move rule that forced Leela play the mate or not. Does Leela still troll around mating positions?

I think most people already know the problem is the Leela evals are saturated around 95%-100% so I've not said anything new there. But the question is why? One thing I thought was that Leela's played so many millions of games that the evals of winning positions will eventually all converge to the extremes. If that's the case then CF's eval should go to +1 also. Here are CF's eval for various generations after 120. ... Ke3:

Scale from -1 to 1. The first number is the static eval of the position, second is the average of a 10,000 node search.

gen 2000 0.38/0.38
gen 4000 0.44/0.41
gen 6000 0.62/0.57
gen 8000 0.55/0.46
gen 10000 0.60/0.54
gen 12000 0.55/0.51
gen 14000 0.51/0.47

Although we're only talking of 100,000's of self-play games (and not millions) it looks like it's stabilising around +0.5 which I think is key to getting aesthetically pleasing endgames.

Apparently Demis Hassabis said AlphaZero does not suffer from these endgame problems (One of the Leela developers mentions this on a youbtube video). Are there any AlphaZero games played to the endgame with mate to confirm this?

Out of the three main stages: Opening, middle-game and endgame it seems most people (judging by the forum posts) are resigned to the fact that NN engines just wont ever play the endgame (the Ender project only plays the endgame). I really don't think that's the case. The NN engines will eventually dominate on all stages of the game and I'm sure they will have something to teach us on the endgame as well.

I also think they will eventually conquer the tactics arena at some stage though that will probably require radically new algorithms and network structures.

For now there's only one NN who plays the endgame like a boss, and it's not Leela!

To finish off here is CF's continuation of the Leela-Marvin game after 51. ... Kd8. Note the gradual increase in eval (mate is seen on move 34).

Alex

Alexander Lim
Posts: 42
Joined: Sun Mar 10, 2019 12:16 am
Full name: Alexander Lim

### Re: Playing the endgame like a boss !!

By the way, the purpose of my previous post (aside from showing off Chess Fighters amazing endgame prowess ) was to illustrate that Leela's naughty endgame / trolling behaviour is not an inherent problem of NN's or MCTS. It's just... well... Leela!

hgm
Posts: 24435
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

### Re: Playing the endgame like a boss !!

Of course it is not an intrinsic NN problem. The evaluation you need to swiftly win KRK even with a very shallow search is extremely simple (centralization bonus for the Kings, somewhat larger for the bare one, through PST will already do). A NN as complex as that of Leela can very easily calculate that function in its value head, and the policy head would hardly matter in cases where a two-ply search is already sufficient. Besides, you cannot expect miracles from the policy head: it is trained to guide the search towards a high evaluation from the value head, and if the latter is always a certain win, the policy head won't care what you play.

But when you train the net to ignore the difference between low DTM and high DTM, insisting that the value head should deliver 100.0% all the time, because any KRK position with the strong side to move is a certain win, you make it impossible for the engine to find the path to the win, unless by pure accident it gets the mate within its horizon.

And yes, when the value head would not report the saturation value +1, but something around +0.5, there is room for it to encode progress, and the problem goes away. But then it would greatly err in the absolute value of KRK, giving +0.5 for a certain win. If it would not have been trained to similarly underestimate other wins you would have built in an inhibition to convert to KRK, which very well might be the only way to win more complex end-games.