TalkChess.com

Posted: **Wed Jul 08, 2020 8:51 am**

Before discussing the SF wdl data one should understand what they actually represents.

There are two principal definitions of wdl.

Given a particular position one could define it as the empirical probabilities that an engine scores a win,draw,loss against itself when playing from that position. Most people commenting here seem to be using this definition but if one thinks about it one realizes that it is fraught with difficulties. The most obvious one is that one needs a mechanism to introduce variety and this may influence the result. Another difficulty is that it requires an unreasonable amount of ressources to measure wdl probabilities in this way.
Instead it is more practical to define wdl data as follows: identify some characteristics of a position (e.g. eval (after search), game phase, table base lookup, drawishness heuristics, ...) and define w,d,l as the probability that a position, randomly selected from a fixed corpus of positions, with these characteristics, has the outcome win,draw,loss. Usually we have no means of knowing the game theoretic outcome of a position, so as a proxy we should use the outcome of a match played from the position which was selected. The drawback of this definition is that it depends on the selected characteristics (where do you stop?) and on the corpus of positions.

Stockfish sort of uses definition 2, using the ply count and the eval as characteristics, and a corpus of positions taken from Fishtest LTC games. I say "sort of" since ply count is not truly a characteristic of the position but serves here as a proxy for game phase. The data for game phase exists but happens to be more difficult to fit.

Posted: **Wed Jul 08, 2020 9:07 am**

Michel wrote: ↑Wed Jul 08, 2020 8:51 amThe most obvious one is that one needs a mechanism to introduce variety and this may influence the result. Another difficulty is that it requires an unreasonable amount of ressources to measure wdl probabilities in this way.

Running the engine with multiple threads should introduce suitable variety without deliberate weakening, for LazySMP engines. But the resources cost is just way too high to be practical.

Posted: **Thu Jul 09, 2020 3:41 am**

Michel wrote: ↑Wed Jul 08, 2020 8:51 am Usually we have no means of knowing the game theoretic outcome of a position, so as a proxy we should use the outcome of a match played from the position which was selected.

What proxy was used that causes K v K positions to appear as white with winning chances?

The feature is still in diapers and being sent to do adult jobs.

Posted: **Thu Jul 09, 2020 1:21 pm**

Ovyron wrote: ↑Thu Jul 09, 2020 3:41 am
Michel wrote: ↑Wed Jul 08, 2020 8:51 am Usually we have no means of knowing the game theoretic outcome of a position, so as a proxy we should use the outcome of a match played from the position which was selected.
What proxy was used that causes K v K positions to appear as white with winning chances?

SF has still won and lost many games in Fishtest after reaching a 0.00 score at relatively high move numbers. So if you base win probability on score and game_ply, the probability will indeed be positive.

Posted: **Thu Jul 16, 2020 10:04 pm**

Probably asked somewhere else, but I can't find it. What's the impact of using ply in the calculation? The problem here is we don't have such information if we start from a non-initial position.

Posted: **Thu Jul 16, 2020 10:33 pm**

kinderchocolate wrote: ↑Thu Jul 16, 2020 10:04 pm Probably asked somewhere else, but I can't find it. What's the impact of using ply in the calculation? The problem here is we don't have such information if we start from a non-initial position.

Yes, your concern about ply count not being available for many positions one might want to analyze was already raised. My suggestion is to not use WDL scoring for such positions. Or for anything, for that matter.

But, if you want to understand how ply affects WDL scores, study this code: https://github.com/official-stockfish/S ... i.cpp#L186

I believe PawnValueEg = 206.

Posted: **Thu Jul 16, 2020 11:44 pm**

Thanks. I think probability of winning is a good measure for chess reporting, and is in fact better than "cp":

cp is a programming concept not for chess analysis

cp is heavily implementation dependent

Reporting probability from fitting a sigmoid curve is a nice way to normalize the conflicts. I attach a plot of the Stockfish's WDL code.

https://github.com/glinscott/fishtest/w ... n-fishtest saturate around 400, but the SF code saturate around 600. Not sure why the author of patch reported "The model fits rather accurately the LTC fishtest statistics". The saturation point is critically important in the model, so if I'm not mistaken the patch was horribly badly programmed.

600 is a little less than a knight in SF

At cp==0, the winning chance in the Fishtest link is about little less than 1 (hard to see). The SF code is 0.076 (vertical line in the plot).

Basically, the code tells us if we have an advantage something between a pawn and a knight, it's almost certain win. Up by a pawn is approximately 25% winning chance, not including draws.

Posted: **Thu Jul 16, 2020 11:53 pm**

I would like to add WDL for chess analysis, because it's a such better statistics than unscaled scores. But my concerns:

It doesn't look like fitting Fishtest properly???

Probability goes up as ply rises. This is bad, because in many practical endgames such as the common rook endgames the winning chance actually drop relative to middle game for the same score. Chess engines tend to overestimate winning chances without a tablebase.

Posted: **Thu Jul 16, 2020 11:57 pm**

If I was to add it analysis, I may just drop the ply parameter, and just hard-code it to 10. It looks like at 10, a knight advantage is about 75% winning. I like it to be 75% winning for a piece up.

Posted: **Thu Jul 16, 2020 11:59 pm**

You got several things wrong.

Fishtest has 400cp adjudication, so any game reaching it for a few plies get marked as a win, though in some instance playing on it would end in a draw.

Stockfish internal units aren't the same as centipawns. 600 or so would be the value of a knight in internal units, not cp. Of course usually a position down a knight snowballs into much worse quickly.

You got the (cp, ply) -> wdl function very wrong, because with Stockfish's actual WDL formula the draw probability increase as ply count increase instead of the winning probability increasing as in your graph.

TalkChess.com

Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output