Stockfish has included WDL stats in engine output

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Michel
Posts: 2102
Joined: Sun Sep 28, 2008 11:50 pm

Re: Stockfish has included WDL stats in engine output

Post by Michel » Wed Jul 08, 2020 6:51 am

Before discussing the SF wdl data one should understand what they actually represents.

There are two principal definitions of wdl.
  1. Given a particular position one could define it as the empirical probabilities that an engine scores a win,draw,loss against itself when playing from that position. Most people commenting here seem to be using this definition but if one thinks about it one realizes that it is fraught with difficulties. The most obvious one is that one needs a mechanism to introduce variety and this may influence the result. Another difficulty is that it requires an unreasonable amount of ressources to measure wdl probabilities in this way.
  2. Instead it is more practical to define wdl data as follows: identify some characteristics of a position (e.g. eval (after search), game phase, table base lookup, drawishness heuristics, ...) and define w,d,l as the probability that a position, randomly selected from a fixed corpus of positions, with these characteristics, has the outcome win,draw,loss. Usually we have no means of knowing the game theoretic outcome of a position, so as a proxy we should use the outcome of a match played from the position which was selected. The drawback of this definition is that it depends on the selected characteristics (where do you stop?) and on the corpus of positions.
Stockfish sort of uses definition 2, using the ply count and the eval as characteristics, and a corpus of positions taken from Fishtest LTC games. I say "sort of" since ply count is not truly a characteristic of the position but serves here as a proxy for game phase. The data for game phase exists but happens to be more difficult to fit.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

Alayan
Posts: 321
Joined: Tue Nov 19, 2019 7:48 pm
Full name: Alayan Feh

Re: Stockfish has included WDL stats in engine output

Post by Alayan » Wed Jul 08, 2020 7:07 am

Michel wrote:
Wed Jul 08, 2020 6:51 am
The most obvious one is that one needs a mechanism to introduce variety and this may influence the result. Another difficulty is that it requires an unreasonable amount of ressources to measure wdl probabilities in this way.
Running the engine with multiple threads should introduce suitable variety without deliberate weakening, for LazySMP engines. But the resources cost is just way too high to be practical.

User avatar
Ovyron
Posts: 4346
Joined: Tue Jul 03, 2007 2:30 am

Re: Stockfish has included WDL stats in engine output

Post by Ovyron » Thu Jul 09, 2020 1:41 am

Michel wrote:
Wed Jul 08, 2020 6:51 am
Usually we have no means of knowing the game theoretic outcome of a position, so as a proxy we should use the outcome of a match played from the position which was selected.
What proxy was used that causes K v K positions to appear as white with winning chances?

The feature is still in diapers and being sent to do adult jobs.

syzygy
Posts: 4598
Joined: Tue Feb 28, 2012 10:56 pm

Re: Stockfish has included WDL stats in engine output

Post by syzygy » Thu Jul 09, 2020 11:21 am

Ovyron wrote:
Thu Jul 09, 2020 1:41 am
Michel wrote:
Wed Jul 08, 2020 6:51 am
Usually we have no means of knowing the game theoretic outcome of a position, so as a proxy we should use the outcome of a match played from the position which was selected.
What proxy was used that causes K v K positions to appear as white with winning chances?
SF has still won and lost many games in Fishtest after reaching a 0.00 score at relatively high move numbers. So if you base win probability on score and game_ply, the probability will indeed be positive.

kinderchocolate
Posts: 428
Joined: Mon Nov 01, 2010 5:55 am
Full name: Ted Wong
Contact:

Re: Stockfish has included WDL stats in engine output

Post by kinderchocolate » Thu Jul 16, 2020 8:04 pm

Probably asked somewhere else, but I can't find it. What's the impact of using ply in the calculation? The problem here is we don't have such information if we start from a non-initial position.

zullil
Posts: 6326
Joined: Mon Jan 08, 2007 11:31 pm
Location: PA USA
Full name: Louis Zulli

Re: Stockfish has included WDL stats in engine output

Post by zullil » Thu Jul 16, 2020 8:33 pm

kinderchocolate wrote:
Thu Jul 16, 2020 8:04 pm
Probably asked somewhere else, but I can't find it. What's the impact of using ply in the calculation? The problem here is we don't have such information if we start from a non-initial position.
Yes, your concern about ply count not being available for many positions one might want to analyze was already raised. My suggestion is to not use WDL scoring for such positions. Or for anything, for that matter. :evil:

But, if you want to understand how ply affects WDL scores, study this code: https://github.com/official-stockfish/S ... i.cpp#L186

I believe PawnValueEg = 206.

kinderchocolate
Posts: 428
Joined: Mon Nov 01, 2010 5:55 am
Full name: Ted Wong
Contact:

Re: Stockfish has included WDL stats in engine output

Post by kinderchocolate » Thu Jul 16, 2020 9:44 pm

Thanks. I think probability of winning is a good measure for chess reporting, and is in fact better than "cp":
  • cp is a programming concept not for chess analysis
  • cp is heavily implementation dependent
Reporting probability from fitting a sigmoid curve is a nice way to normalize the conflicts. I attach a plot of the Stockfish's WDL code.
  • https://github.com/glinscott/fishtest/w ... n-fishtest saturate around 400, but the SF code saturate around 600. Not sure why the author of patch reported "The model fits rather accurately the LTC fishtest statistics". The saturation point is critically important in the model, so if I'm not mistaken the patch was horribly badly programmed.
  • 600 is a little less than a knight in SF
  • At cp==0, the winning chance in the Fishtest link is about little less than 1 (hard to see). The SF code is 0.076 (vertical line in the plot).
Basically, the code tells us if we have an advantage something between a pawn and a knight, it's almost certain win. Up by a pawn is approximately 25% winning chance, not including draws.
Attachments
ABCD.png
ABCD.png (40.24 KiB) Viewed 250 times

kinderchocolate
Posts: 428
Joined: Mon Nov 01, 2010 5:55 am
Full name: Ted Wong
Contact:

Re: Stockfish has included WDL stats in engine output

Post by kinderchocolate » Thu Jul 16, 2020 9:53 pm

I would like to add WDL for chess analysis, because it's a such better statistics than unscaled scores. But my concerns:
  • It doesn't look like fitting Fishtest properly???
  • Probability goes up as ply rises. This is bad, because in many practical endgames such as the common rook endgames the winning chance actually drop relative to middle game for the same score. Chess engines tend to overestimate winning chances without a tablebase.

kinderchocolate
Posts: 428
Joined: Mon Nov 01, 2010 5:55 am
Full name: Ted Wong
Contact:

Re: Stockfish has included WDL stats in engine output

Post by kinderchocolate » Thu Jul 16, 2020 9:57 pm

If I was to add it analysis, I may just drop the ply parameter, and just hard-code it to 10. It looks like at 10, a knight advantage is about 75% winning. I like it to be 75% winning for a piece up.

Alayan
Posts: 321
Joined: Tue Nov 19, 2019 7:48 pm
Full name: Alayan Feh

Re: Stockfish has included WDL stats in engine output

Post by Alayan » Thu Jul 16, 2020 9:59 pm

You got several things wrong.

Fishtest has 400cp adjudication, so any game reaching it for a few plies get marked as a win, though in some instance playing on it would end in a draw.

Stockfish internal units aren't the same as centipawns. 600 or so would be the value of a knight in internal units, not cp. Of course usually a position down a knight snowballs into much worse quickly.

You got the (cp, ply) -> wdl function very wrong, because with Stockfish's actual WDL formula the draw probability increase as ply count increase instead of the winning probability increasing as in your graph.

Post Reply