Stockfish has included WDL stats in engine output

zullil · Post by **zullil** » Sat Jul 04, 2020 1:27 pm

nnnnnnnn wrote: ↑Sat Jul 04, 2020 12:23 pm
nnnnnnnn wrote: ↑Thu Jul 02, 2020 5:48 am wdl may be just what I need which is to determine the win/loss/draw probability...

(In fact I asked exactly this question last week, http://www.talkchess.com/forum3/viewtop ... =7&t=74243 , how to determine when to stop search based on a cp 0 score, and nobody mentioned wdl, which would have solved the issue.)
Just to be clear, based on information on another thread: the current WDL implementation is not helpful to me. It accounts for neither material nor tablebase information nor perpetual check (as someone pointed out in another thread, it reports an 11% of chance of win in K v. K for example).

It's actually 10.5% only if K vs K is the game's initial position, i.e., the FEN fullmove number is 1. If the fullmove number is 70 then it's 3.2%. But it rises to 3.8% if the fullmove number is 120.

Hope this "feature" is removed completely, and soon.

zullil · Post by **zullil** » Sat Jul 04, 2020 1:32 pm

AndrewGrant wrote: ↑Sat Jul 04, 2020 12:30 pm

Agreed. Just as we should not care to see CP scores from Leela, we should not care to see WDL scores from Stockfish. It has no basis. Its purely cosmetic, and almost condescending to the user.

I agree with cosmetic. And probably with condescending. But the numbers are based on something. Not something that I find useful, but something. See

https://github.com/official-stockfish/S ... ci_showwdl

syzygy · Post by **syzygy** » Sat Jul 04, 2020 4:05 pm

hgm wrote: ↑Sat Jul 04, 2020 10:41 am You cannot say that a GUI not implementing it is not fully UCI-compliant, though. Or did you mean that UCI compliance should make a GUI resistant against whatever garbage an engine might cough up? The specs require that of the engine; I am not sure they also require it of the GUI; they are written very much from an engine POV.

Yes, according to what GCP wrote a UCI compliant GUI should not crash on garbage ("even though UCI does say the interface should ignore stuff it doesn't understand"). In practice many GUIs will of course crash, and I see no reason to tempt such crashes if the engine could just default to not including the information.

However, I can't find this requirement in this copy of the spec:
http://wbec-ridderkerk.nl/html/UCIProtocol.html
Perhaps I am overlooking it or it is included in some later version. Or GCP simply misread the spec.

One could argue that, since the UCI protocol was designed to make life easy for engine programmers and not for GUI programmers, a GUI should never crash on garbage coming from the engine, whereas an engine may happily crash if it receives garbage from the GUI. But there is still no reason to tempt such crashes.

syzygy · Post by **syzygy** » Sat Jul 04, 2020 4:17 pm

syzygy wrote: ↑Sat Jul 04, 2020 4:05 pm However, I can't find this requirement in this copy of the spec:
http://wbec-ridderkerk.nl/html/UCIProtocol.html
Perhaps I am overlooking it or it is included in some later version. Or GCP simply misread the spec.

It must be this sentence:

UCI spec wrote:* if the engine or the GUI receives an unknown command or token it should just ignore it and try to parse the rest of the string.

Well, yes. The prolem with the UCI spec is that it is way too loosely defined. (Or should I say "one problem"?)

MikeB · Post by **MikeB** » Sat Jul 04, 2020 10:06 pm

lucasart wrote: ↑Thu Jul 02, 2020 6:25 am
I am appalled by this patch... Really doubt this kind of silly gimmick would have made it when Marco was the maintainer.

An alpha beta search trying to pretend to be a MCTS, for marketing reasons.

I would expect as much from commercial closed source engines, like when Rybka was underreporting its search depth to fool naive users into thinking that it was a quantum leap in quality: we compute much less, but much better. Or course the BS was eventually exposed.

That seems like a overreaction An open source ending "pretending to be a MCTS, for marketing reasons" - that's a reach, considering it is an open source engine with no marketing department , no revenues and the data was pulled from millions of games in fish test. Many people would find this data interesting - but if you don;'t simply keep it turned off - it passed no regression. I applaud the current maintainers for this patch. The venom that can generated on this forum is venomous. Rybka was done in secret for what I would be were obvious deceptive marketing reasons - to hide its origin. No different than Houdini increasing nodes by 1/7 ( nodes = nodes + nodes/ 7). This patch was done in the open, it was open for discussion - the science behind it was provided and discussed - totally unlike the two examples you provided. That is why your post is simply venomous - the analogy you created is nonsensical and fake and its sole purpose was to discredit the patch because you don't like it . On my, the sky is falling, this never would have happened if Marco was the maintainer. Please, spare me.

Pio · Post by **Pio** » Sat Jul 04, 2020 11:58 pm

MikeB wrote: ↑Sat Jul 04, 2020 10:06 pm
lucasart wrote: ↑Thu Jul 02, 2020 6:25 am
I am appalled by this patch... Really doubt this kind of silly gimmick would have made it when Marco was the maintainer.

An alpha beta search trying to pretend to be a MCTS, for marketing reasons.

I would expect as much from commercial closed source engines, like when Rybka was underreporting its search depth to fool naive users into thinking that it was a quantum leap in quality: we compute much less, but much better. Or course the BS was eventually exposed.
That seems like a overreaction An open source ending "pretending to be a MCTS, for marketing reasons" - that's a reach, considering it is an open source engine with no marketing department , no revenues and the data was pulled from millions of games in fish test. Many people would find this data interesting - but if you don;'t simply keep it turned off - it passed no regression. I applaud the current maintainers for this patch. The venom that can generated on this forum is venomous. Rybka was done in secret for what I would be were obvious deceptive marketing reasons - to hide its origin. No different than Houdini increasing nodes by 1/7 ( nodes = nodes + nodes/ 7). This patch was done in the open, it was open for discussion - the science behind it was provided and discussed - totally unlike the two examples you provided. That is why your post is simply venomous - the analogy you created is nonsensical and fake and its sole purpose was to discredit the patch because you don't like it . On my, the sky is falling, this never would have happened if Marco was the maintainer. Please, spare me.

+1

I guess the reported centipawn score is also for marketing reasons.

Honestly I have always thought that it was strange that the win probability was not the standard. There is no problem using win probabilities in an alpha beta framework (just think opponent’s win probability = 1 - my win probability. Thinking in win probabilities is much more intuitive and can also help when tuning the evaluation, deciding on pruning thresholds... . But I guess it is hard for some people to think outside the box and that is probably why we have so many Stockfish lookalikes even though they think they have done something original. I think there are lots of people doing a great job making weak and fun engines but unfortunately the weak stockfishes get most of the attention.

MikeB · Post by **MikeB** » Sun Jul 05, 2020 2:31 am

Pio wrote: ↑Sat Jul 04, 2020 11:58 pm <snip>
Honestly I have always thought that it was strange that the win probability was not the standard. There is no problem using win probabilities in an alpha beta framework (just think opponent’s win probability = 1 - my win probability. Thinking in win probabilities is much more intuitive and can also help when tuning the evaluation, deciding on pruning thresholds... .

+1
You are absolutely correct. The first known attempt to evaluate the pieces was made by H. M. Taylor in 1876 [4], reported by Coxeter (1940, pp. 162-165 [5]). The value of a piece was taken as proportional to the average number of squares controlled, averaged over all 64 positions of the piece on the board. Later, in 1949, Claude Shannon in his 1949 paper "Programming a Computer for Playing Chess" also proposed the {1, 3, 3, 5, 9} point values for {pawn , knight, bishop, rook , queen}. I am not certain who first proposed those piece values, but I believe it was considered common knowledge prior to Shannon's book. Anyway , piece values were developed as a means not only to teach new players. but it was also used to develop the first chess programs. Now in 2020, we have machine learning models that do not even considered the piece values at but only considers win draw and loss probabilities based on millions and millions of game played at at very fast time controls. I happen to believe scoring probabilities is the more natural way to score an expected outcome of a game - any game frankly for a human. That is also one reason why betting is so popular

Centiapawn evaluations typically range from -100 to +100 centipawn, But the expected outcome for a loss is at 100% from -5 to minus -100 (roughly) and the expected outcome of a win is at 100% win from +5 to +100 (roughly) . How many times do we ( or did we ) pay attention to centipawn below -5 and above +5 - we do all the time! BUt the reality is that the game is already over. Our attention is perhaps better utilized focusing on games where the win or loss is below 100% if we want to learn something from the game. We have all this real estate ( 190 Centipawns) where the outcome is already a forgone conclusion - and we have this small windows of 10 centpawns ( minus 5 to plus 5) where it is not. I really think that is why many players will prefer centipawn - not that the information will give them some new profound insight - it is simply a much more natural filter to focus on what is important.

Prediction - a hundred years from now, centipawn evaluation output will be obsolete, People of the business today are very much wired to probabilities. In business today everything is risk evaluated as what can go right, what can wrong. Business decisions, investing, insurance decisions , government decisions, medical treatment decisions - they are all based on probability and risk - not on some contrived and abstract centipawn evaluation that is relatively loosely connected a win probability the desired outcome. It only came into being because there was nothing better. Now granted - Stockfish is still centipawn like scoring internally - and the highest rated CP move is still being played - all SF is doing is assigning that centipawn scoring to a percent winning score and the using some historical data that fits to a model based on millions of games to output WDL. The WDL is simply a model - and we all know how well models performed during the recent COVID-19 crisis- sometimes not very well. In some respects we do calculate the scoring percentage in our head anyway e.g., I'm two pawns up, I like my winning chances.

It's human nature for people to be resistant to change. I have seen it all my life. It is probably the single biggest reason why some very successful companies in their day are no longer here - they did not change or changed quick enough. That list is quite long.

syzygy · Post by **syzygy** » Sun Jul 05, 2020 3:20 pm

Pio wrote: ↑Sat Jul 04, 2020 11:58 pm Honestly I have always thought that it was strange that the win probability was not the standard. There is no problem using win probabilities in an alpha beta framework (just think opponent’s win probability = 1 - my win probability. Thinking in win probabilities is much more intuitive and can also help when tuning the evaluation, deciding on pruning thresholds... .

Since you're overlooking draws here, it doesn't seem to be that intuitive...

Reporting scores in centipawns has always worked quite fine, corresponds to how all engines used to work (this has changed now) and avoids making false promises that only confuse users.

Leela works very differently from conventional engines and naturally produces W/D/L predictions. For Leela to work with existing GUIs, it is forced to unnaturally convert W/D/L predictions to centipawn scores. That may not be ideal, but it is as it is. I don't think the solution is to require all existing engines now to be modified to produce W/D/L predictions.

syzygy · Post by **syzygy** » Sun Jul 05, 2020 3:23 pm

MikeB wrote: ↑Sun Jul 05, 2020 2:31 am
Pio wrote: ↑Sat Jul 04, 2020 11:58 pm <snip>
Honestly I have always thought that it was strange that the win probability was not the standard. There is no problem using win probabilities in an alpha beta framework (just think opponent’s win probability = 1 - my win probability. Thinking in win probabilities is much more intuitive and can also help when tuning the evaluation, deciding on pruning thresholds... .
+1
You are absolutely correct. The first known attempt to evaluate the pieces was made by H. M. Taylor in 1876 [4], reported by Coxeter (1940, pp. 162-165 [5]). The value of a piece was taken as proportional to the average number of squares controlled, averaged over all 64 positions of the piece on the board. Later, in 1949, Claude Shannon in his 1949 paper "Programming a Computer for Playing Chess" also proposed the {1, 3, 3, 5, 9} point values for {pawn , knight, bishop, rook , queen}. I am not certain who first proposed those piece values, but I believe it was considered common knowledge prior to Shannon's book. Anyway , piece values were developed as a means not only to teach new players. but it was also used to develop the first chess programs.

Aren't you now giving two counterexamples to the statement that you qualify as "absolutely correct"?

I really think that is why many players will prefer centipawn - not that the information will give them some new profound insight - it is simply a much more natural filter to focus on what is important.

In some respects we do calculate the scoring percentage in our head anyway e.g., I'm two pawns up, I like my winning chances.

Indeed, so we agree on what is more intuitive for humans.

It's human nature for people to be resistant to change. I have seen it all my life. It is probably the single biggest reason why some very successful companies in their day are no longer here - they did not change or changed quick enough. That list is quite long.

It is clear that the NN approach to chess is vastly more promising than many people, including me, had expected. However, SF is not yet bankrupt and even if it were, artificially converting centipawn scores into W/D/L probabilities would do nothing to save it. It would be like Kodak pixelating their photos on film just to look digital.

hgm · Post by **hgm** » Sun Jul 05, 2020 3:51 pm

MikeB wrote: ↑Sun Jul 05, 2020 2:31 amCentiapawn evaluations typically range from -100 to +100 centipawn, But the expected outcome for a loss is at 100% from -5 to minus -100 (roughly) and the expected outcome of a win is at 100% win from +5 to +100 (roughly) .

I think you are confusing centiPawns with Pawns. When you are 5 Pawns up it is a 100% win. When you are only 100 cP up, it is usually draw. In most end-ames 1 Pawn is not yet a decisive advantage.

Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output