Stockfish has included WDL stats in engine output

Pio · Post by **Pio** » Sun Jul 05, 2020 10:13 pm

syzygy wrote: ↑Sun Jul 05, 2020 3:20 pm
Pio wrote: ↑Sat Jul 04, 2020 11:58 pm Honestly I have always thought that it was strange that the win probability was not the standard. There is no problem using win probabilities in an alpha beta framework (just think opponent’s win probability = 1 - my win probability. Thinking in win probabilities is much more intuitive and can also help when tuning the evaluation, deciding on pruning thresholds... .
Since you're overlooking draws here, it doesn't seem to be that intuitive...

Reporting scores in centipawns has always worked quite fine, corresponds to how all engines used to work (this has changed now) and avoids making false promises that only confuse users.

Leela works very differently from conventional engines and naturally produces W/D/L predictions. For Leela to work with existing GUIs, it is forced to unnaturally convert W/D/L predictions to centipawn scores. That may not be ideal, but it is as it is. I don't think the solution is to require all existing engines now to be modified to produce W/D/L predictions.

You are right that it is impossible to return a draw-prediction from an alpha beta framework unless it is exact, i.e. you know it is a draw with best play or it is the Score of the end of the PV.

That was not my point however. My point is that it is simple to convert an existing alpha beta engine to work with win probabilities (or should I say win or draw probabilities to satisfy you). Using my way of probabilities in alpha beta has many obvious advantages as I mentioned in my previous post. An additional gain is that can compress the storage space for the transposition table since a size of 10 bits should be more than enough and 8 bits might be sufficient. With 8 bits you could let 1 bit represent if the score is a special score or not and with the rest of the 7 bits tell either the “win or draw”-probability with a granularity of less than 1 % granularity or the distance to mate.

Even though the draw information cannot be reported correctly you could get an estimation by returning the W/D/L information from the end of the PV if you choose to use the draw probability within the engine. Even though the draw probability information by itself will not be passed in the alpha beta framework it could still be used to report an estimation to the GUI and it could also be used within the engine as a way of controlling how much you want to go for a win (contempt is the word I think). For example if you meet a weaker engine you could count draws less hoping for a win gambling for more uncertain lines. The same goes if you desperately need a win or draw in the last game of a tournament.

On a side note I have thought about that it would be really nice to have a different metric for endgame table bases. Would it not be nice to have an EGTB that for each position gives one of the moves that minimises the proof tree of the position. I understand it will take more space but it could be very useful for people wanting to train a neural network on endgame positions as well as for people wanting natural play of the engine in the endgame. Just an idea

(since I am not able to do it by myself without a lot of work)

syzygy · Post by **syzygy** » Sun Jul 05, 2020 10:47 pm

Pio wrote: ↑Sun Jul 05, 2020 10:13 pm My point is that it is simple to convert an existing alpha beta engine to work with win probabilities (or should I say win or draw probabilities to satisfy you). Using my way of probabilities in alpha beta has many obvious advantages as I mentioned in my previous post. An additional gain is that can compress the storage space for the transposition table since a size of 10 bits should be more than enough and 8 bits might be sufficient. With 8 bits you could let 1 bit represent if the score is a special score or not and with the rest of the 7 bits tell either the “win or draw”-probability with a granularity of less than 1 % granularity or the distance to mate.

Feel free to prepare a patch for Stockfish. I don't expect your patch will pass, but perhaps I am wrong.

But I hope you do realise that you will have to rewrite Stockfish's evaluation almost completely. You can't just convert SF's current (usually additive) scoring components into probabilities or "probability components". (And this thread has "Stockfish" in the title.)

Or are you only proposing to map SF's evaluation onto some kind of logarithmic scale to save a few bits in the transposition table? That would not have much to do with probabilities, I would think. But again, if it makes SF stronger, why not.

On a side note I have thought about that it would be really nice to have a different metric for endgame table bases. Would it not be nice to have an EGTB that for each position gives one of the moves that minimises the proof tree of the position. I understand it will take more space but it could be very useful for people wanting to train a neural network on endgame positions as well as for people wanting natural play of the engine in the endgame. Just an idea (since I am not able to do it by myself without a lot of work)

It is very easy to come up with "you should think outside the box!" ideas if you don't intend to implement them yourself. No need to think about practicalities, like, could it ever work at all, does it even make sense.

MikeB · Post by **MikeB** » Sun Jul 05, 2020 11:56 pm

hgm wrote: ↑Sun Jul 05, 2020 3:51 pm
MikeB wrote: ↑Sun Jul 05, 2020 2:31 amCentiapawn evaluations typically range from -100 to +100 centipawn, But the expected outcome for a loss is at 100% from -5 to minus -100 (roughly) and the expected outcome of a win is at 100% win from +5 to +100 (roughly) .
I think you are confusing centiPawns with Pawns. When you are 5 Pawns up it is a 100% win. When you are only 100 cP up, it is usually draw. In most end-ames 1 Pawn is not yet a decisive advantage.

Correct.

Pio · Post by **Pio** » Mon Jul 06, 2020 12:52 am

syzygy wrote: ↑Sun Jul 05, 2020 10:47 pm
Pio wrote: ↑Sun Jul 05, 2020 10:13 pm My point is that it is simple to convert an existing alpha beta engine to work with win probabilities (or should I say win or draw probabilities to satisfy you). Using my way of probabilities in alpha beta has many obvious advantages as I mentioned in my previous post. An additional gain is that can compress the storage space for the transposition table since a size of 10 bits should be more than enough and 8 bits might be sufficient. With 8 bits you could let 1 bit represent if the score is a special score or not and with the rest of the 7 bits tell either the “win or draw”-probability with a granularity of less than 1 % granularity or the distance to mate.
Feel free to prepare a patch for Stockfish. I don't expect your patch will pass, but perhaps I am wrong.

But I hope you do realise that you will have to rewrite Stockfish's evaluation almost completely. You can't just convert SF's current (usually additive) scoring components into probabilities or "probability components". (And this thread has "Stockfish" in the title.)

Or are you only proposing to map SF's evaluation onto some kind of logarithmic scale to save a few bits in the transposition table? That would not have much to do with probabilities, I would think. But again, if it makes SF stronger, why not.

On a side note I have thought about that it would be really nice to have a different metric for endgame table bases. Would it not be nice to have an EGTB that for each position gives one of the moves that minimises the proof tree of the position. I understand it will take more space but it could be very useful for people wanting to train a neural network on endgame positions as well as for people wanting natural play of the engine in the endgame. Just an idea (since I am not able to do it by myself without a lot of work)
It is very easy to come up with "you should think outside the box!" ideas if you don't intend to implement them yourself. No need to think about practicalities, like, could it ever work at all, does it even make sense.

As you have seen I mostly like to have far fetched ideas and not to implement them. I guess mostly that is because I am not a very skilled programmer and I think it is more fun to play games with my daughter than to program. I have implemented some of my ideas though (like a very fast and simple matrix inversion algorithm based on block matrixes that I invented before I even heard about it). It just takes some time.

I have lots of ideas in many different areas of physics and mathematics and I am also interested in biology. Optimisation and ANN is also very interesting. The problem with me is that I enjoy the thinking more than the doing.

At least I don’t need to make others feel bad to feel better about myself. I know my limitations and know what I am good at. I also don’t have the need to point out when others are wrong in mathematics

for example.

In this forum I enjoy reading HGM’s posts because I can learn from him. I also enjoyed Miguels’, Daniels’, Dons’ and Heiners’ posts when they were active since they all had really nice ideas that were very similar to mine.

syzygy · Post by **syzygy** » Mon Jul 06, 2020 9:55 am

Pio wrote: ↑Mon Jul 06, 2020 12:52 am At least I don’t need to make others feel bad to feel better about myself. I know my limitations and know what I am good at. I also don’t have the need to point out when others are wrong in mathematics for example.

You may not have liked my response, but what you wrote is (my emphasis added):

Honestly I have always thought that it was strange that the win probability was not the standard. There is no problem using win probabilities in an alpha beta framework (just think opponent’s win probability = 1 - my win probability. Thinking in win probabilities is much more intuitive and can also help when tuning the evaluation, deciding on pruning thresholds... . But I guess it is hard for some people to think outside the box and that is probably why we have so many Stockfish lookalikes even though they think they have done something original. I think there are lots of people doing a great job making weak and fun engines but unfortunately the weak stockfishes get most of the attention.

You did seem to be casting judgment on others here. Others who find it "hard to think outside the box" but have actually implemented something themselves.

If you have good ideas but no time or ability to implement them yourself, it could still be interesting to hear about those ideas if they are sufficiently developed so that perhaps somebody else can do something with it. But "just use probabilities in your alpha-beta chess engine" does not get close to that stage.

My point is that it is simple to convert an existing alpha beta engine to work with win probabilities (or should I say win or draw probabilities to satisfy you). Using my way of probabilities in alpha beta has many obvious advantages as I mentioned in my previous post.

No, it is not simple. If it is simple to you, then please explain it to us. You are talking about "[your] way of probabilities in alpha beta" that has "many obvious advantages" but you have not explained your way at all.

How are you going to express knight on A1 as a probability and combine it with other features? Linear scoring is unlikely to give the best possible evaluation function (as shown by the NN engines), but it is simple and intuitive to work with for human programmers.

Milos · Post by **Milos** » Mon Jul 06, 2020 2:01 pm

syzygy wrote: ↑Sun Jul 05, 2020 10:47 pm.
But I hope you do realise that you will have to rewrite Stockfish's evaluation almost completely. You can't just convert SF's current (usually additive) scoring components into probabilities or "probability components". (And this thread has "Stockfish" in the title.)

No need, NN eval for SF is already there and for fixed nodes is already noticably stroger than original SF eval.

odomobo · Post by **odomobo** » Mon Jul 06, 2020 5:48 pm

Milos wrote: ↑Mon Jul 06, 2020 2:01 pm
syzygy wrote: ↑Sun Jul 05, 2020 10:47 pm.
But I hope you do realise that you will have to rewrite Stockfish's evaluation almost completely. You can't just convert SF's current (usually additive) scoring components into probabilities or "probability components". (And this thread has "Stockfish" in the title.)
No need, NN eval for SF is already there and for fixed nodes is already noticably stroger than original SF eval.

But the NN eval has to be something like 1000x slower, right? This doesn't seem like a good comparison to me, because one of stockfish's features is its fast evaluation.

Milos · Post by **Milos** » Mon Jul 06, 2020 5:51 pm

odomobo wrote: ↑Mon Jul 06, 2020 5:48 pm
Milos wrote: ↑Mon Jul 06, 2020 2:01 pm
syzygy wrote: ↑Sun Jul 05, 2020 10:47 pm.
But I hope you do realise that you will have to rewrite Stockfish's evaluation almost completely. You can't just convert SF's current (usually additive) scoring components into probabilities or "probability components". (And this thread has "Stockfish" in the title.)
No need, NN eval for SF is already there and for fixed nodes is already noticably stroger than original SF eval.
But the NN eval has to be something like 1000x slower, right? This doesn't seem like a good comparison to me, because one of stockfish's features is its fast evaluation.

NN eval is only twice slower. There is plenty information on this forum, but ppl seems to be quite uninformed.
Hint: try searching for SF-NNUE.

odomobo · Post by **odomobo** » Mon Jul 06, 2020 5:55 pm

Ah, that's not so bad then

Alayan · Post by **Alayan** » Mon Jul 06, 2020 6:31 pm

Pio wrote: ↑Sun Jul 05, 2020 10:13 pm That was not my point however. My point is that it is simple to convert an existing alpha beta engine to work with win probabilities (or should I say win or draw probabilities to satisfy you). Using my way of probabilities in alpha beta has many obvious advantages as I mentioned in my previous post. An additional gain is that can compress the storage space for the transposition table since a size of 10 bits should be more than enough and 8 bits might be sufficient. With 8 bits you could let 1 bit represent if the score is a special score or not and with the rest of the 7 bits tell either the “win or draw”-probability with a granularity of less than 1 % granularity or the distance to mate.

The main point of evaluation is to produce position ordering. If position A has a better eval than position B, prefer position A.

The secondary point of evaluation is to guide search. Some feature might not be worth the weight it is assigned and would be incorrect in a leaf node that's backed up to the root, but it will push search to consider more the positions with this feature, and descendant leaf nodes where the feature isn't there anymore will tell if there was actually something or not there.

Internal winpct gives no advantage over standard internal units in either case. Increased granularity close to 0.00 isn't an advantage if your evaluation is too inaccurate anyway to take meaningful advantage of it. Meanwhile, you just made serial computation of the position's evaluation a total headache, as you can't just add winpct the way one can add cp. If you use conversion functions to go back and forth from an additive model (or something equivalent), then you're just wasting a lot of energy on useless computations. If you don't go with a linear model, there is nothing "simple" about converting an existing engine. It will be very complex, which also mean hard to tune and improve, and you'll lose elo because even if the model allows good enough values to stay on par with the linear one, you won't find them.

Besides, centipawn output makes no false promise. It gives an estimated advantage, but if someone with a clue makes the mental effort to think of it in winning probabilities, contextual information will be used - type of position, engine depth, eval trends...

Meanwhile, with raw WDL output, many people make the gross mistake of forgetting context. The actual WDL values will be off for almost all situations. They are tuned for training conditions, but even so, they are only a guess. WDL may look more serious, but if the engine is missing something important, its number will be way off. And for any games played in different conditions, the WDL predictions are not applicable. If Stockfish proclaims +2 in a human blitz game position that white goes on to lose, the interpretation "White blundered a +2 position" still stands. If Stockfish were to proclaim 95% win instead, the interpretation "White blundered a 95% win position" would be completely wrong because in the context of that game, there never was a 95% win for white position.

Milos wrote: ↑Mon Jul 06, 2020 2:01 pm
syzygy wrote: ↑Sun Jul 05, 2020 10:47 pm.
But I hope you do realise that you will have to rewrite Stockfish's evaluation almost completely. You can't just convert SF's current (usually additive) scoring components into probabilities or "probability components". (And this thread has "Stockfish" in the title.)
No need, NN eval for SF is already there and for fixed nodes is already noticably stroger than original SF eval.

That's completely irrelevant to the discussion at hand. The NN takes input feature, then output an eval in SF-internal units. It doesn't need winpct to work at all, because winpct gives nothing for position ordering and search exploration - and to perform well with SF's search without heavy modifications, the eval needs similarities with SF's original eval.

Converting SF's hand-written eval to use winpct instead of cp is what was suggested ant it's unfeasible without severe elo loss.

Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output