New Stockfish 15,1 evaluations

jkominek · Post by **jkominek** » Fri Feb 03, 2023 7:02 am

Milton wrote: ↑Fri Feb 03, 2023 5:36 am
CornfedForever wrote: ↑Thu Feb 02, 2023 6:43 pm
Plutie wrote: ↑Thu Feb 02, 2023 3:45 pm
Uri Blass wrote: ↑Thu Feb 02, 2023 3:28 pm ...
I do not know based on what time control and what hardware +1 means expected result of 0.75(I guess it is not exactly about probability to win because it may be something like 50.1% for a win 49.8% for a draw and 0.1% for a loss).
+1.00 is equal to a 50% win chance at move 32, fitted to fishtest LTC data (60s+0.6s @ 1.328m nps)
Right, you beat me to that.
I (may be wrong and apologize if so) think I remember (can't find the post) Larry K saying a similar thing had been done to Dragon. He may well have been referring to something else though.
So if an evaluation of "1" means a 50% chance of a win, would this be equivalent to an evaluation of "0" (i.e. no advantage to either side) under the previous scheme?

The latest Stockfish WDL model, hot off the press, predicts a win rate of 49.7%, a draw rate of 50.3%, and no losses for a normalized evaluation of +1 pawn. This equals an expected points accumulation of 74.85%. (What my mind usually runs towards given the wording "win rate". The terminology can be confusing.)

Code: Select all

Eval   Pawns    Win%   Draw%   Loss%  Points
   0    0.00     0.3    99.4     0.3   50.00
  98    0.25     1.3    98.6     0.1   50.60
 197    0.50     5.4    94.6     0.0   52.70
 295    0.75    19.1    80.9     0.0   59.55
 394    1.00    49.7    50.3     0.0   74.85
 453    1.15    70.0    30.0     0.0   85.00
 591    1.50    94.5     5.5     0.0   97.25
 788    2.00    99.7     0.3     0.0   99.85
 985    2.50   100.0     0.0     0.0  100.00

This table is for the anchor point of ply=64. It's actually a symmetric 2D function. wdl: (eval, ply) -> win_rate, loss_rate = wdl(-eval, ply), and the draw rate is what's left over to sum to one. Eval is the internal-to-Stockfish evaluation value.

The win/draw line referred to by Larry Kaufman is defined as the boundary where win_rate = draw_rate.

Uri Blass · Post by **Uri Blass** » Fri Feb 03, 2023 2:47 pm

jkominek wrote: ↑Fri Feb 03, 2023 7:02 am
Milton wrote: ↑Fri Feb 03, 2023 5:36 am
CornfedForever wrote: ↑Thu Feb 02, 2023 6:43 pm
Plutie wrote: ↑Thu Feb 02, 2023 3:45 pm
Uri Blass wrote: ↑Thu Feb 02, 2023 3:28 pm ...
I do not know based on what time control and what hardware +1 means expected result of 0.75(I guess it is not exactly about probability to win because it may be something like 50.1% for a win 49.8% for a draw and 0.1% for a loss).
+1.00 is equal to a 50% win chance at move 32, fitted to fishtest LTC data (60s+0.6s @ 1.328m nps)
Right, you beat me to that.
I (may be wrong and apologize if so) think I remember (can't find the post) Larry K saying a similar thing had been done to Dragon. He may well have been referring to something else though.
So if an evaluation of "1" means a 50% chance of a win, would this be equivalent to an evaluation of "0" (i.e. no advantage to either side) under the previous scheme?
The latest Stockfish WDL model, hot off the press, predicts a win rate of 49.7%, a draw rate of 50.3%, and no losses for a normalized evaluation of +1 pawn. This equals an expected points accumulation of 74.85%. (What my mind usually runs towards given the wording "win rate". The terminology can be confusing.)
Code: Select all
Eval   Pawns    Win%   Draw%   Loss%  Points
   0    0.00     0.3    99.4     0.3   50.00
  98    0.25     1.3    98.6     0.1   50.60
 197    0.50     5.4    94.6     0.0   52.70
 295    0.75    19.1    80.9     0.0   59.55
 394    1.00    49.7    50.3     0.0   74.85
 453    1.15    70.0    30.0     0.0   85.00
 591    1.50    94.5     5.5     0.0   97.25
 788    2.00    99.7     0.3     0.0   99.85
 985    2.50   100.0     0.0     0.0  100.00
This table is for the anchor point of ply=64. It's actually a symmetric 2D function. wdl: (eval, ply) -> win_rate, loss_rate = wdl(-eval, ply), and the draw rate is what's left over to sum to one. Eval is the internal-to-Stockfish evaluation value.

The win/draw line referred to by Larry Kaufman is defined as the boundary where win_rate = draw_rate.

I see only percentage of wins draw loss.
Is there an information of exact number of games and not percentage?

0.1% may mean 10 out of 10000 and may mean 100 out of 100000 and I have no idea based on how many games the numbers with the specific exact evaluation the numbers.

I also read the following in previous post:
"+1.00 is equal to a 50% win chance at move 32, fitted to fishtest LTC data (60s+0.6s @ 1.328m nps)"

I am not sure if it means that +1.00 at different move number that is not move 32 mean different probabilities.

I would like to have more correct data:
For example:
How many games do you have with evaluation of 2.50 that you claim 100% win and do you have really 100% or maybe something like 9997 wins out of 10000 and 3 draws?
How many games do you have with evaluation of 2.49 and how many wins and how many draws and how many losses?

jkominek · Post by **jkominek** » Fri Feb 03, 2023 11:02 pm

jkominek wrote: ↑Fri Feb 03, 2023 7:02 am
Milton wrote: ↑Fri Feb 03, 2023 5:36 am
CornfedForever wrote: ↑Thu Feb 02, 2023 6:43 pm
Plutie wrote: ↑Thu Feb 02, 2023 3:45 pm
Uri Blass wrote: ↑Thu Feb 02, 2023 3:28 pm ...
I do not know based on what time control and what hardware +1 means expected result of 0.75(I guess it is not exactly about probability to win because it may be something like 50.1% for a win 49.8% for a draw and 0.1% for a loss).
+1.00 is equal to a 50% win chance at move 32, fitted to fishtest LTC data (60s+0.6s @ 1.328m nps)
Right, you beat me to that.
I (may be wrong and apologize if so) think I remember (can't find the post) Larry K saying a similar thing had been done to Dragon. He may well have been referring to something else though.
So if an evaluation of "1" means a 50% chance of a win, would this be equivalent to an evaluation of "0" (i.e. no advantage to either side) under the previous scheme?
The latest Stockfish WDL model, hot off the press, predicts a win rate of 49.7%, a draw rate of 50.3%, and no losses for a normalized evaluation of +1 pawn. This equals an expected points accumulation of 74.85%. (What my mind usually runs towards given the wording "win rate". The terminology can be confusing.)
Code: Select all
Eval   Pawns    Win%   Draw%   Loss%  Points
   0    0.00     0.3    99.4     0.3   50.00
  98    0.25     1.3    98.6     0.1   50.60
 197    0.50     5.4    94.6     0.0   52.70
 295    0.75    19.1    80.9     0.0   59.55
 394    1.00    49.7    50.3     0.0   74.85
 453    1.15    70.0    30.0     0.0   85.00
 591    1.50    94.5     5.5     0.0   97.25
 788    2.00    99.7     0.3     0.0   99.85
 985    2.50   100.0     0.0     0.0  100.00
This table is for the anchor point of ply=64. It's actually a symmetric 2D function. wdl: (eval, ply) -> win_rate, loss_rate = wdl(-eval, ply), and the draw rate is what's left over to sum to one. Eval is the internal-to-Stockfish evaluation value.

The win/draw line referred to by Larry Kaufman is defined as the boundary where win_rate = draw_rate.

Uri Blass wrote: ↑Fri Feb 03, 2023 2:47 pm I see only percentage of wins draw loss.

Yes, that's right. A WDL model is always going to be constructed in terms of probabilities. The percentages reflect that.

Uri Blass wrote: ↑Fri Feb 03, 2023 2:47 pm Is there an information of exact number of games and not percentage?

0.1% may mean 10 out of 10000 and may mean 100 out of 100000 and I have no idea based on how many games the numbers with the specific exact evaluation the numbers.

You mean in terms of the raw data used as the basis for curve fitting? According to the commit message it was trained on 400 million positions. Taking a guess at average game length that equates to around 35-40 million games. To know how many data points contribute to each grid quadrant in the domain space you'd have to get into the data in detail (which I don't have).

In https://github.com/official-stockfish/S ... /pull/4373 the Stockfish developer vondele -- a real name? I'm not sure -- posts six plots depicting the results of his work. In the four 2D contour plots one can see the comparison between isobars of the raw data (right side, top pair) and the smoothed curves (bottom pair).

Uri Blass wrote: ↑Fri Feb 03, 2023 2:47 pm I also read the following in previous post:
"+1.00 is equal to a 50% win chance at move 32, fitted to fishtest LTC data (60s+0.6s @ 1.328m nps)"

I am not sure if it means that +1.00 at different move number that is not move 32 mean different probabilities.

Yes, at move numbers other than ply 64 the model predicts different probabilities. The differences are small, though. In the 2D plots I mentioned the deviation is seen in how much the model contours deviate from vertical, as the y-axis is ply number.

Uri Blass wrote: ↑Fri Feb 03, 2023 2:47 pm I would like to have more correct data:
For example:
How many games do you have with evaluation of 2.50 that you claim 100% win and do you have really 100% or maybe something like 9997 wins out of 10000 and 3 draws?
How many games do you have with evaluation of 2.49 and how many wins and how many draws and how many losses?

For that level of detail I suppose your best course of action is to hop onto the Stockfish discord channel and inquire with vondele himself.

New Stockfish 15,1 evaluations

Re: New Stockfish 15,1 evaluations

Re: New Stockfish 15,1 evaluations

Re: New Stockfish 15,1 evaluations