Stockfish 16 evals
Moderator: Ras
-
- Posts: 6259
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Stockfish 16 evals
I noticed something odd about Stockfish 16 evals. 1.00 is supposed to be the point where White should win 50% of the games. But when I check out many opening positions near the line, it seems that about 0.93 is the eval that gives 50% White win prob, whereas 1.00 is about 58% win prob to 42% draw prob. Does this mean that the calibration is wrong, or that it is aimed more at the middlegame and that the opening is not typical?
Komodo rules!
-
- Posts: 2128
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Stockfish 16 evals.
Hello Larry:
This calibration is at move 32 (64 plies), not at the opening. If you take a look on the following links, you will see some graphics, one of them with 'at move 32' in the title, which I embed here (the top leftmore graphic):
https://github.com/vondele/WLD_model
https://github.com/official-stockfish/S ... evaluation

The bottom leftmost graphic also has some formulas fitted with move 32 as the basis (note all the x/32, where the x-axis is the number of moves).
------------
I found the patch: it was between SF 15 and SF 15.1 (list of commits here), on November 5th, 2022:
Normalize evaluation
Regards from Spain.
Ajedrecista.
This calibration is at move 32 (64 plies), not at the opening. If you take a look on the following links, you will see some graphics, one of them with 'at move 32' in the title, which I embed here (the top leftmore graphic):
https://github.com/vondele/WLD_model
https://github.com/official-stockfish/S ... evaluation

The bottom leftmost graphic also has some formulas fitted with move 32 as the basis (note all the x/32, where the x-axis is the number of moves).
------------
I found the patch: it was between SF 15 and SF 15.1 (list of commits here), on November 5th, 2022:
Normalize evaluation
https://github.com/official-stockfish/S ... 09adddR213src/uci.cpp wrote:[...]
// Enforce that NormalizeToPawnValue corresponds to a 50% win rate at ply 64
[...]
Regards from Spain.
Ajedrecista.
-
- Posts: 6259
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Stockfish 16 evals
OK, that explains the discrepancy, but shouldn't an eval of 1.00 have a fairly constant win prob. thruout the game? Dropping from 58% in the opening to 50% at move 32 seems like a huge disparity which would affect play adversely; Stockfish would make wrong trading decisions due to this.
Komodo rules!
-
- Posts: 643
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
Re: Stockfish 16 evals
From my understanding there is just a constant factor used to scale the internal score to an uci output score and this factor is calculated by looking at many recent LTC games at fishtest with the goal to get this "score 1 at move 32 == win probability 50%". So this has no influence at all on moves and decissions to trade pieces.
-
- Posts: 2584
- Joined: Mon Feb 08, 2016 12:43 am
- Full name: Brendan J Norman
Re: Stockfish 16 evals
I don't know about anything technical here, but I do know that SF has always given exaggerated evals of +1 or more, especially in opening positions where others like Dragon or Lc0 simply say +0.25 or something.
I just ignore it as if its a person, and say "Okay then, SF likes this position for white"
and try not to treat it like an all-knowing God. 
I just ignore it as if its a person, and say "Okay then, SF likes this position for white"


-
- Posts: 6259
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Stockfish 16 evals
Right, but this doesn't address the question of why a position with a win prob. of 50% has a significantly different internal score at move 32 than at move 1. That should cause poor decisions.RubiChess wrote: ↑Sat Jul 29, 2023 6:55 am From my understanding there is just a constant factor used to scale the internal score to an uci output score and this factor is calculated by looking at many recent LTC games at fishtest with the goal to get this "score 1 at move 32 == win probability 50%". So this has no influence at all on moves and decissions to trade pieces.
Komodo rules!
-
- Posts: 6259
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Stockfish 16 evals
I think you are conflating SF evals before the scaling (SF 15.1 I believe), when they were absurdly high, and after, when they had a clear meaning of 1 = 50% win prob (though at move 32 so not quite right at move 1). SF (or Lc0 or Torch or Dragon) evals are not quite at the accuracy of God, but they are (usually) close to it; if SF eval at move 32 after a deep search is +1.5 or more, it is probably winning, and if it is 0.5 or less, it is probably a draw with perfect play.BrendanJNorman wrote: ↑Sat Jul 29, 2023 8:54 am I don't know about anything technical here, but I do know that SF has always given exaggerated evals of +1 or more, especially in opening positions where others like Dragon or Lc0 simply say +0.25 or something.
I just ignore it as if its a person, and say "Okay then, SF likes this position for white"and try not to treat it like an all-knowing God.
![]()
Komodo rules!
-
- Posts: 643
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
Re: Stockfish 16 evals
Well, one could argue that every score of a position different from 0 or mate in x is wrong by definition. And the internal score of a position is not there to express exact probabilities of winning (which even heavily depends on the opponent), it is just a tool that helps to find the best move.lkaufman wrote: ↑Sat Jul 29, 2023 7:02 pmRight, but this doesn't address the question of why a position with a win prob. of 50% has a significantly different internal score at move 32 than at move 1. That should cause poor decisions.RubiChess wrote: ↑Sat Jul 29, 2023 6:55 am From my understanding there is just a constant factor used to scale the internal score to an uci output score and this factor is calculated by looking at many recent LTC games at fishtest with the goal to get this "score 1 at move 32 == win probability 50%". So this has no influence at all on moves and decissions to trade pieces.
-
- Posts: 6259
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Stockfish 16 evals
If the engine "believes" that position x has a higher expected score than position y, it should give a higher eval to position X. It has nothing to do with the "truth" or the opponent, the engine should be trying to improve its expected score according to its own "beliefs".RubiChess wrote: ↑Sat Jul 29, 2023 10:10 pmWell, one could argue that every score of a position different from 0 or mate in x is wrong by definition. And the internal score of a position is not there to express exact probabilities of winning (which even heavily depends on the opponent), it is just a tool that helps to find the best move.lkaufman wrote: ↑Sat Jul 29, 2023 7:02 pmRight, but this doesn't address the question of why a position with a win prob. of 50% has a significantly different internal score at move 32 than at move 1. That should cause poor decisions.RubiChess wrote: ↑Sat Jul 29, 2023 6:55 am From my understanding there is just a constant factor used to scale the internal score to an uci output score and this factor is calculated by looking at many recent LTC games at fishtest with the goal to get this "score 1 at move 32 == win probability 50%". So this has no influence at all on moves and decissions to trade pieces.
Komodo rules!
-
- Posts: 643
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
Re: Stockfish 16 evals
That's what hopefully every A/B engine does including Stockfish.
I didn't say that score of a position depends on opponent. What I said is that the probability to win the game depends on the opponent. And this means that it is difficult to impossible to find a correct and general model for a score-to-wdl conversion.
SF chooses the definition "a win probability of 50% at ply 64 and against opponent with (almost) same strength should be represented by an uci score of 1". So it fixes two parameters: The ply and the opponent's strength.
Feel free to train a net that gives perfect 1.0 score for 50% win probability in every ply and game phase. I'm sure this would improve evaluation and game play in general. But it is obviously not so easy.lkaufman wrote: ↑Fri Jul 28, 2023 11:04 pm OK, that explains the discrepancy, but shouldn't an eval of 1.00 have a fairly constant win prob. thruout the game? Dropping from 58% in the opening to 50% at move 32 seems like a huge disparity which would affect play adversely; Stockfish would make wrong trading decisions due to this.