Stockfish 16 evals

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

lkaufman
Posts: 6259
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Stockfish 16 evals

Post by lkaufman »

I noticed something odd about Stockfish 16 evals. 1.00 is supposed to be the point where White should win 50% of the games. But when I check out many opening positions near the line, it seems that about 0.93 is the eval that gives 50% White win prob, whereas 1.00 is about 58% win prob to 42% draw prob. Does this mean that the calibration is wrong, or that it is aimed more at the middlegame and that the opening is not typical?
Komodo rules!
User avatar
Ajedrecista
Posts: 2128
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Stockfish 16 evals.

Post by Ajedrecista »

Hello Larry:

This calibration is at move 32 (64 plies), not at the opening. If you take a look on the following links, you will see some graphics, one of them with 'at move 32' in the title, which I embed here (the top leftmore graphic):

https://github.com/vondele/WLD_model
https://github.com/official-stockfish/S ... evaluation

Image

The bottom leftmost graphic also has some formulas fitted with move 32 as the basis (note all the x/32, where the x-axis is the number of moves).

------------

I found the patch: it was between SF 15 and SF 15.1 (list of commits here), on November 5th, 2022:

Normalize evaluation
src/uci.cpp wrote:[...]
// Enforce that NormalizeToPawnValue corresponds to a 50% win rate at ply 64
[...]
https://github.com/official-stockfish/S ... 09adddR213

Regards from Spain.

Ajedrecista.
lkaufman
Posts: 6259
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Stockfish 16 evals

Post by lkaufman »

OK, that explains the discrepancy, but shouldn't an eval of 1.00 have a fairly constant win prob. thruout the game? Dropping from 58% in the opening to 50% at move 32 seems like a huge disparity which would affect play adversely; Stockfish would make wrong trading decisions due to this.
Komodo rules!
User avatar
RubiChess
Posts: 643
Joined: Fri Mar 30, 2018 7:20 am
Full name: Andreas Matthies

Re: Stockfish 16 evals

Post by RubiChess »

From my understanding there is just a constant factor used to scale the internal score to an uci output score and this factor is calculated by looking at many recent LTC games at fishtest with the goal to get this "score 1 at move 32 == win probability 50%". So this has no influence at all on moves and decissions to trade pieces.
BrendanJNorman
Posts: 2584
Joined: Mon Feb 08, 2016 12:43 am
Full name: Brendan J Norman

Re: Stockfish 16 evals

Post by BrendanJNorman »

I don't know about anything technical here, but I do know that SF has always given exaggerated evals of +1 or more, especially in opening positions where others like Dragon or Lc0 simply say +0.25 or something.

I just ignore it as if its a person, and say "Okay then, SF likes this position for white" :lol: and try not to treat it like an all-knowing God. :lol:
lkaufman
Posts: 6259
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Stockfish 16 evals

Post by lkaufman »

RubiChess wrote: Sat Jul 29, 2023 6:55 am From my understanding there is just a constant factor used to scale the internal score to an uci output score and this factor is calculated by looking at many recent LTC games at fishtest with the goal to get this "score 1 at move 32 == win probability 50%". So this has no influence at all on moves and decissions to trade pieces.
Right, but this doesn't address the question of why a position with a win prob. of 50% has a significantly different internal score at move 32 than at move 1. That should cause poor decisions.
Komodo rules!
lkaufman
Posts: 6259
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Stockfish 16 evals

Post by lkaufman »

BrendanJNorman wrote: Sat Jul 29, 2023 8:54 am I don't know about anything technical here, but I do know that SF has always given exaggerated evals of +1 or more, especially in opening positions where others like Dragon or Lc0 simply say +0.25 or something.

I just ignore it as if its a person, and say "Okay then, SF likes this position for white" :lol: and try not to treat it like an all-knowing God. :lol:
I think you are conflating SF evals before the scaling (SF 15.1 I believe), when they were absurdly high, and after, when they had a clear meaning of 1 = 50% win prob (though at move 32 so not quite right at move 1). SF (or Lc0 or Torch or Dragon) evals are not quite at the accuracy of God, but they are (usually) close to it; if SF eval at move 32 after a deep search is +1.5 or more, it is probably winning, and if it is 0.5 or less, it is probably a draw with perfect play.
Komodo rules!
User avatar
RubiChess
Posts: 643
Joined: Fri Mar 30, 2018 7:20 am
Full name: Andreas Matthies

Re: Stockfish 16 evals

Post by RubiChess »

lkaufman wrote: Sat Jul 29, 2023 7:02 pm
RubiChess wrote: Sat Jul 29, 2023 6:55 am From my understanding there is just a constant factor used to scale the internal score to an uci output score and this factor is calculated by looking at many recent LTC games at fishtest with the goal to get this "score 1 at move 32 == win probability 50%". So this has no influence at all on moves and decissions to trade pieces.
Right, but this doesn't address the question of why a position with a win prob. of 50% has a significantly different internal score at move 32 than at move 1. That should cause poor decisions.
Well, one could argue that every score of a position different from 0 or mate in x is wrong by definition. And the internal score of a position is not there to express exact probabilities of winning (which even heavily depends on the opponent), it is just a tool that helps to find the best move.
lkaufman
Posts: 6259
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Stockfish 16 evals

Post by lkaufman »

RubiChess wrote: Sat Jul 29, 2023 10:10 pm
lkaufman wrote: Sat Jul 29, 2023 7:02 pm
RubiChess wrote: Sat Jul 29, 2023 6:55 am From my understanding there is just a constant factor used to scale the internal score to an uci output score and this factor is calculated by looking at many recent LTC games at fishtest with the goal to get this "score 1 at move 32 == win probability 50%". So this has no influence at all on moves and decissions to trade pieces.
Right, but this doesn't address the question of why a position with a win prob. of 50% has a significantly different internal score at move 32 than at move 1. That should cause poor decisions.
Well, one could argue that every score of a position different from 0 or mate in x is wrong by definition. And the internal score of a position is not there to express exact probabilities of winning (which even heavily depends on the opponent), it is just a tool that helps to find the best move.
If the engine "believes" that position x has a higher expected score than position y, it should give a higher eval to position X. It has nothing to do with the "truth" or the opponent, the engine should be trying to improve its expected score according to its own "beliefs".
Komodo rules!
User avatar
RubiChess
Posts: 643
Joined: Fri Mar 30, 2018 7:20 am
Full name: Andreas Matthies

Re: Stockfish 16 evals

Post by RubiChess »

lkaufman wrote: Sun Jul 30, 2023 7:21 am If the engine "believes" that position x has a higher expected score than position y, it should give a higher eval to position X.
That's what hopefully every A/B engine does including Stockfish.
lkaufman wrote: Sun Jul 30, 2023 7:21 am It has nothing to do with the "truth" or the opponent, the engine should be trying to improve its expected score according to its own "beliefs".
I didn't say that score of a position depends on opponent. What I said is that the probability to win the game depends on the opponent. And this means that it is difficult to impossible to find a correct and general model for a score-to-wdl conversion.
SF chooses the definition "a win probability of 50% at ply 64 and against opponent with (almost) same strength should be represented by an uci score of 1". So it fixes two parameters: The ply and the opponent's strength.
lkaufman wrote: Fri Jul 28, 2023 11:04 pm OK, that explains the discrepancy, but shouldn't an eval of 1.00 have a fairly constant win prob. thruout the game? Dropping from 58% in the opening to 50% at move 32 seems like a huge disparity which would affect play adversely; Stockfish would make wrong trading decisions due to this.
Feel free to train a net that gives perfect 1.0 score for 50% win probability in every ply and game phase. I'm sure this would improve evaluation and game play in general. But it is obviously not so easy.