Stockfish 16 evals

lkaufman · Post by **lkaufman** » Sun Jul 30, 2023 7:17 pm

RubiChess wrote: ↑Sun Jul 30, 2023 2:28 pm
lkaufman wrote: ↑Sun Jul 30, 2023 7:21 am If the engine "believes" that position x has a higher expected score than position y, it should give a higher eval to position X.
That's what hopefully every A/B engine does including Stockfish.

"hopefully" is the key word here. Clearly Stockfish is not actually doing this. In the opening, an eval slightly below 1 has a higher expected score than an eval slightly above 1 at move 32, but a lower eval.

lkaufman wrote: ↑Sun Jul 30, 2023 7:21 am It has nothing to do with the "truth" or the opponent, the engine should be trying to improve its expected score according to its own "beliefs".
I didn't say that score of a position depends on opponent. What I said is that the probability to win the game depends on the opponent. And this means that it is difficult to impossible to find a correct and general model for a score-to-wdl conversion.
SF chooses the definition "a win probability of 50% at ply 64 and against opponent with (almost) same strength should be represented by an uci score of 1". So it fixes two parameters: The ply and the opponent's strength.

Of course win prob. means against equal opponent (unless otherwise stated). Only one parameter, ply, is worth talking about.

lkaufman wrote: ↑Fri Jul 28, 2023 11:04 pm OK, that explains the discrepancy, but shouldn't an eval of 1.00 have a fairly constant win prob. thruout the game? Dropping from 58% in the opening to 50% at move 32 seems like a huge disparity which would affect play adversely; Stockfish would make wrong trading decisions due to this.
Feel free to train a net that gives perfect 1.0 score for 50% win probability in every ply and game phase. I'm sure this would improve evaluation and game play in general. But it is obviously not so easy.

That might not be easy, but it should be fairly easy to apply a scaling factor that is a function of ply that would make 1.0 = 50% win prob. on average for every ply. I might propose that for Torch if I determine that Torch has a similar issue in this regard to Stockfish.

syzygy · Post by **syzygy** » Mon Jul 31, 2023 11:47 pm

RubiChess wrote: ↑Sun Jul 30, 2023 2:28 pm Feel free to train a net that gives perfect 1.0 score for 50% win probability in every ply and game phase. I'm sure this would improve evaluation and game play in general. But it is obviously not so easy.

Training to get it perfect is one thing, but if such a skew is present in the training data, perhaps it is possible to remove it before training starts?

I am speaking from a position of utter ignorance though

Stockfish 16 evals

Re: Stockfish 16 evals

Re: Stockfish 16 evals