I have a database of 40,000 ultra-bullet games (1.5''+0.015'') of Stockfish 7. With a PGN tool provided by Ferdinand Mosca, I extracted the Expected Performance (%) function of Move Number and Eval. Move Number from engine's point of view is material count. The plots of Expected Performance function of move number for different values of shown Eval is here (Stockfish 7):
The lines would better be constants for every phase of the game (material). That's because an engine has only Eval as arbiter, and having varying Expected Performance for the same Eval will degrade the play. From this plot, for Eval of 1.7, as an example, an engine can go from 88% expected performance to 78% expected performance only changing total material, and thinking that it's not worse off as eval goes (same 170cp). In fact it's more than 100 ELO points loss. The distortion appears for all values of Eval on most of the Move Numbers.
The next to see would be Texel, which has its Eval adjusted to logistic and expected performance, theoretically independently (practically not) of the phases of the game.
But is something to be expected, as each engine has stronger or weaker points in every phase of the game. Isn't it? Of course is not desirable.
Also are you proposing for example that if Stockfish adjusts the resulting eval value at the end of the eval function to follow an straight line, it will win some strength?
Do you have the scripts? Will be nice to do the same with Andscacs. Also I can show the results if anyone is interested.
Laskos wrote:I have a database of 40,000 ultra-bullet games (1.5''+0.015'') of Stockfish 7. With a PGN tool provided by Ferdinand Mosca, I extracted the Expected Performance (%) function of Move Number and Eval. Move Number from engine's point of view is material count. The plots of Expected Performance function of move number for different values of shown Eval is here (Stockfish 7):
The lines would better be constants for every phase of the game (material). That's because an engine has only Eval as arbiter, and having varying Expected Performance for the same Eval will degrade the play. From this plot, for Eval of 1.7, as an example, an engine can go from 88% expected performance to 78% expected performance only changing total material, and thinking that it's not worse off as eval goes (same 170cp). In fact it's more than 100 ELO points loss. The distortion appears for all values of Eval on most of the Move Numbers.
The next to see would be Texel, which has its Eval adjusted to logistic and expected performance, theoretically independently (practically not) of the phases of the game.
I don't think move number is the best way to measure this effect. I would think that total material left on the board would be a better metric. These graphs with and with out passers might be instructive as well. In fact I would think breaking them up by general pawn formations may tell an interesting story.
Regards,
Forrest
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
But is something to be expected, as each engine has stronger or weaker points in every phase of the game. Isn't it? Of course is not desirable.
Also are you proposing for example that if Stockfish adjusts the resulting eval value at the end of the eval function to follow an straight line, it will win some strength?
Yes, chess is a game of skill, playing randomly is not a good idea. The goal, until some tables solve the chess, is as the game progresses, to improve expected performance, not the wrong eval. Improving on eval instead of expected performance will be much more likely weaker than improving on expected performance. Probably tomorrow I will do the same for Andscacs, 0.86 version is ok?
Do you have the scripts? Will be nice to do the same with Andscacs. Also I can show the results if anyone is interested.
Zenmastur wrote:
I don't think move number is the best way to measure this effect. I would think that total material left on the board would be a better metric. These graphs with and with out passers might be instructive as well. In fact I would think breaking them up by general pawn formations may tell an interesting story.
Regards,
Forrest
Theoretically it's doable, but I spent 1-2 hours on this with move number only for all these datapoints on a database of 40,000 games. With computing material, it would take an order or two of magnitude longer. Maybe it's better to have a map between move number and average material computed on a much smaller database (statistically it is sound, as dispersion of material for a fixed move is not that high).