Anomaly of Stockfish eval depending on phases of the game

Laskos · Post by **Laskos** » Thu May 05, 2016 5:50 pm

I have a database of 40,000 ultra-bullet games (1.5''+0.015'') of Stockfish 7. With a PGN tool provided by Ferdinand Mosca, I extracted the Expected Performance (%) function of Move Number and Eval. Move Number from engine's point of view is material count. The plots of Expected Performance function of move number for different values of shown Eval is here (Stockfish 7):

The lines would better be constants for every phase of the game (material). That's because an engine has only Eval as arbiter, and having varying Expected Performance for the same Eval will degrade the play. From this plot, for Eval of 1.7, as an example, an engine can go from 88% expected performance to 78% expected performance only changing total material, and thinking that it's not worse off as eval goes (same 170cp). In fact it's more than 100 ELO points loss. The distortion appears for all values of Eval on most of the Move Numbers.

The next to see would be Texel, which has its Eval adjusted to logistic and expected performance, theoretically independently (practically not) of the phases of the game.

cdani · Post by **cdani** » Thu May 05, 2016 7:08 pm

Nice graph! Thanks.

But is something to be expected, as each engine has stronger or weaker points in every phase of the game. Isn't it? Of course is not desirable.

Also are you proposing for example that if Stockfish adjusts the resulting eval value at the end of the eval function to follow an straight line, it will win some strength?

Do you have the scripts? Will be nice to do the same with Andscacs. Also I can show the results if anyone is interested.

Zenmastur · Post by **Zenmastur** » Thu May 05, 2016 7:55 pm

Laskos wrote:I have a database of 40,000 ultra-bullet games (1.5''+0.015'') of Stockfish 7. With a PGN tool provided by Ferdinand Mosca, I extracted the Expected Performance (%) function of Move Number and Eval. Move Number from engine's point of view is material count. The plots of Expected Performance function of move number for different values of shown Eval is here (Stockfish 7):

The lines would better be constants for every phase of the game (material). That's because an engine has only Eval as arbiter, and having varying Expected Performance for the same Eval will degrade the play. From this plot, for Eval of 1.7, as an example, an engine can go from 88% expected performance to 78% expected performance only changing total material, and thinking that it's not worse off as eval goes (same 170cp). In fact it's more than 100 ELO points loss. The distortion appears for all values of Eval on most of the Move Numbers.

The next to see would be Texel, which has its Eval adjusted to logistic and expected performance, theoretically independently (practically not) of the phases of the game.

I don't think move number is the best way to measure this effect. I would think that total material left on the board would be a better metric. These graphs with and with out passers might be instructive as well. In fact I would think breaking them up by general pawn formations may tell an interesting story.

Regards,

Forrest

Laskos · Post by **Laskos** » Thu May 05, 2016 10:30 pm

cdani wrote:Nice graph! Thanks.

But is something to be expected, as each engine has stronger or weaker points in every phase of the game. Isn't it? Of course is not desirable.

Also are you proposing for example that if Stockfish adjusts the resulting eval value at the end of the eval function to follow an straight line, it will win some strength?

Yes, chess is a game of skill, playing randomly is not a good idea. The goal, until some tables solve the chess, is as the game progresses, to improve expected performance, not the wrong eval. Improving on eval instead of expected performance will be much more likely weaker than improving on expected performance. Probably tomorrow I will do the same for Andscacs, 0.86 version is ok?

Do you have the scripts? Will be nice to do the same with Andscacs. Also I can show the results if anyone is interested.

Laskos · Post by **Laskos** » Thu May 05, 2016 10:36 pm

Zenmastur wrote:
I don't think move number is the best way to measure this effect. I would think that total material left on the board would be a better metric. These graphs with and with out passers might be instructive as well. In fact I would think breaking them up by general pawn formations may tell an interesting story.

Regards,

Forrest

Theoretically it's doable, but I spent 1-2 hours on this with move number only for all these datapoints on a database of 40,000 games. With computing material, it would take an order or two of magnitude longer. Maybe it's better to have a map between move number and average material computed on a much smaller database (statistically it is sound, as dispersion of material for a fixed move is not that high).

cdani · Post by **cdani** » Fri May 06, 2016 1:00 am

Laskos wrote: Probably tomorrow I will do the same for Andscacs, 0.86 version is ok?

Thanks! Better take this one:
www.andscacs.com/andscacs086085.zip

As Andscacs is not very well tuned, I expect something worst than Stockfish

cdani · Post by **cdani** » Fri May 06, 2016 1:06 am

cdani wrote:
Laskos wrote: Probably tomorrow I will do the same for Andscacs, 0.86 version is ok?
Thanks! Better take this one:
www.andscacs.com/andscacs086085.zip

As Andscacs is not very well tuned, I expect something worst than Stockfish

If you want I can generate the games for you. Just tell me.

Laskos · Post by **Laskos** » Fri May 06, 2016 10:01 am

cdani wrote:
Laskos wrote: Probably tomorrow I will do the same for Andscacs, 0.86 version is ok?
Thanks! Better take this one:
www.andscacs.com/andscacs086085.zip

As Andscacs is not very well tuned, I expect something worst than Stockfish

For now Texel (1.06beta). A bit more straight than Stockfish, but still not quite constants.

Laskos · Post by **Laskos** » Fri May 06, 2016 12:20 pm

cdani wrote:
Laskos wrote: Probably tomorrow I will do the same for Andscacs, 0.86 version is ok?
Thanks! Better take this one:
www.andscacs.com/andscacs086085.zip

As Andscacs is not very well tuned, I expect something worst than Stockfish

And here the plot for Andscacs086085 at 1.5''+0.015'', pretty large distortions at high evals:

imagen jpg

Laskos · Post by **Laskos** » Fri May 06, 2016 9:28 pm

Andscacs 086101 with the database you supplied me at 5+0.03:

Anomaly of Stockfish eval depending on phases of the game

Anomaly of Stockfish eval depending on phases of the game

Re: Anomaly of Stockfish eval depending on phases of the gam

Re: Anomaly of Stockfish eval depending on phases of the gam

Re: Anomaly of Stockfish eval depending on phases of the gam

Re: Anomaly of Stockfish eval depending on phases of the gam

Re: Anomaly of Stockfish eval depending on phases of the gam

Re: Anomaly of Stockfish eval depending on phases of the gam

Re: Anomaly of Stockfish eval depending on phases of the gam

Re: Anomaly of Stockfish eval depending on phases of the gam

Re: Anomaly of Stockfish eval depending on phases of the gam