Anomaly of Stockfish eval depending on phases of the game

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Anomaly of Stockfish eval depending on phases of the game

Post by Laskos »

I have a database of 40,000 ultra-bullet games (1.5''+0.015'') of Stockfish 7. With a PGN tool provided by Ferdinand Mosca, I extracted the Expected Performance (%) function of Move Number and Eval. Move Number from engine's point of view is material count. The plots of Expected Performance function of move number for different values of shown Eval is here (Stockfish 7):

Image

The lines would better be constants for every phase of the game (material). That's because an engine has only Eval as arbiter, and having varying Expected Performance for the same Eval will degrade the play. From this plot, for Eval of 1.7, as an example, an engine can go from 88% expected performance to 78% expected performance only changing total material, and thinking that it's not worse off as eval goes (same 170cp). In fact it's more than 100 ELO points loss. The distortion appears for all values of Eval on most of the Move Numbers.

The next to see would be Texel, which has its Eval adjusted to logistic and expected performance, theoretically independently (practically not) of the phases of the game.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Anomaly of Stockfish eval depending on phases of the gam

Post by cdani »

Nice graph! Thanks.

But is something to be expected, as each engine has stronger or weaker points in every phase of the game. Isn't it? Of course is not desirable.

Also are you proposing for example that if Stockfish adjusts the resulting eval value at the end of the eval function to follow an straight line, it will win some strength?

Do you have the scripts? Will be nice to do the same with Andscacs. Also I can show the results if anyone is interested.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Anomaly of Stockfish eval depending on phases of the gam

Post by Zenmastur »

Laskos wrote:I have a database of 40,000 ultra-bullet games (1.5''+0.015'') of Stockfish 7. With a PGN tool provided by Ferdinand Mosca, I extracted the Expected Performance (%) function of Move Number and Eval. Move Number from engine's point of view is material count. The plots of Expected Performance function of move number for different values of shown Eval is here (Stockfish 7):

Image

The lines would better be constants for every phase of the game (material). That's because an engine has only Eval as arbiter, and having varying Expected Performance for the same Eval will degrade the play. From this plot, for Eval of 1.7, as an example, an engine can go from 88% expected performance to 78% expected performance only changing total material, and thinking that it's not worse off as eval goes (same 170cp). In fact it's more than 100 ELO points loss. The distortion appears for all values of Eval on most of the Move Numbers.

The next to see would be Texel, which has its Eval adjusted to logistic and expected performance, theoretically independently (practically not) of the phases of the game.
I don't think move number is the best way to measure this effect. I would think that total material left on the board would be a better metric. These graphs with and with out passers might be instructive as well. In fact I would think breaking them up by general pawn formations may tell an interesting story.

Regards,

Forrest
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Anomaly of Stockfish eval depending on phases of the gam

Post by Laskos »

cdani wrote:Nice graph! Thanks.

But is something to be expected, as each engine has stronger or weaker points in every phase of the game. Isn't it? Of course is not desirable.

Also are you proposing for example that if Stockfish adjusts the resulting eval value at the end of the eval function to follow an straight line, it will win some strength?
Yes, chess is a game of skill, playing randomly is not a good idea. The goal, until some tables solve the chess, is as the game progresses, to improve expected performance, not the wrong eval. Improving on eval instead of expected performance will be much more likely weaker than improving on expected performance. Probably tomorrow I will do the same for Andscacs, 0.86 version is ok?
Do you have the scripts? Will be nice to do the same with Andscacs. Also I can show the results if anyone is interested.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Anomaly of Stockfish eval depending on phases of the gam

Post by Laskos »

Zenmastur wrote:
I don't think move number is the best way to measure this effect. I would think that total material left on the board would be a better metric. These graphs with and with out passers might be instructive as well. In fact I would think breaking them up by general pawn formations may tell an interesting story.

Regards,

Forrest
Theoretically it's doable, but I spent 1-2 hours on this with move number only for all these datapoints on a database of 40,000 games. With computing material, it would take an order or two of magnitude longer. Maybe it's better to have a map between move number and average material computed on a much smaller database (statistically it is sound, as dispersion of material for a fixed move is not that high).
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Anomaly of Stockfish eval depending on phases of the gam

Post by cdani »

Laskos wrote: Probably tomorrow I will do the same for Andscacs, 0.86 version is ok?
Thanks! Better take this one:
www.andscacs.com/andscacs086085.zip

As Andscacs is not very well tuned, I expect something worst than Stockfish :-)
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Anomaly of Stockfish eval depending on phases of the gam

Post by cdani »

cdani wrote:
Laskos wrote: Probably tomorrow I will do the same for Andscacs, 0.86 version is ok?
Thanks! Better take this one:
www.andscacs.com/andscacs086085.zip

As Andscacs is not very well tuned, I expect something worst than Stockfish :-)
If you want I can generate the games for you. Just tell me.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Anomaly of Stockfish eval depending on phases of the gam

Post by Laskos »

cdani wrote:
Laskos wrote: Probably tomorrow I will do the same for Andscacs, 0.86 version is ok?
Thanks! Better take this one:
www.andscacs.com/andscacs086085.zip

As Andscacs is not very well tuned, I expect something worst than Stockfish :-)
For now Texel (1.06beta). A bit more straight than Stockfish, but still not quite constants.
Image
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Anomaly of Stockfish eval depending on phases of the gam

Post by Laskos »

cdani wrote:
Laskos wrote: Probably tomorrow I will do the same for Andscacs, 0.86 version is ok?
Thanks! Better take this one:
www.andscacs.com/andscacs086085.zip

As Andscacs is not very well tuned, I expect something worst than Stockfish :-)
And here the plot for Andscacs086085 at 1.5''+0.015'', pretty large distortions at high evals:
Image
imagen jpg
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Anomaly of Stockfish eval depending on phases of the gam

Post by Laskos »

Andscacs 086101 with the database you supplied me at 5+0.03:
Image