More accurate evaluation function leads to worse play?

Discussion of chess software programming and technical issues.

Moderator: Ras

xylist
Posts: 2
Joined: Fri Feb 07, 2025 6:06 am
Full name: Zhongle C. Qu

More accurate evaluation function leads to worse play?

Post by xylist »

Hi everyone. I've been trying to improve the evaluation function of my engine recently to make it more accurate. To measure the accuracy, I selected 50k quiet positions (side to move not in check, best move not capture or check), used both my engine and stockfish to analyze them with low depth, and compared them with the static evaluation. I computed R^2 scores (coefficient of determination) and there is indeed an improvement (from 0.38 to 0.52). However, when I test the engine, the playing strength actually dropped (-180 elo). This feels so counterintuitive. Shouldn't a more accurate evaluation function result in a gain in playing strength?
benvining
Posts: 31
Joined: Fri May 30, 2025 10:18 pm
Full name: Ben Vining

Re: More accurate evaluation function leads to worse play?

Post by benvining »

Maybe not if the upgraded eval function takes way more time than the simpler one?
xylist
Posts: 2
Joined: Fri Feb 07, 2025 6:06 am
Full name: Zhongle C. Qu

Re: More accurate evaluation function leads to worse play?

Post by xylist »

benvining wrote: Thu Jul 17, 2025 5:30 am Maybe not if the upgraded eval function takes way more time than the simpler one?
The upgraded one is around 1.3x slower, so that shouldn't be a huge problem. However I noticed that the branching factor has increased a little, and the engine is now looking at a lot more nodes. But I have no idea why this is happening.
User avatar
Bo Persson
Posts: 259
Joined: Sat Mar 11, 2006 8:31 am
Location: Malmö, Sweden
Full name: Bo Persson

Re: More accurate evaluation function leads to worse play?

Post by Bo Persson »

When you have a more "accurate" eveluation, you might also get more different scores. If you have scores 10, 10, 10 you can get cut-offs from "no improvement", but scores 11, 10, 12 might requires more search to tell them apart.

It is common to have to balance speed and "accuracy" in the program, and realize that some evaluation terms might just be to expensive to compute. Getting the correct answer too late doesn't help.
User avatar
Tibono
Posts: 128
Joined: Sat Aug 01, 2015 6:16 pm
Location: France
Full name: Eric Bonneau

Re: More accurate evaluation function leads to worse play?

Post by Tibono »

I think a better evaluation is one that makes your engine more "comfortable" (i.e. efficient) with.
Getting closer to Stockfish's eval drifted it away from positions it manages best.
Just my 2 cents...