The value of an evaluation function

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

User avatar
Rebel
Posts: 6946
Joined: Thu Aug 18, 2011 12:04 pm

The value of an evaluation function

Post by Rebel »

Has anyone done a similar experiment for his EVAL?

I found quite a couple of surprises.

http://www.top-5000.nl/eval.htm
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: The value of an evaluation function

Post by Daniel Shawul »

Rebel wrote:Has anyone done a similar experiment for his EVAL?

I found quite a couple of surprises.

http://www.top-5000.nl/eval.htm
I have an engine that uses material and PST (a general centralize all pieces table). It is born out of need since it supports many chess variants and other games. The standard chess engine is rated at about 2400 elo in CCRL. The other "evaluation" term that it probably has is an indirect very very poor king safety in search. I don't remember how exactly but it prefers search lines that avoid weak king safety lines. So I think it is possible to reach even higher elo ratings with material + pst, if your search is highly selective and add "spice" in the search that prefers certain lines.

About your results:

*I am not surprized by your PST result as that can be a good substitute for mobility in general. Maybe PST + mobility is redundunt in that regard.

*Some have reported pawn structure evaluation is not highly valued for an attacking engine.

*King safety is highly valued in my engine since a long time and I think it brings a significant difference last time I checked.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The value of an evaluation function

Post by bob »

Rebel wrote:Has anyone done a similar experiment for his EVAL?

I found quite a couple of surprises.

http://www.top-5000.nl/eval.htm
Did this a few years ago when we started the Crafty rewrite to get rid of black/white code. I removed everything, and rewrote each component part by part, and tested. Numbers were similar although I had more accurate answers since each test was 30K games as per usual...

I've done the same thing for search features like null-move, LMR, and pruning, not to mention testing each extension (to discover that all but checks either hurt or do not help one bit)...
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: The value of an evaluation function

Post by lucasart »

Rebel wrote:Has anyone done a similar experiment for his EVAL?

I found quite a couple of surprises.

http://www.top-5000.nl/eval.htm
Sounds like a good idea. I'll try removing features that might have small added value. And the ones I'm most skeptical about and will begin with are doubles, isolated, backward pawns. It's clear that removing PSQ or passed pawns or king safety will be a disaster (though these features can surely be improved).
Ideally I'd like to spot useless things and remove them. That would be a good start :)
User avatar
Dan Honeycutt
Posts: 5258
Joined: Mon Feb 27, 2006 4:31 pm
Location: Atlanta, Georgia

Re: The value of an evaluation function

Post by Dan Honeycutt »

bob wrote:Did this a few years ago when we started the Crafty rewrite to get rid of black/white code. I removed everything, and rewrote each component part by part, and tested. Numbers were similar although I had more accurate answers since each test was 30K games as per usual...
How do you get 30K games without a lot of duplicates? Do you have a large number of opponents you test against or a bunch of starting positions that you use?

Best
Dan H.
Piotr Cichy
Posts: 75
Joined: Sun Jul 30, 2006 11:13 pm
Location: Kalisz, Poland

Re: The value of an evaluation function

Post by Piotr Cichy »

Rebel wrote:Has anyone done a similar experiment for his EVAL?

I found quite a couple of surprises.

http://www.top-5000.nl/eval.htm
I made a similiar test with nanoSzachy about 4 years ago (http://talkchess.com/forum/viewtopic.ph ... 61&t=36104) and my results were very close to yours.

Code: Select all

Mobility                                                 +130 ELO
Passed pawns                                              +60 ELO
Positional (rook on (half)open file, knight outpost etc)  +60 ELO
Bishop pair                                               +45 ELO
PST                                                       +45 ELO
Pawn shield around king                                   +21 ELO
Pawn structure                                            +20 ELO
King safety                                                +3 ELO 
I made the test also for last version of my engine (this time each test with 20000 games), here are some results:

Code: Select all

Passed pawns                                             +129 ELO
Mobility                                                  +47 ELO
King safety                                               +41 ELO
Pawn structure                                            +39 ELO
I managed to improve passed pawns and king safety evaluation. Interesting thing is that this time I got worse results for mobility and pawn structure, although their evaluation code has not changed. Maybe each eval component is very strongly dependent on each other.
CRoberson
Posts: 2053
Joined: Mon Mar 13, 2006 2:31 am
Location: North Carolina, USA

Re: The value of an evaluation function

Post by CRoberson »

Yes, I did some of this years ago. My primary experiment was to cut out all but piece count from the eval and test the crippled engine vs the full one. The result was about a 600 Elo gap. The eval made that much of a difference. However, the dumbed down program was able to search 2 to 3 ply deeper on average which says there is more to the value of the eval knowledge.

I sent my results to Bob and he tried the same idea with Crafty. He ran more test games, but the results were nearly the same.

As for the contribution of individual components, that is more difficult than your testing. A component will have a different contribution value based on what else is in the program. Lets say you have only piece count and you add mobility to get version A. In version B, you add mobility, but it had everything but mobility in it. I think the gain of A over its base version is different than in B over its base version.
jdart
Posts: 4361
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: The value of an evaluation function

Post by jdart »

Do you have scoring terms for particular kinds of material imbalance such as Rook exchanged for a minor? I didn't see that in the list.

I haven't modified my eval function significantly in some time and it is probably time to look at it again. I did try some king safety changes fairly recently but I didn't find any that were worthwhile (and some made things worse), which is not too surprising since this part has gone through a gradual tuning process over several versions.

--Jon
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The value of an evaluation function

Post by bob »

Dan Honeycutt wrote:
bob wrote:Did this a few years ago when we started the Crafty rewrite to get rid of black/white code. I removed everything, and rewrote each component part by part, and tested. Numbers were similar although I had more accurate answers since each test was 30K games as per usual...
How do you get 30K games without a lot of duplicates? Do you have a large number of opponents you test against or a bunch of starting positions that you use?

Best
Dan H.
I have 5 good opponents I play against, and I have 3000 different starting positions that I use to play a pair of games with alternating colors. 5 x 3000 x 2 = 30,000. I actually have way more than 3000 positions for those cases where I am looking to measure 1-2 Elo, which needs in the range of 100K games...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The value of an evaluation function

Post by bob »

CRoberson wrote:Yes, I did some of this years ago. My primary experiment was to cut out all but piece count from the eval and test the crippled engine vs the full one. The result was about a 600 Elo gap. The eval made that much of a difference. However, the dumbed down program was able to search 2 to 3 ply deeper on average which says there is more to the value of the eval knowledge.

I sent my results to Bob and he tried the same idea with Crafty. He ran more test games, but the results were nearly the same.

As for the contribution of individual components, that is more difficult than your testing. A component will have a different contribution value based on what else is in the program. Lets say you have only piece count and you add mobility to get version A. In version B, you add mobility, but it had everything but mobility in it. I think the gain of A over its base version is different than in B over its base version.
Also a lot of common terms have distinct correlation. For example, a rook on an open file. What better way to increase mobility. "A knight on the rim is dim" is the same. There are lots of terms that can almost replace each other, and definitely interact with each other...