Has anyone done a similar experiment for his EVAL?
I found quite a couple of surprises.
http://www.top-5000.nl/eval.htm
The value of an evaluation function
Moderators: hgm, Dann Corbit, Harvey Williamson
-
Daniel Shawul
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: The value of an evaluation function
I have an engine that uses material and PST (a general centralize all pieces table). It is born out of need since it supports many chess variants and other games. The standard chess engine is rated at about 2400 elo in CCRL. The other "evaluation" term that it probably has is an indirect very very poor king safety in search. I don't remember how exactly but it prefers search lines that avoid weak king safety lines. So I think it is possible to reach even higher elo ratings with material + pst, if your search is highly selective and add "spice" in the search that prefers certain lines.Rebel wrote:Has anyone done a similar experiment for his EVAL?
I found quite a couple of surprises.
http://www.top-5000.nl/eval.htm
About your results:
*I am not surprized by your PST result as that can be a good substitute for mobility in general. Maybe PST + mobility is redundunt in that regard.
*Some have reported pawn structure evaluation is not highly valued for an attacking engine.
*King safety is highly valued in my engine since a long time and I think it brings a significant difference last time I checked.
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: The value of an evaluation function
Did this a few years ago when we started the Crafty rewrite to get rid of black/white code. I removed everything, and rewrote each component part by part, and tested. Numbers were similar although I had more accurate answers since each test was 30K games as per usual...Rebel wrote:Has anyone done a similar experiment for his EVAL?
I found quite a couple of surprises.
http://www.top-5000.nl/eval.htm
I've done the same thing for search features like null-move, LMR, and pruning, not to mention testing each extension (to discover that all but checks either hurt or do not help one bit)...
-
lucasart
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: The value of an evaluation function
Sounds like a good idea. I'll try removing features that might have small added value. And the ones I'm most skeptical about and will begin with are doubles, isolated, backward pawns. It's clear that removing PSQ or passed pawns or king safety will be a disaster (though these features can surely be improved).Rebel wrote:Has anyone done a similar experiment for his EVAL?
I found quite a couple of surprises.
http://www.top-5000.nl/eval.htm
Ideally I'd like to spot useless things and remove them. That would be a good start
-
Dan Honeycutt
- Posts: 5258
- Joined: Mon Feb 27, 2006 4:31 pm
- Location: Atlanta, Georgia
Re: The value of an evaluation function
How do you get 30K games without a lot of duplicates? Do you have a large number of opponents you test against or a bunch of starting positions that you use?bob wrote:Did this a few years ago when we started the Crafty rewrite to get rid of black/white code. I removed everything, and rewrote each component part by part, and tested. Numbers were similar although I had more accurate answers since each test was 30K games as per usual...
Best
Dan H.
-
Piotr Cichy
- Posts: 75
- Joined: Sun Jul 30, 2006 11:13 pm
- Location: Kalisz, Poland
Re: The value of an evaluation function
I made a similiar test with nanoSzachy about 4 years ago (http://talkchess.com/forum/viewtopic.ph ... 61&t=36104) and my results were very close to yours.Rebel wrote:Has anyone done a similar experiment for his EVAL?
I found quite a couple of surprises.
http://www.top-5000.nl/eval.htm
Code: Select all
Mobility +130 ELO
Passed pawns +60 ELO
Positional (rook on (half)open file, knight outpost etc) +60 ELO
Bishop pair +45 ELO
PST +45 ELO
Pawn shield around king +21 ELO
Pawn structure +20 ELO
King safety +3 ELO Code: Select all
Passed pawns +129 ELO
Mobility +47 ELO
King safety +41 ELO
Pawn structure +39 ELO-
CRoberson
- Posts: 2053
- Joined: Mon Mar 13, 2006 2:31 am
- Location: North Carolina, USA
Re: The value of an evaluation function
Yes, I did some of this years ago. My primary experiment was to cut out all but piece count from the eval and test the crippled engine vs the full one. The result was about a 600 Elo gap. The eval made that much of a difference. However, the dumbed down program was able to search 2 to 3 ply deeper on average which says there is more to the value of the eval knowledge.
I sent my results to Bob and he tried the same idea with Crafty. He ran more test games, but the results were nearly the same.
As for the contribution of individual components, that is more difficult than your testing. A component will have a different contribution value based on what else is in the program. Lets say you have only piece count and you add mobility to get version A. In version B, you add mobility, but it had everything but mobility in it. I think the gain of A over its base version is different than in B over its base version.
I sent my results to Bob and he tried the same idea with Crafty. He ran more test games, but the results were nearly the same.
As for the contribution of individual components, that is more difficult than your testing. A component will have a different contribution value based on what else is in the program. Lets say you have only piece count and you add mobility to get version A. In version B, you add mobility, but it had everything but mobility in it. I think the gain of A over its base version is different than in B over its base version.
-
jdart
- Posts: 4361
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: The value of an evaluation function
Do you have scoring terms for particular kinds of material imbalance such as Rook exchanged for a minor? I didn't see that in the list.
I haven't modified my eval function significantly in some time and it is probably time to look at it again. I did try some king safety changes fairly recently but I didn't find any that were worthwhile (and some made things worse), which is not too surprising since this part has gone through a gradual tuning process over several versions.
--Jon
I haven't modified my eval function significantly in some time and it is probably time to look at it again. I did try some king safety changes fairly recently but I didn't find any that were worthwhile (and some made things worse), which is not too surprising since this part has gone through a gradual tuning process over several versions.
--Jon
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: The value of an evaluation function
I have 5 good opponents I play against, and I have 3000 different starting positions that I use to play a pair of games with alternating colors. 5 x 3000 x 2 = 30,000. I actually have way more than 3000 positions for those cases where I am looking to measure 1-2 Elo, which needs in the range of 100K games...Dan Honeycutt wrote:How do you get 30K games without a lot of duplicates? Do you have a large number of opponents you test against or a bunch of starting positions that you use?bob wrote:Did this a few years ago when we started the Crafty rewrite to get rid of black/white code. I removed everything, and rewrote each component part by part, and tested. Numbers were similar although I had more accurate answers since each test was 30K games as per usual...
Best
Dan H.
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: The value of an evaluation function
Also a lot of common terms have distinct correlation. For example, a rook on an open file. What better way to increase mobility. "A knight on the rim is dim" is the same. There are lots of terms that can almost replace each other, and definitely interact with each other...CRoberson wrote:Yes, I did some of this years ago. My primary experiment was to cut out all but piece count from the eval and test the crippled engine vs the full one. The result was about a 600 Elo gap. The eval made that much of a difference. However, the dumbed down program was able to search 2 to 3 ply deeper on average which says there is more to the value of the eval knowledge.
I sent my results to Bob and he tried the same idea with Crafty. He ran more test games, but the results were nearly the same.
As for the contribution of individual components, that is more difficult than your testing. A component will have a different contribution value based on what else is in the program. Lets say you have only piece count and you add mobility to get version A. In version B, you add mobility, but it had everything but mobility in it. I think the gain of A over its base version is different than in B over its base version.