The value of an evaluation function

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

bpfliegel
Posts: 71
Joined: Fri Mar 16, 2012 10:16 am

Re: The value of an evaluation function

Post by bpfliegel »

bob wrote:
CRoberson wrote:Yes, I did some of this years ago. My primary experiment was to cut out all but piece count from the eval and test the crippled engine vs the full one. The result was about a 600 Elo gap. The eval made that much of a difference. However, the dumbed down program was able to search 2 to 3 ply deeper on average which says there is more to the value of the eval knowledge.

I sent my results to Bob and he tried the same idea with Crafty. He ran more test games, but the results were nearly the same.

As for the contribution of individual components, that is more difficult than your testing. A component will have a different contribution value based on what else is in the program. Lets say you have only piece count and you add mobility to get version A. In version B, you add mobility, but it had everything but mobility in it. I think the gain of A over its base version is different than in B over its base version.
Also a lot of common terms have distinct correlation. For example, a rook on an open file. What better way to increase mobility. "A knight on the rim is dim" is the same. There are lots of terms that can almost replace each other, and definitely interact with each other...
In digital signal processing extracted features should be as orthogonal (really loved that word to see in Tord's article) as possible... Applies in chess programming as well I think.

Balint
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The value of an evaluation function

Post by bob »

bpfliegel wrote:
bob wrote:
CRoberson wrote:Yes, I did some of this years ago. My primary experiment was to cut out all but piece count from the eval and test the crippled engine vs the full one. The result was about a 600 Elo gap. The eval made that much of a difference. However, the dumbed down program was able to search 2 to 3 ply deeper on average which says there is more to the value of the eval knowledge.

I sent my results to Bob and he tried the same idea with Crafty. He ran more test games, but the results were nearly the same.

As for the contribution of individual components, that is more difficult than your testing. A component will have a different contribution value based on what else is in the program. Lets say you have only piece count and you add mobility to get version A. In version B, you add mobility, but it had everything but mobility in it. I think the gain of A over its base version is different than in B over its base version.
Also a lot of common terms have distinct correlation. For example, a rook on an open file. What better way to increase mobility. "A knight on the rim is dim" is the same. There are lots of terms that can almost replace each other, and definitely interact with each other...
In digital signal processing extracted features should be as orthogonal (really loved that word to see in Tord's article) as possible... Applies in chess programming as well I think.

Balint
I agree. It is just not easy when using common chess knowledge which is redundant in many cases, such as the well-known ones I mentioned...
bpfliegel
Posts: 71
Joined: Fri Mar 16, 2012 10:16 am

Re: The value of an evaluation function

Post by bpfliegel »

bob wrote:
bpfliegel wrote:
bob wrote:
CRoberson wrote:Yes, I did some of this years ago. My primary experiment was to cut out all but piece count from the eval and test the crippled engine vs the full one. The result was about a 600 Elo gap. The eval made that much of a difference. However, the dumbed down program was able to search 2 to 3 ply deeper on average which says there is more to the value of the eval knowledge.

I sent my results to Bob and he tried the same idea with Crafty. He ran more test games, but the results were nearly the same.

As for the contribution of individual components, that is more difficult than your testing. A component will have a different contribution value based on what else is in the program. Lets say you have only piece count and you add mobility to get version A. In version B, you add mobility, but it had everything but mobility in it. I think the gain of A over its base version is different than in B over its base version.
Also a lot of common terms have distinct correlation. For example, a rook on an open file. What better way to increase mobility. "A knight on the rim is dim" is the same. There are lots of terms that can almost replace each other, and definitely interact with each other...
In digital signal processing extracted features should be as orthogonal (really loved that word to see in Tord's article) as possible... Applies in chess programming as well I think.

Balint
I agree. It is just not easy when using common chess knowledge which is redundant in many cases, such as the well-known ones I mentioned...
Yes, in most cases the very nature of the features is non-independent - this is especially more true in the case of chess :)
F. Bluemers
Posts: 868
Joined: Thu Mar 09, 2006 11:21 pm
Location: Nederland

Re: The value of an evaluation function

Post by F. Bluemers »

Rebel wrote:Has anyone done a similar experiment for his EVAL?

I found quite a couple of surprises.

http://www.top-5000.nl/eval.htm
c) The incredible low impact of Pawn Eval (double pawns, isolated pawns, backward pawns,
pawn pressure, pawn formation) of only 2%, see match 2.4 contrary to passed pawn EVAL
(13.6%) see match 2.5
It might be interesting to repeat this one with fixed depth to see if and how much speed hurts your pawn eval.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: The value of an evaluation function

Post by diep »

bob wrote:
Rebel wrote:Has anyone done a similar experiment for his EVAL?

I found quite a couple of surprises.

http://www.top-5000.nl/eval.htm
Did this a few years ago when we started the Crafty rewrite to get rid of black/white code. I removed everything, and rewrote each component part by part, and tested. Numbers were similar although I had more accurate answers since each test was 30K games as per usual...

I've done the same thing for search features like null-move, LMR, and pruning, not to mention testing each extension (to discover that all but checks either hurt or do not help one bit)...
If numbers were similar then you have also some major bugs in your program :)

341 5 1 1.0%

how the hell can you have 5 draws and 1 loss?

some years ago when i started some parameter tuning experiment i also did do thesame thing, though not returning 1,3,3,5,9 i returned higher values

1,4,4,6,12

that's a hard 0% score then. Not 1 win and 5 draws.

Also i did do 1000 games at 1 0 level ...
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: The value of an evaluation function

Post by diep »

CRoberson wrote:Yes, I did some of this years ago. My primary experiment was to cut out all but piece count from the eval and test the crippled engine vs the full one. The result was about a 600 Elo gap. The eval made that much of a difference. However, the dumbed down program was able to search 2 to 3 ply deeper on average which says there is more to the value of the eval knowledge.

I sent my results to Bob and he tried the same idea with Crafty. He ran more test games, but the results were nearly the same.

As for the contribution of individual components, that is more difficult than your testing. A component will have a different contribution value based on what else is in the program. Lets say you have only piece count and you add mobility to get version A. In version B, you add mobility, but it had everything but mobility in it. I think the gain of A over its base version is different than in B over its base version.
Something just counting material in the simplistic manner of 1,3,3,5,9 and searching a ply or 30 is not 600 elo worse.

It's a hard 0%.

More than 750 elo points difference. I don't know exact elo of it to be honest, as simple material + PSQ already is very strong (and searching 10+ plies less). Would guess is around elo 1500 or so.

Normal version Diep is what is it 3200 elo or so?
So that's a 1700 elo point gap or so.

Just calculate the statistical score something 1700 elo points worse is gonna get at 350 games as Ed played.

Would be surprised honestely if in standard a human 1500 elo is gonna lose a match from such engine with 1,3,,3,5,9.

They learn quickly you know.
But we speak about a gap so huge here.

Now please note that Diep, as opposed to Rebel, is a very agressive engine, i would say too agressive.

Knowledge causes that overoptimism.

Yet fact is that theoretical you should see 0% score not a few draws and a loss.
User avatar
Rebel
Posts: 6946
Joined: Thu Aug 18, 2011 12:04 pm

Re: The value of an evaluation function

Post by Rebel »

F. Bluemers wrote:
Rebel wrote:Has anyone done a similar experiment for his EVAL?

I found quite a couple of surprises.

http://www.top-5000.nl/eval.htm
c) The incredible low impact of Pawn Eval (double pawns, isolated pawns, backward pawns,
pawn pressure, pawn formation) of only 2%, see match 2.4 contrary to passed pawn EVAL
(13.6%) see match 2.5
It might be interesting to repeat this one with fixed depth to see if and how much speed hurts your pawn eval.
Was exactly the first thing I thought about. But it's about 5% and thus a very good investment for 14-15 elo after all. I am more inclined to conclude that either:

1. 15 elo for pawn eval (except passers) is normal;
2. a bug plays a role;
3. or a badly tuned part, or parts.