Help on evaluation

Discussion of chess software programming and technical issues.

Moderator: Ras

xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Help on evaluation

Post by xmas79 »

Hello,
my engine development has been stopped for about one year, but now I have some spare time and I'm going to dive into tuning my eval. As always, my preferred opponent is Fairy-Max.

Over the time I added a lot of parameters to the evaluation (bishop pairs, passed pawns, doubled etc...), and after a small refactoring step, I tried to see what impact each parameter has in game play. Result: pratically zero, unless I really go out of the ballpark. The most important thing that drives the quality of the playing style is the PST :shock: :shock: :shock: ! More than one year ago (when I run the first tests) I had a hand tuned PST, and I remember that I got +200 WRT fairy-max. That PST has gone and another is in, with +120 instead (I didn't pay attention to such things at that time when I replaced it).

The thing is: no matter how many eval features I enable/disable (I coded them so I can disable each term individually), it seems to me a "no progression/regression". If I disable only PST then this is a complete disaster.

Are the PST so important? Or is this due to a very untuned evaluation terms?
Henk
Posts: 7251
Joined: Mon May 27, 2013 10:31 am

Re: Help on evaluation

Post by Henk »

At least PST is fast. It's cheap money (elo points). But PST is difficult to update and unclear. After a while you ask yourself why did it get these values.
op12no2
Posts: 551
Joined: Tue Feb 04, 2014 12:25 pm
Location: Gower, Wales
Full name: Colin Jenkins

Re: Help on evaluation

Post by op12no2 »

Hi Natale,

I know from my own experience that FairyMax can be beaten pretty much 100% of the time with just a Material+PST eval.

So maybe attack it from that angle first? Simplify the eval to just Material+PST and tweak the search until it wins pretty much all of the time and then see if adding eval params makes a difference - it *could* be that your search is damping the eval changes potential?
Aleks Peshkov
Posts: 916
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia
Full name: Aleks Peshkov

Re: Help on evaluation

Post by Aleks Peshkov »

I think the primary reason is that FairyMax evaluation is just PST plus FairyMax is a fast searcher. Take Stockfish with time odds as the opponent and your Occam's razor tuning would be counterproductive.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Help on evaluation

Post by matthewlai »

xmas79 wrote:Hello,
my engine development has been stopped for about one year, but now I have some spare time and I'm going to dive into tuning my eval. As always, my preferred opponent is Fairy-Max.

Over the time I added a lot of parameters to the evaluation (bishop pairs, passed pawns, doubled etc...), and after a small refactoring step, I tried to see what impact each parameter has in game play. Result: pratically zero, unless I really go out of the ballpark. The most important thing that drives the quality of the playing style is the PST :shock: :shock: :shock: ! More than one year ago (when I run the first tests) I had a hand tuned PST, and I remember that I got +200 WRT fairy-max. That PST has gone and another is in, with +120 instead (I didn't pay attention to such things at that time when I replaced it).

The thing is: no matter how many eval features I enable/disable (I coded them so I can disable each term individually), it seems to me a "no progression/regression". If I disable only PST then this is a complete disaster.

Are the PST so important? Or is this due to a very untuned evaluation terms?
PST is indeed by far the most important evaluation feature. As an added bonus, it's also almost free.

Bishop pair is pretty significant as well, and also almost free.

After that, mobility is probably the most important.

Then king safety (just give a bonus for pieces close to enemy king).

Unfortunately, this is where the cheap/easy stuff ends. Everything else requires more work for less benefits, and you'll have to start testing extensively.

Pawn features are definitely important, but not quite as important as above (especially since you have pawn PST already). However, with pawn hash tables, they are also practically free. That's why most engines also have elaborate pawn structure eval.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Re: Help on evaluation

Post by xmas79 »

op12no2 wrote:Hi Natale,

I know from my own experience that FairyMax can be beaten pretty much 100% of the time with just a Material+PST eval.

So maybe attack it from that angle first? Simplify the eval to just Material+PST and tweak the search until it wins pretty much all of the time and then see if adding eval params makes a difference - it *could* be that your search is damping the eval changes potential?
Hi Colin,
this is how I started two years ago... I used to beat Fairy-Max with my PST+Material since the beginning with good margins (up to +200Elo over 30k games). And search is OK and didn't changed much from that time. However, it was not a 100% (or very close) win rate, and I wanted to score better against Fairy-Max. So I looked at other eval terms, and kept adding them. But something must be wrong, since adding produce no measurable effect on gameplay. I can "feel" it even by watching live games. It doesn't get better at chess. Must be something else... I'm pretty sure eval terms kicks in, since every eval term was debugged. But now I'm going to start debugging all the eval again...
xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Re: Help on evaluation

Post by xmas79 »

Aleks Peshkov wrote:I think the primary reason is that FairyMax evaluation is just PST plus FairyMax is a fast searcher. Take Stockfish with time odds as the opponent and your Occam's razor tuning would be counterproductive.
Hi Aleks ,
thanks, but I'm still not there... I will do some tuning when I understand what's going on here.

BTW, my search is 2x-5x faster than fairy-max when having only PST. However, IMHO we should not compare NPS search speeds of two different engines. NPS seems to me to be relevant only when referred to the same searched tree.
xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Re: Help on evaluation

Post by xmas79 »

matthewlai wrote:
xmas79 wrote:Hello,
my engine development has been stopped for about one year, but now I have some spare time and I'm going to dive into tuning my eval. As always, my preferred opponent is Fairy-Max.

Over the time I added a lot of parameters to the evaluation (bishop pairs, passed pawns, doubled etc...), and after a small refactoring step, I tried to see what impact each parameter has in game play. Result: pratically zero, unless I really go out of the ballpark. The most important thing that drives the quality of the playing style is the PST :shock: :shock: :shock: ! More than one year ago (when I run the first tests) I had a hand tuned PST, and I remember that I got +200 WRT fairy-max. That PST has gone and another is in, with +120 instead (I didn't pay attention to such things at that time when I replaced it).

The thing is: no matter how many eval features I enable/disable (I coded them so I can disable each term individually), it seems to me a "no progression/regression". If I disable only PST then this is a complete disaster.

Are the PST so important? Or is this due to a very untuned evaluation terms?
PST is indeed by far the most important evaluation feature. As an added bonus, it's also almost free.

Bishop pair is pretty significant as well, and also almost free.

After that, mobility is probably the most important.

Then king safety (just give a bonus for pieces close to enemy king).

Unfortunately, this is where the cheap/easy stuff ends. Everything else requires more work for less benefits, and you'll have to start testing extensively.

Pawn features are definitely important, but not quite as important as above (especially since you have pawn PST already). However, with pawn hash tables, they are also practically free. That's why most engines also have elaborate pawn structure eval.
Hi Mattew,
I have them all, however I "see" that the most important feature I have is the pawn PST. This alone avoids moves such as h4, h5, a4, a5, g4 etc... I simply didn't expected them to be THAT important.

IMO the importance of the features is the following:
1) PST
2) King safety
3) Pawn structure
4) the rest.

I didn't play too much with eval recently, and I don't have big experience. So I'm probably off by some amount. But I'm starting to play with these settings in order to see where my "bottleneck" is.

Thanks.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Help on evaluation

Post by matthewlai »

xmas79 wrote:
matthewlai wrote:
xmas79 wrote:Hello,
my engine development has been stopped for about one year, but now I have some spare time and I'm going to dive into tuning my eval. As always, my preferred opponent is Fairy-Max.

Over the time I added a lot of parameters to the evaluation (bishop pairs, passed pawns, doubled etc...), and after a small refactoring step, I tried to see what impact each parameter has in game play. Result: pratically zero, unless I really go out of the ballpark. The most important thing that drives the quality of the playing style is the PST :shock: :shock: :shock: ! More than one year ago (when I run the first tests) I had a hand tuned PST, and I remember that I got +200 WRT fairy-max. That PST has gone and another is in, with +120 instead (I didn't pay attention to such things at that time when I replaced it).

The thing is: no matter how many eval features I enable/disable (I coded them so I can disable each term individually), it seems to me a "no progression/regression". If I disable only PST then this is a complete disaster.

Are the PST so important? Or is this due to a very untuned evaluation terms?
PST is indeed by far the most important evaluation feature. As an added bonus, it's also almost free.

Bishop pair is pretty significant as well, and also almost free.

After that, mobility is probably the most important.

Then king safety (just give a bonus for pieces close to enemy king).

Unfortunately, this is where the cheap/easy stuff ends. Everything else requires more work for less benefits, and you'll have to start testing extensively.

Pawn features are definitely important, but not quite as important as above (especially since you have pawn PST already). However, with pawn hash tables, they are also practically free. That's why most engines also have elaborate pawn structure eval.
Hi Mattew,
I have them all, however I "see" that the most important feature I have is the pawn PST. This alone avoids moves such as h4, h5, a4, a5, g4 etc... I simply didn't expected them to be THAT important.

IMO the importance of the features is the following:
1) PST
2) King safety
3) Pawn structure
4) the rest.

I didn't play too much with eval recently, and I don't have big experience. So I'm probably off by some amount. But I'm starting to play with these settings in order to see where my "bottleneck" is.

Thanks.
It's probably useful to not think of the PST as just one feature, but as many features combined.

It also gives you a knight and bishop centralization bonus (which indirectly predicts mobility), rook on 7th bonus, king de-centralization bonus in the opening, and centralization bonus in the endgame, etc. All those are very important features.

That's assuming you have phase-dependent PSTs. If you don't, it's probably a good idea.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Help on evaluation

Post by cdani »

xmas79 wrote:Hi Colin,
this is how I started two years ago... I used to beat Fairy-Max with my PST+Material since the beginning with good margins (up to +200Elo over 30k games). And search is OK and didn't changed much from that time. However, it was not a 100% (or very close) win rate, and I wanted to score better against Fairy-Max.
If I understand you well, this is probably the problem. To tune other search parametres is better to tune them against stronger engines, not far from yours. If you are beating Fairy-Max by 70% or more is too much to be a relevant test.