Hello,
my engine development has been stopped for about one year, but now I have some spare time and I'm going to dive into tuning my eval. As always, my preferred opponent is Fairy-Max.
Over the time I added a lot of parameters to the evaluation (bishop pairs, passed pawns, doubled etc...), and after a small refactoring step, I tried to see what impact each parameter has in game play. Result: pratically zero, unless I really go out of the ballpark. The most important thing that drives the quality of the playing style is the PST ! More than one year ago (when I run the first tests) I had a hand tuned PST, and I remember that I got +200 WRT fairy-max. That PST has gone and another is in, with +120 instead (I didn't pay attention to such things at that time when I replaced it).
The thing is: no matter how many eval features I enable/disable (I coded them so I can disable each term individually), it seems to me a "no progression/regression". If I disable only PST then this is a complete disaster.
Are the PST so important? Or is this due to a very untuned evaluation terms?
At least PST is fast. It's cheap money (elo points). But PST is difficult to update and unclear. After a while you ask yourself why did it get these values.
I know from my own experience that FairyMax can be beaten pretty much 100% of the time with just a Material+PST eval.
So maybe attack it from that angle first? Simplify the eval to just Material+PST and tweak the search until it wins pretty much all of the time and then see if adding eval params makes a difference - it *could* be that your search is damping the eval changes potential?
I think the primary reason is that FairyMax evaluation is just PST plus FairyMax is a fast searcher. Take Stockfish with time odds as the opponent and your Occam's razor tuning would be counterproductive.
xmas79 wrote:Hello,
my engine development has been stopped for about one year, but now I have some spare time and I'm going to dive into tuning my eval. As always, my preferred opponent is Fairy-Max.
Over the time I added a lot of parameters to the evaluation (bishop pairs, passed pawns, doubled etc...), and after a small refactoring step, I tried to see what impact each parameter has in game play. Result: pratically zero, unless I really go out of the ballpark. The most important thing that drives the quality of the playing style is the PST ! More than one year ago (when I run the first tests) I had a hand tuned PST, and I remember that I got +200 WRT fairy-max. That PST has gone and another is in, with +120 instead (I didn't pay attention to such things at that time when I replaced it).
The thing is: no matter how many eval features I enable/disable (I coded them so I can disable each term individually), it seems to me a "no progression/regression". If I disable only PST then this is a complete disaster.
Are the PST so important? Or is this due to a very untuned evaluation terms?
PST is indeed by far the most important evaluation feature. As an added bonus, it's also almost free.
Bishop pair is pretty significant as well, and also almost free.
After that, mobility is probably the most important.
Then king safety (just give a bonus for pieces close to enemy king).
Unfortunately, this is where the cheap/easy stuff ends. Everything else requires more work for less benefits, and you'll have to start testing extensively.
Pawn features are definitely important, but not quite as important as above (especially since you have pawn PST already). However, with pawn hash tables, they are also practically free. That's why most engines also have elaborate pawn structure eval.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
I know from my own experience that FairyMax can be beaten pretty much 100% of the time with just a Material+PST eval.
So maybe attack it from that angle first? Simplify the eval to just Material+PST and tweak the search until it wins pretty much all of the time and then see if adding eval params makes a difference - it *could* be that your search is damping the eval changes potential?
Hi Colin,
this is how I started two years ago... I used to beat Fairy-Max with my PST+Material since the beginning with good margins (up to +200Elo over 30k games). And search is OK and didn't changed much from that time. However, it was not a 100% (or very close) win rate, and I wanted to score better against Fairy-Max. So I looked at other eval terms, and kept adding them. But something must be wrong, since adding produce no measurable effect on gameplay. I can "feel" it even by watching live games. It doesn't get better at chess. Must be something else... I'm pretty sure eval terms kicks in, since every eval term was debugged. But now I'm going to start debugging all the eval again...
Aleks Peshkov wrote:I think the primary reason is that FairyMax evaluation is just PST plus FairyMax is a fast searcher. Take Stockfish with time odds as the opponent and your Occam's razor tuning would be counterproductive.
Hi Aleks ,
thanks, but I'm still not there... I will do some tuning when I understand what's going on here.
BTW, my search is 2x-5x faster than fairy-max when having only PST. However, IMHO we should not compare NPS search speeds of two different engines. NPS seems to me to be relevant only when referred to the same searched tree.
xmas79 wrote:Hello,
my engine development has been stopped for about one year, but now I have some spare time and I'm going to dive into tuning my eval. As always, my preferred opponent is Fairy-Max.
Over the time I added a lot of parameters to the evaluation (bishop pairs, passed pawns, doubled etc...), and after a small refactoring step, I tried to see what impact each parameter has in game play. Result: pratically zero, unless I really go out of the ballpark. The most important thing that drives the quality of the playing style is the PST ! More than one year ago (when I run the first tests) I had a hand tuned PST, and I remember that I got +200 WRT fairy-max. That PST has gone and another is in, with +120 instead (I didn't pay attention to such things at that time when I replaced it).
The thing is: no matter how many eval features I enable/disable (I coded them so I can disable each term individually), it seems to me a "no progression/regression". If I disable only PST then this is a complete disaster.
Are the PST so important? Or is this due to a very untuned evaluation terms?
PST is indeed by far the most important evaluation feature. As an added bonus, it's also almost free.
Bishop pair is pretty significant as well, and also almost free.
After that, mobility is probably the most important.
Then king safety (just give a bonus for pieces close to enemy king).
Unfortunately, this is where the cheap/easy stuff ends. Everything else requires more work for less benefits, and you'll have to start testing extensively.
Pawn features are definitely important, but not quite as important as above (especially since you have pawn PST already). However, with pawn hash tables, they are also practically free. That's why most engines also have elaborate pawn structure eval.
Hi Mattew,
I have them all, however I "see" that the most important feature I have is the pawn PST. This alone avoids moves such as h4, h5, a4, a5, g4 etc... I simply didn't expected them to be THAT important.
IMO the importance of the features is the following:
1) PST
2) King safety
3) Pawn structure
4) the rest.
I didn't play too much with eval recently, and I don't have big experience. So I'm probably off by some amount. But I'm starting to play with these settings in order to see where my "bottleneck" is.
xmas79 wrote:Hello,
my engine development has been stopped for about one year, but now I have some spare time and I'm going to dive into tuning my eval. As always, my preferred opponent is Fairy-Max.
Over the time I added a lot of parameters to the evaluation (bishop pairs, passed pawns, doubled etc...), and after a small refactoring step, I tried to see what impact each parameter has in game play. Result: pratically zero, unless I really go out of the ballpark. The most important thing that drives the quality of the playing style is the PST ! More than one year ago (when I run the first tests) I had a hand tuned PST, and I remember that I got +200 WRT fairy-max. That PST has gone and another is in, with +120 instead (I didn't pay attention to such things at that time when I replaced it).
The thing is: no matter how many eval features I enable/disable (I coded them so I can disable each term individually), it seems to me a "no progression/regression". If I disable only PST then this is a complete disaster.
Are the PST so important? Or is this due to a very untuned evaluation terms?
PST is indeed by far the most important evaluation feature. As an added bonus, it's also almost free.
Bishop pair is pretty significant as well, and also almost free.
After that, mobility is probably the most important.
Then king safety (just give a bonus for pieces close to enemy king).
Unfortunately, this is where the cheap/easy stuff ends. Everything else requires more work for less benefits, and you'll have to start testing extensively.
Pawn features are definitely important, but not quite as important as above (especially since you have pawn PST already). However, with pawn hash tables, they are also practically free. That's why most engines also have elaborate pawn structure eval.
Hi Mattew,
I have them all, however I "see" that the most important feature I have is the pawn PST. This alone avoids moves such as h4, h5, a4, a5, g4 etc... I simply didn't expected them to be THAT important.
IMO the importance of the features is the following:
1) PST
2) King safety
3) Pawn structure
4) the rest.
I didn't play too much with eval recently, and I don't have big experience. So I'm probably off by some amount. But I'm starting to play with these settings in order to see where my "bottleneck" is.
Thanks.
It's probably useful to not think of the PST as just one feature, but as many features combined.
It also gives you a knight and bishop centralization bonus (which indirectly predicts mobility), rook on 7th bonus, king de-centralization bonus in the opening, and centralization bonus in the endgame, etc. All those are very important features.
That's assuming you have phase-dependent PSTs. If you don't, it's probably a good idea.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
xmas79 wrote:Hi Colin,
this is how I started two years ago... I used to beat Fairy-Max with my PST+Material since the beginning with good margins (up to +200Elo over 30k games). And search is OK and didn't changed much from that time. However, it was not a 100% (or very close) win rate, and I wanted to score better against Fairy-Max.
If I understand you well, this is probably the problem. To tune other search parametres is better to tune them against stronger engines, not far from yours. If you are beating Fairy-Max by 70% or more is too much to be a relevant test.