What you want is not just a quantity of terms, but you need to cover any chess principles that are not already covered.sedicla wrote:My engine has an elo of about 2500. I was wondering if I have to focus more on tuning current evaluation terms or add new ideas.
I am going to start a new development cycle, and I'm not sure how much I can improve by tuning current items, and when I should stop this. I have a feeling that I can improve, but maybe I will waste time and not improve that much. On the other hand if I introduce new items, will add more variables to the process. Anyway, I appreciate if anyone can comment on that...
I consider too much knowledge, i.e. the "Diep" approach, a different kind of brute force - not smart. More is not "better" but quality counts more than quantity. A good chess evaluation function does require significant quantity though. When possible try to put your knowledge in a form where more can be added without requiring a big performance degrade. For example pawn structure is almost free and any new single feature is virtually free due to the use of pawn structure hash tables. You can do a lot with material signature hash tables.
Piece square tables are limited because they are not dynamic but you can add knowledge free via those but better to address more than the most general things in a more dynamic way.
You really must cover all important chess principles to have a really strong evaluation function. Some of them encompass a lot (such as king safety) and is a black art, they can be improved forever and you never will feel that you have them right.
The tuning is critical. The conventional wisdom used to be that the weights did not matter than much as long as you were in the general ballpark - but we have found that is just not true. That wisdom came from an era when massive automated testing did not exist or was much more limited for the few that had it. We have gotten tons of ELO improvement over the years from tuning weights. It's at a point now that we cannot change ANY value without noticing a small ELO loss. There is the issue about whether you have found some local optima or not - I won't address that here but try to get the really big terms right. By "big" I do not necessarily mean the heaviest weights, but the terms that define the skeleton of your evaluation function, the weights of the pieces, how they change in different phases of the games and how they interact with other pieces (such as bishop pair and other things) as well as the basic pawn structure terms and mobility. Really get those right and in balance before tuning the secondary terms. By "big" I mean terms that affect every single game - the ubiquitous terms you might say.
Clop is a pretty good tool for getting things in the right ballpark. Clop is no good if you don't have the patience to run a LOT of games - it is subject to the same rules of statistics as playing matches, you need tens of thousands of games to converge on reasonable values. We don't make very heavy use of Clop as we are good at manual tuning but we have found it useful. When we add a new evaluation feature it is a good tool to find good starting values if you don't trust your own guess. We sometimes start from that and then tune manually from there, primarily with massive testing of various weights.