Help on evaluation

Henk · Post by **Henk** » Wed Aug 19, 2015 3:49 pm

matthewlai wrote:
xmas79 wrote:
matthewlai wrote:
xmas79 wrote:Hello,
my engine development has been stopped for about one year, but now I have some spare time and I'm going to dive into tuning my eval. As always, my preferred opponent is Fairy-Max.

Over the time I added a lot of parameters to the evaluation (bishop pairs, passed pawns, doubled etc...), and after a small refactoring step, I tried to see what impact each parameter has in game play. Result: pratically zero, unless I really go out of the ballpark. The most important thing that drives the quality of the playing style is the PST ! More than one year ago (when I run the first tests) I had a hand tuned PST, and I remember that I got +200 WRT fairy-max. That PST has gone and another is in, with +120 instead (I didn't pay attention to such things at that time when I replaced it).

The thing is: no matter how many eval features I enable/disable (I coded them so I can disable each term individually), it seems to me a "no progression/regression". If I disable only PST then this is a complete disaster.

Are the PST so important? Or is this due to a very untuned evaluation terms?
PST is indeed by far the most important evaluation feature. As an added bonus, it's also almost free.

Bishop pair is pretty significant as well, and also almost free.

After that, mobility is probably the most important.

Then king safety (just give a bonus for pieces close to enemy king).

Unfortunately, this is where the cheap/easy stuff ends. Everything else requires more work for less benefits, and you'll have to start testing extensively.

Pawn features are definitely important, but not quite as important as above (especially since you have pawn PST already). However, with pawn hash tables, they are also practically free. That's why most engines also have elaborate pawn structure eval.
Hi Mattew,
I have them all, however I "see" that the most important feature I have is the pawn PST. This alone avoids moves such as h4, h5, a4, a5, g4 etc... I simply didn't expected them to be THAT important.

IMO the importance of the features is the following:
1) PST
2) King safety
3) Pawn structure
4) the rest.

I didn't play too much with eval recently, and I don't have big experience. So I'm probably off by some amount. But I'm starting to play with these settings in order to see where my "bottleneck" is.

Thanks.
It's probably useful to not think of the PST as just one feature, but as many features combined.

It also gives you a knight and bishop centralization bonus (which indirectly predicts mobility), rook on 7th bonus, king de-centralization bonus in the opening, and centralization bonus in the endgame, etc. All those are very important features.

That's assuming you have phase-dependent PSTs. If you don't, it's probably a good idea.

That's also the problem that PST has multiple features combined. So it makes it difficult to debug. For instance displaying king safety score.

jdart · Post by **jdart** » Wed Aug 19, 2015 9:32 pm

I think you need to test against a stronger engine.

If you do not have good king safety/king attack scoring, you will get killed by an engine that has these features.

--Jon

hgm · Post by **hgm** » Wed Aug 19, 2015 11:15 pm

Aleks Peshkov wrote:I think the primary reason is that FairyMax evaluation is just PST plus FairyMax is a fast searcher. Take Stockfish with time odds as the opponent and your Occam's razor tuning would be counterproductive.

Note that Fairy-Max does not really use PST, just a centralization table that is used by K, B, N and P (where for the P there is an extra, game-phase-dependent push bonus), Q and P don't use it, and just diffuse freely over the board. So it has no way to incorporate niceties as Rooks on 7th rank, long Bishop diagonals, assymetry (as white and black use the same table).

In addition, it does have some explicit and implicit King safety, which in self-testing really added significant Elo. The implicit part is due to giving a hefty penaly for any King move. This makes it very allergic to checks that could force the King to move, and hence avoid opponnet safe checking possibilities. The explicit part is that it tries to keep a Pawn in front of the King.

It also has some multi-Pawn evaluation, discouraging pushing if the Pawn two square beside it is missing,

I have been working the past week on another 'toy engine', one step up from micro-Max, if you want. The idea was that I wanted this to be a 'full-featured' engine, but without very complex speed optimalizations that would require lots of code. But, unlike micro-Max, it does have true move sorting, after pushing all the moves from move generation of a move stack.

As evaluation features it uses:
*) end-game and opening PST
*) (hashed) pawn structure (passers, isolated/backward and doubles Pawns)
*) some special material terms, such as Bishop pairs, Rooks benefitting from open files etc.
*) king shelter (pawn shield, open files in or against it)
*) king seige (squares attacked next to the king, how many and by which piece)
*) discounting of evaluation based on drawish material (KBK, KNNK, KRKN, KRNKR...)
*) recognizers for specific draw patterns in some end-games (KPK, KBPK, KQKP)
*) patterns for common piece traps (white B on a7, black P on b6, white R on h1 and K on g1, etc.)
*) piece mobility

The mobility and king seige are the expensive terms, the rest is quite cheap. (Pawn structure is cheap because it gets hashed with a high hit rate.) I calculate them as a side effect of move generation, which is a very cheap way of doing it. In the full-width search it usually tries null move, so that gives the opponnent mobility and seige for free. In QS it does a 'dummy' null-move search, which just generates (and counts) moves, and returns immediately after. Of course the before this, and generation of its own moves, it tries to stand pat on a lazy version of the evaluation, which uses the seige and mobility terms from the parent node, the mobility corrected for the contribution the captured piece made to it (but not for mobility changes due to the moving piece and (un)blocking). That means calculating these terms hardly involves any dedicated code, just a few extra lines in the move generator, an extra conditional return immediately after it, and a dummy call.

xmas79 · Post by **xmas79** » Wed Aug 19, 2015 11:20 pm

matthewlai wrote:...That's assuming you have...

I have all of this... Still trying to understand....

xmas79 · Post by **xmas79** » Wed Aug 19, 2015 11:25 pm

cdani wrote:If I understand you well, this is probably the problem. To tune other search parametres is better to tune them against stronger engines, not far from yours. If you are beating Fairy-Max by 70% or more is too much to be a relevant test.

This is something I thought, however I don't understand why I should play against another engine if I still lose some games against Fairy-Max (say 1 every 4). I'm asking myself how it is possible that I can't make any progress even by adding small/a ton of eval terms. Surely untuned, ok, but something should change against that specific opponent, no?

xmas79 · Post by **xmas79** » Wed Aug 19, 2015 11:27 pm

jdart wrote:I think you need to test against a stronger engine.

If you do not have good king safety/king attack scoring, you will get killed by an engine that has these features.

--Jon

I added this feature and the engine changed "style" by killing Fairy-Max in midgame instead of winning in endgame. Must be tuned of course...

xmas79 · Post by **xmas79** » Wed Aug 19, 2015 11:30 pm

hgm wrote:I have been working the past week on another 'toy engine', one step up from micro-Max, if you want..

Except for hashed pawn evaluation, all these features are present in my engine too. I could try to test it and see what's the result... Is it downloadable?

cdani · Post by **cdani** » Wed Aug 19, 2015 11:45 pm

xmas79 wrote:
cdani wrote:If I understand you well, this is probably the problem. To tune other search parametres is better to tune them against stronger engines, not far from yours. If you are beating Fairy-Max by 70% or more is too much to be a relevant test.
This is something I thought, however I don't understand why I should play against another engine if I still lose some games against Fairy-Max (say 1 every 4). I'm asking myself how it is possible that I can't make any progress even by adding small/a ton of eval terms. Surely untuned, ok, but something should change against that specific opponent, no?

You said it, is because you win Fairy-Max anyway, so you don't see the effect of the changes quite often.

So you will see the effect of the changes more often playing against more advanced engines. Just try it and you will see the results.

Daniel Anulliero · Post by **Daniel Anulliero** » Thu Aug 20, 2015 4:28 pm

I think tuning your eval to win every games against Fairy max (so only one engine) must be dangerous
Your eval is adapted against just one engine
Test your change against engines from 1700 to 2300 elo must be better imho

Help on evaluation

Re: Help on evaluation

Re: Help on evaluation

Re: Help on evaluation

Re: Help on evaluation

Re: Help on evaluation

Re: Help on evaluation

Re: Help on evaluation

Re: Help on evaluation

Re: Help on evaluation