Help on evaluation

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Henk
Posts: 7220
Joined: Mon May 27, 2013 10:31 am

Re: Help on evaluation

Post by Henk »

matthewlai wrote:
xmas79 wrote:
matthewlai wrote:
xmas79 wrote:Hello,
my engine development has been stopped for about one year, but now I have some spare time and I'm going to dive into tuning my eval. As always, my preferred opponent is Fairy-Max.

Over the time I added a lot of parameters to the evaluation (bishop pairs, passed pawns, doubled etc...), and after a small refactoring step, I tried to see what impact each parameter has in game play. Result: pratically zero, unless I really go out of the ballpark. The most important thing that drives the quality of the playing style is the PST :shock: :shock: :shock: ! More than one year ago (when I run the first tests) I had a hand tuned PST, and I remember that I got +200 WRT fairy-max. That PST has gone and another is in, with +120 instead (I didn't pay attention to such things at that time when I replaced it).

The thing is: no matter how many eval features I enable/disable (I coded them so I can disable each term individually), it seems to me a "no progression/regression". If I disable only PST then this is a complete disaster.

Are the PST so important? Or is this due to a very untuned evaluation terms?
PST is indeed by far the most important evaluation feature. As an added bonus, it's also almost free.

Bishop pair is pretty significant as well, and also almost free.

After that, mobility is probably the most important.

Then king safety (just give a bonus for pieces close to enemy king).

Unfortunately, this is where the cheap/easy stuff ends. Everything else requires more work for less benefits, and you'll have to start testing extensively.

Pawn features are definitely important, but not quite as important as above (especially since you have pawn PST already). However, with pawn hash tables, they are also practically free. That's why most engines also have elaborate pawn structure eval.
Hi Mattew,
I have them all, however I "see" that the most important feature I have is the pawn PST. This alone avoids moves such as h4, h5, a4, a5, g4 etc... I simply didn't expected them to be THAT important.

IMO the importance of the features is the following:
1) PST
2) King safety
3) Pawn structure
4) the rest.

I didn't play too much with eval recently, and I don't have big experience. So I'm probably off by some amount. But I'm starting to play with these settings in order to see where my "bottleneck" is.

Thanks.
It's probably useful to not think of the PST as just one feature, but as many features combined.

It also gives you a knight and bishop centralization bonus (which indirectly predicts mobility), rook on 7th bonus, king de-centralization bonus in the opening, and centralization bonus in the endgame, etc. All those are very important features.

That's assuming you have phase-dependent PSTs. If you don't, it's probably a good idea.
That's also the problem that PST has multiple features combined. So it makes it difficult to debug. For instance displaying king safety score.
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Help on evaluation

Post by jdart »

I think you need to test against a stronger engine.

If you do not have good king safety/king attack scoring, you will get killed by an engine that has these features.

--Jon
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Help on evaluation

Post by hgm »

Aleks Peshkov wrote:I think the primary reason is that FairyMax evaluation is just PST plus FairyMax is a fast searcher. Take Stockfish with time odds as the opponent and your Occam's razor tuning would be counterproductive.
Note that Fairy-Max does not really use PST, just a centralization table that is used by K, B, N and P (where for the P there is an extra, game-phase-dependent push bonus), Q and P don't use it, and just diffuse freely over the board. So it has no way to incorporate niceties as Rooks on 7th rank, long Bishop diagonals, assymetry (as white and black use the same table).

In addition, it does have some explicit and implicit King safety, which in self-testing really added significant Elo. The implicit part is due to giving a hefty penaly for any King move. This makes it very allergic to checks that could force the King to move, and hence avoid opponnet safe checking possibilities. The explicit part is that it tries to keep a Pawn in front of the King.

It also has some multi-Pawn evaluation, discouraging pushing if the Pawn two square beside it is missing,

I have been working the past week on another 'toy engine', one step up from micro-Max, if you want. The idea was that I wanted this to be a 'full-featured' engine, but without very complex speed optimalizations that would require lots of code. But, unlike micro-Max, it does have true move sorting, after pushing all the moves from move generation of a move stack.

As evaluation features it uses:
*) end-game and opening PST
*) (hashed) pawn structure (passers, isolated/backward and doubles Pawns)
*) some special material terms, such as Bishop pairs, Rooks benefitting from open files etc.
*) king shelter (pawn shield, open files in or against it)
*) king seige (squares attacked next to the king, how many and by which piece)
*) discounting of evaluation based on drawish material (KBK, KNNK, KRKN, KRNKR...)
*) recognizers for specific draw patterns in some end-games (KPK, KBPK, KQKP)
*) patterns for common piece traps (white B on a7, black P on b6, white R on h1 and K on g1, etc.)
*) piece mobility

The mobility and king seige are the expensive terms, the rest is quite cheap. (Pawn structure is cheap because it gets hashed with a high hit rate.) I calculate them as a side effect of move generation, which is a very cheap way of doing it. In the full-width search it usually tries null move, so that gives the opponnent mobility and seige for free. In QS it does a 'dummy' null-move search, which just generates (and counts) moves, and returns immediately after. Of course the before this, and generation of its own moves, it tries to stand pat on a lazy version of the evaluation, which uses the seige and mobility terms from the parent node, the mobility corrected for the contribution the captured piece made to it (but not for mobility changes due to the moving piece and (un)blocking). That means calculating these terms hardly involves any dedicated code, just a few extra lines in the move generator, an extra conditional return immediately after it, and a dummy call.
xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Re: Help on evaluation

Post by xmas79 »

matthewlai wrote:...That's assuming you have...
I have all of this... Still trying to understand....
xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Re: Help on evaluation

Post by xmas79 »

cdani wrote:If I understand you well, this is probably the problem. To tune other search parametres is better to tune them against stronger engines, not far from yours. If you are beating Fairy-Max by 70% or more is too much to be a relevant test.
This is something I thought, however I don't understand why I should play against another engine if I still lose some games against Fairy-Max (say 1 every 4). I'm asking myself how it is possible that I can't make any progress even by adding small/a ton of eval terms. Surely untuned, ok, but something should change against that specific opponent, no?
xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Re: Help on evaluation

Post by xmas79 »

jdart wrote:I think you need to test against a stronger engine.

If you do not have good king safety/king attack scoring, you will get killed by an engine that has these features.

--Jon
I added this feature and the engine changed "style" by killing Fairy-Max in midgame instead of winning in endgame. Must be tuned of course...
xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Re: Help on evaluation

Post by xmas79 »

hgm wrote:I have been working the past week on another 'toy engine', one step up from micro-Max, if you want..
Except for hashed pawn evaluation, all these features are present in my engine too. I could try to test it and see what's the result... Is it downloadable?
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Help on evaluation

Post by cdani »

xmas79 wrote:
cdani wrote:If I understand you well, this is probably the problem. To tune other search parametres is better to tune them against stronger engines, not far from yours. If you are beating Fairy-Max by 70% or more is too much to be a relevant test.
This is something I thought, however I don't understand why I should play against another engine if I still lose some games against Fairy-Max (say 1 every 4). I'm asking myself how it is possible that I can't make any progress even by adding small/a ton of eval terms. Surely untuned, ok, but something should change against that specific opponent, no?
You said it, is because you win Fairy-Max anyway, so you don't see the effect of the changes quite often.

So you will see the effect of the changes more often playing against more advanced engines. Just try it and you will see the results.
Daniel Anulliero
Posts: 759
Joined: Fri Jan 04, 2013 4:55 pm
Location: Nice

Re: Help on evaluation

Post by Daniel Anulliero »

I think tuning your eval to win every games against Fairy max (so only one engine) must be dangerous
Your eval is adapted against just one engine
Test your change against engines from 1700 to 2300 elo must be better imho