How much evaluation before diminishing returns?

lojic · Post by **lojic** » Wed Feb 03, 2021 10:19 pm

I've been perusing the Stockfish Evaluation Guide, and there's a ton of great info there for a newbie like me. Are there some rules of thumb regarding maximizing benefit/cost for evaluation? Obviously, anything added to evaluation will slow the engine down, so at some point, the reduced search due to being slower will counteract the benefit of better evaluation.

I realize this will depend on many factors, and I expect the differences between mailbox (my engine) and bitboard (most (all?) top engines) may be pretty large.

I guess I'm looking for a short list of evaluation factors that are generally viewed as "worth it". For example, here is the Javascript example for the middle game evaluation from the above guide:

Code: Select all

function middle_game_evaluation(pos, nowinnable) {
  var v = 0;
  v += piece_value_mg(pos) - piece_value_mg(colorflip(pos));
  v += psqt_mg(pos) - psqt_mg(colorflip(pos));
  v += imbalance_total(pos);
  v += pawns_mg(pos) - pawns_mg(colorflip(pos));
  v += pieces_mg(pos) - pieces_mg(colorflip(pos));
  v += mobility_mg(pos) - mobility_mg(colorflip(pos));
  v += threats_mg(pos) - threats_mg(colorflip(pos));
  v += passed_mg(pos) - passed_mg(colorflip(pos));
  v += space(pos) - space(colorflip(pos));
  v += king_mg(pos) - king_mg(colorflip(pos));
  if (!nowinnable) v += winnable_total_mg(pos, v);
  return v;
}

I think everyone agrees that the sum of piece values is worth it :)

Is the only way to answer this question to make a change, and then have your engine play N games to see if it improves, and keep repeating that process? I guess I need to invest the time to code up the stuff necessary to stick my engine in an app that can automate the playing of many games overnight.

mvanthoor · Post by **mvanthoor** » Wed Feb 03, 2021 10:31 pm

The few things that always seem to be large ELO-winners are:

1. Detecting passed pawns (even better if it's defended), so you can create your own and stop your opponent.
2. Detecting doubled/tripled pawns (so you can avoid them, and cause them for your opponent)
3. Mobility
4. King safety

Not necessarily in that order; it depends on the engine. TSCP for example, almost exclusively gains its 1725 CCRL ELO rating from points 1 and 2, against engines that don't have these features yet. My engine is faster, searches deeper, and has better PST's than TSCP, but TSCP is still about 30 ELO stronger. It ruins the pawn structure and creates 1 (or 5) passed pawns, almost to the exclusion of anything else. Because my engine has only material count and PST's (and no hash table either, which is a handicap in an endgame with few pieces), it doesn't recognize the danger. Sometimes it even loses in a winning position, because the promotion of TSCP's passed pawns are _just_ beyond its current search horizon.

Easy evaluations to add are:

1. Bonus for bishop pair.
2. Bonus for rook or queen on a (half) open file. Fully open file gets a larger bonus than half open.
3. A bonus for covering one of the castling squares of the opponent (using square_attacked()) if he is not yet castled, to try and keep the king in the middle of the board.

I haven't yet implemented those in my evaluation. I know they're good to have, because in a real over the board game, the more from 1-3 you have, especially in combination with a larger mobility than your opponent, greatly increases your attacking chances.

hgm · Post by **hgm** » Wed Feb 03, 2021 11:58 pm

There is this interesting argument: if you add knowledge to an engine that slows it down by 1%, but doesn't improve its play enough to compensate that, you would not want it. But on an engine that is twice slower because it already has far more knowledge in the eval, adding the same knowledge would only slow it down half a percent. And if the added knowledge was not already covered, the benefit it gives could be the same. In that case it could be desirable, because the slowdown is less. This is why engines can be classified as fast or as slow searchers, the latter having lots of knowledge slowing them down, but still losing strength on eleminating any of that knowledge. Because percentually it doesn't speed them up enough.

Indeed Pawn structure is a pretty important evaluation term. Micro-Max only surpassed TSCP after I greatly increased its aggression with Pawns. Recognizing passers was asking for too much, as it would require lots of code. But adding a PST-like bonus that increases the value of a 7th-rank Pawn to 2.5 Pawn, and that of a 6th-rank Pawn (even if not a passer) to 1.7 Pawn did the trick.

Note that the number of half-open files, the number of doubled Pawns and the total number of Pawns are NOT independent parameters. I suppose you would only want to award half-open files when you have Rooks to exploit them. But that still means that in the presence of Rooks the open-file bonus actually encourages the creation of doubled Pawns just as much as a direct penalty on doubled Pawns discourages them. Assuming that both are so much smaller than a full Pawn that it will never sacrifice a Pawn to create an half-open file.

Also note that doubled Pawns are not unconditionally bad. It is mainly isolated doubled Pawns that are bad. But single isolated Pawns are already bad themselves, and an isolated doubled Pawn would get twice that penalty. So an additional penalty for it doesn't do that much. Pawn structure is quite important, but just penalizing doubled Pawns won't give you much improvement. You have to be more subtle than that.

Most advanced engines use a separate Pawn hash table, with a separate hash key to which only the Pawns contribute. (Sometimes Pawns + King.) Since Pawn structures only change slowly, you can get very large hit rates on even a 'small' table (like 1MB). That makes evaluation of Pawn structure very fast, no matter how complex the algorithms are that really do it. Apart from a score bonus, you can store lots of other information that is helpful for calculating evaluation terms that are also dependent on other pieces. E.g. you can store the location of the most-advanced passer, to be used to determine how favorable the Kings are located w.r.t. the passer. (E.g. whether it is an 'unstoppable passer' in a Pawn ending.) And the quality of the Pawn shield on each of the 4 wings, where the actual location of the King would determine which one you include in the total evaluation. Or how much your castling rights are still worth. And which files are open or half-open. And how much the value of the white and the black Bishop should be adjusted (if you have one) due to the number of friendly Pawns on its color. In KingSlayer I use a 32-byte entry in the Pawn hash, (of which only 4 bytes are the key signature), to store all that information.

How much evaluation before diminishing returns?

How much evaluation before diminishing returns?

Re: How much evaluation before diminishing returns?

Re: How much evaluation before diminishing returns?