Passer evaluation

hgm · Post by **hgm** » Wed May 11, 2022 11:54 am

I have been trying to use Fairy-Max for measuring piece values in the chess variant Metamachy. It is not only that there are many fairy pieces particippating in this variant: the board is 12x12, and Pawns are allowed a double push everywhere on the board. Both these aspects have a profound effect on piece values, so I cannot even be sure of the value of the orthodox pieces.

My first tests revealed that the match results are much less sensitive to an initial material imbalance than in orthodox Chess: Knight odds resulted in a score of only 70%, while in orthodox Chess Pawn odds is already nearly that (65-68%). Pawn odds here gave only a 54.5% score, hardly more than the white advantage in Chess.

I figured this could be not so much a problem intrinsic to Metamachy, but an artifact caused by Fairy-Max. The latter has no real Pawn-structure evaluation, and in particular cannot recognize passers. While passers in Metamachy are much more dangerous than in Chess: the double push makes them outrun most pieces, and allows them to move over an attacked square in their path, so that you really have to block them in order to prevent advance. (Fortunately other Pawns can still e.p. capture them on the skipped square, so the concept of passer is identical in Chess and Metamachy.) This situation leads to many 'surprise wins', where one of the players (which could very well be the one that is behind) suddenly finds an unstoppable promotion appearing within the horizon. Which causes significant additional randomization of the results, pushing them towards 50%.

So the value of Pawn structure is on average much larger in Metamachy than in Chess, also because you start with 12 Pawns instead of 8 each. During the middle game the value of the Pawn structure randomly drifts when both players are totally ignorant of it. To cure that I made a special version of Fairy-Max for these tests, which does identify passers and can score them. Of course this will only be any good if the scoring is sensible as well.

For identifying the passers I used the 2-pass algorithm I first saw in TSCP: the first pass determines the backward-most Pawn in every file (for each player), and the second pass determines whether you have any opponent Pawns in front of you in your own or the adjacent files. Because this process is rather expensive (Fairy-Max has no piece list, so it has to scan the entire 144-square board to find its Pawns), I use a Pawn-hash table to store the result. Perhaps I was a bit stingy on sizes there: it has only 16K entires, and uses a 32-bit pawn key, so that the signature is only 18 bits.

Any way, it seems to work. I gave passers a 16cP bonus on their starting rank, which increases to 96cP when they are only one step from promotion. I also gave some extra bonus for when they are protected. A quick tests shows there is no terrible slowdown of nps, and the passer-aware version appears to beat the ignorant one with 58.5% (plus a rather large error margin).

This was just a first try; it is possible that much better play could still be achieved by different scoring of the passers. In other engines I noted that scoring passers can be a mixed blessing, especially w.r.t. encouraging their advance. Because advancing them too early often makes them undefensible, so that you will needlessly lose the most valuable Pawns you had. So perhaps I should switch to game-phase-dependent scoring. Holding back the Pawns early on to keep them safe when they are not protected by another Pawn, and only encourage pushing without protection late in the game.

Code: Select all

int ww[16], bb[16] = { 16 };
int Peval()
{
  static int bonus[] = { 0, 0, 4, 6, 6, 10, 10, 16, 16, 24, 24, 0 },
             prot[]  = { 0, 0, 0, 2, 2, 3, 3, 4, 4, 5, 5, 0 };
  int f, r, s=2000;
  bb[BW+1] = 16;
  for(f=0; f<BW; f++) {
    ww[f+1] = 0; bb[f+1] = 16;
    for(r=1; r<BH-1; r++) {
      int p = b[f+16*r] & 15;
      if(p == 1) ww[f+1] = r;
      if(p == 2 && bb[f+1] == 16) bb[f+1] = r;
    }
  }
  for(f=0; f<BW; f++) {
    r = ww[f+1];
    if(r && r < bb[f+1] && r <= bb[f] && r <= bb[f+2]) s += bonus[11-r] + (ww[f] == r+1 || ww[f+2] == r+1)*prot[11-r];
    r = bb[f+1];
    if(r != 16 && r > ww[f+1] && r >= ww[f] && r >= ww[f+2]) s -= bonus[r] + (bb[f] == r-1 || bb[f+2] == r-1)*prot[r];
  }
  return s;
}

int JJ; // pawn key
int pawnTab[1<<14];
int Probe(int k)
{
  int i=JJ&0x3FFF, s=pawnTab[i]^JJ;
  if(s & 0xFFFFC000) s = Peval(), pawnTab[i] = s ^ JJ;
  return (s - 2000)*k >> 1;
}

hgm · Post by **hgm** » Fri May 13, 2022 1:52 pm

Even though play is better, the more accurate scoring of Pawn structure did not enable the engine to benefit much more from Pawn odds.

But fortunately I now found the true 'randomizing factor'. An attempt to measure the Queen value gave me the clue. I first tried a Q vs 2R imbalance; the Rooks were massacred. Then I tried Q vs 2R + minor. Again the Queen won big. Even with Q vs 3R the Queen won. Q > 3R??? That didn't seem right...

Problem was that the 12x12 game I was testing for has no castling, but instead a 2-square initial jump on the King. On 12x12 that still leaves the King pretty in the center. Since Fairy-Max has no real King-Safety evaluation it left the King there completely unshielded in a large open space (because even when you still have many Pawns, they advance a lot on such a large board). This gives a Queen a field day: once she manages to slip behind the Pawns it can check and fork at her heart's content, gobbling up several minors or even Rooks, deciding the game in her advantage. And it is not much more difficult for a Queen to do this when behind in Pawns or other pieces. But of course impossible when you have no Queen.

Fairy-Max relies for its King safety entirely on quickly castling (giving a hefty bonus for doing that), and effectively freezing the King (and Pawn directly in front of it) by penalizing other moves. In my initial test setup none of that was possible: the King was starting at 1st rank and the Pawns on 3rd, so there wasn't even a Pawn to freeze. I now switched to a test setup whwere castling is allowed, with an initial position that has all pieces on 2nd rank. Pawn odds (deleting the h-Pawn) then does result in a 65% score.

[diag]
ranks=12
files=12
firstRank=1
promoChoice=L
symmetry=rotate
Pawn::fmWfceFfnnmD::a3-l3
Camel:M:::k2
Prince::KfmnD:Commoner:j2
Cannon:::Canon:c2
Elephant::FA::b2
Knight:N:::d2,i2
Bishop::::e2,h2
Rook::::a2,l2
Eagle:G:FyafsF:Hawk:
Lion::KNAD::
Queen::::f2
King::KisO4::g2
[/diag]

lithander · Post by **lithander** » Mon May 16, 2022 11:42 pm

hgm wrote: ↑Wed May 11, 2022 11:54 am I have been trying to use Fairy-Max for measuring piece values in the chess variant Metamachy. It is not only that there are many fairy pieces particippating in this variant: the board is 12x12, and Pawns are allowed a double push everywhere on the board. Both these aspects have a profound effect on piece values, so I cannot even be sure of the value of the orthodox pieces.

Couldn't you infer the piece values with a process similar as used by Texel's tuning? You'd need a large enough quantity of annotated positions where you know which side is winning. Then the configuration of material values that minimize the error in predicting the outcome is probably a good approximation of the material values. Gradient descent works better for me than Texels original method, speedwise.

And the same approach should also work for passed pawn evaluation. Just have a dedicated PSQT for "passed pawns" and the tuner should spit out in detail how valuable a passed pawn is (in addition to it's material value) on a specific square. Here's an example from Leorik's tuner:

Code: Select all

PassedPawns - MG
    0,    0,    0,    0,    0,    0,    0,    0,
   37,   58,   17,   40,   30,   61,   11,  -15,
   51,   22,   24,   16,  -16,   11,  -30,  -23,
    9,    7,    3,    1,    2,   26,  -30,  -18,
   12,  -11,  -12,  -12,  -15,   -1,   -5,   17,
   -1,   -5,   -9,  -28,    7,   31,   22,   26,
    4,   11,   13,   -2,   15,   19,   32,   -6,
    0,    0,    0,    0,    0,    0,    0,    0,

PassedPawns - EG
    0,    0,    0,    0,    0,    0,    0,    0,
   66,   30,   71,   25,   38,   -7,   71,  111,
   81,  113,   77,   73,   79,   77,  132,  139,
   69,   64,   53,   44,   30,   14,  100,   84,
   27,   52,   40,   35,   42,   26,   50,   19,
   16,   26,   25,   47,   -1,  -27,   -1,  -13,
    3,    2,  -11,   16,  -10,  -15,  -23,   16,
    0,    0,    0,    0,    0,    0,    0,    0,

The tables get combined like this
score(phase) = MG + phase * EG
Which means the EG values modify the MG value but don't replace them.

Looking at the tables I suspect the dataset is both too small and has a strong bias for king-size castling. But it captures the essence: Passed pawns are more valuable in the endgame than in the early game phase. And the further they advance the better. Also it's good to know in what ballpark the bonus should be. Even if I don't use the tables in the end it's a good guidance for implementing a handcrafted pawnstructure evaluation.

hgm · Post by **hgm** » Tue May 17, 2022 7:38 am

The problem is to get the annotated positions. Only a very small fraction of the games would contain positions with a given complex imbalance. Most games you generate might never have a quiet piece imbalance; just an advantage of one or more Pawns. Imbalances where one side is a plain piece ahead are not useful; the game there is already decided, and they only tell you pieces have large positive value.

If you start games from an initial position with that imbalance, all games contain it.

By selecting key imbalances (such as BB-NN, BP-R, NN-R, Q-RR) you generate exactly those positions that would contribute most to the squared error for a given piece-value deviation. I don't see how taking individual positions rather than games would be an improvement. It would weight the results differently, because some imbalances might take in average fewer moves to win than others. But somehow I feel that makes them more important rather than less.

Another issue is the quality of the games the positions are taken from. If that is low, because the players did not understand some basic strategic concept (like passers or king safety), it introduces a large randomization in the mapping of advantage to result. Which makes that you require more positions or games to get a reliable average, even when you fit an evaluation that does take account of that concept. It also makes it difficult to get a representative set. In a game between engines that do not evaluate king safety, the latter will always be poor. While in games between more knowledgeable players it might be predominantly good. And if a Queen is worth more when the opponent has poor king safety...

lithander · Post by **lithander** » Tue May 17, 2022 1:29 pm

I have to admit that so far I have cheated and used existing labeled data-sets and just stuck with one that gave good results for me. I believe that it would be possible to start with a trivial initialization of piece square tables. Do a few thousand of selfplay games generate annotated positions from it and train less-trivial coefficients. Rinse repeat.
That passed pawns are extra valuable should become visible in the data before the engines knows about it. So I think you can write an engine that just knows about "features" and train all coefficients of how those features should impact the evaluation from scratch based on selfplay games of the still ignorant engine. But I have yet to prove it! (which I plan to do eventually)

hgm wrote: ↑Tue May 17, 2022 7:38 am Another issue is the quality of the games the positions are taken from. If that is low, because the players did not understand some basic strategic concept (like passers or king safety), it introduces a large randomization in the mapping of advantage to result. Which makes that you require more positions or games to get a reliable average, even when you fit an evaluation that does take account of that concept. It also makes it difficult to get a representative set. In a game between engines that do not evaluate king safety, the latter will always be poor. While in games between more knowledgeable players it might be predominantly good. And if a Queen is worth more when the opponent has poor king safety...

I have always struggled intuitively with the idea that imbalances even have this "correct" value. Intuitively it feels like the value of a passed pawn or king-safety depends on both players: If both players are unaware the value is low because you need luck to create and profit form this imbalance. If both players know the concept the value should be moderate because they will factor it into their play and try to deny each other the advantage. It should have the highest "value" when only one player is aware of it and the other isn't because then the knowledgeable player can more often use that imbalance to his advantage.

hgm · Post by **hgm** » Tue May 17, 2022 2:56 pm

lithander wrote: ↑Tue May 17, 2022 1:29 pm I have to admit that so far I have cheated and used existing labeled data-sets and just stuck with one that gave good results for me. I believe that it would be possible to start with a trivial initialization of piece square tables. Do a few thousand of selfplay games generate annotated positions from it and train less-trivial coefficients. Rinse repeat.
That passed pawns are extra valuable should become visible in the data before the engines knows about it. So I think you can write an engine that just knows about "features" and train all coefficients of how those features should impact the evaluation from scratch based on selfplay games of the still ignorant engine. But I have yet to prove it! (which I plan to do eventually)

I largely agree with that. But for that scheme to work you should use an engine that can recognize the features, even if initially it does not know how to score them. So the relevant evaluation terms all have to be programmed.

The disadvantage is that it becomes a very tedious process, where you have to go through several iterations of playing games and tuning. And that initially it learns all kinds of wrong things, (which only work because of opponent stupidity), which it has to unlearn in later iterations. Currently it takes me about 5 min per game (single CPU, so I can play a few games in parallel, depending on what else I am using the PC for as well). And even then Fairy-Max reaches only 5-6 ply in the middle game. So playing 'a few thousand games' is easier said than done.

I have always struggled intuitively with the idea that imbalances even have this "correct" value. Intuitively it feels like the value of a passed pawn or king-safety depends on both players: If both players are unaware the value is low because you need luck to create and profit form this imbalance. If both players know the concept the value should be moderate because they will factor it into their play and try to deny each other the advantage. It should have the highest "value" when only one player is aware of it and the other isn't because then the knowledgeable player can more often use that imbalance to his advantage.

I think the point is that however stupid engines may be, tactically they are always strong. Which means they are pretty good at exploiting weaknesses even without knowing about them. If one player messes up his King Safety especially bad, e.g. by advancing his entire Pawn shield, sooner or later the opponent will discover "hey, I have a mate-in-8!". Or that a Queen has to be sacrificed to prevent the mate. So as long as it knows checkmate or gaining a Queen is good, it won't hesitate from exploiting the opponent's weak King Safety.

And you don't have to know a Rook is stronger than a Bishop to experience the fact that you gobble up opponent Pawns much more easily in the end-game when you have one than that you lose them to the Bishop. What you want to prevent, though, is that it ignorantly trades his Rook for a Bishop, because he thinks this is a neutral trade. But when the values are not set equally, one of the two players will always avoid the trade, in self-play. It doesn't even matter which value you set higher.

The important thing is indeed whether an initial imbalance is preserved long enough to express itself in the score if the players don't recognize it as valuable. E.g. it won't help to start one King behind a Pawn shield, and the other exposed, if the engine early in the opening happily advances and trades all Pawns that were in the Pawn shield. So I guess a necessary condition is that the engine should consider it valuable enough to preserve it over longer times. Then it doesn't matter so much whether it correctly scores the advantage; the value has to be just large enough to not needlessly squander it.

JohnWoe · Post by **JohnWoe** » Wed May 18, 2022 1:54 am

hgm wrote: ↑Tue May 17, 2022 2:56 pm
lithander wrote: ↑Tue May 17, 2022 1:29 pm I have to admit that so far I have cheated and used existing labeled data-sets and just stuck with one that gave good results for me. I believe that it would be possible to start with a trivial initialization of piece square tables. Do a few thousand of selfplay games generate annotated positions from it and train less-trivial coefficients. Rinse repeat.
That passed pawns are extra valuable should become visible in the data before the engines knows about it. So I think you can write an engine that just knows about "features" and train all coefficients of how those features should impact the evaluation from scratch based on selfplay games of the still ignorant engine. But I have yet to prove it! (which I plan to do eventually)
I largely agree with that. But for that scheme to work you should use an engine that can recognize the features, even if initially it does not know how to score them. So the relevant evaluation terms all have to be programmed.

The disadvantage is that it becomes a very tedious process, where you have to go through several iterations of playing games and tuning. And that initially it learns all kinds of wrong things, (which only work because of opponent stupidity), which it has to unlearn in later iterations. Currently it takes me about 5 min per game (single CPU, so I can play a few games in parallel, depending on what else I am using the PC for as well). And even then Fairy-Max reaches only 5-6 ply in the middle game. So playing 'a few thousand games' is easier said than done.

I have always struggled intuitively with the idea that imbalances even have this "correct" value. Intuitively it feels like the value of a passed pawn or king-safety depends on both players: If both players are unaware the value is low because you need luck to create and profit form this imbalance. If both players know the concept the value should be moderate because they will factor it into their play and try to deny each other the advantage. It should have the highest "value" when only one player is aware of it and the other isn't because then the knowledgeable player can more often use that imbalance to his advantage.
I think the point is that however stupid engines may be, tactically they are always strong. Which means they are pretty good at exploiting weaknesses even without knowing about them. If one player messes up his King Safety especially bad, e.g. by advancing his entire Pawn shield, sooner or later the opponent will discover "hey, I have a mate-in-8!". Or that a Queen has to be sacrificed to prevent the mate. So as long as it knows checkmate or gaining a Queen is good, it won't hesitate from exploiting the opponent's weak King Safety.

And you don't have to know a Rook is stronger than a Bishop to experience the fact that you gobble up opponent Pawns much more easily in the end-game when you have one than that you lose them to the Bishop. What you want to prevent, though, is that it ignorantly trades his Rook for a Bishop, because he thinks this is a neutral trade. But when the values are not set equally, one of the two players will always avoid the trade, in self-play. It doesn't even matter which value you set higher.

The important thing is indeed whether an initial imbalance is preserved long enough to express itself in the score if the players don't recognize it as valuable. E.g. it won't help to start one King behind a Pawn shield, and the other exposed, if the engine early in the opening happily advances and trades all Pawns that were in the Pawn shield. So I guess a necessary condition is that the engine should consider it valuable enough to preserve it over longer times. Then it doesn't matter so much whether it correctly scores the advantage; the value has to be just large enough to not needlessly squander it.

You could try stochastic approach. Make a vector of eval features and tune the vector by V x M. After every epoch. Obviously chess engine needs to rock solid and super fast. For game play approach. I used stochastic approach tuning some other AI. Perfect play with very little work. Rather surprised how little effort was required. I rather let 16 CPUs do the crunching, than me.

hgm · Post by **hgm** » Wed May 18, 2022 8:59 am

Yes, the principle is very simple. But you would need a chess engine that implements the evaluation features that you want to tune. Which also means that you have to know what these features are. That is the real problem.

From looking at the games the current engine plays I get the impression that its evaluation should be augmented with a term that awards shielding the King from distant checks as long as the opponent has a Queen.

dangi12012 · Post by **dangi12012** » Wed May 18, 2022 11:13 am

hgm wrote: ↑Wed May 18, 2022 8:59 am From looking at the games the current engine plays I get the impression that its evaluation should be augmented with a term that awards shielding the King from distant checks as long as the opponent has a Queen.

Why dont you let a neural network figure that one out?
Humans cannot handcraft a better evaluation than a multilayered NN with a simple gradient descent algorithm.

A single layer network without a activation function is mathematically identical to Piece-Square Tables
More layers have deeper terms and would be easiest to think of looking a few plies ahead.

My point being that the idea of shielding among others will automatically emerge out of the noise.

hgm · Post by **hgm** » Wed May 18, 2022 11:36 am

There certainly is some merit to that idea, but also a lot of practical drawbacks. For one, I should write a NN evaluation function, which is much more involved than adding an evaluation term. I have no idea how large such a network should be to count the number of empty squares on a ray between King and the nearest obstacle (piece or edge); I guess that would be as many layers as there are squares on the ray, as a closer piece should suppress the bonus for any piece that is further away. That is a lot, so the net would have to be big, and to run it would then be slow. It might also take a lot of extra training to learn the net to discover the pattern, before it can tune it.

Passer evaluation

Passer evaluation

Re: Passer evaluation

Re: Passer evaluation

Re: Passer evaluation

Re: Passer evaluation

Re: Passer evaluation

Re: Passer evaluation

Re: Passer evaluation

Re: Passer evaluation

Re: Passer evaluation