Stuck trying to come up with my own PST values

emadsen · Post by **emadsen** » Wed Feb 23, 2022 5:58 am

amanjpro wrote: ↑Wed Feb 23, 2022 5:08 am Zahak 6.2 (which is 2800 as per CCRL), was tuned over the same 700_000 quiet positions, and I tuned around 816 parameters with them. If the OP has issues with making progress with tuning, then he has a bug, either in the tuner, eval or search

That’s great but how much Elo did the tuning gain? Did you already have sensible eval param values? Peter Österlund, in his original post, recommended 8+ million positions.

To be fair, I don’t have hard numbers. Just anecdotal evidence of decreased likelihood of overfitting param values when using millions instead of hundred thousands of positions.

amanjpro · Post by **amanjpro** » Wed Feb 23, 2022 6:04 am

emadsen wrote: ↑Wed Feb 23, 2022 5:58 am
amanjpro wrote: ↑Wed Feb 23, 2022 5:08 am Zahak 6.2 (which is 2800 as per CCRL), was tuned over the same 700_000 quiet positions, and I tuned around 816 parameters with them. If the OP has issues with making progress with tuning, then he has a bug, either in the tuner, eval or search
That’s great but how much Elo did the tuning gain? Did you already have sensible eval param values? Peter Österlund, in his original post, recommended 8+ million positions.

To be fair, I don’t have hard numbers. Just anecdotal evidence of decreased likelihood of overfitting param values when using millions instead of hundred thousands of positions.

Around 200-300 elo (from Zahak 2.0 to 3.0). Most of the patches after 3.0 were related to search, and rarely about eval

emadsen · Post by **emadsen** » Wed Feb 23, 2022 6:38 am

amanjpro wrote: ↑Wed Feb 23, 2022 6:04 am Around 200-300 elo (from Zahak 2.0 to 3.0). Most of the patches after 3.0 were related to search, and rarely about eval

That diff introduces tapered eval, which alone can be worth 250 Elo. A chess engine without tapered eval is really handicapped, keeping its king tucked away in his corner far away from his own advancing pawns. I'm pretty certain mvanthoor and lithander have measurements establishing tapered eval's strength contribution in the 250 Elo range.

amanjpro · Post by **amanjpro** » Wed Feb 23, 2022 6:40 am

emadsen wrote: ↑Wed Feb 23, 2022 6:38 am
amanjpro wrote: ↑Wed Feb 23, 2022 6:04 am Around 200-300 elo (from Zahak 2.0 to 3.0). Most of the patches after 3.0 were related to search, and rarely about eval
That diff introduces tapered eval, which alone can be worth 250 Elo. A chess engine without tapered eval is really handicapped, keeping its king tucked away in his corner far away from his own advancing pawns. I'm pretty certain mvanthoor and lithander have measurements establishing tapered eval's strength contribution in the 250 Elo range.

My tapered eval was some 40 elo or so, I added it first then tuned the eval params. I remember the gains roughly were 200 + 40 or so.

Various match results are posted in the PR: https://github.com/amanjpro/zahak/pull/61

Prior to "tapered eval" I already had two phase PSTs, just hand tuned, and I was changing phase based on the number of pieces/pawns in one go. So a move could turn the position from 100% MG to 100% EG

KhepriChess · Post by **KhepriChess** » Wed Feb 23, 2022 6:53 am

emadsen wrote: ↑Wed Feb 23, 2022 2:24 am
KhepriChess wrote: ↑Wed Feb 23, 2022 1:27 am Your recommendation is more or less what I'm aiming for in tuning (and basically what I've tried to aim for in hand tuning up to this point). I've look at PST for a few other engines to get a sense of what they're doing, for some reference.

I'm trying to tune the latter (64 * 6 * 2), every square for every piece for MG and EG.

I've tried with more positions, but then the resulting PSTs are just a bunch of large negative numbers.
Then it isn't.

You asked why your tuning efforts haven't yielded a stronger engine. I suggested it may be because you're tuning 23x more parameters against 1/7th the number of positions I recommended. You responded by saying that's "more or less" what you're doing. It's not.

You could be tuning against noise. How many games included a key maneuver of a white bishop to d6 in the endgame? Not near d6, exactly d6. The problem you're encountering, perhaps, is too easily convincing yourself, "I'm doing the same as others" or "I've done all that can be done." I'd be a little more cautious if I were you of casually dismissing what amounts to a 161x difference in eval minimization factors.

Please don't take this the wrong way. I'm just trying to answer your question as directly as I can. I certainly am not in possession of all the answers. I have a middling engine in terms of playing strength and a few years of experience with chess engine programming. Others have more of each. Listen to what other experienced chess engine authors suggest. If it doesn't align with what I'm saying, then consider my advice an outlier not worth following at this point.

For what it's worth, I think...

Step one is to gather more quiet positions correlated with game results. I suspect games from engines near to the strength of your engine may be more helpful than games from elite engines- though I don't have hard data to prove that. Run these games yourself or download games from CCRL.

Step two is to increase the rigor of your process beginning with decreasing the parameter search space your tuner must traverse. Verify the tuner actually is changing all param values. Also verify the tuner updates all lookup tables dependent on param values prior to calculating the eval error of the next iteration.

Step three is to investigate the numbers the tuner produces. Why does it return many large, negative numbers for PST values? How does this affect the static score of sample positions from your test set? What static eval does the stronger version of your engine return compared to the static eval of the weaker engine? As other have suggested, investigate whether the total value of material + PST changes drastically or only the PST value. If the tuned eval params produce nonsensical static scores, that suggests your tuner is not actually updating param values, or isn't calculating the total error correctly (are you summing the square of each error so positive and negative eval diffs don't cancel each other out?), isn't converting score to win percentage via a sigmoid function correctly, etc.

Thanks for the insight. And I definitely don't take it the wrong way; I posted here for help so some direct answers are what I'm looking for.

I can definitely see that the tuner is updating/saving the values when they are changed (plenty of breakpoints and manual checking really seems to indicate that they are being saved).

My mean square error function is a little odd (to me) because of the parallelism in C#, which honestly I'm not very familiar with. But it looks like:

Code: Select all

private double MeanSquareError(double K, int[] weights)
{
    _weights.UpdateWeights(weights);
    double error = 0;

    Parallel.For(0, _positions.Count, () => (double)0, (i, loop, subtotal) =>
        {
            var position = _positions[i];
            float score = Evaluate(position);

            var sigmoid = 1 / (1 + Math.Pow(10, -K * score / 400));
            subtotal = Math.Pow(position.Result - sigmoid, 2);

            return subtotal;
        }, localState =>
        {
            Add(ref error, localState);
        }
    );

    return error / _positions.Count;
}

I'll need to spend some more time doing some of the things you suggested like comparing the static eval scores.

op12no2 · Post by **op12no2** » Wed Feb 23, 2022 7:03 am

KhepriChess wrote: ↑Wed Feb 23, 2022 1:27 am
op12no2 wrote: ↑Tue Feb 22, 2022 10:27 pm Could you keep the piece values anchored at 1 3 3 5 9 for example and just tune the PSTs? Or set the whole mg/eg bishop (for example) PSTs to 3 and not have piece values? Then there is only one feature coefficient and weight. NB: I find Javascript fast enough, taking ~30 sec per 1M positions per ~1000 features/weights - but yeah that means hours not minutes for tuning...
That's actually what I've been doing. I just have the tuner changing the PST values, and not the piece values. And man, I don't know what you're doing but I was never able to get my JS tuner to be that fast. Even my C# version takes over an hour for 100,000 positions.

If it's just the bishop PSTs that are a problem and you tune 'bishop pair' try anchoring the bishop pair as an expt. My bishop end game PSTs are all -ve as it happens but mid game are not. Re: speed: if you are using the +1/-1 approach, you can gain orders of magnitude by swapping to gradient descent.

emadsen · Post by **emadsen** » Wed Feb 23, 2022 7:09 am

KhepriChess wrote: ↑Wed Feb 23, 2022 6:53 am

Code: Select all

            float score = Evaluate(position);
            var sigmoid = 1 / (1 + Math.Pow(10, -K * score / 400));
            subtotal = Math.Pow(position.Result - sigmoid, 2);

Does position.Result adjust for the side to move? Like this. So if white won the game and it's white's move, the result is 1. But if white won the game and it's black move, the result is 0. And vice-versa.

emadsen · Post by **emadsen** » Wed Feb 23, 2022 7:51 am

I'm not familiar with Parallel.For. I'd worry whether the increment... Add(ref error, localState)... is thread-safe. Or whether you must use Interlocked.Increment. Or whether the .NET LINQ code that invokes your localState => lambda method provides thread-safety protection.

I don't like magic methods like this. Too much hidden from view.

KhepriChess · Post by **KhepriChess** » Wed Feb 23, 2022 6:40 pm

emadsen wrote: ↑Wed Feb 23, 2022 7:09 am
KhepriChess wrote: ↑Wed Feb 23, 2022 6:53 am
Code: Select all
            float score = Evaluate(position);
            var sigmoid = 1 / (1 + Math.Pow(10, -K * score / 400));
            subtotal = Math.Pow(position.Result - sigmoid, 2);
Does position.Result adjust for the side to move? Like this. So if white won the game and it's white's move, the result is 1. But if white won the game and it's black move, the result is 0. And vice-versa.

position.Result doesn't adjust for the side to move, but Evaluate(position) returns a score from white's perspective (so if side to move is black, score = score * -1).

emadsen wrote: ↑Wed Feb 23, 2022 7:51 am I'm not familiar with Parallel.For. I'd worry whether the increment... Add(ref error, localState)... is thread-safe. Or whether you must use Interlocked.Increment. Or whether the .NET LINQ code that invokes your localState => lambda method provides thread-safety protection.

I don't like magic methods like this. Too much hidden from view.

The increment is not inherently thread safe. Normally, you'd just use something like Interlocked.Add(), but that doesn't take doubles. The Add method I use is the one found here https://stackoverflow.com/a/16893641, which with my understanding it should be thread safe.

KhepriChess · Post by **KhepriChess** » Thu Feb 24, 2022 6:38 am

op12no2 wrote: ↑Wed Feb 23, 2022 7:03 am If it's just the bishop PSTs that are a problem and you tune 'bishop pair' try anchoring the bishop pair as an expt. My bishop end game PSTs are all -ve as it happens but mid game are not. Re: speed: if you are using the +1/-1 approach, you can gain orders of magnitude by swapping to gradient descent.

I've looked at that PDF by Andrew Grant about GD and looked at your implementation of it for your engine. It still looks pretty foreign to me and I don't completely understand it.

I made a tweak to my tuner, which has slowed it down immensely; Running against 1000 positions for the past 8 hours. That seems like something is wrong, but I'm not sure what that would be. I'll probably let it continue to run overnight and see what happens tomorrow.

Code: Select all

         while (improved)
         {
            improved = false;

            for (int i = 0; i < weights.Length; i++)
            {
               if (i <= 7 || (i >= 53 && i <= 64)) // pawn squares on the 1st and 8th ranks don't need to be adjusted
               {
                  continue;
               }

               weights[i] += 1;

               double newError = MeanSquareError(K, weights);

               if (newError < bestError)
               {
                  bestError = newError;
                  improved = true;
               }
               else
               {
                  weights[i] -= 2;
                  newError = MeanSquareError(K, weights);

                  if (newError < bestError)
                  {
                     bestError = newError;
                     improved = true;
                  }
                  else
                  {
                     weights[i] += 1; // I added this else block
                  }
               }
            }
         }

Stuck trying to come up with my own PST values

Re: Stuck trying to come up with my own PST values

Re: Stuck trying to come up with my own PST values

Re: Stuck trying to come up with my own PST values

Re: Stuck trying to come up with my own PST values

Re: Stuck trying to come up with my own PST values

Re: Stuck trying to come up with my own PST values

Re: Stuck trying to come up with my own PST values

Re: Stuck trying to come up with my own PST values

Re: Stuck trying to come up with my own PST values

Re: Stuck trying to come up with my own PST values