Tapered Evaluation and MSE (Texel Tuning)

Desperado · Post by **Desperado** » Mon Jan 11, 2021 1:52 am

Pio wrote: ↑Mon Jan 11, 2021 1:07 am
Desperado wrote: ↑Mon Jan 11, 2021 12:25 am Hi Pio,

... What do you mean by the tuner cannot split the average term for a term? ...
Code: Select all
simpleEval(){
  int mat[PID] = {0,100,300,300,500,900};
  score = materialscore();
  return score
}
In the simpleEval() the tuner will produce an average score of 300
for a minor piece. The score for a knight cannot be devided into sth. like
(100+500)/2 which is also 300 for that term (knight value).
Code: Select all
taperedEval() {
  int mgMat[PID] = {0, 50, 60, 70 150, 200};
  int egMat[PID] = {0,150,540,530,850,1600};
  mg = materialMG();
  eg = materialEG();
  score = (mg * phase + eg *(PHASE_MAX-phase)) / PHASE_MAX
  return score;
}
In the tapered eval() the tuner gets the opportunity to divide the term (knight value),
into something like (60+540)/2 which is also 300. The tuner tries to keep the 300 BUT

If you have 10000 positions and the average mg-phase is 13 of 24, the weights are 13(mg):11(eg)
code speaks more than words...
Code: Select all
    const int mg_eg_sum = 200; // pawnvalue (90,110)
    int mg, eg;
    for(int i = 0; i < mg_eg_sum; i++)
    {
        mg = (mg_eg_sum - i);
        eg = i;
        printf("\n[%d %d %.4f] ", mg, eg, (double) ((mg * 13) + (eg * 11)) / 24);
    }
    getchar();
	
The tuner will bring the mg value to its minum because the phase factor is dominant.
Code: Select all
[200 0 108.3333]
[199 1 108.2500]
[198 2 108.1667]
[197 3 108.0833]
[196 4 108.0000]
[195 5 107.9167]
[194 6 107.8333]
[193 7 107.7500]
[192 8 107.6667]
[191 9 107.5833]
[190 10 107.5000]
...
[1 199  91.7500]
Now, a short answer to your last post. Please look at the routines minimize() and solve().
There is a pretty simple documentation what the functions are doing. One Epoch loops over all parameters,
modifiying each single one by a tiny amount, as long it can be improved. The minimizing code is at most 50 lines
of code (with code duplication). The complete logic can be seen in this routine it is very simple.

I gladly accept suggestions for improvement, especially if the code shown was also read

Thanks anyway for your ideas and comments.

Regards.
I tried my best to read your code but I have only programmed very little in C++ 15 years ago.

I got the impression that you tried to get a very good estimate of every term by itself with the following code.
Code: Select all
offset;

            do
            {
                // copies  current score values. If the result does not improve
                // we will restore the scores.
                updateParameters(BACKUP);

                // modify and compute resulting mse
                *param[id].score -= offset;
                lo = mse();

                // no further improvement, there is nothing more todo for
                // the current parameter. restore the score from the latest
                // modification and continue parameter loop.
                if(lo >= bestFitness)
                {
                    updateParameters(RESET);
                    break;
                }

                bestFitness = lo;
                printParameters(param);
                printf(" %f ", bestFitness);
            } while(1);
I have no idea about what you mean by
The tuner will bring the mg value to its minum because the phase factor is dominant.
The values tuned won’t change their values in a direct way depending on the phase at all (if the phase is not also tuned). Only the sensitivity will go up with the weight (or phase it has) since the term will contribute more to the error.

I have read the code you posted, but I have no idea of how you are handling the tuning of your tapered eval. The tuning of the tapered eval should not be any different than with not having a tapered eval. You can just see the different MG EG terms as completely distinct terms that they are.

I am trying to help but I cannot guess code you haven’t posted. If you don’t want my advice such as speeding up the convergence exponentially I won’t answer any more.

Hi Pio, of course I appreciate your support!

The point has only been that I have already implemented your explanations (at least I think so).
There is also no further code. It is as simple as it is described.

The things you dont see are loading an epd file, loading(setting) the parameters and some helper functions.
The param_t includes a reference on the eval variable and a backup score and a stepsize.
Nothing more to see. It works, and getting convergence between 20 to 50 epochs and talking of minutes isn't bad at all, i think.

So, what concrete changes do you suggest? And what part of the code (in c++ relation) don't you understand?
I am not sure which parts cannot be read in abstract way. Let me know and i will explain in simple words.

These numbers are an example how the tuner will "weight" the error with a average distribution
of the phases of 13:11. (max phase 24)

In the example i thought of a pawn value 100 mg(100),eg(100)

Code: Select all

[mg|eg|phase(13:11)
[200 0 108.3333]
[199 1 108.2500]
[198 2 108.1667]
[197 3 108.0833]
...
[1 199  91.7500]

The phase ratio of 13:11 means that the error for the mg-score is multiplied with 13 and
the average eg-score is multiplied with 11. This weights the error depending on the ratio.
So, even if the error given by the mg score equals the error of the eg score, the tuner will
set the mg score to "0" and the eg score "200" because 200 * 11 is less than 200 * 13.
This is a continuous function, which leads to the one or the other extreme [0,200] or [200,0].

For a better explanation you need to plug in the code snippets in your environment and follow the code.

Thanks for your help.

Ferdy · Post by **Ferdy** » Mon Jan 11, 2021 1:54 am

Desperado wrote: ↑Sun Jan 10, 2021 10:57 pm

Code: Select all

        // Now the following situations can happen. The current offset does not
        // improve in any direction, so we continue with the next parameter.
        // More interesting is, when we improve the bestFitness, then we follow
        // the direction which improves more than the other. We keep the direction
        // and the offset as long as possible. As option we can switch to another
        // offset but keeping the direction. This might be an optimazation and should
        // be tested.

        if(lo < hi && lo < bestFitness)
        {
            bestFitness = lo;
            *param[id].score -= offset;

            do
            {
                // copies  current score values. If the result does not improve
                // we will restore the scores.
                updateParameters(BACKUP);

                // modify and compute resulting mse
                *param[id].score -= offset;
                lo = mse();

                // no further improvement, there is nothing more todo for
                // the current parameter. restore the score from the latest
                // modification and continue parameter loop.
                if(lo >= bestFitness)
                {
                    updateParameters(RESET);
                    break;
                }

                bestFitness = lo;
                printParameters(param);
                printf(" %f ", bestFitness);
            } while(1);
        }
        ...

That part is interesting indeed. When the error improves on that particular param you keep on testing it, sort of exploitation. The disadvantage could be that the other params following it would no longer have a chance to improve.

Did you try to go to next param once a param has already improved the error, sort of like exploration which is actually Texel tuning is doing?

Desperado · Post by **Desperado** » Mon Jan 11, 2021 2:01 am

Hi guys, here it is unfortunately already late, now that I have warmed up ... until tomorrow

Desperado · Post by **Desperado** » Mon Jan 11, 2021 9:23 am

Ferdy wrote: ↑Mon Jan 11, 2021 1:54 am

Desperado wrote: ↑Sun Jan 10, 2021 10:57 pm

Code: Select all

        // Now the following situations can happen. The current offset does not
        // improve in any direction, so we continue with the next parameter.
        // More interesting is, when we improve the bestFitness, then we follow
        // the direction which improves more than the other. We keep the direction
        // and the offset as long as possible. As option we can switch to another
        // offset but keeping the direction. This might be an optimazation and should
        // be tested.

        if(lo < hi && lo < bestFitness)
        {
            bestFitness = lo;
            *param[id].score -= offset;

            do
            {
                // copies  current score values. If the result does not improve
                // we will restore the scores.
                updateParameters(BACKUP);

                // modify and compute resulting mse
                *param[id].score -= offset;
                lo = mse();

                // no further improvement, there is nothing more todo for
                // the current parameter. restore the score from the latest
                // modification and continue parameter loop.
                if(lo >= bestFitness)
                {
                    updateParameters(RESET);
                    break;
                }

                bestFitness = lo;
                printParameters(param);
                printf(" %f ", bestFitness);
            } while(1);
        }
        ...

That part is interesting indeed. When the error improves on that particular param you keep on testing it, sort of exploitation. The disadvantage could be that the other params following it would no longer have a chance to improve.

Did you try to go to next param once a param has already improved the error, sort of like exploration which is actually Texel tuning is doing?

Why should subsequent parameters no longer change?
At the latest in the next epoch, the process is repeated with a new offset and if that is not sufficient, the sequence of parameter modifications can also be run through in a random sequence as described in the code.

if I take a step into a valley and I can estimate the next steps of the same size, why should I change the direction, in this case by direction I mean choose the next parameter. In the code I have used the term direction in the context of slope, just to avoid confusion.

Well, the algorithm can of course be discussed, but at the moment I am very interested in finding out how the drifting apart of MG / EG values occurs or can be stopped.

Thanks for your interest. Regards

Ferdy · Post by **Ferdy** » Mon Jan 11, 2021 9:43 am

Desperado wrote: ↑Mon Jan 11, 2021 9:23 am
Ferdy wrote: ↑Mon Jan 11, 2021 1:54 am
Desperado wrote: ↑Sun Jan 10, 2021 10:57 pm
Code: Select all
        // Now the following situations can happen. The current offset does not
        // improve in any direction, so we continue with the next parameter.
        // More interesting is, when we improve the bestFitness, then we follow
        // the direction which improves more than the other. We keep the direction
        // and the offset as long as possible. As option we can switch to another
        // offset but keeping the direction. This might be an optimazation and should
        // be tested.

        if(lo < hi && lo < bestFitness)
        {
            bestFitness = lo;
            *param[id].score -= offset;

            do
            {
                // copies  current score values. If the result does not improve
                // we will restore the scores.
                updateParameters(BACKUP);

                // modify and compute resulting mse
                *param[id].score -= offset;
                lo = mse();

                // no further improvement, there is nothing more todo for
                // the current parameter. restore the score from the latest
                // modification and continue parameter loop.
                if(lo >= bestFitness)
                {
                    updateParameters(RESET);
                    break;
                }

                bestFitness = lo;
                printParameters(param);
                printf(" %f ", bestFitness);
            } while(1);
        }
        ...
That part is interesting indeed. When the error improves on that particular param you keep on testing it, sort of exploitation. The disadvantage could be that the other params following it would no longer have a chance to improve.

Did you try to go to next param once a param has already improved the error, sort of like exploration which is actually Texel tuning is doing?
Why should subsequent parameters no longer change?
At the latest in the next epoch, the process is repeated with a new offset and if that is not sufficient, the sequence of parameter modifications can also be run through in a random sequence as described in the code.

if I take a step into a valley and I can estimate the next steps of the same size, why should I change the direction, in this case by direction I mean choose the next parameter. In the code I have used the term direction in the context of slope, just to avoid confusion.

Well, the algorithm can of course be discussed, but at the moment I am very interested in finding out how the drifting apart of MG / EG values occurs or can be stopped.

Thanks for your interest. Regards

How many training positions did you use to have the following result?

Code: Select all

Material:   P,  N,  B,  R,  Q      P,  N,  B,   R,   Q
Start: MG: 80,300,320,500,980 EG:100,300,320, 500, 980
End:   MG:  5, 24, 32, 48, 68 EG:192,595,610,1020,1990

Could you generate some stats from your training data where one side is ahead by a queen, rook, bishop, knight, pawn and how many are in mg phase and eg phase?

Pio · Post by **Pio** » Mon Jan 11, 2021 9:51 am

Ferdy wrote: ↑Mon Jan 11, 2021 9:43 am
Desperado wrote: ↑Mon Jan 11, 2021 9:23 am
Ferdy wrote: ↑Mon Jan 11, 2021 1:54 am
Desperado wrote: ↑Sun Jan 10, 2021 10:57 pm
Code: Select all
        // Now the following situations can happen. The current offset does not
        // improve in any direction, so we continue with the next parameter.
        // More interesting is, when we improve the bestFitness, then we follow
        // the direction which improves more than the other. We keep the direction
        // and the offset as long as possible. As option we can switch to another
        // offset but keeping the direction. This might be an optimazation and should
        // be tested.

        if(lo < hi && lo < bestFitness)
        {
            bestFitness = lo;
            *param[id].score -= offset;

            do
            {
                // copies  current score values. If the result does not improve
                // we will restore the scores.
                updateParameters(BACKUP);

                // modify and compute resulting mse
                *param[id].score -= offset;
                lo = mse();

                // no further improvement, there is nothing more todo for
                // the current parameter. restore the score from the latest
                // modification and continue parameter loop.
                if(lo >= bestFitness)
                {
                    updateParameters(RESET);
                    break;
                }

                bestFitness = lo;
                printParameters(param);
                printf(" %f ", bestFitness);
            } while(1);
        }
        ...
That part is interesting indeed. When the error improves on that particular param you keep on testing it, sort of exploitation. The disadvantage could be that the other params following it would no longer have a chance to improve.

Did you try to go to next param once a param has already improved the error, sort of like exploration which is actually Texel tuning is doing?
Why should subsequent parameters no longer change?
At the latest in the next epoch, the process is repeated with a new offset and if that is not sufficient, the sequence of parameter modifications can also be run through in a random sequence as described in the code.

if I take a step into a valley and I can estimate the next steps of the same size, why should I change the direction, in this case by direction I mean choose the next parameter. In the code I have used the term direction in the context of slope, just to avoid confusion.

Well, the algorithm can of course be discussed, but at the moment I am very interested in finding out how the drifting apart of MG / EG values occurs or can be stopped.

Thanks for your interest. Regards
How many training positions did you use to have the following result?
Code: Select all
Material:   P,  N,  B,  R,  Q      P,  N,  B,   R,   Q
Start: MG: 80,300,320,500,980 EG:100,300,320, 500, 980
End:   MG:  5, 24, 32, 48, 68 EG:192,595,610,1020,1990
Could you generate some stats from your training data where one side is ahead by a queen, rook, bishop, knight, pawn and how many are in mg phase and eg phase?

Something must be very wrong, since the MG and EG values are so off and the minimalisation part should get very big errors for positions where MG >> EG or MG << EG

Desperado · Post by **Desperado** » Mon Jan 11, 2021 10:00 am

Ferdy wrote: ↑Mon Jan 11, 2021 9:43 am
Desperado wrote: ↑Mon Jan 11, 2021 9:23 am
Ferdy wrote: ↑Mon Jan 11, 2021 1:54 am
Desperado wrote: ↑Sun Jan 10, 2021 10:57 pm
Code: Select all
        // Now the following situations can happen. The current offset does not
        // improve in any direction, so we continue with the next parameter.
        // More interesting is, when we improve the bestFitness, then we follow
        // the direction which improves more than the other. We keep the direction
        // and the offset as long as possible. As option we can switch to another
        // offset but keeping the direction. This might be an optimazation and should
        // be tested.

        if(lo < hi && lo < bestFitness)
        {
            bestFitness = lo;
            *param[id].score -= offset;

            do
            {
                // copies  current score values. If the result does not improve
                // we will restore the scores.
                updateParameters(BACKUP);

                // modify and compute resulting mse
                *param[id].score -= offset;
                lo = mse();

                // no further improvement, there is nothing more todo for
                // the current parameter. restore the score from the latest
                // modification and continue parameter loop.
                if(lo >= bestFitness)
                {
                    updateParameters(RESET);
                    break;
                }

                bestFitness = lo;
                printParameters(param);
                printf(" %f ", bestFitness);
            } while(1);
        }
        ...
That part is interesting indeed. When the error improves on that particular param you keep on testing it, sort of exploitation. The disadvantage could be that the other params following it would no longer have a chance to improve.

Did you try to go to next param once a param has already improved the error, sort of like exploration which is actually Texel tuning is doing?
Why should subsequent parameters no longer change?
At the latest in the next epoch, the process is repeated with a new offset and if that is not sufficient, the sequence of parameter modifications can also be run through in a random sequence as described in the code.

if I take a step into a valley and I can estimate the next steps of the same size, why should I change the direction, in this case by direction I mean choose the next parameter. In the code I have used the term direction in the context of slope, just to avoid confusion.

Well, the algorithm can of course be discussed, but at the moment I am very interested in finding out how the drifting apart of MG / EG values occurs or can be stopped.

Thanks for your interest. Regards
How many training positions did you use to have the following result?
Code: Select all
Material:   P,  N,  B,  R,  Q      P,  N,  B,   R,   Q
Start: MG: 80,300,320,500,980 EG:100,300,320, 500, 980
End:   MG:  5, 24, 32, 48, 68 EG:192,595,610,1020,1990
Could you generate some stats from your training data where one side is ahead by a queen, rook, bishop, knight, pawn and how many are in mg phase and eg phase?

Hello Ferdy,

i used batchsizes of 10K,20K,50K,100K,200K with qs training and even 1M with static evluation and several runs.
The numbers you see are arbitray and only illustrate the divergence of the scores.

I can provide stats but i need to write some code for it. I am not sure when i can do it today, but i will do it.

Ferdy · Post by **Ferdy** » Mon Jan 11, 2021 11:26 am

Dig up my texel tuning scripts before and tried it using this training set.

I used 300K positions per iteration out of 1m+ positions. Looks normal so far. Unlike your method, I tune next param once the mse is improved. Also tried one side at a time that is, +1 if it fails try -1. If +1 improves the mse then go to the next param.

In every iteration I change the 300k pos by first shuffling the 1m+ training pos randomly.

It stops reducing the mse at iteration 32.

Sven · Post by **Sven** » Mon Jan 11, 2021 1:15 pm

Ferdy wrote: ↑Mon Jan 11, 2021 11:26 am [...] I tune next param once the mse is improved. Also tried one side at a time that is, +1 if it fails try -1. If +1 improves the mse then go to the next param.

I do roughly the same in Jumbo, and I am satisfied with the resulting convergence behaviour. With a small number of parameters, like 11 (material parameters only), it needs less than a minute to converge from guessed values to a local optimum even with several 100k positions and single-threaded (Jumbo can use parallel threads for calculating the static evaluation of positions). The full static evaluation is used (with lazy eval disabled) which is quite slow in Jumbo.

This is a simplified version of the essential part of Jumbo's tuning code (I left out things like threading code, special eval topics like pawn hash, range checking of parameter values, error handling, diagnostic output etc.):

Code: Select all

struct TuningPosition {
    char    m_fen[120];
    double  m_result; // 1.0/0.5/0.0 from white viewpoint
};

class Tuner {
public:
    void run(std::string trainingFilePath, std::string parameterFilePath, uint maxIterations);

private:
    double averageError                 ();

    std::vector<TuningPosition> const   m_tuningPos;
    Board                               m_board;
};

static Parameter * tuningParam[] = {
    &(PARAM_INSTANCE(Material_Pawn_MG)),
    &(PARAM_INSTANCE(Material_Knight_MG)),
    &(PARAM_INSTANCE(Material_Bishop_MG)),
    &(PARAM_INSTANCE(Material_Rook_MG)),
    &(PARAM_INSTANCE(Material_Queen_MG)),

    //&(PARAM_INSTANCE(Material_Pawn_EG)),
    &(PARAM_INSTANCE(Material_Knight_EG)),
    &(PARAM_INSTANCE(Material_Bishop_EG)),
    &(PARAM_INSTANCE(Material_Rook_EG)),
    &(PARAM_INSTANCE(Material_Queen_EG)),

    // ...
};

static double sigmoid(int score)
{
    static double const K = 1.0; // TODO
    return 1.0 / (1.0 + pow(10.0, -K * score / 400));
}

double Tuner::averageError()
{
    double sum = 0.0;
    uint nPos = m_tuningPos.size()
    for (uint i = 0; i < nPos; i++) {
        TuningPosition const & tp = m_tuningPos[i];
        (void) m_board.setupFromFEN(tp.m_fen);
        int score = evaluateForWhite(m_board);
        double sig = sigmoid(score);
        sum += (tp.m_result - sig) * (tp.m_result - sig);
    }
    return sum / nPos;
}

void Tuner::run(char const * trainingFilePath, char const * parameterFilePath, uint maxIterations)
{
    // read training data (FEN + result) from training file into vector "m_tuningPos"
    // ...

    double e0 = averageError();
    bool isImproved = true;
    for (uint it = 0; it < maxIterations && isImproved; it++) {
        isImproved = false;
        for (uint i = 0; i < ARRAY_SIZE(tuningParam); i++) {
            Parameter & param = *(tuningParam[i]);
            constexpr int inc[2] = { +1, -1 };
            for (int j = 0; j < ARRAY_SIZE(inc) && !isImproved; j++) {
                param.add(inc);
                double e = averageError();
                isImproved = (e < e0);
                if (isImproved) {
                    e0 = e;
                } else {
                    param.add(-inc);
                }
            }
        }
    }

    // save parameters to parameter file
    // ...
}

Ferdy · Post by **Ferdy** » Mon Jan 11, 2021 6:30 pm

Ferdy wrote: ↑Mon Jan 11, 2021 11:26 am Dig up my texel tuning scripts before and tried it using this training set.

Stats of that training data.

Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)