Tapered Evaluation and MSE (Texel Tuning)

hgm · Post by **hgm** » Thu Jan 14, 2021 7:23 pm

So the conclusion is that the set of positions sucks, and contains virtually no information on the mg piece values?

Desperado · Post by **Desperado** » Fri Jan 15, 2021 9:00 am

Hello,

can someone confirm the number below please.

    double error = mse();
    printf("\nTotal error ccrl-40-15-elo-3200.epd with K=1.0: %.16f", error);
    getchar();

Loaded Epd Positions: 1537380
Total error ccrl-40-15-elo-3200.epd with K=1.0: 0.1216016900516253

Of course, I need to further divide the problem. Although I was convinced that the data was the problem,
I have to accept that there is a hidden problem with my code. Obviously, it's not the general algorithms,
nor the now-simplified scoring function. The bug seems to be hidden in the helper functions, if there is one.
So I want to start by checking if error calculation and epd routines work correctly.

So if someone can do a simple static material evaluation with the vector 100,300,300,500,1000 on the mentioned file,
I can just see if the error sum, mentioned above, is identical. Depending on the result then further steps will follow.

Thanks a lot in advance.

Desperado · Post by **Desperado** » Fri Jan 15, 2021 9:33 am

Desperado wrote: ↑Fri Jan 15, 2021 9:00 am Hello,

can someone confirm the number below please.
Code: Select all
    double error = mse();
    printf("\nTotal error ccrl-40-15-elo-3200.epd with K=1.0: %.16f", error);
    getchar();
Loaded Epd Positions: 1537380
Total error ccrl-40-15-elo-3200.epd with K=1.0: 0.1216016900516253

Of course, I need to further divide the problem. Although I was convinced that the data was the problem,
I have to accept that there is a hidden problem with my code. Obviously, it's not the general algorithms,
nor the now-simplified scoring function. The bug seems to be hidden in the helper functions, if there is one.
So I want to start by checking if error calculation and epd routines work correctly.

So if someone can do a simple static material evaluation with the vector 100,300,300,500,1000 on the mentioned file,
I can just see if the error sum, mentioned above, is identical. Depending on the result then further steps will follow.

Thanks a lot in advance.

I use the phase encoding 1,2,4 (minor,roor,queen) with maximum 24 for mg positions.

The error for the statring vector

Code: Select all

int Eval::mgMat[7] = {0,110,310,310,510,1010,0};
int Eval::egMat[7] = {0, 90,290,290,490,990,0};

Loaded Epd Positions: 1537380
Total error ccrl-40-15-elo-3200.epd with K=1.0: 0.1215975237992763

Desperado · Post by **Desperado** » Fri Jan 15, 2021 11:08 am

Desperado wrote: ↑Fri Jan 15, 2021 9:33 am
Desperado wrote: ↑Fri Jan 15, 2021 9:00 am Hello,

can someone confirm the number below please.
Code: Select all
    double error = mse();
    printf("\nTotal error ccrl-40-15-elo-3200.epd with K=1.0: %.16f", error);
    getchar();
Loaded Epd Positions: 1537380
Total error ccrl-40-15-elo-3200.epd with K=1.0: 0.1216016900516253

Of course, I need to further divide the problem. Although I was convinced that the data was the problem,
I have to accept that there is a hidden problem with my code. Obviously, it's not the general algorithms,
nor the now-simplified scoring function. The bug seems to be hidden in the helper functions, if there is one.
So I want to start by checking if error calculation and epd routines work correctly.

So if someone can do a simple static material evaluation with the vector 100,300,300,500,1000 on the mentioned file,
I can just see if the error sum, mentioned above, is identical. Depending on the result then further steps will follow.

Thanks a lot in advance.
I use the phase encoding 1,2,4 (minor,roor,queen) with maximum 24 for mg positions.

The error for the statring vector
Code: Select all
int Eval::mgMat[7] = {0,110,310,310,510,1010,0};
int Eval::egMat[7] = {0, 90,290,290,490,990,0};
Loaded Epd Positions: 1537380
Total error ccrl-40-15-elo-3200.epd with K=1.0: 0.1215975237992763

I did a little experiment for ccrl-40-15-elo-3200.epd(unmodified)

Code: Select all

int Eval::mgMat[7] = {0,100,300,300,500,1000,0};
int Eval::egMat[7] = {0,100,300,300,500,1000,0};

int Eval::full(pos_t* pos)
{
    int cnt;
    score_t score = {0,0};

    for(int c = WHITE; c <= BLACK; c++)
    {
        for(int p = WP + c; p <= WQ + c; p += 2)
        {
            cnt = Bit::popcnt(pos->bb[p]);
            score.mg += cnt * mgMat[PID(p)];
            score.eg += cnt * egMat[PID(p)];
        }

        score.mg = -score.mg;
        score.eg = -score.eg;
    }

    int phase = 1 * Bit::popcnt(Pos::minors(pos));
    phase += 2 * Bit::popcnt(Pos::rooks(pos));
    phase += 4 * Bit::popcnt(Pos::queens(pos));
    phase = min(phase, 24);

    // Trial - use constant
    return 4;

    int s = (score.mg * phase + score.eg * (24 - phase)) / 24;
    return pos->stm == WHITE ? s : -s;
}

MSE 0.1008900000000000 // constant eval 0
MSE 0.1008891832250924 // constant eval 1
MSE 0.1009115831442991 // constant eval 2
MSE 0.1008999752977588 // constant eval 3
MSE 0.1009115831442991 // constant eval 4
MSE 0.1009273310704712 // constant eval 5

That makes me think a lot!

1.
The constant evaluation produces a less error (maybe the best) than the normal evaluation.
Because the constant is at the same time the average, that would mean that the Tuner tries to
push the average evaluation into direction of 1cp, close to 0.

The easiest way would be to set all material values to 1cp or because of the phase evaluation the
alternative would be to diverge to a set that produces the average of 1cp too.

Now i set the alle material values to 1cp an switch of the trial code, and the result of the MSE is 0.1008926319644775

int Eval::mgMat[7] = {0,1,1,1,1,1,0};
int Eval::egMat[7] = {0,1,1,1,1,1,0};
MSE 0.1008926319644775

int Eval::mgMat[7] = {0,-1,-2,-2,-3,-4,0};
int Eval::egMat[7] = {0, 2, 3, 3, 4, 5,0};
MSE: 0.1008921291858372 even smaller than constant 1 (diverging)

2.
Having a optimal evaluation close to zero, must mean that the draw rate dominates the data. (which could be with the choice of epd file)

conclusion

Now, where do i think wrong ? What does it mean ? Looking at this, a am just more suprised of the convergence people reported.

I must have a lack in knowlege of something very essential, but what is it?

At least it is consistent with what my tuner does, the tuner tries to minimize the total points in my param vector.

hgm · Post by **hgm** » Fri Jan 15, 2021 12:58 pm

The reported mse are suspect. I suppose you are calculating the error as the difference between the eval given in the EPD and what your function calculates. But if you use a constant, the minimum of that should occur when that constant is the average evaluation of all your positions, and grow as the square of the deviation from that. But when you vary the constant, the mse just seems to jump up and down randomly. Either you have a precision problem, or the calculation is incorrect.

I see that you are fitting result predictions on a scale 0 to 1, so that on average you are off by about 0.3. (Lower than 0.5, because of the high draw rate, I suppose.) In that case shifting the constant 7cP should change the result prediction by about 0.01, and the mse by 0.0001 (and 14cP then by 0.0004, etc.).

BTW, are you sure the results in your data set are given from side-to-move POV, and not from white POV?

Desperado · Post by **Desperado** » Fri Jan 15, 2021 1:36 pm

hgm wrote: ↑Fri Jan 15, 2021 12:58 pm The reported mse are suspect. I suppose you are calculating the error as the difference between the eval given in the EPD and what your function calculates. But if you use a constant, the minimum of that should occur when that constant is the average evaluation of all your positions, and grow as the square of the deviation from that. But when you vary the constant, the mse just seems to jump up and down randomly. Either you have a precision problem, or the calculation is incorrect.

I se that you are fitting result predictions on a scale 0 to 1, so that on average you are off by about 0.3. (Lower than 0.5, because of the high draw rate, I suppose.) In that case shifting the constant 7cP should change the result prediction by about 0.01, and the mse by 0.0001 (and 14cP then by 0.0004, etc.).

Hello HG,

i included the posts i wrote directly before, so that everyone is able to confirm the mse calculation or classify it as wrong.
(That would really help me, to continue).

The functions computing this error are given in this thread, like the sigmoid, the loss function, the evaluation function including the phase calculation and the data.

At the moment, I am only interested in locating an error, such as index overruns, assignment operator instead of comparison operator, or generally checking the help functions.

Speculations are no longer necessary. If the mse is identical, I can classify with the data set all routines as correct, which have to do with it, Epd routines and the complete logic for the computation of the mse. If not, I am at least on the track of the first error.

Desperado · Post by **Desperado** » Fri Jan 15, 2021 1:59 pm

hgm wrote: ↑Fri Jan 15, 2021 12:58 pm ...
BTW, are you sure the results in your data set are given from side-to-move POV, and not from white POV?

Somehow this were not there when i quoted it.

next quote is from: http://talkchess.com/forum3/viewtopic.p ... rl#p847806

...
3. Marks each EPD with the PGN result (0.0, 0,5 or 1.0 POV side on move). Marks the length of the PGN. Marks the continuation move in
...

Good point! HG, thank you. I need check my implementation and run a test.

Before using this database i use my own data that pointed out the result from white pov.

Pio · Post by **Pio** » Fri Jan 15, 2021 2:09 pm

Desperado wrote: ↑Fri Jan 15, 2021 1:59 pm
hgm wrote: ↑Fri Jan 15, 2021 12:58 pm ...
BTW, are you sure the results in your data set are given from side-to-move POV, and not from white POV?
Somehow this were not there when i quoted it.

http://talkchess.com/forum3/viewtopic.p ... rl#p847806

...
3. Marks each EPD with the PGN result (0.0, 0,5 or 1.0 POV side on move). Marks the length of the PGN. Marks the continuation move in
...
Good point! HG, thank you. I need check my implementation and run a test.

Before using this database i use my own data that pointed out the result from white pov.

I thought about that this morning too, that it could explain it as well. Another problem might be that it looks like you haven’t computed the K from your test data (see code below). That means that the mapping between centipawn scores and probabilities might be completely off. Calculate the K Value and try what hgm suggested, then I am quite confident you will get reasonable values. If you haven’t calculated the K value, using one value as anchor will be even worse.

Code: Select all

 static double sigmoid(int score)
{
    static double const K = 1.0; // TODO
    return 1.0 / (1.0 + pow(10.0, -K * score / 400));
}

Desperado · Post by **Desperado** » Fri Jan 15, 2021 2:43 pm

I checked the source from http://rebel13.nl/download/data.html

Pgn=0.0 means stmove loses. The scores are POV whoever is about to move. Pgn=0.5 is draw.

No doubt of that anymore.

Going on ...

hgm · Post by **hgm** » Fri Jan 15, 2021 3:38 pm

Pio wrote: ↑Fri Jan 15, 2021 2:09 pmI thought about that this morning too, that it could explain it as well. Another problem might be that it looks like you haven’t computed the K from your test data (see code below). That means that the mapping between centipawn scores and probabilities might be completely off. Calculate the K Value and try what hgm suggested, then I am quite confident you will get reasonable values. If you haven’t calculated the K value, using one value as anchor will be even worse.

An anchor is only needed when the quantities to be fitted can be arbitrarily scaled. Here that is not the case; you want to reproduce a given sigmoid. This should fix all values. (At least when the data points cover the entire space that can be spanned by the parameters. If it would only cover a lower-dimensional sub-space, such as when you only include materially balanced positions, than the optimum is degenerate, and you can impose additional requirements to lift that degeneracy.)

Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)