Tapered Evaluation and MSE (Texel Tuning)

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Tapered Evaluation and MSE (Texel Tuning)

Post by hgm »

So the conclusion is that the set of positions sucks, and contains virtually no information on the mg piece values?
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Tapered Evaluation and MSE (Texel Tuning)

Post by Desperado »

Hello,

can someone confirm the number below please.

Code: Select all

    double error = mse();
    printf("\nTotal error ccrl-40-15-elo-3200.epd with K=1.0: %.16f", error);
    getchar();
Loaded Epd Positions: 1537380
Total error ccrl-40-15-elo-3200.epd with K=1.0: 0.1216016900516253

Of course, I need to further divide the problem. Although I was convinced that the data was the problem,
I have to accept that there is a hidden problem with my code. Obviously, it's not the general algorithms,
nor the now-simplified scoring function. The bug seems to be hidden in the helper functions, if there is one.
So I want to start by checking if error calculation and epd routines work correctly.

So if someone can do a simple static material evaluation with the vector 100,300,300,500,1000 on the mentioned file,
I can just see if the error sum, mentioned above, is identical. Depending on the result then further steps will follow.

Thanks a lot in advance.
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Tapered Evaluation and MSE (Texel Tuning)

Post by Desperado »

Desperado wrote: Fri Jan 15, 2021 9:00 am Hello,

can someone confirm the number below please.

Code: Select all

    double error = mse();
    printf("\nTotal error ccrl-40-15-elo-3200.epd with K=1.0: %.16f", error);
    getchar();
Loaded Epd Positions: 1537380
Total error ccrl-40-15-elo-3200.epd with K=1.0: 0.1216016900516253

Of course, I need to further divide the problem. Although I was convinced that the data was the problem,
I have to accept that there is a hidden problem with my code. Obviously, it's not the general algorithms,
nor the now-simplified scoring function. The bug seems to be hidden in the helper functions, if there is one.
So I want to start by checking if error calculation and epd routines work correctly.

So if someone can do a simple static material evaluation with the vector 100,300,300,500,1000 on the mentioned file,
I can just see if the error sum, mentioned above, is identical. Depending on the result then further steps will follow.

Thanks a lot in advance.
I use the phase encoding 1,2,4 (minor,roor,queen) with maximum 24 for mg positions.

The error for the statring vector

Code: Select all

int Eval::mgMat[7] = {0,110,310,310,510,1010,0};
int Eval::egMat[7] = {0, 90,290,290,490,990,0};
Loaded Epd Positions: 1537380
Total error ccrl-40-15-elo-3200.epd with K=1.0: 0.1215975237992763
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Tapered Evaluation and MSE (Texel Tuning)

Post by Desperado »

Desperado wrote: Fri Jan 15, 2021 9:33 am
Desperado wrote: Fri Jan 15, 2021 9:00 am Hello,

can someone confirm the number below please.

Code: Select all

    double error = mse();
    printf("\nTotal error ccrl-40-15-elo-3200.epd with K=1.0: %.16f", error);
    getchar();
Loaded Epd Positions: 1537380
Total error ccrl-40-15-elo-3200.epd with K=1.0: 0.1216016900516253

Of course, I need to further divide the problem. Although I was convinced that the data was the problem,
I have to accept that there is a hidden problem with my code. Obviously, it's not the general algorithms,
nor the now-simplified scoring function. The bug seems to be hidden in the helper functions, if there is one.
So I want to start by checking if error calculation and epd routines work correctly.

So if someone can do a simple static material evaluation with the vector 100,300,300,500,1000 on the mentioned file,
I can just see if the error sum, mentioned above, is identical. Depending on the result then further steps will follow.

Thanks a lot in advance.
I use the phase encoding 1,2,4 (minor,roor,queen) with maximum 24 for mg positions.

The error for the statring vector

Code: Select all

int Eval::mgMat[7] = {0,110,310,310,510,1010,0};
int Eval::egMat[7] = {0, 90,290,290,490,990,0};
Loaded Epd Positions: 1537380
Total error ccrl-40-15-elo-3200.epd with K=1.0: 0.1215975237992763
I did a little experiment for ccrl-40-15-elo-3200.epd(unmodified)

Code: Select all

int Eval::mgMat[7] = {0,100,300,300,500,1000,0};
int Eval::egMat[7] = {0,100,300,300,500,1000,0};

int Eval::full(pos_t* pos)
{
    int cnt;
    score_t score = {0,0};

    for(int c = WHITE; c <= BLACK; c++)
    {
        for(int p = WP + c; p <= WQ + c; p += 2)
        {
            cnt = Bit::popcnt(pos->bb[p]);
            score.mg += cnt * mgMat[PID(p)];
            score.eg += cnt * egMat[PID(p)];
        }

        score.mg = -score.mg;
        score.eg = -score.eg;
    }

    int phase = 1 * Bit::popcnt(Pos::minors(pos));
    phase += 2 * Bit::popcnt(Pos::rooks(pos));
    phase += 4 * Bit::popcnt(Pos::queens(pos));
    phase = min(phase, 24);

    // Trial - use constant
    return 4;

    int s = (score.mg * phase + score.eg * (24 - phase)) / 24;
    return pos->stm == WHITE ? s : -s;
}
MSE 0.1008900000000000 // constant eval 0
MSE 0.1008891832250924 // constant eval 1
MSE 0.1009115831442991 // constant eval 2
MSE 0.1008999752977588 // constant eval 3
MSE 0.1009115831442991 // constant eval 4
MSE 0.1009273310704712 // constant eval 5

That makes me think a lot!

1.
The constant evaluation produces a less error (maybe the best) than the normal evaluation.
Because the constant is at the same time the average, that would mean that the Tuner tries to
push the average evaluation into direction of 1cp, close to 0.

The easiest way would be to set all material values to 1cp or because of the phase evaluation the
alternative would be to diverge to a set that produces the average of 1cp too.

Now i set the alle material values to 1cp an switch of the trial code, and the result of the MSE is 0.1008926319644775

int Eval::mgMat[7] = {0,1,1,1,1,1,0};
int Eval::egMat[7] = {0,1,1,1,1,1,0};
MSE 0.1008926319644775

int Eval::mgMat[7] = {0,-1,-2,-2,-3,-4,0};
int Eval::egMat[7] = {0, 2, 3, 3, 4, 5,0};
MSE: 0.1008921291858372 even smaller than constant 1 (diverging)

2.
Having a optimal evaluation close to zero, must mean that the draw rate dominates the data. (which could be with the choice of epd file)

conclusion

Now, where do i think wrong ? What does it mean ? Looking at this, a am just more suprised of the convergence people reported.

I must have a lack in knowlege of something very essential, but what is it?

At least it is consistent with what my tuner does, the tuner tries to minimize the total points in my param vector.
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Tapered Evaluation and MSE (Texel Tuning)

Post by hgm »

The reported mse are suspect. I suppose you are calculating the error as the difference between the eval given in the EPD and what your function calculates. But if you use a constant, the minimum of that should occur when that constant is the average evaluation of all your positions, and grow as the square of the deviation from that. But when you vary the constant, the mse just seems to jump up and down randomly. Either you have a precision problem, or the calculation is incorrect.

I see that you are fitting result predictions on a scale 0 to 1, so that on average you are off by about 0.3. (Lower than 0.5, because of the high draw rate, I suppose.) In that case shifting the constant 7cP should change the result prediction by about 0.01, and the mse by 0.0001 (and 14cP then by 0.0004, etc.).

BTW, are you sure the results in your data set are given from side-to-move POV, and not from white POV?
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Tapered Evaluation and MSE (Texel Tuning)

Post by Desperado »

hgm wrote: Fri Jan 15, 2021 12:58 pm The reported mse are suspect. I suppose you are calculating the error as the difference between the eval given in the EPD and what your function calculates. But if you use a constant, the minimum of that should occur when that constant is the average evaluation of all your positions, and grow as the square of the deviation from that. But when you vary the constant, the mse just seems to jump up and down randomly. Either you have a precision problem, or the calculation is incorrect.

I se that you are fitting result predictions on a scale 0 to 1, so that on average you are off by about 0.3. (Lower than 0.5, because of the high draw rate, I suppose.) In that case shifting the constant 7cP should change the result prediction by about 0.01, and the mse by 0.0001 (and 14cP then by 0.0004, etc.).
Hello HG,

i included the posts i wrote directly before, so that everyone is able to confirm the mse calculation or classify it as wrong.
(That would really help me, to continue).

The functions computing this error are given in this thread, like the sigmoid, the loss function, the evaluation function including the phase calculation and the data.

At the moment, I am only interested in locating an error, such as index overruns, assignment operator instead of comparison operator, or generally checking the help functions.

Speculations are no longer necessary. If the mse is identical, I can classify with the data set all routines as correct, which have to do with it, Epd routines and the complete logic for the computation of the mse. If not, I am at least on the track of the first error.
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Tapered Evaluation and MSE (Texel Tuning)

Post by Desperado »

hgm wrote: Fri Jan 15, 2021 12:58 pm ...
BTW, are you sure the results in your data set are given from side-to-move POV, and not from white POV?
Somehow this were not there when i quoted it.

next quote is from: http://talkchess.com/forum3/viewtopic.p ... rl#p847806
...
3. Marks each EPD with the PGN result (0.0, 0,5 or 1.0 POV side on move). Marks the length of the PGN. Marks the continuation move in
...
Good point! HG, thank you. I need check my implementation and run a test.

Before using this database i use my own data that pointed out the result from white pov.
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: Tapered Evaluation and MSE (Texel Tuning)

Post by Pio »

Desperado wrote: Fri Jan 15, 2021 1:59 pm
hgm wrote: Fri Jan 15, 2021 12:58 pm ...
BTW, are you sure the results in your data set are given from side-to-move POV, and not from white POV?
Somehow this were not there when i quoted it.

http://talkchess.com/forum3/viewtopic.p ... rl#p847806
...
3. Marks each EPD with the PGN result (0.0, 0,5 or 1.0 POV side on move). Marks the length of the PGN. Marks the continuation move in
...
Good point! HG, thank you. I need check my implementation and run a test.

Before using this database i use my own data that pointed out the result from white pov.
I thought about that this morning too, that it could explain it as well. Another problem might be that it looks like you haven’t computed the K from your test data (see code below). That means that the mapping between centipawn scores and probabilities might be completely off. Calculate the K Value and try what hgm suggested, then I am quite confident you will get reasonable values. If you haven’t calculated the K value, using one value as anchor will be even worse.

Code: Select all

 static double sigmoid(int score)
{
    static double const K = 1.0; // TODO
    return 1.0 / (1.0 + pow(10.0, -K * score / 400));
}
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Tapered Evaluation and MSE (Texel Tuning)

Post by Desperado »

I checked the source from http://rebel13.nl/download/data.html
Pgn=0.0 means stmove loses. The scores are POV whoever is about to move. Pgn=0.5 is draw.
No doubt of that anymore.

Going on ...
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Tapered Evaluation and MSE (Texel Tuning)

Post by hgm »

Pio wrote: Fri Jan 15, 2021 2:09 pmI thought about that this morning too, that it could explain it as well. Another problem might be that it looks like you haven’t computed the K from your test data (see code below). That means that the mapping between centipawn scores and probabilities might be completely off. Calculate the K Value and try what hgm suggested, then I am quite confident you will get reasonable values. If you haven’t calculated the K value, using one value as anchor will be even worse.
An anchor is only needed when the quantities to be fitted can be arbitrarily scaled. Here that is not the case; you want to reproduce a given sigmoid. This should fix all values. (At least when the data points cover the entire space that can be spanned by the parameters. If it would only cover a lower-dimensional sub-space, such as when you only include materially balanced positions, than the optimum is degenerate, and you can impose additional requirements to lift that degeneracy.)