Tapered Evaluation and MSE (Texel Tuning)

Pio · Post by **Pio** » Fri Jan 15, 2021 3:49 pm

hgm wrote: ↑Fri Jan 15, 2021 3:38 pm
Pio wrote: ↑Fri Jan 15, 2021 2:09 pmI thought about that this morning too, that it could explain it as well. Another problem might be that it looks like you haven’t computed the K from your test data (see code below). That means that the mapping between centipawn scores and probabilities might be completely off. Calculate the K Value and try what hgm suggested, then I am quite confident you will get reasonable values. If you haven’t calculated the K value, using one value as anchor will be even worse.
An anchor is only needed when the quantities to be fitted can be arbitrarily scaled. Here that is not the case; you want to reproduce a given sigmoid. This should fix all values. (At least when the data points cover the entire space that can be spanned by the parameters. If it would only cover a lower-dimensional sub-space, such as when you only include materially balanced positions, than the optimum is degenerate, and you can impose additional requirements to lift that degeneracy.)

Yes I know that. That is why I said it will make it a lot worse if you use an anchor while not having calculated the K

Ferdy · Post by **Ferdy** » Fri Jan 15, 2021 5:30 pm

Desperado wrote: ↑Fri Jan 15, 2021 9:33 am
Desperado wrote: ↑Fri Jan 15, 2021 9:00 am Hello,

can someone confirm the number below please.
Code: Select all
    double error = mse();
    printf("\nTotal error ccrl-40-15-elo-3200.epd with K=1.0: %.16f", error);
    getchar();
Loaded Epd Positions: 1537380
Total error ccrl-40-15-elo-3200.epd with K=1.0: 0.1216016900516253

Of course, I need to further divide the problem. Although I was convinced that the data was the problem,
I have to accept that there is a hidden problem with my code. Obviously, it's not the general algorithms,
nor the now-simplified scoring function. The bug seems to be hidden in the helper functions, if there is one.
So I want to start by checking if error calculation and epd routines work correctly.

So if someone can do a simple static material evaluation with the vector 100,300,300,500,1000 on the mentioned file,
I can just see if the error sum, mentioned above, is identical. Depending on the result then further steps will follow.

Thanks a lot in advance.
I use the phase encoding 1,2,4 (minor,roor,queen) with maximum 24 for mg positions.

The error for the statring vector
Code: Select all
int Eval::mgMat[7] = {0,110,310,310,510,1010,0};
int Eval::egMat[7] = {0, 90,290,290,490,990,0};
Loaded Epd Positions: 1537380
Total error ccrl-40-15-elo-3200.epd with K=1.0: 0.1215975237992763

I took the ccrl-3200 epd from here https://rebel13.nl/misc/epd.html.

Got this result.

Code: Select all

K: 0, Pos: 4041988, total_sq_error: 416103.0,          mse: 0.1029451349187578
K: 1, Pos: 4041988, total_sq_error: 451012.9131711059, mse: 0.11158195253699563
K: 2, Pos: 4041988, total_sq_error: 520544.6291908135, mse: 0.12878430841229946

Reformatted epd file is here

Example:

Code: Select all

rnb2rk1/4bppp/2p1p3/p6q/Pp6/4NNP1/1PQ1PPBP/R2R2K1 w - -,1-0

Ferdy · Post by **Ferdy** » Fri Jan 15, 2021 6:04 pm

Ferdy wrote: ↑Fri Jan 15, 2021 5:30 pm Got this result.

Code: Select all

K: 0, Pos: 4041988, total_sq_error: 416103.0,          mse: 0.1029451349187578
K: 1, Pos: 4041988, total_sq_error: 451012.9131711059, mse: 0.11158195253699563
K: 2, Pos: 4041988, total_sq_error: 520544.6291908135, mse: 0.12878430841229946

Reformatted epd file is here

Example:

Code: Select all

rnb2rk1/4bppp/2p1p3/p6q/Pp6/4NNP1/1PQ1PPBP/R2R2K1 w - -,1-0

Piece value.

Code: Select all

pvalue = [[100,300,300,500,1000], [100,300,300,500,1000]]

Full result:

Code: Select all

K: 0, Pos: 4041988, total_sq_error: 416103.0,          mse: 0.1029451349187578
K: 1, Pos: 4041988, total_sq_error: 451012.9131711059, mse: 0.11158195253699563
K: 2, Pos: 4041988, total_sq_error: 520544.6291908135, mse: 0.12878430841229946
K: 3, Pos: 4041988, total_sq_error: 584036.6130747806, mse: 0.14449241637401708
K: 4, Pos: 4041988, total_sq_error: 633132.59761596,   mse: 0.15663891075776576
K: 5, Pos: 4041988, total_sq_error: 666981.03949793,   mse: 0.16501311718340825

Desperado · Post by **Desperado** » Fri Jan 15, 2021 6:41 pm

Hello Ferdy, thank you!

I will check it as soon as possible. This time to be sure, what is the pov of the result score?

1. pov result == side to move
2. pov result == white to move

Thanks.

@All Running my code for the quiet-labeled.epd i get "normal" results, even for minibatch like 50k.

quiet-labeled.epd
70 115 330 285 345 305 430 520 975 935 best: 0.069547 epoch: 14 SE/B=50K/K=1.6308
70 115 325 290 340 310 425 525 985 935 best: 0.064040 epoch: 15 QS/B=50K/K=1.6797

Ferdy · Post by **Ferdy** » Fri Jan 15, 2021 6:51 pm

Desperado wrote: ↑Fri Jan 15, 2021 6:41 pm Hello Ferdy, thank you!

I will check it as soon as possible. This time to be sure, what is the pov of the result score?

1. pov result == side to move
2. pov result == white to move

Code: Select all

rnb2rk1/4bppp/2p1p3/p6q/Pp6/4NNP1/1PQ1PPBP/R2R2K1 w - -,1-0

That 1-0 result is white wins, 0-1 black wins, does not matter whose side to move.

Desperado · Post by **Desperado** » Fri Jan 15, 2021 7:00 pm

Ferdy wrote: ↑Fri Jan 15, 2021 6:04 pm

Ferdy wrote: ↑Fri Jan 15, 2021 5:30 pm Got this result.

Code: Select all

K: 0, Pos: 4041988, total_sq_error: 416103.0,          mse: 0.1029451349187578
K: 1, Pos: 4041988, total_sq_error: 451012.9131711059, mse: 0.11158195253699563
K: 2, Pos: 4041988, total_sq_error: 520544.6291908135, mse: 0.12878430841229946

Reformatted epd file is here

Example:

Code: Select all

rnb2rk1/4bppp/2p1p3/p6q/Pp6/4NNP1/1PQ1PPBP/R2R2K1 w - -,1-0

Piece value.

Code: Select all

pvalue = [[100,300,300,500,1000], [100,300,300,500,1000]]

Full result:

Code: Select all

K: 0, Pos: 4041988, total_sq_error: 416103.0,          mse: 0.1029451349187578
K: 1, Pos: 4041988, total_sq_error: 451012.9131711059, mse: 0.11158195253699563
K: 2, Pos: 4041988, total_sq_error: 520544.6291908135, mse: 0.12878430841229946
K: 3, Pos: 4041988, total_sq_error: 584036.6130747806, mse: 0.14449241637401708
K: 4, Pos: 4041988, total_sq_error: 633132.59761596,   mse: 0.15663891075776576
K: 5, Pos: 4041988, total_sq_error: 666981.03949793,   mse: 0.16501311718340825

Hi, that looks pretty good

Code: Select all

K=0: MSE 0.1029451349187578
K=1: MSE 0.1115819525369956
K=2: MSE 0.1287843084122995
K=3: MSE 0.1444924163740171
K=4: MSE 0.1566389107577658

So, i don't have any issues with with epd utilities or my error computation!

That was very useful for me.

Of course i will do some tuning later and look at the results.Because this is simply another database than before.

Desperado · Post by **Desperado** » Fri Jan 15, 2021 9:50 pm

Hi, before i will start to check the update operations of the datastructure and the algorithm logic,
i thought i do what i did in the afternoon already, but with the current data.

In the latest post we reported the mse of the full file with different K and a given parameter vector for the material scores.
Fine!, independet code leads to the same results.

Here is a puzzle that might suprise you.

Code: Select all

int Eval::mgMat[7] = {0,1,1,1,1,1,0};
int Eval::egMat[7] = {0,1,1,1,1,1,0};

K=1: MSE 0.1029094541968299

int Eval::mgMat[7] = {0,-1,-2,-3,-4,-5,0};
int Eval::egMat[7] = {0, 2, 3, 4, 5, 6,0};

K=1: MSE 0.1028872134400059

int Eval::mgMat[7] = {0,100,300,300,500,1000,0};
int Eval::egMat[7] = {0,100,300,300,500,1000,0};

K=1: MSE 0.1115819525369956

Both artificial and meaningless vectors result in a significant better result than the starting vector related to the mse! No doubt this time.

This is consistent with my previous observations.

hgm · Post by **hgm** » Fri Jan 15, 2021 10:47 pm

Not really a surprise. It was already clear form the constant eval experiment that there is something horribly wrong in the calculation of mse, and that it is not calculating what it should be calculating at all. Even with a completely non-sensical data set (e.g. just random numbers between 0 and 1 assigned to each position) the mse should have a parabolic dependence on a constant evaluation. You did not have that.

Desperado · Post by **Desperado** » Fri Jan 15, 2021 10:49 pm

Desperado wrote: ↑Fri Jan 15, 2021 9:50 pm Hi, before i will start to check the update operations of the datastructure and the algorithm logic,
i thought i do what i did in the afternoon already, but with the current data.

In the latest post we reported the mse of the full file with different K and a given parameter vector for the material scores.
Fine!, independet code leads to the same results.

Here is a puzzle that might suprise you.
Code: Select all
int Eval::mgMat[7] = {0,1,1,1,1,1,0};
int Eval::egMat[7] = {0,1,1,1,1,1,0};

K=1: MSE 0.1029094541968299

int Eval::mgMat[7] = {0,-1,-2,-3,-4,-5,0};
int Eval::egMat[7] = {0, 2, 3, 4, 5, 6,0};

K=1: MSE 0.1028872134400059

int Eval::mgMat[7] = {0,100,300,300,500,1000,0};
int Eval::egMat[7] = {0,100,300,300,500,1000,0};

K=1: MSE 0.1115819525369956
Both artificial and meaningless vectors result in a significant better result than the starting vector related to the mse! No doubt this time.

This is consistent with my previous observations.

Code: Select all

int Eval::mgMat[7] = {0,-20,-50,-45,-140, -5,0};
int Eval::egMat[7] = {0, 80,270,280, 435,685,0};

K=1: MSE 0.0997049328336036

LOL

Desperado · Post by **Desperado** » Fri Jan 15, 2021 10:50 pm

hgm wrote: ↑Fri Jan 15, 2021 10:47 pm Not really a surprise. It was already clear form the constant eval experiment that there is something horribly wrong in the calculation of mse, and that it is not calculating what it should be calculating at all. Even with a completely non-sensical data set (e.g. just random numbers between 0 and 1 assigned to each position) the mse should have a parabolic dependence on a constant evaluation. You did not have that.

HG, you missed something, the computation is verified as correct by Ferdy! There is nothing wrong in the calculation!
He posted several mse for different K for the complete file with 4041988 positions. The mse are completely identical!
And both of us used our own code base. Just look 3 posts before...

Then i computed the mse for the meaningless vectors, which have a clearly better mse.

After comparing the data base, we made separate measurements. There is a link and anyone can insert the vector and calculate the MSE (including you) for the data

Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)