Tapered Evaluation and MSE (Texel Tuning)

hgm · Post by **hgm** » Sat Jan 16, 2021 9:43 am

If the outcome is wrong, then the calculation is wrong. No matter how many times you verified it. It just means that all the verifications are wrong too.

For a constant evaluation the mse must show parbolic dependence on the constant, with a minimum at the eval corresponding to the average win rate. You did not have that.

Ferdy · Post by **Ferdy** » Sat Jan 16, 2021 11:24 am

Desperado wrote: ↑Fri Jan 15, 2021 10:49 pm
Desperado wrote: ↑Fri Jan 15, 2021 9:50 pm Hi, before i will start to check the update operations of the datastructure and the algorithm logic,
i thought i do what i did in the afternoon already, but with the current data.

In the latest post we reported the mse of the full file with different K and a given parameter vector for the material scores.
Fine!, independet code leads to the same results.

Here is a puzzle that might suprise you.
Code: Select all
int Eval::mgMat[7] = {0,1,1,1,1,1,0};
int Eval::egMat[7] = {0,1,1,1,1,1,0};

K=1: MSE 0.1029094541968299

int Eval::mgMat[7] = {0,-1,-2,-3,-4,-5,0};
int Eval::egMat[7] = {0, 2, 3, 4, 5, 6,0};

K=1: MSE 0.1028872134400059

int Eval::mgMat[7] = {0,100,300,300,500,1000,0};
int Eval::egMat[7] = {0,100,300,300,500,1000,0};

K=1: MSE 0.1115819525369956
Both artificial and meaningless vectors result in a significant better result than the starting vector related to the mse! No doubt this time.

This is consistent with my previous observations.
Code: Select all
int Eval::mgMat[7] = {0,-20,-50,-45,-140, -5,0};
int Eval::egMat[7] = {0, 80,270,280, 435,685,0};

K=1: MSE 0.0997049328336036
LOL

The training positions may have a position where the result is a draw and one side is ahead in material and other configuration. For example.

Code: Select all

3kr3/8/4B1R1/8/3K4/8/8/8 w - -,1/2-1/2

[d]3kr3/8/4B1R1/8/3K4/8/8/8 w - - 91 138

With piece value 1, 1, 1, 1, 1

Code: Select all

K: 1, Pos: 1, total_sq_error: 0.000002071, mse: 0.000002071, wpov_mat_score: 1

With piece value 100, 300, 300, 500, 1000

Code: Select all

K: 1, Pos: 1, total_sq_error: 0.121815269, mse: 0.121815269, wpov_mat_score: 300

The error of crappy piece values is lower because its piece values are lower. If result is a draw, it is expected that an evaluation that is close to zero has lesser error.

Desperado · Post by **Desperado** » Sat Jan 16, 2021 11:28 am

hgm wrote: ↑Sat Jan 16, 2021 9:43 am If the outcome is wrong, then the calculation is wrong. No matter how many times you verified it. It just means that all the verifications are wrong too.

For a constant evaluation the mse must show parbolic dependence on the constant, with a minimum at the eval corresponding to the average win rate. You did not have that.

Sorry HG, you are wrong, clearly! Here we go...

Result by calculation for constant eval "0":

total positions: 4041988

white:1096478 draw: 2377576 black: 567934

constant eval: "0" results in "0.5" because of sigmoid(0,400)

error white wins
e = 1-0.5 = 0.5 / sE = e*e = 0.25 / sE * 1096478 = 274119.5

error draws
e = 0.5 - 0.5 = 0 / sE = e*e = 0 / sE * 2377576 = 0

error black wins
e = 0.0-0.5 = -0.5 / sE = e*e = 0.25 / sE * 567934 = 141983.5

Total Error: 274119.5 + 0 + 141983.5 = 416103
Average Error: 416103 / 4041988 = 0.10294513491875779937990909423779

Result by measurement for constant eval "0":

0.1029451349187578

If you have a constant eval the mse only depends on the distribution of the results.
My code for that part is fine. Ferdy's result are fine too, because we measured identically results.

The vectors with the artificial numbers relate to a significant smaller error. That is a fact!

Desperado · Post by **Desperado** » Sat Jan 16, 2021 11:34 am

hgm wrote: ↑Sat Jan 16, 2021 9:43 am If the outcome is wrong, ...

Or do you focus on that matter? (meaning the result of the game)

Desperado · Post by **Desperado** » Sat Jan 16, 2021 12:11 pm

Ferdy wrote: ↑Sat Jan 16, 2021 11:24 am
Desperado wrote: ↑Fri Jan 15, 2021 10:49 pm
Desperado wrote: ↑Fri Jan 15, 2021 9:50 pm Hi, before i will start to check the update operations of the datastructure and the algorithm logic,
i thought i do what i did in the afternoon already, but with the current data.

In the latest post we reported the mse of the full file with different K and a given parameter vector for the material scores.
Fine!, independet code leads to the same results.

Here is a puzzle that might suprise you.
Code: Select all
int Eval::mgMat[7] = {0,1,1,1,1,1,0};
int Eval::egMat[7] = {0,1,1,1,1,1,0};

K=1: MSE 0.1029094541968299

int Eval::mgMat[7] = {0,-1,-2,-3,-4,-5,0};
int Eval::egMat[7] = {0, 2, 3, 4, 5, 6,0};

K=1: MSE 0.1028872134400059

int Eval::mgMat[7] = {0,100,300,300,500,1000,0};
int Eval::egMat[7] = {0,100,300,300,500,1000,0};

K=1: MSE 0.1115819525369956
Both artificial and meaningless vectors result in a significant better result than the starting vector related to the mse! No doubt this time.

This is consistent with my previous observations.
Code: Select all
int Eval::mgMat[7] = {0,-20,-50,-45,-140, -5,0};
int Eval::egMat[7] = {0, 80,270,280, 435,685,0};

K=1: MSE 0.0997049328336036
LOL
The training positions may have a position where the result is a draw and one side is ahead in material and other configuration. For example.
Code: Select all
3kr3/8/4B1R1/8/3K4/8/8/8 w - -,1/2-1/2
[d]3kr3/8/4B1R1/8/3K4/8/8/8 w - - 91 138

With piece value 1, 1, 1, 1, 1
Code: Select all
K: 1, Pos: 1, total_sq_error: 0.000002071, mse: 0.000002071, wpov_mat_score: 1
With piece value 100, 300, 300, 500, 1000
Code: Select all
K: 1, Pos: 1, total_sq_error: 0.121815269, mse: 0.121815269, wpov_mat_score: 300
The error of crappy piece values is lower because its piece values are lower. If result is a draw, it is expected that an evaluation that is close to zero has lesser error.

Hello Ferdy,

for the moment, but i will come back later again, i only can tell you, that the tuner is not sensitive to such information,
that you used as explanation. For the tuner, only the MSE plays a role (without interpretation).

In this special context, i can tell that my tuner behaves more correct than yours (as it seems, but we may check that later).
As long as the tuner finds smaller values he continues.

I would like to know, when you explained your algorithm, that you shuffle data inbetween. Of course you need to update the
reference value (best value) after shuffling. I think you take that into account, do you ? If not, it would be to compare two mse of two different
data sets, that would easily run pretty fast in your stop condition of three iterations. Just a thought...

hgm · Post by **hgm** » Sat Jan 16, 2021 1:29 pm

Desperado wrote: ↑Sat Jan 16, 2021 11:28 am
hgm wrote: ↑Sat Jan 16, 2021 9:43 am If the outcome is wrong, then the calculation is wrong. No matter how many times you verified it. It just means that all the verifications are wrong too.

For a constant evaluation the mse must show parbolic dependence on the constant, with a minimum at the eval corresponding to the average win rate. You did not have that.
Sorry HG, you are wrong, clearly! Here we go...

Result by calculation for constant eval "0":

total positions: 4041988

white:1096478 draw: 2377576 black: 567934

constant eval: "0" results in "0.5" because of sigmoid(0,400)

error white wins
e = 1-0.5 = 0.5 / sE = e*e = 0.25 / sE * 1096478 = 274119.5

error draws
e = 0.5 - 0.5 = 0 / sE = e*e = 0 / sE * 2377576 = 0

error black wins
e = 0.0-0.5 = -0.5 / sE = e*e = 0.25 / sE * 567934 = 141983.5

Total Error: 274119.5 + 0 + 141983.5 = 416103
Average Error: 416103 / 4041988 = 0.10294513491875779937990909423779

Result by measurement for constant eval "0":

0.1029451349187578

If you have a constant eval the mse only depends on the distribution of the results.
My code for that part is fine. Ferdy's result are fine too, because we measured identically results.

The vectors with the artificial numbers relate to a significant smaller error. That is a fact!

That doesn't address my point at all. You only calculate here for one value of the constant eval. That shows nothing about how the mse depends on that constant.

You did show the mse for several different values of the constant, and it was clearly not fitting a parabola.

What the mse is for several subsets of the data is not relevant.

Desperado · Post by **Desperado** » Sat Jan 16, 2021 1:58 pm

hgm wrote: ↑Sat Jan 16, 2021 1:29 pm
Desperado wrote: ↑Sat Jan 16, 2021 11:28 am
hgm wrote: ↑Sat Jan 16, 2021 9:43 am If the outcome is wrong, then the calculation is wrong. No matter how many times you verified it. It just means that all the verifications are wrong too.

For a constant evaluation the mse must show parbolic dependence on the constant, with a minimum at the eval corresponding to the average win rate. You did not have that.
Sorry HG, you are wrong, clearly! Here we go...

Result by calculation for constant eval "0":

total positions: 4041988

white:1096478 draw: 2377576 black: 567934

constant eval: "0" results in "0.5" because of sigmoid(0,400)

error white wins
e = 1-0.5 = 0.5 / sE = e*e = 0.25 / sE * 1096478 = 274119.5

error draws
e = 0.5 - 0.5 = 0 / sE = e*e = 0 / sE * 2377576 = 0

error black wins
e = 0.0-0.5 = -0.5 / sE = e*e = 0.25 / sE * 567934 = 141983.5

Total Error: 274119.5 + 0 + 141983.5 = 416103
Average Error: 416103 / 4041988 = 0.10294513491875779937990909423779

Result by measurement for constant eval "0":

0.1029451349187578

If you have a constant eval the mse only depends on the distribution of the results.
My code for that part is fine. Ferdy's result are fine too, because we measured identically results.

The vectors with the artificial numbers relate to a significant smaller error. That is a fact!
That doesn't address my point at all. You only calculate here for one value of the constant eval. That shows nothing about how the mse depends on that constant.

You did show the mse for several different values of the constant, and it was clearly not fitting a parabola.

What the mse is for several subsets of the data is not relevant.

It is absolutely relevant.

1. It shows clearly the dependency on the WDL distribution. That effects any subset with the same magnitude of order.
2. I shows that the code works correct.

If you want to say that the data is useless in the way it is analyzed, well, then the topic remains debatable.

My math skills have been rusty since I graduated more than 20 years ago. If I understand you correctly, you want to know how the number of scores are distributed. For example, how often is the result -4, 20, 127 or any other present? That would change with any different vector.
Maybe I am wrong and still don't understand what you mean. In my opinion, this would be an indicator for the quality of the data set
but would have no connection with the technical determination/calculation of the mean error.

In any case, the analyses are relevant in the mentioned context.

Desperado · Post by **Desperado** » Sat Jan 16, 2021 3:33 pm

Desperado wrote: ↑Sat Jan 16, 2021 12:11 pm
Ferdy wrote: ↑Sat Jan 16, 2021 11:24 am
Desperado wrote: ↑Fri Jan 15, 2021 10:49 pm
Desperado wrote: ↑Fri Jan 15, 2021 9:50 pm Hi, before i will start to check the update operations of the datastructure and the algorithm logic,
i thought i do what i did in the afternoon already, but with the current data.

In the latest post we reported the mse of the full file with different K and a given parameter vector for the material scores.
Fine!, independet code leads to the same results.

Here is a puzzle that might suprise you.
Code: Select all
int Eval::mgMat[7] = {0,1,1,1,1,1,0};
int Eval::egMat[7] = {0,1,1,1,1,1,0};

K=1: MSE 0.1029094541968299

int Eval::mgMat[7] = {0,-1,-2,-3,-4,-5,0};
int Eval::egMat[7] = {0, 2, 3, 4, 5, 6,0};

K=1: MSE 0.1028872134400059

int Eval::mgMat[7] = {0,100,300,300,500,1000,0};
int Eval::egMat[7] = {0,100,300,300,500,1000,0};

K=1: MSE 0.1115819525369956
Both artificial and meaningless vectors result in a significant better result than the starting vector related to the mse! No doubt this time.

This is consistent with my previous observations.
Code: Select all
int Eval::mgMat[7] = {0,-20,-50,-45,-140, -5,0};
int Eval::egMat[7] = {0, 80,270,280, 435,685,0};

K=1: MSE 0.0997049328336036
LOL
The training positions may have a position where the result is a draw and one side is ahead in material and other configuration. For example.
Code: Select all
3kr3/8/4B1R1/8/3K4/8/8/8 w - -,1/2-1/2
[d]3kr3/8/4B1R1/8/3K4/8/8/8 w - - 91 138

With piece value 1, 1, 1, 1, 1
Code: Select all
K: 1, Pos: 1, total_sq_error: 0.000002071, mse: 0.000002071, wpov_mat_score: 1
With piece value 100, 300, 300, 500, 1000
Code: Select all
K: 1, Pos: 1, total_sq_error: 0.121815269, mse: 0.121815269, wpov_mat_score: 300
The error of crappy piece values is lower because its piece values are lower. If result is a draw, it is expected that an evaluation that is close to zero has lesser error.
Hello Ferdy,

for the moment, but i will come back later again, i only can tell you, that the tuner is not sensitive to such information,
that you used as explanation. For the tuner, only the MSE plays a role (without interpretation).

In this special context, i can tell that my tuner behaves more correct than yours (as it seems, but we may check that later).
As long as the tuner finds smaller values he continues.

I would like to know, when you explained your algorithm, that you shuffle data inbetween. Of course you need to update the
reference value (best value) after shuffling. I think you take that into account, do you ? If not, it would be to compare two mse of two different
data sets, that would easily run pretty fast in your stop condition of three iterations. Just a thought...

This effect is produced in the existing data even in mid-game phases (especially if you use only a pure static evaluation and no quiescent).
And now the core of the matter becomes visible.
This could be the reason why the tuner minimizes the midgame values so extremely and adjusts the endgame values accordingly.
Especially if this type of error is particularly frequent when the board is almost full.
I guess there are a lot of positions that are unbalanced but include a draw.

Especially the findings since yesterday point more and more to the fact that I am not imagining all this.

Current summary:
1. my mse() calculation works correct
2. the existing of artifical/meaningless vectors that produce a smaller mse() is real
3. my tuner does not run into a stop criterium that prevents to explore this space of vectors.
And the tuner should not do that, because the only relevant part is the mse value.

I wonder why other tuner, using cpw logic converge because it can be easliy shown there is a smaller mse for an idiotic vector.
Talking about the algorithm itself was not very intensive along that thread until now, and i am still open for the possibility that
something is horrible wrong on my side, the arguments/evidence to date supports that i am not wrong at all.

hgm · Post by **hgm** » Sat Jan 16, 2021 4:42 pm

Desperado wrote: ↑Sat Jan 16, 2021 1:58 pmIf I understand you correctly, you want to know how the number of scores are distributed. For example, how often is the result -4, 20, 127 or any other present?

No, not at all. I am talking about the constant-evaluation case. So one score would be present umpty-thousand times, and no other, and you know which score that is. I am talking about these results you posted:

MSE 0.1008900000000000 // constant eval 0
MSE 0.1008891832250924 // constant eval 1
MSE 0.1009115831442991 // constant eval 2
MSE 0.1008999752977588 // constant eval 3
MSE 0.1009115831442991 // constant eval 4
MSE 0.1009273310704712 // constant eval 5

This shows beyond any doubt that the mse is not calculated correctly. The graph of MSE vs the constant is not a parabola, but jumps up and down irregularly.

Desperado · Post by **Desperado** » Sat Jan 16, 2021 4:55 pm

hgm wrote: ↑Sat Jan 16, 2021 4:42 pm
Desperado wrote: ↑Sat Jan 16, 2021 1:58 pmIf I understand you correctly, you want to know how the number of scores are distributed. For example, how often is the result -4, 20, 127 or any other present?
No, not at all. I am talking about the constant-evaluation case. So one score would be present umpty-thousand times, and no other, and you know which score that is. I am talking about these results you posted:

MSE 0.1008900000000000 // constant eval 0
MSE 0.1008891832250924 // constant eval 1
MSE 0.1009115831442991 // constant eval 2
MSE 0.1008999752977588 // constant eval 3
MSE 0.1009115831442991 // constant eval 4
MSE 0.1009273310704712 // constant eval 5

This shows beyond any doubt that the mse is not calculated correctly. The graph of MSE vs the constant is not a parabola, but jumps up and down irregularly.

This data is outdated. We switched to the same database after you gave the hint with stm results in the database.
First, we worked on different data, second, the interpretation of the result at that point was wrong. So, the quoted data is garbage,
I agree with you in that.

But after Ferdy reported the mse (Fri Jan 15, 2021 4:30 pm), everything did change!
No doubt about database, results and computation. So you can follow from that point.

Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)