Tapered Evaluation and MSE (Texel Tuning)
Moderator: Ras
-
- Posts: 28354
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Tapered Evaluation and MSE (Texel Tuning)
If the outcome is wrong, then the calculation is wrong. No matter how many times you verified it. It just means that all the verifications are wrong too.
For a constant evaluation the mse must show parbolic dependence on the constant, with a minimum at the eval corresponding to the average win rate. You did not have that.
For a constant evaluation the mse must show parbolic dependence on the constant, with a minimum at the eval corresponding to the average win rate. You did not have that.
-
- Posts: 4846
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Tapered Evaluation and MSE (Texel Tuning)
The training positions may have a position where the result is a draw and one side is ahead in material and other configuration. For example.Desperado wrote: ↑Fri Jan 15, 2021 10:49 pmDesperado wrote: ↑Fri Jan 15, 2021 9:50 pm Hi, before i will start to check the update operations of the datastructure and the algorithm logic,
i thought i do what i did in the afternoon already, but with the current data.
In the latest post we reported the mse of the full file with different K and a given parameter vector for the material scores.
Fine!, independet code leads to the same results.
Here is a puzzle that might suprise you.
Both artificial and meaningless vectors result in a significant better result than the starting vector related to the mse! No doubt this time.Code: Select all
int Eval::mgMat[7] = {0,1,1,1,1,1,0}; int Eval::egMat[7] = {0,1,1,1,1,1,0}; K=1: MSE 0.1029094541968299 int Eval::mgMat[7] = {0,-1,-2,-3,-4,-5,0}; int Eval::egMat[7] = {0, 2, 3, 4, 5, 6,0}; K=1: MSE 0.1028872134400059 int Eval::mgMat[7] = {0,100,300,300,500,1000,0}; int Eval::egMat[7] = {0,100,300,300,500,1000,0}; K=1: MSE 0.1115819525369956
This is consistent with my previous observations.LOLCode: Select all
int Eval::mgMat[7] = {0,-20,-50,-45,-140, -5,0}; int Eval::egMat[7] = {0, 80,270,280, 435,685,0}; K=1: MSE 0.0997049328336036
Code: Select all
3kr3/8/4B1R1/8/3K4/8/8/8 w - -,1/2-1/2
With piece value 1, 1, 1, 1, 1
Code: Select all
K: 1, Pos: 1, total_sq_error: 0.000002071, mse: 0.000002071, wpov_mat_score: 1
Code: Select all
K: 1, Pos: 1, total_sq_error: 0.121815269, mse: 0.121815269, wpov_mat_score: 300
-
- Posts: 879
- Joined: Mon Dec 15, 2008 11:45 am
Re: Tapered Evaluation and MSE (Texel Tuning)
Sorry HG, you are wrong, clearly! Here we go...hgm wrote: ↑Sat Jan 16, 2021 9:43 am If the outcome is wrong, then the calculation is wrong. No matter how many times you verified it. It just means that all the verifications are wrong too.
For a constant evaluation the mse must show parbolic dependence on the constant, with a minimum at the eval corresponding to the average win rate. You did not have that.
Result by calculation for constant eval "0":
total positions: 4041988
white:1096478 draw: 2377576 black: 567934
constant eval: "0" results in "0.5" because of sigmoid(0,400)
error white wins
e = 1-0.5 = 0.5 / sE = e*e = 0.25 / sE * 1096478 = 274119.5
error draws
e = 0.5 - 0.5 = 0 / sE = e*e = 0 / sE * 2377576 = 0
error black wins
e = 0.0-0.5 = -0.5 / sE = e*e = 0.25 / sE * 567934 = 141983.5
Total Error: 274119.5 + 0 + 141983.5 = 416103
Average Error: 416103 / 4041988 = 0.10294513491875779937990909423779
Result by measurement for constant eval "0":
0.1029451349187578
If you have a constant eval the mse only depends on the distribution of the results.
My code for that part is fine. Ferdy's result are fine too, because we measured identically results.
The vectors with the artificial numbers relate to a significant smaller error. That is a fact!
-
- Posts: 879
- Joined: Mon Dec 15, 2008 11:45 am
-
- Posts: 879
- Joined: Mon Dec 15, 2008 11:45 am
Re: Tapered Evaluation and MSE (Texel Tuning)
Hello Ferdy,Ferdy wrote: ↑Sat Jan 16, 2021 11:24 amThe training positions may have a position where the result is a draw and one side is ahead in material and other configuration. For example.Desperado wrote: ↑Fri Jan 15, 2021 10:49 pmDesperado wrote: ↑Fri Jan 15, 2021 9:50 pm Hi, before i will start to check the update operations of the datastructure and the algorithm logic,
i thought i do what i did in the afternoon already, but with the current data.
In the latest post we reported the mse of the full file with different K and a given parameter vector for the material scores.
Fine!, independet code leads to the same results.
Here is a puzzle that might suprise you.
Both artificial and meaningless vectors result in a significant better result than the starting vector related to the mse! No doubt this time.Code: Select all
int Eval::mgMat[7] = {0,1,1,1,1,1,0}; int Eval::egMat[7] = {0,1,1,1,1,1,0}; K=1: MSE 0.1029094541968299 int Eval::mgMat[7] = {0,-1,-2,-3,-4,-5,0}; int Eval::egMat[7] = {0, 2, 3, 4, 5, 6,0}; K=1: MSE 0.1028872134400059 int Eval::mgMat[7] = {0,100,300,300,500,1000,0}; int Eval::egMat[7] = {0,100,300,300,500,1000,0}; K=1: MSE 0.1115819525369956
This is consistent with my previous observations.LOLCode: Select all
int Eval::mgMat[7] = {0,-20,-50,-45,-140, -5,0}; int Eval::egMat[7] = {0, 80,270,280, 435,685,0}; K=1: MSE 0.0997049328336036
[d]3kr3/8/4B1R1/8/3K4/8/8/8 w - - 91 138Code: Select all
3kr3/8/4B1R1/8/3K4/8/8/8 w - -,1/2-1/2
With piece value 1, 1, 1, 1, 1With piece value 100, 300, 300, 500, 1000Code: Select all
K: 1, Pos: 1, total_sq_error: 0.000002071, mse: 0.000002071, wpov_mat_score: 1
The error of crappy piece values is lower because its piece values are lower. If result is a draw, it is expected that an evaluation that is close to zero has lesser error.Code: Select all
K: 1, Pos: 1, total_sq_error: 0.121815269, mse: 0.121815269, wpov_mat_score: 300
for the moment, but i will come back later again, i only can tell you, that the tuner is not sensitive to such information,
that you used as explanation. For the tuner, only the MSE plays a role (without interpretation).
In this special context, i can tell that my tuner behaves more correct than yours (as it seems, but we may check that later).
As long as the tuner finds smaller values he continues.
I would like to know, when you explained your algorithm, that you shuffle data inbetween. Of course you need to update the
reference value (best value) after shuffling. I think you take that into account, do you ? If not, it would be to compare two mse of two different
data sets, that would easily run pretty fast in your stop condition of three iterations. Just a thought...
-
- Posts: 28354
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Tapered Evaluation and MSE (Texel Tuning)
That doesn't address my point at all. You only calculate here for one value of the constant eval. That shows nothing about how the mse depends on that constant.Desperado wrote: ↑Sat Jan 16, 2021 11:28 amSorry HG, you are wrong, clearly! Here we go...hgm wrote: ↑Sat Jan 16, 2021 9:43 am If the outcome is wrong, then the calculation is wrong. No matter how many times you verified it. It just means that all the verifications are wrong too.
For a constant evaluation the mse must show parbolic dependence on the constant, with a minimum at the eval corresponding to the average win rate. You did not have that.
Result by calculation for constant eval "0":
total positions: 4041988
white:1096478 draw: 2377576 black: 567934
constant eval: "0" results in "0.5" because of sigmoid(0,400)
error white wins
e = 1-0.5 = 0.5 / sE = e*e = 0.25 / sE * 1096478 = 274119.5
error draws
e = 0.5 - 0.5 = 0 / sE = e*e = 0 / sE * 2377576 = 0
error black wins
e = 0.0-0.5 = -0.5 / sE = e*e = 0.25 / sE * 567934 = 141983.5
Total Error: 274119.5 + 0 + 141983.5 = 416103
Average Error: 416103 / 4041988 = 0.10294513491875779937990909423779
Result by measurement for constant eval "0":
0.1029451349187578
If you have a constant eval the mse only depends on the distribution of the results.
My code for that part is fine. Ferdy's result are fine too, because we measured identically results.
The vectors with the artificial numbers relate to a significant smaller error. That is a fact!
You did show the mse for several different values of the constant, and it was clearly not fitting a parabola.
What the mse is for several subsets of the data is not relevant.
-
- Posts: 879
- Joined: Mon Dec 15, 2008 11:45 am
Re: Tapered Evaluation and MSE (Texel Tuning)
It is absolutely relevant.hgm wrote: ↑Sat Jan 16, 2021 1:29 pmThat doesn't address my point at all. You only calculate here for one value of the constant eval. That shows nothing about how the mse depends on that constant.Desperado wrote: ↑Sat Jan 16, 2021 11:28 amSorry HG, you are wrong, clearly! Here we go...hgm wrote: ↑Sat Jan 16, 2021 9:43 am If the outcome is wrong, then the calculation is wrong. No matter how many times you verified it. It just means that all the verifications are wrong too.
For a constant evaluation the mse must show parbolic dependence on the constant, with a minimum at the eval corresponding to the average win rate. You did not have that.
Result by calculation for constant eval "0":
total positions: 4041988
white:1096478 draw: 2377576 black: 567934
constant eval: "0" results in "0.5" because of sigmoid(0,400)
error white wins
e = 1-0.5 = 0.5 / sE = e*e = 0.25 / sE * 1096478 = 274119.5
error draws
e = 0.5 - 0.5 = 0 / sE = e*e = 0 / sE * 2377576 = 0
error black wins
e = 0.0-0.5 = -0.5 / sE = e*e = 0.25 / sE * 567934 = 141983.5
Total Error: 274119.5 + 0 + 141983.5 = 416103
Average Error: 416103 / 4041988 = 0.10294513491875779937990909423779
Result by measurement for constant eval "0":
0.1029451349187578
If you have a constant eval the mse only depends on the distribution of the results.
My code for that part is fine. Ferdy's result are fine too, because we measured identically results.
The vectors with the artificial numbers relate to a significant smaller error. That is a fact!
You did show the mse for several different values of the constant, and it was clearly not fitting a parabola.
What the mse is for several subsets of the data is not relevant.
1. It shows clearly the dependency on the WDL distribution. That effects any subset with the same magnitude of order.
2. I shows that the code works correct.
If you want to say that the data is useless in the way it is analyzed, well, then the topic remains debatable.
My math skills have been rusty since I graduated more than 20 years ago. If I understand you correctly, you want to know how the number of scores are distributed. For example, how often is the result -4, 20, 127 or any other present? That would change with any different vector.
Maybe I am wrong and still don't understand what you mean. In my opinion, this would be an indicator for the quality of the data set
but would have no connection with the technical determination/calculation of the mean error.
In any case, the analyses are relevant in the mentioned context.
-
- Posts: 879
- Joined: Mon Dec 15, 2008 11:45 am
Re: Tapered Evaluation and MSE (Texel Tuning)
This effect is produced in the existing data even in mid-game phases (especially if you use only a pure static evaluation and no quiescent).Desperado wrote: ↑Sat Jan 16, 2021 12:11 pmHello Ferdy,Ferdy wrote: ↑Sat Jan 16, 2021 11:24 amThe training positions may have a position where the result is a draw and one side is ahead in material and other configuration. For example.Desperado wrote: ↑Fri Jan 15, 2021 10:49 pmDesperado wrote: ↑Fri Jan 15, 2021 9:50 pm Hi, before i will start to check the update operations of the datastructure and the algorithm logic,
i thought i do what i did in the afternoon already, but with the current data.
In the latest post we reported the mse of the full file with different K and a given parameter vector for the material scores.
Fine!, independet code leads to the same results.
Here is a puzzle that might suprise you.
Both artificial and meaningless vectors result in a significant better result than the starting vector related to the mse! No doubt this time.Code: Select all
int Eval::mgMat[7] = {0,1,1,1,1,1,0}; int Eval::egMat[7] = {0,1,1,1,1,1,0}; K=1: MSE 0.1029094541968299 int Eval::mgMat[7] = {0,-1,-2,-3,-4,-5,0}; int Eval::egMat[7] = {0, 2, 3, 4, 5, 6,0}; K=1: MSE 0.1028872134400059 int Eval::mgMat[7] = {0,100,300,300,500,1000,0}; int Eval::egMat[7] = {0,100,300,300,500,1000,0}; K=1: MSE 0.1115819525369956
This is consistent with my previous observations.LOLCode: Select all
int Eval::mgMat[7] = {0,-20,-50,-45,-140, -5,0}; int Eval::egMat[7] = {0, 80,270,280, 435,685,0}; K=1: MSE 0.0997049328336036
[d]3kr3/8/4B1R1/8/3K4/8/8/8 w - - 91 138Code: Select all
3kr3/8/4B1R1/8/3K4/8/8/8 w - -,1/2-1/2
With piece value 1, 1, 1, 1, 1With piece value 100, 300, 300, 500, 1000Code: Select all
K: 1, Pos: 1, total_sq_error: 0.000002071, mse: 0.000002071, wpov_mat_score: 1
The error of crappy piece values is lower because its piece values are lower. If result is a draw, it is expected that an evaluation that is close to zero has lesser error.Code: Select all
K: 1, Pos: 1, total_sq_error: 0.121815269, mse: 0.121815269, wpov_mat_score: 300
for the moment, but i will come back later again, i only can tell you, that the tuner is not sensitive to such information,
that you used as explanation. For the tuner, only the MSE plays a role (without interpretation).
In this special context, i can tell that my tuner behaves more correct than yours (as it seems, but we may check that later).
As long as the tuner finds smaller values he continues.
I would like to know, when you explained your algorithm, that you shuffle data inbetween. Of course you need to update the
reference value (best value) after shuffling. I think you take that into account, do you ? If not, it would be to compare two mse of two different
data sets, that would easily run pretty fast in your stop condition of three iterations. Just a thought...
And now the core of the matter becomes visible.
This could be the reason why the tuner minimizes the midgame values so extremely and adjusts the endgame values accordingly.
Especially if this type of error is particularly frequent when the board is almost full.
I guess there are a lot of positions that are unbalanced but include a draw.
Especially the findings since yesterday point more and more to the fact that I am not imagining all this.
Current summary:
1. my mse() calculation works correct
2. the existing of artifical/meaningless vectors that produce a smaller mse() is real
3. my tuner does not run into a stop criterium that prevents to explore this space of vectors.
And the tuner should not do that, because the only relevant part is the mse value.
I wonder why other tuner, using cpw logic converge because it can be easliy shown there is a smaller mse for an idiotic vector.
Talking about the algorithm itself was not very intensive along that thread until now, and i am still open for the possibility that
something is horrible wrong on my side, the arguments/evidence to date supports that i am not wrong at all.
-
- Posts: 28354
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Tapered Evaluation and MSE (Texel Tuning)
No, not at all. I am talking about the constant-evaluation case. So one score would be present umpty-thousand times, and no other, and you know which score that is. I am talking about these results you posted:
MSE 0.1008900000000000 // constant eval 0
MSE 0.1008891832250924 // constant eval 1
MSE 0.1009115831442991 // constant eval 2
MSE 0.1008999752977588 // constant eval 3
MSE 0.1009115831442991 // constant eval 4
MSE 0.1009273310704712 // constant eval 5
This shows beyond any doubt that the mse is not calculated correctly. The graph of MSE vs the constant is not a parabola, but jumps up and down irregularly.
-
- Posts: 879
- Joined: Mon Dec 15, 2008 11:45 am
Re: Tapered Evaluation and MSE (Texel Tuning)
This data is outdated. We switched to the same database after you gave the hint with stm results in the database.hgm wrote: ↑Sat Jan 16, 2021 4:42 pmNo, not at all. I am talking about the constant-evaluation case. So one score would be present umpty-thousand times, and no other, and you know which score that is. I am talking about these results you posted:
MSE 0.1008900000000000 // constant eval 0
MSE 0.1008891832250924 // constant eval 1
MSE 0.1009115831442991 // constant eval 2
MSE 0.1008999752977588 // constant eval 3
MSE 0.1009115831442991 // constant eval 4
MSE 0.1009273310704712 // constant eval 5
This shows beyond any doubt that the mse is not calculated correctly. The graph of MSE vs the constant is not a parabola, but jumps up and down irregularly.
First, we worked on different data, second, the interpretation of the result at that point was wrong. So, the quoted data is garbage,
I agree with you in that.
But after Ferdy reported the mse (Fri Jan 15, 2021 4:30 pm), everything did change!
No doubt about database, results and computation. So you can follow from that point.