I don't think so. Note that the result could be a win or a loss even with a drawish evaluation (score) of 0. The log-likelihood in this case is log(0.5)=-0.3 whether the result is a win/loss or draw. A reasonable score of +200 for a win gives you log-likelihood of -0.12, so we can not assume only positions with score >1000 are correctly predicted as a win.jdart wrote:The value for correctly predicted losses and wins is very close to zero. Not exactly because the logistic function will only asymptotically approach 1 as the score increases, or 0 as the score decreases.The value of the loss is not zero also for wins and draws that are predicted correctly too so I don't see why that is a problem?
--Jon
tuning via maximizing likelihood
Moderators: hgm, Rebel, chrisw
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: tuning via maximizing likelihood
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: maximum a posteriori
For example.I already have separate MG and EG values for most important parameters, is that what you are proposing ?
At the very least it makes a lot of sense to have "draw_elo_MG" and "draw_elo_EG". The actual value of draw_elo used for a particular position would be interpolated between these two values depending on the "game phase" (which is feature of the position). The draw model parameters draw_elo_MG and draw_elo_EG can be optimized together with the parameters used in the evaluation function.
But one could also make draw_elo for a particular position depend on other features of the position, like king safety. This would create some more draw model parameters.
Note that I mainly like this idea for its mathematical elegance. It may not make any difference in practice. On the other hand it may also lead to an objective model for the "drawishness" of a position which seems interesting.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: maximum a posteriori
Ok, I did a first attempt at this considering only material. I added slopes for the draw_elo and home_elo asMichel wrote:For example.I already have separate MG and EG values for most important parameters, is that what you are proposing ?
At the very least it makes a lot of sense to have "draw_elo_MG" and "draw_elo_EG". The actual value of draw_elo used for a particular position would be interpolated between these two values depending on the "game phase" (which is feature of the position). The draw model parameters draw_elo_MG and draw_elo_EG can be optimized together with the parameters used in the evaluation function.
But one could also make draw_elo for a particular position depend on other features of the position, like king safety. This would create some more draw model parameters.
Note that I mainly like this idea for its mathematical elegance. It may not make any difference in practice. On the other hand it may also lead to an objective model for the "drawishness" of a position which seems interesting.
Code: Select all
draw_elo = ELO_DRAW + phase * ELO_DRAW_SLOPE
home_elo = ELO_HOME + phase * ELO_HOME_SLOPE
The result using Davidson model after few iterations is
Code: Select all
ELO_DRAW=62, ELO_DRAW_SLOPE=-8
ELO_HOME=20, ELO_HOME_SLOPE=-3
Daniel
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: maximum a posteriori
Ok I added king safety as one of the parameters that affect drawishness and it seems to be clearly the major factor affecting drawishness. It was the first parameter to be quickly modified by the CG iterations. I am not sure i got the definitions of phase right in my previous post, so I will define them here with the code i used to avoid mistakes while wording it.
Result after a few iterations
Code: Select all
double factor_m = material / 62.0; //cumulatives pieces (no pawns) for both sides with values of 9-5-3-3 for Q-R-B-N (factor goes from 1.0 to 0.0)
double factor_k = ksafety / 100.0; //cumulative king safety for both sides scaled to 1 pawn value (factor goes from 0.0 to maybe 5.0)
int eloH = ELO_HOME + factor_m * ELO_HOME_SLOPE_PHASE
+ factor_k * ELO_HOME_SLOPE_KSAFETY;
int eloD = ELO_DRAW + factor_m * ELO_DRAW_SLOPE_PHASE
+ factor_k * ELO_DRAW_SLOPE_KSAFETY;
Code: Select all
ELO_HOME 10
ELO_DRAW 65
ELO_HOME_SLOPE_PHASE -2
ELO_DRAW_SLOPE_PHASE -16
ELO_HOME_SLOPE_KSAFETY 19
ELO_DRAW_SLOPE_KSAFETY -25
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: maximum a posteriori
Thanks. Very nice! I looked at your code and it was what I had in mind.Result after a few iterations
Code:
ELO_HOME 10
ELO_DRAW 65
ELO_HOME_SLOPE_PHASE -2
ELO_DRAW_SLOPE_PHASE -16
ELO_HOME_SLOPE_KSAFETY 19
ELO_DRAW_SLOPE_KSAFETY -25
If the converged values are similar as above then it seems that the model is at least compatible with common sense!
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: maximum a posteriori
I have got two issues
a) The scale of centi-pawns vs elo is not one to one when we have significant draw percentages, so despite our previous assumption that no scaling is needed, this needs to be addressed especially for Davidson. The tuning results I got using Davidson are siginficantly ramped up compared to the other models. Assuming the centi-pawn vs winning-percentage relation is logistic, we will try to morf the three draw models to give evaluation scores obtained with no draw model. The criteria I am using to calculate the scaling factor is the slope with eval=0 should match that of the logistic curve. Let me know if there is a better scaling factor calculation method especially for Davidson that is slighlty off after scaling using this method. In code, it is:
For Davidson, calculating the factor (nu=dg) is made so that eloDraw=0 gives dg=0. This wasn't the case in the previous code i posted but i think it should be like that.
b) Some evaluation terms dealing with imbalances are very problematic for tuning. I had a bonus for a major or minor pieces vs pawns bonus. Tuning this increased the value of the bonus from a default value of 45 to even 500! I think this is because in the dataset there probably aren't enough positions where a side is a piece up and not win. When I carefully changed the evaluation codition to be for a side to be up a piece AND down by atleast pawns that equal the piece value, then the tuning kept more or less the 45 value. I can imagine this kind of thing could be problematic to other people tuning their eval too.
I will do the drawishness simulations to convergence after i fixed these issues.
Daniel
a) The scale of centi-pawns vs elo is not one to one when we have significant draw percentages, so despite our previous assumption that no scaling is needed, this needs to be addressed especially for Davidson. The tuning results I got using Davidson are siginficantly ramped up compared to the other models. Assuming the centi-pawn vs winning-percentage relation is logistic, we will try to morf the three draw models to give evaluation scores obtained with no draw model. The criteria I am using to calculate the scaling factor is the slope with eval=0 should match that of the logistic curve. Let me know if there is a better scaling factor calculation method especially for Davidson that is slighlty off after scaling using this method. In code, it is:
Code: Select all
1282 //scale elos so that they look more like elostat's
1283 // Match slopes at 0 elo difference using df(x)/dx = K/4
1284 void calculate_scale() {
1285 const double K = log(10)/400.0;
1286 double df;
1287 if(eloModel == 0) {
1288 double dg = elo_to_gamma(eloDraw);
1289 double f = 1 / (1 + dg);
1290 df = f * (1 - f) * K;
1291 } else if(eloModel == 1) {
1292 double dg = elo_to_gamma(eloDraw) - 1;
1293 df = (dg / pow(2+dg,2.0)) * K;
1294 } else if(eloModel == 2) {
1295 const double pi = 3.14159265359;
1296 double x = -eloDraw/400.0;
1297 df = exp(-x*x) / (400.0 * sqrt(pi));
1298 }
1299 eloScale = (4.0 / K) * df;
1300 printf("EloScale %f\n",eloScale);
1301 }
b) Some evaluation terms dealing with imbalances are very problematic for tuning. I had a bonus for a major or minor pieces vs pawns bonus. Tuning this increased the value of the bonus from a default value of 45 to even 500! I think this is because in the dataset there probably aren't enough positions where a side is a piece up and not win. When I carefully changed the evaluation codition to be for a side to be up a piece AND down by atleast pawns that equal the piece value, then the tuning kept more or less the 45 value. I can imagine this kind of thing could be problematic to other people tuning their eval too.
I will do the drawishness simulations to convergence after i fixed these issues.
Daniel
-
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: maximum a posteriori
Yes, I found that ideally the positions should exhibit the features I'm trying to fit (other positions just add noise). If you use a fitting method that uses the Jacobian, the condition number gives you a hint on how well features are represented.Daniel Shawul wrote: b) Some evaluation terms dealing with imbalances are very problematic for tuning. I had a bonus for a major or minor pieces vs pawns bonus. Tuning this increased the value of the bonus from a default value of 45 to even 500! I think this is because in the dataset there probably aren't enough positions where a side is a piece up and not win. When I carefully changed the evaluation codition to be for a side to be up a piece AND down by atleast pawns that equal the piece value, then the tuning kept more or less the 45 value. I can imagine this kind of thing could be problematic to other people tuning their eval too.
Values can also be unstable due to bugs in the implementation of the evaluation; I've fiund a few of those that way.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: maximum a posteriori
About the scaling it turns out it is better to match the slope at the drawElo point instead of 0 point. Here are pictures that show before and after scaling with two criteria i.e. matching the slope at 0 or eloDraw point.
First using eloDraw=50 which doesn't show any difference with the two methods
But using a very large value of eloDraw=250, matching the slope at 250 is better for Davidson
The slopes are scaled by 400 times.
Daniel
First using eloDraw=50 which doesn't show any difference with the two methods
But using a very large value of eloDraw=250, matching the slope at 250 is better for Davidson
The slopes are scaled by 400 times.
Daniel
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: maximum a posteriori
You know of course as well as I that one usually tries to address this by adding a (somewhat adhoc) regularization term to the objective function. The theoretical justification for doing this seems somewhat unclear.Daniel wrote:b) Some evaluation terms dealing with imbalances are very problematic for tuning. I had a bonus for a major or minor pieces vs pawns bonus. Tuning this increased the value of the bonus from a default value of 45 to even 500! I think this is because in the dataset there probably aren't enough positions where a side is a piece up and not win. When I carefully changed the evaluation codition to be for a side to be up a piece AND down by atleast pawns that equal the piece value, then the tuning kept more or less the 45 value. I can imagine this kind of thing could be problematic to other people tuning their eval too.
If the aim is to have the evaluation function reflect the expected score of a position via the standard logistic function then scaling is indeed necessary if one of the standard draw models is being used.daniel wrote:About the scaling it turns out it is better to match the slope at the drawElo point instead of 0 point. Here are pictures that show before and after scaling with two criteria i.e. matching the slope at 0 or eloDraw point.
However for simple symmetry reasons it seems weird to do the matching for any other point than ev_score=0.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: maximum a posteriori
The theoretical justification for regularization is not that unclear. I'll introduce the standard naming for these things from statistics, because I shouldn't assume everyone is familiar with the terminology. We are considering a large family of models that might explain our observations (all the possible settings of the parameters). The Bayesian approach to this situation requires that we start with a probability distribution over the set of models (known as a "prior"), which expresses our beliefs about what's reasonable (e.g., one of the bonus you talk about having a value of 500 isn't reasonable). The data allows us to refine this and obtain another probability distribution (known as the "posterior") by using Bayes's formula to compute the probability of a model given the data. One way to estimate our model parameters is to pick the model with the highest posterior probability. In case our prior probability distribution is flat, this estimator is called the "maximum likelihood estimator".Michel wrote:You know of course as well as I that one usually tries to address this by adding a (somewhat adhoc) regularization term to the objective function. The theoretical justification for doing this seems somewhat unclear.Daniel wrote:b) Some evaluation terms dealing with imbalances are very problematic for tuning. I had a bonus for a major or minor pieces vs pawns bonus. Tuning this increased the value of the bonus from a default value of 45 to even 500! I think this is because in the dataset there probably aren't enough positions where a side is a piece up and not win. When I carefully changed the evaluation codition to be for a side to be up a piece AND down by atleast pawns that equal the piece value, then the tuning kept more or less the 45 value. I can imagine this kind of thing could be problematic to other people tuning their eval too.
Now for the relevant part: In some cases regularization is exactly equivalent to starting with a prior probability distribution that is not flat. For instance, L2 regularization of a linear model (a.k.a. "Tikhonov regularization" or "ridge regression") corresponds to a Gaussian prior centered around zero, with diagonal covariance and equal variance in every parameter. I seem to remember that the regularization coefficient is inversely proportional to the variance of the prior, or something like that; I have done the computation before but I can't remember.
https://en.wikipedia.org/wiki/Tikhonov_ ... rpretation