This a valid point, but as my simulation example shows, the eval scores are identical up to an overall scale, for mean squared error, logistic log likelihood and ordered logit.jdart wrote:This may have been covered, at least indirectly, but the issue I see with this function and draws is that the value of the loss is not zero when the predicted score is 0.5 and the actual score is 0.5. The loss function for draws does have its minimum value in that case. But it is not zero.r * log( logistic(score) ) + (1 - r) log( 1 - logistic(score) )
Each draw will increase the value of the objective even if it is predicted correctly, and this increase is more than what most loss and win positions contribute (because many will be more or less accurately predicted).
--Jon
tuning via maximizing likelihood
Moderators: hgm, Rebel, chrisw
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: tuning via maximizing likelihood
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: tuning via maximizing likelihood
This was the objective function discussed in the OP but it is not really how maximum likelihood is supposed to work in the presence of draws. One needs to choose a draw model. Luckily there are reasonable draw models available which are used by rating tools. This is discussed in subsequent posts.jdart wrote:This may have been covered, at least indirectly, but the issue I see with this function and draws is that the value of the loss is not zero when the predicted score is 0.5 and the actual score is 0.5. The loss function for draws does have its minimum value in that case. But it is not zero.r * log( logistic(score) ) + (1 - r) log( 1 - logistic(score) )
Each draw will increase the value of the objective even if it is predicted correctly, and this increase is more than what most loss and win positions contribute (because many will be more or less accurately predicted).
--Jon
A draw model involves at least one extra parameter which is a proxy for the draw ratio (draw_elo in the Bayes Elo model) but perhaps it may involve more parameters.
Whether or not the text book approach yields better results in actual games than more adhoc approaches is of course impossible to know in advance.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: tuning via maximizing likelihood
The value of the loss is not zero also for wins and draws that are predicted correctly too so I don't see why that is a problem?jdart wrote:This may have been covered, at least indirectly, but the issue I see with this function and draws is that the value of the loss is not zero when the predicted score is 0.5 and the actual score is 0.5. The loss function for draws does have its minimum value in that case. But it is not zero.r * log( logistic(score) ) + (1 - r) log( 1 - logistic(score) )
Originally, I was thinking of a situation where the draws are replaced with wins and losses (no draw models). The implicit draw model in this equation is 'sort of' a special case of the Davidson Model where 2 draws = 1 win + 1 loss
Substituting r=0.5 in the original loss function, you get
Code: Select all
log(P(draw:score)) = 0.5 * (log( logistic(score) ) + log( 1 - logistic(score) )
log(P(draw:score)) = 0.5 * (log( P(win:score) ) + log( P(loss:score) )
P(draw:score) = sqrt(P(win:score)*P(loss:score))
Code: Select all
P(win:score) + P(loss:score) + P(draw:score) = 1 + P(draw:score),
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: maximum a posteriori
You are right, I am not getting any thing out of it except for slowing down convergence. I did not try what I proposed as a prior, but something that could only serve as regularization. I added a virtual set of results that favour the set of parameters that are currently being considered by assigning a mse=0 for a tiny fraction of the total number of positions. I think prior was very useful to elo estimation because we have often very few games for estimating strength.Michel wrote:I do not think putting on a prior will make much of a difference, except perhaps for the speed of convergence.Daniel wrote:Using a bayesian approach, we can add a prior distribution of parameters which acts like a regularization for the ML estimates. So we do a maximum a posteriori (MAP) estimate rather than maximum likelihood instead (ML). So far, I have not been able to get better results than my existing parameers for the eval terms, so it makes sense to keep that momentum ('or absence of it') going for a while by introducing a prior distribution over the evaluation parameters (theta)
I tried making the drawElo and homeAdvantage parameters dynamic and it works. The problem is I am not getting any improvement from tuning parameters so I can not gauge whether determining drawElo dynamically improves things.What might be an interesting experiment is to make the draw_elo parameter (in the BE model) itself a linear combination of the features (i.e. it would vary from position to position). A model with a constant draw_elo parameter is obviously incorrect as in the endgame the expected draw rate should be much higher.
I already have separate MG and EG values for most important parameters, is that what you are proposing ?If implemented literally this would roughly double the number of parameters. There is an increased risk of over fitting however so one could cut this down by only including features that are expected to have an influence on the draw ratio, like game phase and perhaps king safety.
-
- Posts: 4367
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: tuning via maximizing likelihood
The value for correctly predicted losses and wins is very close to zero. Not exactly because the logistic function will only asymptotically approach 1 as the score increases, or 0 as the score decreases.The value of the loss is not zero also for wins and draws that are predicted correctly too so I don't see why that is a problem?
--Jon
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: tuning via maximizing likelihood
I don't think so. Note that the result could be a win or a loss even with a drawish evaluation (score) of 0. The log-likelihood in this case is log(0.5)=-0.3 whether the result is a win/loss or draw. A reasonable score of +200 for a win gives you log-likelihood of -0.12, so we can not assume only positions with score >1000 are correctly predicted as a win.jdart wrote:The value for correctly predicted losses and wins is very close to zero. Not exactly because the logistic function will only asymptotically approach 1 as the score increases, or 0 as the score decreases.The value of the loss is not zero also for wins and draws that are predicted correctly too so I don't see why that is a problem?
--Jon
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: maximum a posteriori
For example.I already have separate MG and EG values for most important parameters, is that what you are proposing ?
At the very least it makes a lot of sense to have "draw_elo_MG" and "draw_elo_EG". The actual value of draw_elo used for a particular position would be interpolated between these two values depending on the "game phase" (which is feature of the position). The draw model parameters draw_elo_MG and draw_elo_EG can be optimized together with the parameters used in the evaluation function.
But one could also make draw_elo for a particular position depend on other features of the position, like king safety. This would create some more draw model parameters.
Note that I mainly like this idea for its mathematical elegance. It may not make any difference in practice. On the other hand it may also lead to an objective model for the "drawishness" of a position which seems interesting.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: maximum a posteriori
Ok, I did a first attempt at this considering only material. I added slopes for the draw_elo and home_elo asMichel wrote:For example.I already have separate MG and EG values for most important parameters, is that what you are proposing ?
At the very least it makes a lot of sense to have "draw_elo_MG" and "draw_elo_EG". The actual value of draw_elo used for a particular position would be interpolated between these two values depending on the "game phase" (which is feature of the position). The draw model parameters draw_elo_MG and draw_elo_EG can be optimized together with the parameters used in the evaluation function.
But one could also make draw_elo for a particular position depend on other features of the position, like king safety. This would create some more draw model parameters.
Note that I mainly like this idea for its mathematical elegance. It may not make any difference in practice. On the other hand it may also lead to an objective model for the "drawishness" of a position which seems interesting.
Code: Select all
draw_elo = ELO_DRAW + phase * ELO_DRAW_SLOPE
home_elo = ELO_HOME + phase * ELO_HOME_SLOPE
The result using Davidson model after few iterations is
Code: Select all
ELO_DRAW=62, ELO_DRAW_SLOPE=-8
ELO_HOME=20, ELO_HOME_SLOPE=-3
Daniel
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: maximum a posteriori
Ok I added king safety as one of the parameters that affect drawishness and it seems to be clearly the major factor affecting drawishness. It was the first parameter to be quickly modified by the CG iterations. I am not sure i got the definitions of phase right in my previous post, so I will define them here with the code i used to avoid mistakes while wording it.
Result after a few iterations
Code: Select all
double factor_m = material / 62.0; //cumulatives pieces (no pawns) for both sides with values of 9-5-3-3 for Q-R-B-N (factor goes from 1.0 to 0.0)
double factor_k = ksafety / 100.0; //cumulative king safety for both sides scaled to 1 pawn value (factor goes from 0.0 to maybe 5.0)
int eloH = ELO_HOME + factor_m * ELO_HOME_SLOPE_PHASE
+ factor_k * ELO_HOME_SLOPE_KSAFETY;
int eloD = ELO_DRAW + factor_m * ELO_DRAW_SLOPE_PHASE
+ factor_k * ELO_DRAW_SLOPE_KSAFETY;
Code: Select all
ELO_HOME 10
ELO_DRAW 65
ELO_HOME_SLOPE_PHASE -2
ELO_DRAW_SLOPE_PHASE -16
ELO_HOME_SLOPE_KSAFETY 19
ELO_DRAW_SLOPE_KSAFETY -25
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: maximum a posteriori
Thanks. Very nice! I looked at your code and it was what I had in mind.Result after a few iterations
Code:
ELO_HOME 10
ELO_DRAW 65
ELO_HOME_SLOPE_PHASE -2
ELO_DRAW_SLOPE_PHASE -16
ELO_HOME_SLOPE_KSAFETY 19
ELO_DRAW_SLOPE_KSAFETY -25
If the converged values are similar as above then it seems that the model is at least compatible with common sense!
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.