tuning via maximizing likelihood

Michel · Post by **Michel** » Thu Oct 12, 2017 9:20 am

jdart wrote:
r * log( logistic(score) ) + (1 - r) log( 1 - logistic(score) )
This may have been covered, at least indirectly, but the issue I see with this function and draws is that the value of the loss is not zero when the predicted score is 0.5 and the actual score is 0.5. The loss function for draws does have its minimum value in that case. But it is not zero.

Each draw will increase the value of the objective even if it is predicted correctly, and this increase is more than what most loss and win positions contribute (because many will be more or less accurately predicted).

--Jon

This was the objective function discussed in the OP but it is not really how maximum likelihood is supposed to work in the presence of draws. One needs to choose a draw model. Luckily there are reasonable draw models available which are used by rating tools. This is discussed in subsequent posts.

A draw model involves at least one extra parameter which is a proxy for the draw ratio (draw_elo in the Bayes Elo model) but perhaps it may involve more parameters.

Whether or not the text book approach yields better results in actual games than more adhoc approaches is of course impossible to know in advance.

Daniel Shawul · Post by **Daniel Shawul** » Thu Oct 12, 2017 6:56 pm

jdart wrote:
r * log( logistic(score) ) + (1 - r) log( 1 - logistic(score) )
This may have been covered, at least indirectly, but the issue I see with this function and draws is that the value of the loss is not zero when the predicted score is 0.5 and the actual score is 0.5. The loss function for draws does have its minimum value in that case. But it is not zero.

The value of the loss is not zero also for wins and draws that are predicted correctly too so I don't see why that is a problem?

Originally, I was thinking of a situation where the draws are replaced with wins and losses (no draw models). The implicit draw model in this equation is 'sort of' a special case of the Davidson Model where 2 draws = 1 win + 1 loss

Substituting r=0.5 in the original loss function, you get

Code: Select all

log&#40;P&#40;draw&#58;score&#41;)  = 0.5 * &#40;log&#40; logistic&#40;score&#41; ) + log&#40; 1 - logistic&#40;score&#41; )
log&#40;P&#40;draw&#58;score&#41;)  = 0.5 * &#40;log&#40; P&#40;win&#58;score&#41; ) + log&#40; P&#40;loss&#58;score&#41; )
P&#40;draw&#58;score&#41; = sqrt&#40;P&#40;win&#58;score&#41;*P&#40;loss&#58;score&#41;)

The sum probability need to be scaled to 1 which is where the difference comes with davidson. Using a scaling factor of

Code: Select all

P&#40;win&#58;score&#41; + P&#40;loss&#58;score&#41; + P&#40;draw&#58;score&#41; = 1 + P&#40;draw&#58;score&#41;,

it becomes same as the Davidson model with a constant draw factor nu of 1.

Daniel Shawul · Post by **Daniel Shawul** » Thu Oct 12, 2017 7:04 pm

Michel wrote:
Daniel wrote:Using a bayesian approach, we can add a prior distribution of parameters which acts like a regularization for the ML estimates. So we do a maximum a posteriori (MAP) estimate rather than maximum likelihood instead (ML). So far, I have not been able to get better results than my existing parameers for the eval terms, so it makes sense to keep that momentum ('or absence of it') going for a while by introducing a prior distribution over the evaluation parameters (theta)
I do not think putting on a prior will make much of a difference, except perhaps for the speed of convergence.

You are right, I am not getting any thing out of it except for slowing down convergence. I did not try what I proposed as a prior, but something that could only serve as regularization. I added a virtual set of results that favour the set of parameters that are currently being considered by assigning a mse=0 for a tiny fraction of the total number of positions. I think prior was very useful to elo estimation because we have often very few games for estimating strength.

What might be an interesting experiment is to make the draw_elo parameter (in the BE model) itself a linear combination of the features (i.e. it would vary from position to position). A model with a constant draw_elo parameter is obviously incorrect as in the endgame the expected draw rate should be much higher.

I tried making the drawElo and homeAdvantage parameters dynamic and it works. The problem is I am not getting any improvement from tuning parameters so I can not gauge whether determining drawElo dynamically improves things.

If implemented literally this would roughly double the number of parameters. There is an increased risk of over fitting however so one could cut this down by only including features that are expected to have an influence on the draw ratio, like game phase and perhaps king safety.

I already have separate MG and EG values for most important parameters, is that what you are proposing ?

jdart · Post by **jdart** » Thu Oct 12, 2017 9:55 pm

The value of the loss is not zero also for wins and draws that are predicted correctly too so I don't see why that is a problem?

The value for correctly predicted losses and wins is very close to zero. Not exactly because the logistic function will only asymptotically approach 1 as the score increases, or 0 as the score decreases.

--Jon

Daniel Shawul · Post by **Daniel Shawul** » Thu Oct 12, 2017 11:12 pm

jdart wrote:
The value of the loss is not zero also for wins and draws that are predicted correctly too so I don't see why that is a problem?
The value for correctly predicted losses and wins is very close to zero. Not exactly because the logistic function will only asymptotically approach 1 as the score increases, or 0 as the score decreases.

--Jon

I don't think so. Note that the result could be a win or a loss even with a drawish evaluation (score) of 0. The log-likelihood in this case is log(0.5)=-0.3 whether the result is a win/loss or draw. A reasonable score of +200 for a win gives you log-likelihood of -0.12, so we can not assume only positions with score >1000 are correctly predicted as a win.

Michel · Post by **Michel** » Fri Oct 13, 2017 10:04 am

I already have separate MG and EG values for most important parameters, is that what you are proposing ?

For example.

At the very least it makes a lot of sense to have "draw_elo_MG" and "draw_elo_EG". The actual value of draw_elo used for a particular position would be interpolated between these two values depending on the "game phase" (which is feature of the position). The draw model parameters draw_elo_MG and draw_elo_EG can be optimized together with the parameters used in the evaluation function.

But one could also make draw_elo for a particular position depend on other features of the position, like king safety. This would create some more draw model parameters.

Note that I mainly like this idea for its mathematical elegance. It may not make any difference in practice. On the other hand it may also lead to an objective model for the "drawishness" of a position which seems interesting.

Daniel Shawul · Post by **Daniel Shawul** » Fri Oct 13, 2017 8:40 pm

Michel wrote:
I already have separate MG and EG values for most important parameters, is that what you are proposing ?
For example.

At the very least it makes a lot of sense to have "draw_elo_MG" and "draw_elo_EG". The actual value of draw_elo used for a particular position would be interpolated between these two values depending on the "game phase" (which is feature of the position). The draw model parameters draw_elo_MG and draw_elo_EG can be optimized together with the parameters used in the evaluation function.

But one could also make draw_elo for a particular position depend on other features of the position, like king safety. This would create some more draw model parameters.

Note that I mainly like this idea for its mathematical elegance. It may not make any difference in practice. On the other hand it may also lead to an objective model for the "drawishness" of a position which seems interesting.

Ok, I did a first attempt at this considering only material. I added slopes for the draw_elo and home_elo as

Code: Select all

draw_elo = ELO_DRAW + phase * ELO_DRAW_SLOPE
home_elo = ELO_HOME + phase * ELO_HOME_SLOPE

where phase=0 in opening and 1 with no pieces.

The result using Davidson model after few iterations is

Code: Select all

ELO_DRAW=62, ELO_DRAW_SLOPE=-8
ELO_HOME=20, ELO_HOME_SLOPE=-3

The drawishness increases with game phase, while the home advantage also does the same at a lower rate. The starting values for ELO_DRAW and ELO_HOME were 97 and 32 respectively.

Daniel

Daniel Shawul · Post by **Daniel Shawul** » Fri Oct 13, 2017 9:46 pm

Ok I added king safety as one of the parameters that affect drawishness and it seems to be clearly the major factor affecting drawishness. It was the first parameter to be quickly modified by the CG iterations. I am not sure i got the definitions of phase right in my previous post, so I will define them here with the code i used to avoid mistakes while wording it.

Code: Select all

	double factor_m = material / 62.0;  //cumulatives pieces &#40;no pawns&#41;  for both sides with values of 9-5-3-3 for Q-R-B-N  &#40;factor goes from 1.0 to 0.0&#41;
	double factor_k = ksafety / 100.0;  //cumulative king safety for both sides scaled to 1 pawn value &#40;factor goes from 0.0 to maybe 5.0&#41;
	int eloH = ELO_HOME + factor_m * ELO_HOME_SLOPE_PHASE 
						+ factor_k * ELO_HOME_SLOPE_KSAFETY;
	int eloD = ELO_DRAW + factor_m * ELO_DRAW_SLOPE_PHASE 
						+ factor_k * ELO_DRAW_SLOPE_KSAFETY;

Result after a few iterations

Code: Select all

ELO_HOME 10
ELO_DRAW 65
ELO_HOME_SLOPE_PHASE -2
ELO_DRAW_SLOPE_PHASE -16
ELO_HOME_SLOPE_KSAFETY 19
ELO_DRAW_SLOPE_KSAFETY -25

Michel · Post by **Michel** » Sat Oct 14, 2017 7:44 am

Result after a few iterations
Code:

ELO_HOME 10
ELO_DRAW 65
ELO_HOME_SLOPE_PHASE -2
ELO_DRAW_SLOPE_PHASE -16
ELO_HOME_SLOPE_KSAFETY 19
ELO_DRAW_SLOPE_KSAFETY -25

Thanks. Very nice! I looked at your code and it was what I had in mind.

If the converged values are similar as above then it seems that the model is at least compatible with common sense!

Daniel Shawul · Post by **Daniel Shawul** » Sat Oct 14, 2017 6:11 pm

I have got two issues

a) The scale of centi-pawns vs elo is not one to one when we have significant draw percentages, so despite our previous assumption that no scaling is needed, this needs to be addressed especially for Davidson. The tuning results I got using Davidson are siginficantly ramped up compared to the other models. Assuming the centi-pawn vs winning-percentage relation is logistic, we will try to morf the three draw models to give evaluation scores obtained with no draw model. The criteria I am using to calculate the scaling factor is the slope with eval=0 should match that of the logistic curve. Let me know if there is a better scaling factor calculation method especially for Davidson that is slighlty off after scaling using this method. In code, it is:

Code: Select all

1282 //scale elos so that they look more like elostat's 
1283 //  Match slopes at 0 elo difference using df&#40;x&#41;/dx = K/4
1284 void calculate_scale&#40;) &#123;
1285     const double K = log&#40;10&#41;/400.0;
1286     double df;
1287     if&#40;eloModel == 0&#41; &#123;
1288         double dg = elo_to_gamma&#40;eloDraw&#41;;
1289         double f = 1 / &#40;1 + dg&#41;;
1290         df = f * &#40;1 - f&#41; * K;
1291     &#125; else if&#40;eloModel == 1&#41; &#123;
1292         double dg = elo_to_gamma&#40;eloDraw&#41; - 1;
1293         df = &#40;dg / pow&#40;2+dg,2.0&#41;) * K;
1294     &#125; else if&#40;eloModel == 2&#41; &#123;
1295         const double pi = 3.14159265359;
1296         double x = -eloDraw/400.0;
1297         df = exp&#40;-x*x&#41; / &#40;400.0 * sqrt&#40;pi&#41;);
1298     &#125;
1299     eloScale = &#40;4.0 / K&#41; * df;
1300     printf&#40;"EloScale %f\n",eloScale&#41;;
1301 &#125;

For Davidson, calculating the factor (nu=dg) is made so that eloDraw=0 gives dg=0. This wasn't the case in the previous code i posted but i think it should be like that.

b) Some evaluation terms dealing with imbalances are very problematic for tuning. I had a bonus for a major or minor pieces vs pawns bonus. Tuning this increased the value of the bonus from a default value of 45 to even 500! I think this is because in the dataset there probably aren't enough positions where a side is a piece up and not win. When I carefully changed the evaluation codition to be for a side to be up a piece AND down by atleast pawns that equal the piece value, then the tuning kept more or less the 45 value. I can imagine this kind of thing could be problematic to other people tuning their eval too.

I will do the drawishness simulations to convergence after i fixed these issues.

Daniel

tuning via maximizing likelihood

Re: tuning via maximizing likelihood

Re: tuning via maximizing likelihood

Re: maximum a posteriori

Re: tuning via maximizing likelihood

Re: tuning via maximizing likelihood

Re: maximum a posteriori

Re: maximum a posteriori

Re: maximum a posteriori

Re: maximum a posteriori

Re: maximum a posteriori