Advantage for White; Bayeselo (to Rémi Coulom)
Moderators: hgm, Harvey Williamson, bob
Re: Advantage for White; Bayeselo (to Rémi Coulom)
Thanks to hgm and lucas for the suggestions.
Below you find updated graphs representing the same data.
1) binsize is 4 elopoints
2) minimum binsize is 4 samples
3) in the elodelta graph I shifted all models by 20 elo points to compensate for the white to move advantage
4) I added the cdf of the normal distribution with sd=250
5) I added the function hgm suggested to estimate draws scaling by 40/25
Agreed, the gauss function is a better fit than either the linear function or the logistic function.
Looking at the new graphs I am not so sure about hgms suggestion regarding the progression of the avgelo score function. You are right that the next step is to take elodelta into the equation.
Below you find updated graphs representing the same data.
1) binsize is 4 elopoints
2) minimum binsize is 4 samples
3) in the elodelta graph I shifted all models by 20 elo points to compensate for the white to move advantage
4) I added the cdf of the normal distribution with sd=250
5) I added the function hgm suggested to estimate draws scaling by 40/25
Agreed, the gauss function is a better fit than either the linear function or the logistic function.
Looking at the new graphs I am not so sure about hgms suggestion regarding the progression of the avgelo score function. You are right that the next step is to take elodelta into the equation.
Re: Advantage for White; Bayeselo (to Rémi Coulom)
Addendum:
I just did a quick lookup of the outliers in the avgelo graph and indeed found that the values way below the linear regression feature a large average elodelta and the ones above feature a low average elodelta.
I just did a quick lookup of the outliers in the avgelo graph and indeed found that the values way below the linear regression feature a large average elodelta and the ones above feature a low average elodelta.
Re: Advantage for White; Bayeselo (to Rémi Coulom)
It is interesting that the Gaussian seems to give a better fit, despite the fact that the ratings were derived using the logistic. (Now hope the logistic doesn't give a better fit on ratings derived with the Gaussian...) It could be that for this large data set based mostly on low deltaElo data the obtained ratings are not very sensitive to the model used.
What worries me is that the empirical data seems steeper than the curve of the model from which they were derived. That means that Elo differences you have to put into the logistic formula to get the true score percentage are larger than those spit out by BayesElo. In other words, BayesElo systematically underestimates rating differences, compressing the rating scale.
I wonder if this is an artifact caused by the prior. Do you calculate the ratings for this data set yourself? If so, could you recalculate them using a smaller prior (e.g. 0.1 in stead of the standard 2.0)?
What worries me is that the empirical data seems steeper than the curve of the model from which they were derived. That means that Elo differences you have to put into the logistic formula to get the true score percentage are larger than those spit out by BayesElo. In other words, BayesElo systematically underestimates rating differences, compressing the rating scale.
I wonder if this is an artifact caused by the prior. Do you calculate the ratings for this data set yourself? If so, could you recalculate them using a smaller prior (e.g. 0.1 in stead of the standard 2.0)?

 Posts: 404
 Joined: Mon Apr 24, 2006 6:06 pm
 Contact:
Re: Advantage for White; Bayeselo (to Rémi Coulom)
It is not so obvious.hgm wrote:I am not sure I buy that. The number of games is so large, that splitting the data set in two and derive ratings from that would not give significantly different predictions for the ratings. And the other half of the data set would be good enough to define the empirical curve using those derived ratings.Rémi Coulom wrote:If the frequency does not match the model, then it is a sign that the model is bad. But if it does match the model, it does not mean that the model is good, because the ratings were computed with the model in the first place.
If these empirical frequencies then match the model prediction, the model is by definition perfect. Because that was all the model was supposed to do: derive ratings that could be used to accurately predict frequencies.
Imagine the extreme case where the model consists in saying that all the players have the same rating, and there is a given probability of winning and drawing. You can make this model fit the data perfectly. But its predictions are not good.
So your definition of "perfect" may make sense, but "perfect" in the sense of being unbiased is not "perfect" in the sense of making the best possible predictions.
For instance, we could imagine multidimensional models that make much better predictions than a "perfect" onedimensional model.
Even for onedimensional models, I can imagine distributions of player ratings that have no bias in predicting the winning frequency, but produce poor predictions.
Rémi
Re: Advantage for White; Bayeselo (to Rémi Coulom)
OK, agreed. But I wouldn't say that the onedimensional model was not good, then. It would be good as a rating model, assigning the best possible ratings, making the best possible winningfrequency predictions which a model that bases the winning frequency on the difference of a single pair of numbers could do. That you might be able to do better, in terms of predictions, by assigning each player an average and a variance (say), rather than assuming the same variance for all is very like true.Rémi Coulom wrote:For instance, we could imagine multidimensional models that make much better predictions than a "perfect" onedimensional model.
Not sure what you mean by that. What else is there to predict on a game than the winning frequency? Do you mean it might predict the winning frequency against a group of players, but not against the individual players of that group?Even for onedimensional models, I can imagine distributions of player ratings that have no bias in predicting the winning frequency, but produce poor predictions.
Re: Advantage for White; Bayeselo (to Rémi Coulom)
I cannot easily recalculate the Elos; I have taken the data from CCRL.hgm wrote:It is interesting that the Gaussian seems to give a better fit, despite the fact that the ratings were derived using the logistic. (Now hope the logistic doesn't give a better fit on ratings derived with the Gaussian...) It could be that for this large data set based mostly on low deltaElo data the obtained ratings are not very sensitive to the model used.
What worries me is that the empirical data seems steeper than the curve of the model from which they were derived. That means that Elo differences you have to put into the logistic formula to get the true score percentage are larger than those spit out by BayesElo. In other words, BayesElo systematically underestimates rating differences, compressing the rating scale.
I wonder if this is an artifact caused by the prior. Do you calculate the ratings for this data set yourself? If so, could you recalculate them using a smaller prior (e.g. 0.1 in stead of the standard 2.0)?
Taking into account the elodelta of the players my model predicts the game outcome on average 0.15 percentpoints better.
Code: Select all
Elo(x) = 1 / (1 + 10^(x/400))
Elo^1(x) = 400 *LOG10(x / (1x))
P_draw_given_elodelta = Elo(deltawhiteadvantage) * (1  Elo(deltawhiteadvantage)) * 40/25
Elo_draw = Elo^1(P_draw_given_elodelta) + 0.096 * eloavg + 75
Re: Advantage for White; Bayeselo (to Rémi Coulom)
Too bad. In what form do you have the data? Is it a PGN of complete games? Would it be possible to reduce it to just player and result tags? (That would be enough to feed it to BayesElo).Edmund wrote:I cannot easily recalculate the Elos; I have taken the data from CCRL.
0.15%? How do you calculate that? In places (e.g. around +200 deltaElo) the difference between the green curve and the data points is as much as 5% points, and the statistical noise (based on spread of the points) in that very much lower. Is it just that there are comparatively few games in those points, and a huge number of games in the points from 20 to +20 Elo?Taking into account the elodelta of the players my model predicts the game outcome on average 0.15 percentpoints better.
It seems optically clear to me that you can improve the fit (i.e. reduce the 0.15% error) by scaling the ratings up by some 1520%. I.e. use
Elo(x) = 1 / (1 + 10^(1.2*x/400))
to predict the score.

 Posts: 404
 Joined: Mon Apr 24, 2006 6:06 pm
 Contact:
Re: Advantage for White; Bayeselo (to Rémi Coulom)
I mean that it is nice to have an unbiased estimator of the probability of winning, but it does not necessarily produce the best predictions. For instance, if you have a formula that produces an unbiased estimate of the probability of winning as a function as the rating difference between players, you might still beat the quality of prediction of that unbiased estimator by using another model that takes the mean rating of players as an additional parameter. Since it seems that the probability of draws increases with rating, that more advanced model might produce better predictions than your simple unbiased model.hgm wrote:Not sure what you mean by that. What else is there to predict on a game than the winning frequency? Do you mean it might predict the winning frequency against a group of players, but not against the individual players of that group?Rémi Coulom wrote:Even for onedimensional models, I can imagine distributions of player ratings that have no bias in predicting the winning frequency, but produce poor predictions.
So a model can be unbiased, and still can be improved in terms of prediction quality.
Prediction quality should be measured on data that were not used for computing the ratings. It can be measured by the average logprobability of results, for instance.
Rémi
Re: Advantage for White; Bayeselo (to Rémi Coulom)
I downloaded the full pgn. Then used some script to strip everything away but the game_id, result and both elos. I then imported the data to excel where I am manipulating the data now.
Abs(Elodelta) vs. Number of games looks like this:
Abs(Elodelta) vs. Number of games looks like this:
Re: Advantage for White; Bayeselo (to Rémi Coulom)
OK, I see what you mean. By adding more parameters, one can always get a better fit. I was just looking for the best possible model that only takes rating difference into account.
I agree that in general predictions should better not be tested on the data they are derived from, because it will make you err into the direction of thinking that they are better than they really are. But for a very large data set it hardly matters. (E.g. the N/(N1) correction you need for variances computed to a mean derived from the points itself, or from an independently given one.)
In that light, what do you think of the fact that the data points seem to stipulate a steeperrising curve than the green logistic on which they are supposed to be based? Is this an indication that BayesElo's default approach is not the optimal way to extract the ratings?
I agree that in general predictions should better not be tested on the data they are derived from, because it will make you err into the direction of thinking that they are better than they really are. But for a very large data set it hardly matters. (E.g. the N/(N1) correction you need for variances computed to a mean derived from the points itself, or from an independently given one.)
In that light, what do you think of the fact that the data points seem to stipulate a steeperrising curve than the green logistic on which they are supposed to be based? Is this an indication that BayesElo's default approach is not the optimal way to extract the ratings?