From this it follows that a draw between twoplayers is twice as strong evidence for their equality

As I said above this is not quite true even for the logistic cdf since
drawelo also influences the likelihood assigned to a win or a loss.

So drawelo does not fall out of the computation I think.

Perhaps he meant between two equal chessplayers, then it should be true, since if the players are equal then win and loss probability are identical regardless of drawelo.

Indeed, Jeff Sonas has done such verification of the rating model, but he did it for human games, and it is not obvious at all that computer games would follow the same model as what works for humans. So it is important to do it with computer games, preferably between the engines whose results you are analyzing. (If there are enough of those.)

I really don't know if there is any reason to be suspicious of the BayesElo model. The underlying assumption, that a player's instantaneous performance fluctuates around a average with some distribution, and thatthe game will be drawn if the instntaeous performance is too close to that of the opponent is a quite natural one.

It all depends on how the rating curve looks, especially in the tails. There is no fundamental reason why two draws should give equivalent information on relative strength as one win and one loss. This would essentially be a coincidence. Almost no rating model would have that.

hgm wrote:Indeed, Jeff Sonas has done such verification of the rating model, but he did it for human , and it is not obvious at all that computer games would follow the same model as what works for humans. So it is important to do it with computer games, preferably between the engines whose results you are analyzing. (If there are enough of those.)

I really don't know if there is any reason to be suspicious of the BayesElo model. The underlying assumption, that a player's instantaneous performance fluctuates around a average with some distribution, and thatthe game will be drawn if the instntaeous performance is too close to that of the opponent is a quite natural one.

It all depends on how the rating curve looks, especially in the tails. There is no fundamental reason why two draws should give equivalent information on relative strength as one win and one loss. This would essentially be a coincidence. Almost no rating model would have that.

What conclusion did Sonas reach regarding the human games and draws?

If the BayesElo model is correct, then couldn't it be used to break ties in round-robin events? The players with the same score would not necessarily have the same BayesElo rating, and presumably the one with the higher BayesElo rating has in some real sense performed better. Why has this never been tried or suggested (or has it)? What would be the consequences? Would it be similar to any known tiebreak, or quite different?

Michel wrote:As I said above this is not quite true even for the logistic cdf since
drawelo also influences the likelihood assigned to a win or a loss.

So drawelo does not fall out of the computation I think.

Not entirely, but its effect is second-order. What I stated is valid in the limit that F(x+h)-F(x-h) is equal to the derivative (after scaling). And the error in that is only second-order, while the default value for drawElo is quite small.

But even when you start noticing effects of drawElo, it does not follow that the factor two in weight would disappear, or even that it would get smaller.

lkaufman wrote:[What conclusion did Sonas reach regarding the human games and draws?

I do't recall that. I do know that he found the standard models for total score were no good, however, and that a linear model would give a much better fit. That behavior cannot persist in the tails, obviously, but there was little data there.

If the BayesElo model is correct, then couldn't it be used to break ties in round-robin events? The players with the same score would not necessarily have the same BayesElo rating, and presumably the one with the higher BayesElo rating has in some real sense performed better. Why has this never been tried or suggested (or has it)? What would be the consequences? Would it be similar to any known tiebreak, or quite different?

It could be used for that. One reason could be that it needs a computer, while tie breakers like SB can be done by hand. (Well, at least with a pocket calculator for most, nowadays... ) Performance ratings could also be different without double-counting draws, btw, depending on which you beat. This would probably be similar to SB, though.

Michel wrote:As I said above this is not quite true even for the logistic cdf since
drawelo also influences the likelihood assigned to a win or a loss.

So drawelo does not fall out of the computation I think.

Not entirely, but its effect is second-order. What I stated is valid in the limit that F(x+h)-F(x-h) is equal to the derivative (after scaling). And the error in that is only second-order, while the default value for drawElo is quite small.

That is correct. But the likelihood for win and loss are F(x-h) and
1-F(x+h). And these are not second order differences from the values
F(x),1-F(x).

I admit I did not think of that, but are you sure? For one win and one loss the likelihood becomes

F(x-h) * (1 - F(x+h)) = (F - hF')* (1 - F - hF') + O(h^2) =
= F * (1-F) -hF' * (1-F) - hF' * F + O(h^2)
= F * (1-F) - hF' + O(h^2)

(all F and F' taken in x unless specified otherwise).

Now since F * (1-F) is proportional to F' and of O(1), this means that the shape of the likelyhood distribution is F' upto an error of O(h^2). And with Bayes only the shape counts, not the normalization factor (which is indeed 1+O(h), but also O(h) for the draws).

It suddenly occurred to me that if BayesElo is correct, then when doing normal sequential ratings (like USCF or FIDE) wouldn't it be correct to rate each draw twice, or alternatively to only half-rate wins and losses? That doesn't sound right, but it would seem to be logical if the underlying assumption of BayesElo is right. Only in that way would one draw = one win plus one loss.

I admit I did not think of that, but are you sure? For one win and one loss the likelihood becomes

F(x-h) * (1 - F(x+h)) = (F - hF')* (1 - F - hF') + O(h^2) =
= F * (1-F) -hF' * (1-F) - hF' * F + O(h^2)
= F * (1-F) - hF' + O(h^2)

(all F and F' taken in x unless specified otherwise).

Now since F * (1-F) is proportional to F' and of O(1), this means that the shape of the likelyhood distribution is F' upto an error of O(h^2). And with Bayes only the shape counts, not the normalization factor (which is indeed 1+O(h), but also O(h) for the draws).

Seems correct! I considered it unlikely that this would work and didn't check it....