hgm wrote:Rémi Coulom wrote:Well, I don't see how the behaviour of BayesElo is incorrect. What would be a "correct" way to handle a 100% winning rate? In BayesElo, this is handled by the prior. It looks OK to me. If you feed this situation to Bayeselo, it will give a big difference of playing strength between the losing group and the winning group. I cannot see what else it could do.
But that is exactly the point. When you feed this it does
not give you a big difference between the two. Perhaps 300 Elo or so. While it is obvious that the difference should be at least 900 Elo.
At least with the default prior setting. If you take a near-zero prior things improve. But without prior BayesElo basically ceases to be BayesElo.
The problem is that BayesElo distributes the virtual draws equally over all pairs of players. And with 200 players, and 2 virtual draws per player, there are 400 virtual draws. As approximately half of the 200*199/2 pairs of players are from diffent groups (one in A, the other in B), BayesElo counts about 200 virtual draws between the groups. So a result of 100-0 is counted like it was 200-100. And that doesn't look so bad for group B...
In the case you describe, I don't think there is any problem with the behavior of bayeselo. In case there is a result of 100-0 between the two groups and 200 virtual draws, this means that every player has played only two games. It is not reasonable to give a huge rating difference based on only two games. With such a small number of games, the two groups do not exist as groups. In order to make them groups, they need to play several games inside each group, so that the ratings of all the members get strongly correlated.
Imagine the extreme case where every player has played only one game, against the other group. In this case we have a 100-0 score between the two groups. But it cannot be considered like a 100-0 score between two individual: the ratings of the players inside each group are completely uncorrelated. So, the two virtual draws will produce a reasonable strength difference.
In order to have this correlation, we need to have more games inside each group. Just one additional game, like in your situation, will produce extremely weak correlation. Especially for those players who won or lost. The more they play games inside each group, the smaller the strength of the prior on the 100 games that link the two groups. So the two groups will move farther apart as they play games inside each group.
So, I believe that this example is a good example of a situation where bayeselo behaves particularly well. Especially if you compare it to Elostat. As soon as each group has played enough games to be strongly correlated, they will reach that 900 Elo-point difference.
In fact, I would say that if there is a problem with bayeselo in this situation, it is rather the reverse of what you said: as the number of games inside each group goes to infinity, the difference in playing strength between the two groups will go to infinity, although its evaluation is based on only 100 games.
In order to prevent this from happening, it might be a good idea to add an "absolute prior" to bayeselo. For instance, add a virtual player with a virtual draw against everybody. That may be a good additional prior option to add to the program.
Rémi