Code: Select all
Rank Name Elo + - games score oppo. draws
1 171 495 154 154 199 59% 415 0%
Rémi
Moderator: Ras
Code: Select all
Rank Name Elo + - games score oppo. draws
1 171 495 154 154 199 59% 415 0%
Well, of course you woud consider a 100-0 result garbage, because 100 is too few games. With you even a 25,000 game result is garbage...bob wrote:I would draw a completely different conclusion. Ratings within each pool should be highly accurate. ratings between the pools can't possibly be accurate with so few games. So the old GIGO applies here. The original data is garbage from the perspective of being useful to predice overall ratings...
If the relative rating of the players within a group is extremely certain from the large number of games within the group, the group indeed starts to behave as if it was a single player (probably receiving a known handicap in some of the gmes).Sven Schüle wrote:There is no rating for a whole group, only for single players. The competition is not a competition of two groups but of individual players. A 100-0 result of A players against B players, where each A player met one B opponent, does not say much about the playing strength of any participant. It is also possible that 95 A players are stronger than their opponents but 5 A players are weaker, or even that only 30 A players are stronger and 70 A players are weaker, you can't tell that from 1 game, nor who is stronger by how much.
Code: Select all
Rank Name Elo + - games score oppo. draws
1 171 495 47 47 199 59% 415 0%
2 168 491 47 47 199 58% 416 0%
[...]
99 134 359 47 47 199 43% 417 0%
100 117 355 47 47 199 43% 417 0%
101 60 -337 47 47 199 59% -417 0%
102 3 -368 47 47 199 56% -417 0%
[...]
The virtual draws are distributed evenly over all the games. If the prior is 2, and only two players play against each other, 2 virtual draws are added to the match. To make this work in the more general case, the number of virtual draws added between two players is, for every game, 1/N1 + 1/N2, where N1 is the number of games played by player 1 and N2 is the number of games played by player 2. The number of virtual draws only depends on the number of players, not on the number of games.hgm wrote:I assumed that, if there were 200 players, like in the example above, and prior = 2, you would assign 2/99 draws along any link, so that each player in total has 2 draws. But this would mean that there are 2/99 * 10,000 is about 202 virtual draws cnnecting group A and B, making the total result efectively 201-101. And that is not compatible with the result you print here so apparently it is not what you do.
So what is your algorithm for assigning the virtual draws to the links of the pairing network???
And in particular, how would you assign them in the other example (of the two gauntlets, of A and B against C1-C1000)?
This is correcthgm wrote: I assumed that, if there were 200 players, like in the example above, and prior = 2, you would assign 2/99 draws along any link, so that each player in total has 2 draws.
I don't understand this. 99 = 98 in the round-robin + 1 against the other group, right ? that is only 100 games between the two groups, so 2/99*100 = 2.02 virtual draws.But this would mean that there are 2/99 * 10,000 is about 202 virtual draws cnnecting group A and B, making the total result efectively 201-101. And that is not compatible with the result you print here so apparently it is not what you do.
OK, this is different from what I thought. So this cannot be the explanation for the strange results I got on the ChessWar PromoDivision. What seems a bit inconsistent to me intuitively, though, is that the number of draws assigned to a link is independent of the number of games played along that link.... unless that number happens to be zero. This causes a suspect discontinuity, somehow.Rémi Coulom wrote:The virtual draws are distributed evenly over all the games. If the prior is 2, and only two players play against each other, 2 virtual draws are added to the match. To make this work in the more general case, the number of virtual draws added between two players is, for every game, 1/N1 + 1/N2, where N1 is the number of games played by player 1 and N2 is the number of games played by player 2. The number of virtual draws only depends on the number of players, not on the number of games.
Well, with the virtual-draw assignment scheme as I understand it now, not the previous example becomes the Achilles heel of BayesElo, but this two-gauntlets one: by assigning 1,001 virtual draws to each link, A and B get connected to the C group each by 1001 virtual draws. If they played each C opponent only once, there were only 1000 real games. So a 990-10 result in favor of A, and B lost 990-10, th virtual draws make the effective result 1490.5-510.5. Which onlly corresponds to some 200 Elo between A or B and the C-group-average, so 400 between A and B.So, if A plays the same number of games against C1-C1000, then 1,001 virtual draws will be added for every pair.
Well, I don't think the new example falls in that categoryAs I said in my previous message, this may cause a problem by adding too few virtual draws between two densely connected parts of the graph.
It may be better to also have an independ prior, ie a virtual player with some draws against everybody. But when I tried to optimize the prior by cross-validation on WBEC data (and other), I found that adding that virtual player did not improve predictions. I suppose in every normal pairing situation, the current scheme of bayeselo should work.
It is always possible to invent artificial situations where the prior is wrong, anyways, whatever the prior.
If we are going to use the output of the program do find out whether a version is stronger than another, then it is good to have a strong prior. Maybe in the case you have in mind, when trying to make prediction from very few games, a weaker prior is better.
Rémi
Hi Rémi,Rémi Coulom wrote:Note, by the way, that on the same data, if you run the "covariance" commande before listing the ratings, you'll get that instead:Covariance takes into consideration the uncertainty of the link between the two groups.Code: Select all
Rank Name Elo + - games score oppo. draws 1 171 495 154 154 199 59% 415 0%
Rémi
I agree that the prior is wrong in this situation.hgm wrote:Well, with the virtual-draw assignment scheme as I understand it now, not the previous example becomes the Achilles heel of BayesElo, but this two-gauntlets one: by assigning 1,001 virtual draws to each link, A and B get connected to the C group each by 1001 virtual draws. If they played each C opponent only once, there were only 1000 real games. So a 990-10 result in favor of A, and B lost 990-10, th virtual draws make the effective result 1490.5-510.5. Which onlly corresponds to some 200 Elo between A or B and the C-group-average, so 400 between A and B.
But the score difference 99% vs 1% for two identical 1000-game gauntlets should allow you to conclude with a pretty large confidence that the Elo difference is over 1200.
Note that comparing gauntlet results is a very common habit, and that a gauntlet against 1000 different opponents with 1 game each would be considered a more reliable measure of Elo difference than one of 10 opponents, 10 games each.
Yes, this is the idea I suggested in my previous posts. Your example convinces me that I should add this as an option to bayeselo.It seems to me what is needed here is an assignment of virtual draws that somhow 'anchor' the C players (perhaps to each othr, or to a virtual player), without putting too much of a budon on the A and B player.
They are the same for all the players.Sven Schüle wrote: Hi Rémi,
could you please post the whole result after "covariance", or at least the most typical part of it, since I would like to see the error margins given for the other group's players, too.
Thanks,
Sven