A plays B and the the difference (deltaAB) has an error Eab.
Then, with the same number games, we can calculate deltaAC, and it will have error Eac = Eab (since number of games are the same).
Also, with the same number games, we can calculate deltaCB, and it will have error Ecb = Eab (since number of games are the same).
So, we can calculate indirectly
deltaAB = deltaAC + deltaCB
Here we can already see that the error of this indirect calculation is bigger than Eab, no matter what, and we are already playing twice as many games.
deltaAC and deltaCB are independent, so the error for the indirect calculation is
IndirectError_ab = sqrt(Eac^2 + Ecb^2)
IndirectError_ab = sqrt(Eac^2 + Eac^2)
IndirectError_ab = sqrt(2*Eac^2)
IndirectError_ab = sqrt(2) * Eac
If we want the IndirectError_ab to be Eab, we have to make Eac = Eab/sqrt(2). We can do that playing twice as many games, which makes the total 4x.
Miguel
No you are missing inclusion of co-variance completely. In the first A vs B test you have a big covariance so that affects the variance of A - B big time. Even HGM agreed that for the example I gave two standard errors of 5 elo each , the std(A-B) = 10 which your calculation ignores..
No, I am not missing anything. Whatever you do, Eab is always the same as Eac and Ecb, since you do exactly the same.
Miguel
PS: Anyway, If you calculate the Eab correctly, there is no covariance, it is direct measure. But this is not relevant.
No, I am not missing anything. Whatever you do, Eab is always the same as Eac and Ecb, since you do exactly the same.
But the covariances Cov(A,C) and Cov(B,C) do not contribute to Var(A,B) when you calculate Var(A-B). That is the trick there.
Miguel
PS: Anyway, If you calculate the Eab correctly, there is no covariance, it is direct measure. But this is not relevant.
This is wrong. See above. But to summarize.
Var(A,B)=Var(A)+Var(B)-2Cov(A,B)
Now when you play A vs C and B vs C
Var(A-B)=var(A)+Var(B) since Cov(A,B)=0 even if we have Cov(A,C) and Cov(B,C). So you see both covariances have no effect.
No, I am not missing anything. Whatever you do, Eab is always the same as Eac and Ecb, since you do exactly the same.
But the covariances Cov(A,C) and Cov(B,C) do not contribute to Var(A,B) when you calculate Var(A-B). That is the trick there.
Miguel
PS: Anyway, If you calculate the Eab correctly, there is no covariance, it is direct measure. But this is not relevant.
This is wrong. See above. But to summarize.
Var(A,B)=Var(A)+Var(B)-2Cov(A,B)
Now when you play A vs C and B vs C
Var(A-B)=var(A)+Var(B) since Cov(A,B)=0 even if we have Cov(A,C) and Cov(B,C). So you see both covariances have no effect.
There are no covariances, all measures are direct measures. DeltaAB, DeltaAC, and DeltaCB.
There is no such a thing as variable A because you actually measure its difference with B when you face A and B. BTW, that is not the error that BayesElo report, which is the error of A compared to the average of the pool. I am not talking about this error, I am talking about the error of DeltaAB.
No, I am not missing anything. Whatever you do, Eab is always the same as Eac and Ecb, since you do exactly the same.
But the covariances Cov(A,C) and Cov(B,C) do not contribute to Var(A,B) when you calculate Var(A-B). That is the trick there.
Miguel
PS: Anyway, If you calculate the Eab correctly, there is no covariance, it is direct measure. But this is not relevant.
This is wrong. See above. But to summarize.
Var(A,B)=Var(A)+Var(B)-2Cov(A,B)
Now when you play A vs C and B vs C
Var(A-B)=var(A)+Var(B) since Cov(A,B)=0 even if we have Cov(A,C) and Cov(B,C). So you see both covariances have no effect.
There are no covariances, all measures are direct measures. DeltaAB, DeltaAC, and DeltaCB.
There is no such a thing as variable A because you actually measure its difference with B when you face A and B. BTW, that is not the error that BayesElo report, which is the error of A compared to the average of the pool. I am not talking about this error, I am talking about the error of DeltaAB.
Miguel
Like I said already even HG agreed that the error will be twice as much for the example I gave with 180+-5 erros. Why do you think he said it would be 10 elos ?? Now we are changing the subject ..
No, I am not missing anything. Whatever you do, Eab is always the same as Eac and Ecb, since you do exactly the same.
But the covariances Cov(A,C) and Cov(B,C) do not contribute to Var(A,B) when you calculate Var(A-B). That is the trick there.
Miguel
PS: Anyway, If you calculate the Eab correctly, there is no covariance, it is direct measure. But this is not relevant.
This is wrong. See above. But to summarize.
Var(A,B)=Var(A)+Var(B)-2Cov(A,B)
Now when you play A vs C and B vs C
Var(A-B)=var(A)+Var(B) since Cov(A,B)=0 even if we have Cov(A,C) and Cov(B,C). So you see both covariances have no effect.
There are no covariances, all measures are direct measures. DeltaAB, DeltaAC, and DeltaCB.
There is no such a thing as variable A because you actually measure its difference with B when you face A and B. BTW, that is not the error that BayesElo report, which is the error of A compared to the average of the pool. I am not talking about this error, I am talking about the error of DeltaAB.
Miguel
Like I said already even HG agreed that the error will be twice as much for the example I gave with 180+-5 erros. Why do you think he said it would be 10 elos ?? Now we are changing the subject ..
That is a way to represent the results, but the direct measure is Engine_A-EngineB = 200 +/- 20. These are the numbers I am talking about. DeltaAB and Eab.
+100 is the elo compared to the average of the pool (zero), but that is a conversion after you actually found that the difference is 200. You can't calculate one elo without the other.
You are taking +100 and -100 like they are separate but not independent measures. Fine, they are correlated of course if you do it that way, but whatever you do to obtain the error, you will get +/- 20. That is Eab, which will be the same to Eac and Ecb if you do a similar match with the same number of games. From that point on, you can easily see that you need 4x games.
Miguel
Last edited by michiguel on Mon Sep 24, 2012 7:13 am, edited 1 time in total.
That is a way to represent the results, but the direct measure is Engine_A-EngineB = 200 +/- 20. These are the numbers I am talking about. DeltaAB and Eab.
+100 is the elo compared to the average of the pool (zero), but that is a conversion after you actually found that the difference is 200. You can't calculate one elo without the other.
You are taking +100 and -100 like they other separate but not independent measures. Fine, they are correlated of course, but whatever you do to obtain the error, you will get +/- 20. That is Eab, which will be the same to Eac and Ecb if you do a similar match with the same number of games. From that point on, you can easily see that you need 4x games.
Miguel
Well then the reported error of margins are wrong because both elostat and bayeselo default do report 20 (not 10) error of margin for your example. When we have multiple opponent, elostat still calculates variances for each individual by looking at all scores combined +1,0,0.5 so it completely disregards the opponent.
That is a way to represent the results, but the direct measure is Engine_A-EngineB = 200 +/- 20. These are the numbers I am talking about. DeltaAB and Eab.
+100 is the elo compared to the average of the pool (zero), but that is a conversion after you actually found that the difference is 200. You can't calculate one elo without the other.
You are taking +100 and -100 like they other separate but not independent measures. Fine, they are correlated of course, but whatever you do to obtain the error, you will get +/- 20. That is Eab, which will be the same to Eac and Ecb if you do a similar match with the same number of games. From that point on, you can easily see that you need 4x games.
Miguel
Well then the reported error of margins are wrong because both elostat and bayeselo default do report 20 (not 10) error of margin for your example. When we have multiple opponent, elostat still calculates variances for each individual by looking at all scores combined +1,0,0.5 so it completely disregards the opponent.
We don't have multiple opponents here.
If BE reports +/-20 in match between A and B (for each engine), then the error of A-B is 40.
Are you saying that when you measure the elo between A and B in a direct match, that is not a direct measure?
ResultSet-EloRating>mm 1 1
00:00:00,00
ResultSet-EloRating>ratings
Rank Name Elo + - games score oppo. draws
1 Player0 0 27 27 500 50% 0 20%
2 Player1 0 27 27 500 50% 0 20%
ResultSet-EloRating>elostat
1 iterations
00:00:00,00
ResultSet-EloRating>ratings
Rank Name Elo + - games score oppo. draws
1 Player0 0 27 27 500 50% 0 20%
2 Player1 0 27 27 500 50% 0 20%
ResultSet-EloRating>exactdist
00:00:00,06
ResultSet-EloRating>ratings
Rank Name Elo + - games score oppo. draws
1 Player0 0 15 15 500 50% 0 20%
2 Player1 0 15 15 500 50% 0 20%
ResultSet-EloRating>
If you remeber last time, I noted that exactdist gives half as much variance...
Calculate the variance like you did for a 200-200-100 and tell me if you get 27 or 15 elos. The 27 elo is just raw calculated s.e and didn't get divided by 2 unlike your suggestion...