But we are not interested in the differences of A1-B resp. A2-B but in the difference A1-A2 since these are the two "candidates".hgm wrote:That is a bit more complex. There are two independent measurements there, with results D1 (for A1-B) and D2 (for A2-B) for the rating difference, and errors E1 and E2 in it. That means the program will assign the ratings:
A1 = D1 - (D1+D2)/3 = 2/3*D1 - 1/3*D2
A2 = D2 - (D1+D2)/3 = 2/3*D1 - 1/3*D1
B = 0 - (D1+D2)/3 = -1/3*D1 - 1/3*D2
So the error in the A1 rating would be the combination of 2/3*E1 and 1/3*E2, which (because these are independent errors) gives sqrt(4/9*E1*E1 + 1/9*E2*E2).
My question was this: you confirmed that in the self-play case the error of the difference A1-A2 in the self-play case is 2* the error of the ratings of A1 resp. A2 due to the "100% anti-correlation". What is the error of the difference of A1-A2 in the gauntlet case if both A1 and A2 have played twice the number of games each compared to the self-play case? (In my example I had 1000 games self-play vs. 2000+2000 games in the gauntlet case.)
After all that was posted here it is sqrt(error(A1_rating)^2 + error(A2_rating)^2), and if we further assume that both ratings have the same error margins then it becomes sqrt(2) * error(A1_rating). Therefore, to get the same accuracy in the error of rating differences when comparing A1-A2 self-play vs. gauntlet we need 2x the number of games for a gauntlet compared to self-play, not 4x.
Right or wrong?
Sven
