ELOStat, Ordo, and Bayeselo - Part II

Adam Hair · Post by **Adam Hair** » Sat Jul 14, 2012 4:05 am

Laskos wrote:
Adam Hair wrote:After setting the scale to 1, the ratings compression I found for the Bayeselo results went away.

Here are the tables again, with the Bayeselo (computed values) results corrected:
Code: Select all
IPON                        Estimated Denominator        95% Confidence Interval
ELOStat                      363.87                             (351.30, 376.43)
Ordo                         397.86                             (384.78, 410.94)
Ordo with offset(38.7)       399.64                             (394.63, 404.65)
Bayeselo default             320.46                             (315.47, 325.45)
Bayeselo adjusted            396.09                             (390.13, 402.04)
(50.3872; 167.083; 0.1; 1)
Code: Select all
CCRL 40/4                  Estimated Denominator        95% Confidence Interval
Elostat                     374.63                             (366.66, 382.61)
Ordo                        396.61                             (387.59, 405.63)
Ordo with offset(26.35)     399.08                             (393.79, 404.37)
Bayeselo default            361.44                             (355.42, 367.46)
Bayeselo adjusted           400.22                             (392.69, 407.75)
(32.0745; 118.751; 0.1; 1)
Code: Select all
ChessWar                 Estimated Denominator        95% Confidence Interval
Elostat                    300.45                          (289.79, 311.12)
Ordo                       372.58                          (359.00, 386.16)
Ordo with offset(28.34)    376.71                          (364.29, 389.14)
Bayeselo default           304.94                          (292.27, 317.61)
Bayeselo adjusted          374.35                          (359.74, 388.96)
(33.2002; 104.259; 0.1; 1)
The reports and graphs (with the Bayeselo results corrected) can be found at http://www.mediafire.com/?io6p0m4uaupt1he
Very good Adam, thanks. I hope from now on CCRL (and others) will either use Ordo or the adjusted Bayeselo. I didn't know that Bayeselo has that parameter "scale" which fixes it, in fact, if somebody revealed it earlier, the endless discussions about the rating compression of Bayeselo (a true compression denied by many with some very complicated arguments) would have been shorter.
There is a problem with ChessWar results. It may come from two sources: first, as you noted, the engines are sparsely connected. Second, more intriguing, on a larger range of strength in ChessWar, the deviation from the Logistic in the tails may be more visible.
Can you please do the following: for ChessWar results, plot with Denominator exactly 400 the dots for Ordo (with offset) and Bayeselo (adjusted)? I want to see if there is a match on smaller regions around 0 diff. Then, if you can, with (one or more) sub-sets of ChessWar results on a _smaller_ range (comparable say to IPON), try to have estimates for the Denominator for Ordo (offset) and Bayeselo (adjusted). The reason I suspect a bit that the curve on the tails is not exactly logistic is that I performed some very tiny experiments with engines on a wide range, and the specific to the logistic relationships between the scores were broken at the tails. My sample was too tiny.

Thanks, and thanks to Miguel for Ordo which confirmed my suspicions about Bayeselo (default) and EloStat. I hope Remi sets the Bayeselo defaults to the adjusted values.

Kai

I will see what I can do for you, Kai. I did try a smaller subset (centered on 0) from the ChessWar data, but the denominator did not change much. Possibly that is because the regressions are weighted, and the majority of the points are from matches where the Elo difference is not too large. Those points already had much influence on the regression. But I will spend some more time at this. Possibly it will be seen why the ChessWar ratings diverge from the logistic model.

Laskos · Post by **Laskos** » Sun Jul 15, 2012 6:12 am

Adam Hair wrote:
I will see what I can do for you, Kai. I did try a smaller subset (centered on 0) from the ChessWar data, but the denominator did not change much. Possibly that is because the regressions are weighted, and the majority of the points are from matches where the Elo difference is not too large. Those points already had much influence on the regression. But I will spend some more time at this. Possibly it will be seen why the ChessWar ratings diverge from the logistic model.

If I understood, Bayeselo compresses the ratings because mm is not set to 11, and when mm is set to 11, it compresses the ratings because scale is not set to 1. I don't know why one should bother so much about small number of games, when using chess-specific knowledge of results might be useful.
The idea that the curve is not exactly logistic, but tends towards a Gaussian is shown a bit in the "Absolute Error vs Elo Difference" plots in your PDF files. In ChessWar case, the tail errors seem to be centro-symmetric. Smaller range IPON and CCRL errors do not show this.

Kai

ELOStat, Ordo, and Bayeselo - Part II

Re: ELOStat, Ordo, and Bayeselo - Part II

Re: ELOStat, Ordo, and Bayeselo - Part II