I will see what I can do for you, Kai. I did try a smaller subset (centered on 0) from the ChessWar data, but the denominator did not change much. Possibly that is because the regressions are weighted, and the majority of the points are from matches where the Elo difference is not too large. Those points already had much influence on the regression. But I will spend some more time at this. Possibly it will be seen why the ChessWar ratings diverge from the logistic model.Laskos wrote:Very good Adam, thanks. I hope from now on CCRL (and others) will either use Ordo or the adjusted Bayeselo. I didn't know that Bayeselo has that parameter "scale" which fixes it, in fact, if somebody revealed it earlier, the endless discussions about the rating compression of Bayeselo (a true compression denied by many with some very complicated arguments) would have been shorter.Adam Hair wrote:After setting the scale to 1, the ratings compression I found for the Bayeselo results went away.
Here are the tables again, with the Bayeselo (computed values) results corrected:
Code: Select all
IPON Estimated Denominator 95% Confidence Interval ELOStat 363.87 (351.30, 376.43) Ordo 397.86 (384.78, 410.94) Ordo with offset(38.7) 399.64 (394.63, 404.65) Bayeselo default 320.46 (315.47, 325.45) Bayeselo adjusted 396.09 (390.13, 402.04) (50.3872; 167.083; 0.1; 1)
Code: Select all
CCRL 40/4 Estimated Denominator 95% Confidence Interval Elostat 374.63 (366.66, 382.61) Ordo 396.61 (387.59, 405.63) Ordo with offset(26.35) 399.08 (393.79, 404.37) Bayeselo default 361.44 (355.42, 367.46) Bayeselo adjusted 400.22 (392.69, 407.75) (32.0745; 118.751; 0.1; 1)
The reports and graphs (with the Bayeselo results corrected) can be found at http://www.mediafire.com/?io6p0m4uaupt1heCode: Select all
ChessWar Estimated Denominator 95% Confidence Interval Elostat 300.45 (289.79, 311.12) Ordo 372.58 (359.00, 386.16) Ordo with offset(28.34) 376.71 (364.29, 389.14) Bayeselo default 304.94 (292.27, 317.61) Bayeselo adjusted 374.35 (359.74, 388.96) (33.2002; 104.259; 0.1; 1)
There is a problem with ChessWar results. It may come from two sources: first, as you noted, the engines are sparsely connected. Second, more intriguing, on a larger range of strength in ChessWar, the deviation from the Logistic in the tails may be more visible.
Can you please do the following: for ChessWar results, plot with Denominator exactly 400 the dots for Ordo (with offset) and Bayeselo (adjusted)? I want to see if there is a match on smaller regions around 0 diff. Then, if you can, with (one or more) sub-sets of ChessWar results on a _smaller_ range (comparable say to IPON), try to have estimates for the Denominator for Ordo (offset) and Bayeselo (adjusted). The reason I suspect a bit that the curve on the tails is not exactly logistic is that I performed some very tiny experiments with engines on a wide range, and the specific to the logistic relationships between the scores were broken at the tails. My sample was too tiny.
Thanks, and thanks to Miguel for Ordo which confirmed my suspicions about Bayeselo (default) and EloStat. I hope Remi sets the Bayeselo defaults to the adjusted values.
Kai
ELOStat, Ordo, and Bayeselo - Part II
Moderator: Ras
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: ELOStat, Ordo, and Bayeselo - Part II
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: ELOStat, Ordo, and Bayeselo - Part II
If I understood, Bayeselo compresses the ratings because mm is not set to 11, and when mm is set to 11, it compresses the ratings because scale is not set to 1. I don't know why one should bother so much about small number of games, when using chess-specific knowledge of results might be useful.Adam Hair wrote:
I will see what I can do for you, Kai. I did try a smaller subset (centered on 0) from the ChessWar data, but the denominator did not change much. Possibly that is because the regressions are weighted, and the majority of the points are from matches where the Elo difference is not too large. Those points already had much influence on the regression. But I will spend some more time at this. Possibly it will be seen why the ChessWar ratings diverge from the logistic model.
The idea that the curve is not exactly logistic, but tends towards a Gaussian is shown a bit in the "Absolute Error vs Elo Difference" plots in your PDF files. In ChessWar case, the tail errors seem to be centro-symmetric. Smaller range IPON and CCRL errors do not show this.
Kai