Elo Diff is the difference between Houdini 2.0c 6 CPU and Numpty Omega
The equation for the scale is: scale = (4*10^(drawElo/400))/(1+10^(drawElo/400))^2
Code: Select all
Elo Diff drawElo scale
PURE LIST
default 1969 97.3 .925497
mm 1 1 1956 92.7639 .931969
mm 1 1, scale 1 2099 92.7639 1
COMPLETE LIST
default 1980 97.3 .925497
mm 1 1 2034 117.81 .893294
mm 1 1, scale 1 2277 117.81 1The pure list and the complete list were definitely comparable with the default values. That is because the same White advantage, drawElo, and (by default calculation) scale values were the same.
When 'mm 1 1' alone is used, we start to see some differences. The average Elo difference between opponents in the pure database is greater, leading to a lower draw rate (and a lower drawElo value). The Elo difference between Numpty and Houdini actually decreases somewhat as compared to the pure list with default values. The scale increased, but the lower drawElo value causes a lower Elo difference estimate for any particular result. Meanwhile, the difference between Numpty and Houdini on the complete list increases due to the higher drawElo value.
When scale is set to 1, the Elo difference between the reference engines increases for both lists.
I sometimes forget that I have previously thought over certain things. I remember now that I decided to worry a little less about the effects of setting the scale to 1, because using 'mm 1 1' does make comparisons between different types of databases less reasonable. The CCRL 40/4 database (complete or pure) has different characteristics than IPON. It makes complete sense to me to use the most accurate settings to compute the IPON ratings, because it is a more compact database. The IPON database is more homogeneous (the characteristics of the higher Elo games and lower Elo games of IPON are not too different) than the CCRL 40/4 list. As for the CCRL 40/4, the average draw rate does not reflect the draw rates of the top or bottom of the list very well. Thus, I do not think the drawElo value computed from the 40/4 games necessarily gives more accurate ratings. I think that it surely must be more accurate than using the default values, but maybe (given the more heterogeneous nature of the 40/4 database) it is not very much more accurate.
I have been striving to make the CCRL ratings more accurate, but maybe any improvement that has resulted from my suggestions is not worth it.