http://hardy.uhasselt.be/Toga/normalized_elo.pdf
has nice statistical properties, like for example, inverse square of it giving the number of games to the same statistical significance of the result. But for one condition for it to be consistent in rating schemes, one has to check for the additivity in the case of several engines:
Engines 1,2,3
Norm_ELO(2,1) + Norm_ELO(3,2) = Norm_ELO(3,1)
I used a database to check whether the Logistic ELO or the Gaussian ELO are more adequate for engines in this thread:
http://www.talkchess.com/forum/viewtopic.php?t=60791
As I will use the same database (and one more of past weekend), I will repost the OP:
I let play a massive amount of games (total 105,000) in round-robin at fixed nodes between different engines like Stockfish, Texel, Andscacs, etc. for accuracy. The engines were distanced between themselves by an order of 200 ELO points each, so that each individual ELO interval between them is almost linear in ELO-score and independent of the ELO model. The largest total difference between engines was of order of 1400 ELO points, I needed large differences because large differences between ELO models occur for large ELO differences. For each individual match I computed the total Logistic ELO difference, on large ELO intervals. This is the horizontal axis. Then, the consistent ELO is the sum of small differences between engines cumulated to give the total difference. If the Logistic model is consistent these two should be equal, and the diagonal from (0,0) to (1400,1400) would be the fit. If the Gaussian or other model is more consistent, the dots should deviate from the diagonal. They do not very much. Gaussian model seems ruled out, and Logistic ELO model for computer chess engines seems to stand well on this try. My earlier results were mixed because of fewer data points and fewer games for each data point.
The data:
Code: Select all
Individual statistics:
1 SF2 : 2381 35000 (+32134,=1275,-1591), 93.6 %
T1 : 7000 (+6950,= 44,- 6), 99.6 %
Ha1 : 7000 (+6996,= 4,- 0), 100.0 %
T2 : 7000 (+4574,=969,-1457), 72.3 %
R2 : 7000 (+6625,=248,-127), 96.4 %
R1 : 7000 (+6989,= 10,- 1), 99.9 %
2 T2 : 2232 35000 (+28760,=1323,-4917), 84.1 %
SF2 : 7000 (+1457,=969,-4574), 27.7 %
T1 : 7000 (+6968,= 29,- 3), 99.8 %
Ha1 : 7000 (+6991,= 8,- 1), 99.9 %
R2 : 7000 (+6355,=308,-337), 93.0 %
R1 : 7000 (+6989,= 9,- 2), 99.9 %
3 R2 : 2051 35000 (+20528,=1016,-13456), 60.1 %
SF2 : 7000 (+127,=248,-6625), 3.6 %
T1 : 7000 (+6302,=332,-366), 92.4 %
Ha1 : 7000 (+6910,= 45,- 45), 99.0 %
T2 : 7000 (+337,=308,-6355), 7.0 %
R1 : 7000 (+6852,= 83,- 65), 98.5 %
4 T1 : 1898 35000 (+11060,=1952,-21988), 34.4 %
SF2 : 7000 (+ 6,= 44,-6950), 0.4 %
Ha1 : 7000 (+5750,=554,-696), 86.1 %
T2 : 7000 (+ 3,= 29,-6968), 0.2 %
R2 : 7000 (+366,=332,-6302), 7.6 %
R1 : 7000 (+4935,=993,-1072), 77.6 %
5 R1 : 1778 35000 (+5667,=1666,-27667), 18.6 %
SF2 : 7000 (+ 1,= 10,-6989), 0.1 %
T1 : 7000 (+1072,=993,-4935), 22.4 %
Ha1 : 7000 (+4527,=571,-1902), 68.8 %
T2 : 7000 (+ 2,= 9,-6989), 0.1 %
R2 : 7000 (+ 65,= 83,-6852), 1.5 %
6 Ha1 : 1661 35000 (+2644,=1182,-31174), 9.2 %
SF2 : 7000 (+ 0,= 4,-6996), 0.0 %
T1 : 7000 (+696,=554,-5750), 13.9 %
T2 : 7000 (+ 1,= 8,-6991), 0.1 %
R2 : 7000 (+ 45,= 45,-6910), 1.0 %
R1 : 7000 (+1902,=571,-4527), 31.2 %

