Daniel Shawul wrote:
Note that the right test to test the possible models should be by finding the model that gives smaller error function in predicting results.
basically you have rating r that you calculate based on games for the time before the game and you also have a function of expected result that is
expected_result(rating_white,rating_black)
You calculate error function that is sum of the squares of the difference between expected_result and observed_result and the model that is best is the model that gives a smaller value for the error function based on many games.
This is exactly what is done in the paper, namely cross-correlation tests, but i am sure there are better bayesian model selection approaches. Knowing how hard it is to prove which model is best, it baffles me that some here 'clearly' see how davidson model is better.
That has been my qualm, you can't really know until you make the test with all draw models. We did the standard cross-10, i.e. partition the data set in to 10 groups, train on 9 of them and test on 1 subset. DV showed better match there and also for cross-2, and cross-4 on CCRL/CEGT blitz and standard tc ratings, for a total of 4.
The model are called 'draw models' but they do have parameter for home advantage (white/black). It may also be possible to add other modifiers like game length to change ratings. Quick wins give you higher ratings. The point system is the same for all just the ratings. When you say 1W+3D, you get 62.5% and then you add other necessary values like draw ratio is 3/4=75%, game length for the win=30, that win is with black, against a 2300 elo but all draws are against a 2400 elo... etc. So all will contribute to the the predicted rating (strength) of the player, and the different 'draw' models assign different strength obviously.
1)For chess programs rating:
Using game length may increase the rating of programs that never resign(or if the interface adjudicate games based on evaluations it is going to increase the rating of programs that never show very bad evaluation).
If you want to use the pgn of the games and not only the results then it is better to use computer analysis of the games in order to calculate rating
so both players can earn rating points if they played better than their rating and it is possible that both players lose rating points if they played worse than their rating based on computer analysis.
Note that
I do not like this idea because there is a problem in calculating the rating of the strong programs in this way.
For example if you use houdini to analyze the games of houdini it may increase houdini's rating and if you want accurate result by computer analysis you may need significantly more time to analyze the games
relative to the time that is used to play the games.
2)For chess human rating:
I am against all these ideas in human-human games because they encourage cheating.
2 players can simply prepare their game at home and earn rating points from their draw if you use computer analysis to calculate rating.
I am also against the idea that 2 draws do not give the same as win and loss for rating or for ranking of humans because I think this idea also encourage cheating(if a pair of win and loss is not equal to 2 draws then players with equal strength can get motivation to fix their result before the game so they get more from their expected 50% result).