Why is it hard to make a fair comparison between Ordo and bayeselo (state of the art) and list the improvements rather than leaving testers in endless confusion ('scale 1') in the hope that people choose other tools. There is a name for such intellectual dishonesty i.e ignoring other people's work all in all and stating your work started from scratch. The mass would like to know if there are improvements on the state of the art not if someone invented the wheel yet again.I am going to speak up here. Not because I am trying to be in opposition to you, nor to promote one ratings program over another one (I use all three, depending on the situation). In fact neither one of the authors have tried to promote their program over the other program.
Now you are replying to an old post ??? I am not the one who has a webpage promoting ordo, and is effectively co-author, making unfair comparisons etc. Someone like me is going to speak up when you base your conclusions on things you introduced yourself (sclae=1)As far as I can tell, the only person that has treated this as a popularity contest is you. If some person, rightly or wrongly, says there is some problem with Bayeselo, you go out of your way to denigrate Ordo in the act of defending Bayeselo. If this is not a popularity contest, then we should be able to talk about the strengths and weaknesses of both tools in a rational manner with out the need to resort to rhetoric. The best way to judge between the two tools is to check the predictive power of both tools. I had started this a few weeks ago, but I have been more interested in studying material imbalances and parameter tuning lately. If necessary, I can start back on that. I can tell you that both tools do well with well connected sets of games, as any decent rating tool should.
Ordo don't have draw models
Ordo don't have white advantage (atleast before it picks it up from bayeselo)
Ordo don't have LOS
Ordo don't have error bars
Ordo don't have a lot more other features
Ordo has infoerior elo algorithms.
Now why would it would be any better if it has inferior algorithms?? That boggles my mind. If you think it is better list what makes it better like I did for the opposite.
ROFL you still pushing scale=1. The best thing for you would be to ask the author and educate yourself. Remi never said you use that and infact explained to you why your comparison was bullshit but you are still pushing this afterall since it suits your purpose.Getting to Larry's question, the extra parameters of Bayeselo are like a double-edged sword. Using the default values keeps the scale of ratings from different databases nearly the same, which allows for comparison of ratings from different databases. But that throws out much of the extra information that Bayeselo's refinements can wring from a database. In other words, using the default values can make the ratings less accurate than those from the simpler logistic model. Using the estimated parameter values may make the estimated ratings more accurate, but then the ratings will be less useful (uncomparable to other ratings due to the dependency on the draw rate).
The best thing to do (if applicable) is to combine all of the games together and used the estimated values for White advantage and drawElo, and let scale=1.
However, if the draw rate varies across the entire database, it is not clear that whatever drawElo that is used, whether it be the default or estimated from the database, produces more accurate ratings than the simpler logistic model without draws. The only way to be certain (as far as I know) is to check the predictive power of the ratings via cross-validation for a particular database.
Nope, they aren't even comparable. This is another promotion which leaves the users in doubt. From algorithm or feature point of view there is no comparison at all.In the end, I think that the best rating tool may depend on the database and the purpose for the ratings