Ordo vs. Bayeselo

Daniel Shawul · Post by **Daniel Shawul** » Sun Sep 30, 2012 2:11 pm

I am going to speak up here. Not because I am trying to be in opposition to you, nor to promote one ratings program over another one (I use all three, depending on the situation). In fact neither one of the authors have tried to promote their program over the other program.

Why is it hard to make a fair comparison between Ordo and bayeselo (state of the art) and list the improvements rather than leaving testers in endless confusion ('scale 1') in the hope that people choose other tools. There is a name for such intellectual dishonesty i.e ignoring other people's work all in all and stating your work started from scratch. The mass would like to know if there are improvements on the state of the art not if someone invented the wheel yet again.

As far as I can tell, the only person that has treated this as a popularity contest is you. If some person, rightly or wrongly, says there is some problem with Bayeselo, you go out of your way to denigrate Ordo in the act of defending Bayeselo. If this is not a popularity contest, then we should be able to talk about the strengths and weaknesses of both tools in a rational manner with out the need to resort to rhetoric. The best way to judge between the two tools is to check the predictive power of both tools. I had started this a few weeks ago, but I have been more interested in studying material imbalances and parameter tuning lately. If necessary, I can start back on that. I can tell you that both tools do well with well connected sets of games, as any decent rating tool should.

Now you are replying to an old post ??? I am not the one who has a webpage promoting ordo, and is effectively co-author, making unfair comparisons etc. Someone like me is going to speak up when you base your conclusions on things you introduced yourself (sclae=1)

Ordo don't have draw models
Ordo don't have white advantage (atleast before it picks it up from bayeselo)
Ordo don't have LOS
Ordo don't have error bars
Ordo don't have a lot more other features
Ordo has infoerior elo algorithms.
Now why would it would be any better if it has inferior algorithms?? That boggles my mind. If you think it is better list what makes it better like I did for the opposite.

Getting to Larry's question, the extra parameters of Bayeselo are like a double-edged sword. Using the default values keeps the scale of ratings from different databases nearly the same, which allows for comparison of ratings from different databases. But that throws out much of the extra information that Bayeselo's refinements can wring from a database. In other words, using the default values can make the ratings less accurate than those from the simpler logistic model. Using the estimated parameter values may make the estimated ratings more accurate, but then the ratings will be less useful (uncomparable to other ratings due to the dependency on the draw rate).

The best thing to do (if applicable) is to combine all of the games together and used the estimated values for White advantage and drawElo, and let scale=1.

ROFL you still pushing scale=1. The best thing for you would be to ask the author and educate yourself. Remi never said you use that and infact explained to you why your comparison was bullshit but you are still pushing this afterall since it suits your purpose.

However, if the draw rate varies across the entire database, it is not clear that whatever drawElo that is used, whether it be the default or estimated from the database, produces more accurate ratings than the simpler logistic model without draws. The only way to be certain (as far as I know) is to check the predictive power of the ratings via cross-validation for a particular database.

In the end, I think that the best rating tool may depend on the database and the purpose for the ratings

Nope, they aren't even comparable. This is another promotion which leaves the users in doubt. From algorithm or feature point of view there is no comparison at all.

Michel · Post by **Michel** » Sun Sep 30, 2012 2:27 pm

No, the problem with Bayeselo is that it does not give the correct predictions or predictions which would obey the logistic model used by Bayeselo.

I think this is precisely what the scale parameter corrects. So one should _not_ set scale=1 but use the default instead (I wonder why people would fiddle with defaults if they don't know what they are doing).

Laskos · Post by **Laskos** » Sun Sep 30, 2012 2:32 pm

Daniel Shawul wrote: then it would be eloDelta-eloAdv+eloDraw that will match 75%.

Then, if changing sides, it will be - ( eloDelta-eloAdv+eloDraw ) or - ( eloDelta-eloAdv) + eloDraw to give 25%?

Laskos · Post by **Laskos** » Sun Sep 30, 2012 2:34 pm

Michel wrote:
No, the problem with Bayeselo is that it does not give the correct predictions or predictions which would obey the logistic model used by Bayeselo.
I think this is precisely what the scale parameter corrects. So one should _not_ set scale=1.

Adam and others have shown that the predicted ratings in this case obey a different logistic than the assumed one. Fiddling with the "scale" (setting it to 1) was suggested by Remi, in order to correct for that.

Michel · Post by **Michel** » Sun Sep 30, 2012 2:37 pm

Adam and others have shown that the predicted ratings in this case obey a different logistic than the assumed one.

You mean using the default scale?

Of course it is not precisely the usual logistic but very close. The usual logistic is a (probably very crude) approximation anyway. I am going out on a limb here but I think the default scale is chosen in such a way that the second(?) derivative at zero of the expected score matches the one predicted by the standard logistic (Daniel: can you confirm this?).

Laskos · Post by **Laskos** » Sun Sep 30, 2012 2:40 pm

Michel wrote:
Adam and others have shown that the predicted ratings in this case obey a different logistic than the assumed one.
You mean using the default scale?

Yes

Of course it is not precisely the usual logistic but very close. The usual logistic is a (probably very crude) approximation anyway.

Not very close with the default settings, often 340 (and variable) instead of 400 logistic. Can Remi give a dictionary of what his Bayeselo points mean?

Daniel Shawul · Post by **Daniel Shawul** » Sun Sep 30, 2012 2:41 pm

Laskos wrote:
Michel wrote:
No, the problem with Bayeselo is that it does not give the correct predictions or predictions which would obey the logistic model used by Bayeselo.
I think this is precisely what the scale parameter corrects. So one should _not_ set scale=1.
Adam and others have shown that the predicted ratings in this case obey a different logistic than the assumed one. Fiddling with the "scale" (setting it to 1) was suggested by Remi, in order to correct for that.

False. Remi never said use scale=1 but said the opposite. Ordo guys used scale=1 by themselves.

Laskos · Post by **Laskos** » Sun Sep 30, 2012 2:45 pm

Daniel Shawul wrote:
Laskos wrote:
Michel wrote:
No, the problem with Bayeselo is that it does not give the correct predictions or predictions which would obey the logistic model used by Bayeselo.
I think this is precisely what the scale parameter corrects. So one should _not_ set scale=1.
Adam and others have shown that the predicted ratings in this case obey a different logistic than the assumed one. Fiddling with the "scale" (setting it to 1) was suggested by Remi, in order to correct for that.
False. Remi never said use scale=1 but said the opposite. Ordo guys used scale=1 by themselves.

Remi said that this 340 instead of 400 can be corrected by a parameter "scale". Then you both state something to the effect that scale is arbitrary or is used to adapt to other ratings, a bit funny to me as predictions go.

Daniel Shawul · Post by **Daniel Shawul** » Sun Sep 30, 2012 2:47 pm

Laskos wrote:
Daniel Shawul wrote: then it would be eloDelta-eloAdv+eloDraw that will match 75%.
Then, if changing sides, it will be - ( eloDelta-eloAdv+eloDraw ) or - ( eloDelta-eloAdv) + eloDraw to give 25%?

The point is one should not expect logistic at all if you have white advantage and draw elo. Logistic is assumed for the summed value not for eloDelta alone. Now people are saying Logisic(-eloDelta) gives the best model because it gives 'results that we expect'. Evert has answered to that very well..

Code: Select all

double WinProbability(double eloDelta) const
  {
		  return Logistic(-eloDelta - eloAdvantage + eloDraw);
  }

  double LossProbability(double eloDelta) const
  {
		  return Logistic(eloDelta + eloAdvantage + eloDraw);
	
  }
 double DrawProbability(double eloDelta) const
  {
   return 1 - WinProbability(eloDelta) - LossProbability(eloDelta);
  }

Infact this is for the winning percentage so adding draws W + D/2 is what is considered as draws. So it is a complicated to tell percentage W+D/2 from elo differences. Scale was added to make people happy who want to see same rating differences as FIDE

Michel · Post by **Michel** » Sun Sep 30, 2012 2:51 pm

Kay:

The BayesElo score model has a different expectation value from the usual logistic curve in the presence of a non-zero drawelo.

The scale parameter scales the BayesElo score model (the x coordinate) so that it matches the usual logistic curve as closely as possible.

Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo