BayesianElo or Ordo ?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, chrisw, Rebel

User avatar
Rebel
Posts: 7231
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

BayesianElo or Ordo ?

Post by Rebel »

I also created an elo progression by year calculation from CEGT 40/20 games and made a comparison with CCRL.

1. CCRL uses BayesElo while CEGT uses Ordo for elo calculation and the differences are quite remarkable.

2. Following CCRL the elo progress since 2006 is 585 elo while for CEGT the elo progress is 778 elo.

3. CCRL in 2006 starts with an 114 higher elo than CEGT but anno 2021 the CEGT rating 79 elo higher.

4. Browsing through the years we see an almost fixed pattern CEGT ratings scoring higher peaking in 2020 with a difference of 87 elo, the year of the NNUE revolution starting with Stockfish 12, see the red marked year.

5. GM Larry Kaufman hinted this is probably due to BayesElo double counting draws.

6. We found a negative article about BayesElo, we don't have the knowledge to judge, but it's open for discussion .

http://rebel13.nl/misc/stats.html
90% of coding is debugging, the other 10% is writing bugs.
Modern Times
Posts: 3610
Joined: Thu Jun 07, 2012 11:02 pm

Re: BayesianElo or Ordo ?

Post by Modern Times »

CCRL did reduce its ratings by 100 Elo a few years back because we felt they were too high.

Bayeselo is my preference. Some say that it compresses ratings, I'd turn that around and say that Ordo expands them. I don't think one is better than the other, they both have sound statistical grounding, but they work differently.

Daniel Shawul had this to say about the two last year:



forum3/viewtopic.php?f=7&t=73761&p=8413 ... lo#p841358

by Daniel Shawul » Sat Apr 25, 2020 2:08 pm

Why are you using Ordo anyway, clearly it has inferior algorithms than bayeselo that is based on bayesian approach.
Here https://www.remi-coulom.fr/Bayesian-Elo/ Remi remid discusses some of the advantages over prior alogrithm, EloStat.
For the calculation of accurate standard deviations, there is an option to calculate the covariance matrix ( a bit slower) is not the default.
Ordo probably uses a monte carlo sampling of some sort for that, but in bayeselo you find better theory and algorithm.

Bayeselo does have the home field advantage (color) into consideration bud does not take into consideration draw ratio.
It was later extended it to take that into consideration using the Davidson model which turned out to be the best of three other
draw models.
lkaufman
Posts: 6078
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: BayesianElo or Ordo ?

Post by lkaufman »

Modern Times wrote: Sat Oct 16, 2021 7:09 pm CCRL did reduce its ratings by 100 Elo a few years back because we felt they were too high.

Bayeselo is my preference. Some say that it compresses ratings, I'd turn that around and say that Ordo expands them. I don't think one is better than the other, they both have sound statistical grounding, but they work differently.
I strongly disagree with the sentence about compressing/expanding. Human chess uses the Elo rating system. It defines a rating difference for a given percentage score. Ordo uses that same Elo system; it will always show ratings for a match that are consistent with the Elo system. It is simply a correct implementation of what EloStat tried to do. EloStat averaged opponents' ratings, which is just wrong; Ordo corrected that. BayesElo is a different rating system, with different assumptions and a smaller spread which depends on draw rates. If we were starting out with no rating system at all, perhaps BayesElo could be argued to be superior mathematically, I don't have an opinion on that, but we have the Elo system used for all human ratings, and using a different system with "elo" in the name doesn't make it comparable to normal Elo. Ordo is 100% true Elo, no "expansion". Compatibility with existing rating systems is much more important than some mathematical arguments as to why one system or the other might be slightly better at predicting results or easier to use for other calculations.
Komodo rules!
User avatar
Rebel
Posts: 7231
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: BayesianElo or Ordo ?

Post by Rebel »

Modern Times wrote: Sat Oct 16, 2021 7:09 pm CCRL did reduce its ratings by 100 Elo a few years back because we felt they were too high.

Bayeselo is my preference. Some say that it compresses ratings, I'd turn that around and say that Ordo expands them. I don't think one is better than the other, they both have sound statistical grounding, but they work differently.
CCRL 40/15
SF12 - 3476
SF11 - 3433
Only +43

CEGT 40/20
SF12 - 3530
SF11 - 3435
+95

How do you explain ?
90% of coding is debugging, the other 10% is writing bugs.
User avatar
CMCanavessi
Posts: 1142
Joined: Thu Dec 28, 2017 4:06 pm
Location: Argentina

Re: BayesianElo or Ordo ?

Post by CMCanavessi »

What about Glicko, Glicko-2, Trueskill, etc? I can't even find tools to parse a .pgn file with those.
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
Modern Times
Posts: 3610
Joined: Thu Jun 07, 2012 11:02 pm

Re: BayesianElo or Ordo ?

Post by Modern Times »

Anyone can download the CCRL databases and run them through Ordo, Elostat, or any other ratings algorithm they may prefer.
Modern Times
Posts: 3610
Joined: Thu Jun 07, 2012 11:02 pm

Re: BayesianElo or Ordo ?

Post by Modern Times »

Rebel wrote: Sat Oct 16, 2021 8:55 pm
CCRL 40/15
SF12 - 3476
SF11 - 3433
Only +43

CEGT 40/20
SF12 - 3530
SF11 - 3435
+95

How do you explain ?
Explain what ? Run the 40/15 database through Ordo and the ratings diff is +52 Elo (rather than +43). Not a lot different. So the ratings tool isn't making much difference. It is what it is.
lkaufman
Posts: 6078
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: BayesianElo or Ordo ?

Post by lkaufman »

CMCanavessi wrote: Sat Oct 16, 2021 9:06 pm What about Glicko, Glicko-2, Trueskill, etc? I can't even find tools to parse a .pgn file with those.
I believe that the Glicko systems were designed to address the issue of outdated human ratings; human ratings become less reliable with age. This isn't an issue with engines, so Glicko is not relevant for them. I don't know anything about Trueskill.
Komodo rules!
Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: BayesianElo or Ordo ?

Post by Raphexon »

Modern Times wrote: Sat Oct 16, 2021 7:09 pm CCRL did reduce its ratings by 100 Elo a few years back because we felt they were too high.

Bayeselo is my preference. Some say that it compresses ratings, I'd turn that around and say that Ordo expands them. I don't think one is better than the other, they both have sound statistical grounding, but they work differently.

Daniel Shawul had this to say about the two last year:



forum3/viewtopic.php?f=7&t=73761&p=8413 ... lo#p841358

by Daniel Shawul » Sat Apr 25, 2020 2:08 pm

Why are you using Ordo anyway, clearly it has inferior algorithms than bayeselo that is based on bayesian approach.
Here https://www.remi-coulom.fr/Bayesian-Elo/ Remi remid discusses some of the advantages over prior alogrithm, EloStat.
For the calculation of accurate standard deviations, there is an option to calculate the covariance matrix ( a bit slower) is not the default.
Ordo probably uses a monte carlo sampling of some sort for that, but in bayeselo you find better theory and algorithm.

Bayeselo does have the home field advantage (color) into consideration bud does not take into consideration draw ratio.
It was later extended it to take that into consideration using the Davidson model which turned out to be the best of three other
draw models.
Bayesianelo is nice when a game has no or few draws.
Remi has a lot of history inside the computer Go community, and it shows...

Ordo is nicer for (modern computer) chess.
User avatar
Rebel
Posts: 7231
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: BayesianElo or Ordo ?

Post by Rebel »

Modern Times wrote: Sat Oct 16, 2021 10:30 pm
Rebel wrote: Sat Oct 16, 2021 8:55 pm
CCRL 40/15
SF12 - 3476
SF11 - 3433
Only +43

CEGT 40/20
SF12 - 3530
SF11 - 3435
+95

How do you explain ?
Explain what ? Run the 40/15 database through Ordo and the ratings diff is +52 Elo (rather than +43). Not a lot different. So the ratings tool isn't making much difference. It is what it is.
I did as well :

Code: Select all

   # PLAYER                                     :  RATING  ERROR  POINTS  PLAYED   (%)  CFS(%)    W     D    L  D(%)  OppAvg
  17 Stockfish 12 64-bit                        :  3521.0   17.9   479.0     738    65      57  234   490   14    66  3405.2
  40 Stockfish 11 64-bit                        :  3433.0    6.3   304.0     506    60      57  119   370   17    73  3379.6
3251 - 3433 = + 88
90% of coding is debugging, the other 10% is writing bugs.