I also created an elo progression by year calculation from CEGT 40/20 games and made a comparison with CCRL.
1. CCRL uses BayesElo while CEGT uses Ordo for elo calculation and the differences are quite remarkable.
2. Following CCRL the elo progress since 2006 is 585 elo while for CEGT the elo progress is 778 elo.
3. CCRL in 2006 starts with an 114 higher elo than CEGT but anno 2021 the CEGT rating 79 elo higher.
4. Browsing through the years we see an almost fixed pattern CEGT ratings scoring higher peaking in 2020 with a difference of 87 elo, the year of the NNUE revolution starting with Stockfish 12, see the red marked year.
5. GM Larry Kaufman hinted this is probably due to BayesElo double counting draws.
6. We found a negative article about BayesElo, we don't have the knowledge to judge, but it's open for discussion .
http://rebel13.nl/misc/stats.html
BayesianElo or Ordo ?
Moderators: hgm, chrisw, Rebel
-
- Posts: 7231
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
BayesianElo or Ordo ?
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 3610
- Joined: Thu Jun 07, 2012 11:02 pm
Re: BayesianElo or Ordo ?
CCRL did reduce its ratings by 100 Elo a few years back because we felt they were too high.
Bayeselo is my preference. Some say that it compresses ratings, I'd turn that around and say that Ordo expands them. I don't think one is better than the other, they both have sound statistical grounding, but they work differently.
Daniel Shawul had this to say about the two last year:
forum3/viewtopic.php?f=7&t=73761&p=8413 ... lo#p841358
Bayeselo is my preference. Some say that it compresses ratings, I'd turn that around and say that Ordo expands them. I don't think one is better than the other, they both have sound statistical grounding, but they work differently.
Daniel Shawul had this to say about the two last year:
forum3/viewtopic.php?f=7&t=73761&p=8413 ... lo#p841358
by Daniel Shawul » Sat Apr 25, 2020 2:08 pm
Why are you using Ordo anyway, clearly it has inferior algorithms than bayeselo that is based on bayesian approach.
Here https://www.remi-coulom.fr/Bayesian-Elo/ Remi remid discusses some of the advantages over prior alogrithm, EloStat.
For the calculation of accurate standard deviations, there is an option to calculate the covariance matrix ( a bit slower) is not the default.
Ordo probably uses a monte carlo sampling of some sort for that, but in bayeselo you find better theory and algorithm.
Bayeselo does have the home field advantage (color) into consideration bud does not take into consideration draw ratio.
It was later extended it to take that into consideration using the Davidson model which turned out to be the best of three other
draw models.
-
- Posts: 6078
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: BayesianElo or Ordo ?
I strongly disagree with the sentence about compressing/expanding. Human chess uses the Elo rating system. It defines a rating difference for a given percentage score. Ordo uses that same Elo system; it will always show ratings for a match that are consistent with the Elo system. It is simply a correct implementation of what EloStat tried to do. EloStat averaged opponents' ratings, which is just wrong; Ordo corrected that. BayesElo is a different rating system, with different assumptions and a smaller spread which depends on draw rates. If we were starting out with no rating system at all, perhaps BayesElo could be argued to be superior mathematically, I don't have an opinion on that, but we have the Elo system used for all human ratings, and using a different system with "elo" in the name doesn't make it comparable to normal Elo. Ordo is 100% true Elo, no "expansion". Compatibility with existing rating systems is much more important than some mathematical arguments as to why one system or the other might be slightly better at predicting results or easier to use for other calculations.Modern Times wrote: ↑Sat Oct 16, 2021 7:09 pm CCRL did reduce its ratings by 100 Elo a few years back because we felt they were too high.
Bayeselo is my preference. Some say that it compresses ratings, I'd turn that around and say that Ordo expands them. I don't think one is better than the other, they both have sound statistical grounding, but they work differently.
Komodo rules!
-
- Posts: 7231
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: BayesianElo or Ordo ?
CCRL 40/15Modern Times wrote: ↑Sat Oct 16, 2021 7:09 pm CCRL did reduce its ratings by 100 Elo a few years back because we felt they were too high.
Bayeselo is my preference. Some say that it compresses ratings, I'd turn that around and say that Ordo expands them. I don't think one is better than the other, they both have sound statistical grounding, but they work differently.
SF12 - 3476
SF11 - 3433
Only +43
CEGT 40/20
SF12 - 3530
SF11 - 3435
+95
How do you explain ?
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 1142
- Joined: Thu Dec 28, 2017 4:06 pm
- Location: Argentina
Re: BayesianElo or Ordo ?
What about Glicko, Glicko-2, Trueskill, etc? I can't even find tools to parse a .pgn file with those.
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
-
- Posts: 3610
- Joined: Thu Jun 07, 2012 11:02 pm
Re: BayesianElo or Ordo ?
Anyone can download the CCRL databases and run them through Ordo, Elostat, or any other ratings algorithm they may prefer.
-
- Posts: 3610
- Joined: Thu Jun 07, 2012 11:02 pm
Re: BayesianElo or Ordo ?
Explain what ? Run the 40/15 database through Ordo and the ratings diff is +52 Elo (rather than +43). Not a lot different. So the ratings tool isn't making much difference. It is what it is.
-
- Posts: 6078
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: BayesianElo or Ordo ?
I believe that the Glicko systems were designed to address the issue of outdated human ratings; human ratings become less reliable with age. This isn't an issue with engines, so Glicko is not relevant for them. I don't know anything about Trueskill.CMCanavessi wrote: ↑Sat Oct 16, 2021 9:06 pm What about Glicko, Glicko-2, Trueskill, etc? I can't even find tools to parse a .pgn file with those.
Komodo rules!
-
- Posts: 476
- Joined: Sun Mar 17, 2019 12:00 pm
- Full name: Henk Drost
Re: BayesianElo or Ordo ?
Bayesianelo is nice when a game has no or few draws.Modern Times wrote: ↑Sat Oct 16, 2021 7:09 pm CCRL did reduce its ratings by 100 Elo a few years back because we felt they were too high.
Bayeselo is my preference. Some say that it compresses ratings, I'd turn that around and say that Ordo expands them. I don't think one is better than the other, they both have sound statistical grounding, but they work differently.
Daniel Shawul had this to say about the two last year:
forum3/viewtopic.php?f=7&t=73761&p=8413 ... lo#p841358
by Daniel Shawul » Sat Apr 25, 2020 2:08 pm
Why are you using Ordo anyway, clearly it has inferior algorithms than bayeselo that is based on bayesian approach.
Here https://www.remi-coulom.fr/Bayesian-Elo/ Remi remid discusses some of the advantages over prior alogrithm, EloStat.
For the calculation of accurate standard deviations, there is an option to calculate the covariance matrix ( a bit slower) is not the default.
Ordo probably uses a monte carlo sampling of some sort for that, but in bayeselo you find better theory and algorithm.
Bayeselo does have the home field advantage (color) into consideration bud does not take into consideration draw ratio.
It was later extended it to take that into consideration using the Davidson model which turned out to be the best of three other
draw models.
Remi has a lot of history inside the computer Go community, and it shows...
Ordo is nicer for (modern computer) chess.
-
- Posts: 7231
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: BayesianElo or Ordo ?
I did as well :Modern Times wrote: ↑Sat Oct 16, 2021 10:30 pmExplain what ? Run the 40/15 database through Ordo and the ratings diff is +52 Elo (rather than +43). Not a lot different. So the ratings tool isn't making much difference. It is what it is.
Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%) CFS(%) W D L D(%) OppAvg
17 Stockfish 12 64-bit : 3521.0 17.9 479.0 738 65 57 234 490 14 66 3405.2
40 Stockfish 11 64-bit : 3433.0 6.3 304.0 506 60 57 119 370 17 73 3379.6
90% of coding is debugging, the other 10% is writing bugs.