Hi,
Did I miss something? Has the Blitz 2'1 list been recalibrated because of the discussion about bayesElo/Ordo and how CCRL computes ratings? It seems many engines in the middle to lower end of the list have dropped by about 120 points. (Recently they suddenly rose by 70 points or so, by removing games that were played at old time controls, but that rise has more than been undone now.)
I don't really mind of course; the list is only a number. I'm just wondering why some engines seem to have dropped, while some others haven't. Edit: some engines in the top part of the list seem to have gained rating. Has the spread between the lowest and highest rating become bigger and divided differently?
I'm a bit confused to what happened.
CCRL rating list: dropped by 120 Elo?
Moderator: Ras
-
lkaufman
- Posts: 6297
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: CCRL rating list: dropped by 120 Elo?
This is exactly what I expected would happen if the proposed parameter changes were implemented. They effectively expand the scale by using a more realistic drawelo parameter value, so that overall the range becomes comparable to normal Elo, although the top will still be compressed while the bottom will be expanded relative to standard Elo. I don't know that they have done this, but it looks that way. If so presumably also done for Rapid ratings? I think it is a desirable change, the numbers overall will be more in line with how elo is calculated for humans.mvanthoor wrote: ↑Thu Jan 18, 2024 8:10 pm Hi,
Did I miss something? Has the Blitz 2'1 list been recalibrated because of the discussion about bayesElo/Ordo and how CCRL computes ratings? It seems many engines in the middle to lower end of the list have dropped by about 120 points. (Recently they suddenly rose by 70 points or so, by removing games that were played at old time controls, but that rise has more than been undone now.)
I don't really mind of course; the list is only a number. I'm just wondering why some engines seem to have dropped, while some others haven't. Edit: some engines in the top part of the list seem to have gained rating. Has the spread between the lowest and highest rating become bigger and divided differently?
I'm a bit confused to what happened.
Komodo rules!
-
mvanthoor
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: CCRL rating list: dropped by 120 Elo?
Thanks. That would explain it. What is a drawelo parameter? AFAIK, normal Elo calculation doesn't take into accounts if points come from draws or wins; in the end, it's all calculated with regard to expected winning chances. (An example would be: scoring 70% of points against an opponent makes you 100 Elo stronger than that opponent.)lkaufman wrote: ↑Thu Jan 18, 2024 9:13 pm This is exactly what I expected would happen if the proposed parameter changes were implemented. They effectively expand the scale by using a more realistic drawelo parameter value, so that overall the range becomes comparable to normal Elo, although the top will still be compressed while the bottom will be expanded relative to standard Elo. I don't know that they have done this, but it looks that way. If so presumably also done for Rapid ratings? I think it is a desirable change, the numbers overall will be more in line with how elo is calculated for humans.
Sorry if I'm asking stupid questions. I'm not well versed in the finer points of calculating Elo values. The only thing I know now, is that I'll have to readjust all my expectations for my current engine version. It would now be around 2050, instead of the expected 2170.
I don't know if reaching 3000 Elo is actually more difficult now. Many engines around the 2900-3000 range have dropped that much, if at all, but the engines in the sub-2400 range have lost 100+ points. My sample isn't big enough, but I almost dare say that the weaker the engine was, the more points it lost. That would mean you now rise faster through the ranks, if the top engine stayed at roughly the same rating. (It does seem Stockfish gained something like 50 points, IIRC.)
-
Graham Banks
- Posts: 45571
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: CCRL rating list: dropped by 120 Elo?
Re: CCRL 40/15, 2m1s and FRC 40/2 lists updated (13-01-2023)
Post by Modern Times » Sun Jan 14, 2024 2:30 pm
All lists have been updated during the course of the week with the Bayeselo parameter "mm 1 1"
"mm 1 1" makes Bayeselo compute White advantage and drawElo (or eloDraw) from the databases themselves and use those values in computing the ratings, instead of the default values in the bayeselo code.
The result is an expansion of the ratings spread from the bottom engine to the top engine. The spread is now similar to Ordo.
Post by Modern Times » Sun Jan 14, 2024 2:30 pm
All lists have been updated during the course of the week with the Bayeselo parameter "mm 1 1"
"mm 1 1" makes Bayeselo compute White advantage and drawElo (or eloDraw) from the databases themselves and use those values in computing the ratings, instead of the default values in the bayeselo code.
The result is an expansion of the ratings spread from the bottom engine to the top engine. The spread is now similar to Ordo.
gbanksnz at gmail.com
-
lkaufman
- Posts: 6297
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: CCRL rating list: dropped by 120 Elo?
The drawelo parameter is a measure of the elo value of draw odds, though I'm not sure of exactly how it is defined/calculated. You are correct that in the Elo system this would have no consequence, two draws = one win and one loss. But with BayesElo this is not true; I remember reading here (in a post by HGM I believe) that with BayesElo, one draw is like one win and one loss, rather than two draws. It's as if draws count double, compared to decisive games, if I understand it properly. There are some claims that this is a sounder/better model for a rating system than Elo, but it is clearly different. At the very high end, where the top engines draw almost all their games, counting draws double will of course compress the ratings, perhaps by a factor approaching 2. This is oversimplified, but I think it's a good way to understand BayesElo. The change they just made doesn't directly address this, it just expands the whole scale to make the total range similar to normal Elo. Based on your comments, the ratings have moved somewhat over 10% away from 3000 or a bit more. Maybe someone can calculate the precise spread (it's probably not quite linear, but perhaps close enough for practical purposes).mvanthoor wrote: ↑Thu Jan 18, 2024 9:55 pmThanks. That would explain it. What is a drawelo parameter? AFAIK, normal Elo calculation doesn't take into accounts if points come from draws or wins; in the end, it's all calculated with regard to expected winning chances. (An example would be: scoring 70% of points against an opponent makes you 100 Elo stronger than that opponent.)lkaufman wrote: ↑Thu Jan 18, 2024 9:13 pm This is exactly what I expected would happen if the proposed parameter changes were implemented. They effectively expand the scale by using a more realistic drawelo parameter value, so that overall the range becomes comparable to normal Elo, although the top will still be compressed while the bottom will be expanded relative to standard Elo. I don't know that they have done this, but it looks that way. If so presumably also done for Rapid ratings? I think it is a desirable change, the numbers overall will be more in line with how elo is calculated for humans.
Sorry if I'm asking stupid questions. I'm not well versed in the finer points of calculating Elo values. The only thing I know now, is that I'll have to readjust all my expectations for my current engine version. It would now be around 2050, instead of the expected 2170.
I don't know if reaching 3000 Elo is actually more difficult now. Many engines around the 2900-3000 range have dropped that much, if at all, but the engines in the sub-2400 range have lost 100+ points. My sample isn't big enough, but I almost dare say that the weaker the engine was, the more points it lost. That would mean you now rise faster through the ranks, if the top engine stayed at roughly the same rating. (It does seem Stockfish gained something like 50 points, IIRC.)
Komodo rules!