CCRL rating list: dropped by 120 Elo?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

CCRL rating list: dropped by 120 Elo?

Post by mvanthoor »

Hi,

Did I miss something? Has the Blitz 2'1 list been recalibrated because of the discussion about bayesElo/Ordo and how CCRL computes ratings? It seems many engines in the middle to lower end of the list have dropped by about 120 points. (Recently they suddenly rose by 70 points or so, by removing games that were played at old time controls, but that rise has more than been undone now.)

I don't really mind of course; the list is only a number. I'm just wondering why some engines seem to have dropped, while some others haven't. Edit: some engines in the top part of the list seem to have gained rating. Has the spread between the lowest and highest rating become bigger and divided differently?

I'm a bit confused to what happened.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
lkaufman
Posts: 6297
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CCRL rating list: dropped by 120 Elo?

Post by lkaufman »

mvanthoor wrote: Thu Jan 18, 2024 8:10 pm Hi,

Did I miss something? Has the Blitz 2'1 list been recalibrated because of the discussion about bayesElo/Ordo and how CCRL computes ratings? It seems many engines in the middle to lower end of the list have dropped by about 120 points. (Recently they suddenly rose by 70 points or so, by removing games that were played at old time controls, but that rise has more than been undone now.)

I don't really mind of course; the list is only a number. I'm just wondering why some engines seem to have dropped, while some others haven't. Edit: some engines in the top part of the list seem to have gained rating. Has the spread between the lowest and highest rating become bigger and divided differently?

I'm a bit confused to what happened.
This is exactly what I expected would happen if the proposed parameter changes were implemented. They effectively expand the scale by using a more realistic drawelo parameter value, so that overall the range becomes comparable to normal Elo, although the top will still be compressed while the bottom will be expanded relative to standard Elo. I don't know that they have done this, but it looks that way. If so presumably also done for Rapid ratings? I think it is a desirable change, the numbers overall will be more in line with how elo is calculated for humans.
Komodo rules!
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: CCRL rating list: dropped by 120 Elo?

Post by mvanthoor »

lkaufman wrote: Thu Jan 18, 2024 9:13 pm This is exactly what I expected would happen if the proposed parameter changes were implemented. They effectively expand the scale by using a more realistic drawelo parameter value, so that overall the range becomes comparable to normal Elo, although the top will still be compressed while the bottom will be expanded relative to standard Elo. I don't know that they have done this, but it looks that way. If so presumably also done for Rapid ratings? I think it is a desirable change, the numbers overall will be more in line with how elo is calculated for humans.
Thanks. That would explain it. What is a drawelo parameter? AFAIK, normal Elo calculation doesn't take into accounts if points come from draws or wins; in the end, it's all calculated with regard to expected winning chances. (An example would be: scoring 70% of points against an opponent makes you 100 Elo stronger than that opponent.)

Sorry if I'm asking stupid questions. I'm not well versed in the finer points of calculating Elo values. The only thing I know now, is that I'll have to readjust all my expectations for my current engine version. It would now be around 2050, instead of the expected 2170.

I don't know if reaching 3000 Elo is actually more difficult now. Many engines around the 2900-3000 range have dropped that much, if at all, but the engines in the sub-2400 range have lost 100+ points. My sample isn't big enough, but I almost dare say that the weaker the engine was, the more points it lost. That would mean you now rise faster through the ranks, if the top engine stayed at roughly the same rating. (It does seem Stockfish gained something like 50 points, IIRC.)
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
Graham Banks
Posts: 45571
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: CCRL rating list: dropped by 120 Elo?

Post by Graham Banks »

Re: CCRL 40/15, 2m1s and FRC 40/2 lists updated (13-01-2023)
Post by Modern Times » Sun Jan 14, 2024 2:30 pm

All lists have been updated during the course of the week with the Bayeselo parameter "mm 1 1"

"mm 1 1" makes Bayeselo compute White advantage and drawElo (or eloDraw) from the databases themselves and use those values in computing the ratings, instead of the default values in the bayeselo code.

The result is an expansion of the ratings spread from the bottom engine to the top engine. The spread is now similar to Ordo.
gbanksnz at gmail.com
lkaufman
Posts: 6297
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CCRL rating list: dropped by 120 Elo?

Post by lkaufman »

mvanthoor wrote: Thu Jan 18, 2024 9:55 pm
lkaufman wrote: Thu Jan 18, 2024 9:13 pm This is exactly what I expected would happen if the proposed parameter changes were implemented. They effectively expand the scale by using a more realistic drawelo parameter value, so that overall the range becomes comparable to normal Elo, although the top will still be compressed while the bottom will be expanded relative to standard Elo. I don't know that they have done this, but it looks that way. If so presumably also done for Rapid ratings? I think it is a desirable change, the numbers overall will be more in line with how elo is calculated for humans.
Thanks. That would explain it. What is a drawelo parameter? AFAIK, normal Elo calculation doesn't take into accounts if points come from draws or wins; in the end, it's all calculated with regard to expected winning chances. (An example would be: scoring 70% of points against an opponent makes you 100 Elo stronger than that opponent.)

Sorry if I'm asking stupid questions. I'm not well versed in the finer points of calculating Elo values. The only thing I know now, is that I'll have to readjust all my expectations for my current engine version. It would now be around 2050, instead of the expected 2170.

I don't know if reaching 3000 Elo is actually more difficult now. Many engines around the 2900-3000 range have dropped that much, if at all, but the engines in the sub-2400 range have lost 100+ points. My sample isn't big enough, but I almost dare say that the weaker the engine was, the more points it lost. That would mean you now rise faster through the ranks, if the top engine stayed at roughly the same rating. (It does seem Stockfish gained something like 50 points, IIRC.)
The drawelo parameter is a measure of the elo value of draw odds, though I'm not sure of exactly how it is defined/calculated. You are correct that in the Elo system this would have no consequence, two draws = one win and one loss. But with BayesElo this is not true; I remember reading here (in a post by HGM I believe) that with BayesElo, one draw is like one win and one loss, rather than two draws. It's as if draws count double, compared to decisive games, if I understand it properly. There are some claims that this is a sounder/better model for a rating system than Elo, but it is clearly different. At the very high end, where the top engines draw almost all their games, counting draws double will of course compress the ratings, perhaps by a factor approaching 2. This is oversimplified, but I think it's a good way to understand BayesElo. The change they just made doesn't directly address this, it just expands the whole scale to make the total range similar to normal Elo. Based on your comments, the ratings have moved somewhat over 10% away from 3000 or a bit more. Maybe someone can calculate the precise spread (it's probably not quite linear, but perhaps close enough for practical purposes).
Komodo rules!