Lichess Blitz rating to FIDE rating

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

lkaufman
Posts: 6287
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Lichess Blitz rating to FIDE rating

Post by lkaufman »

Ferdy wrote: Fri Dec 24, 2021 10:09 am
lkaufman wrote: Fri Dec 24, 2021 1:33 am
carldaman wrote: Fri Dec 24, 2021 1:01 am
MonteCarlo wrote: Thu Dec 23, 2021 11:52 pm Per https://talkchess.com/forum3/viewtopic.php?f=2&t=73564, a maybe-conceptually-inspired-by-Rodent private engine.

Not likely to be a derivative of SF, but not a lot of detail to go on.

I wouldn't read too much into its lichess rating. Just an artefact of its opponents. It's +1839 =11 -11 in blitz, with at least one of the losses coming against a TOS-violating opponent.

If it played the same people Carlsen played, it would be much higher rated.

Cheers!
Nezh is a private engine, and I can't add much more than what's been stated, since I'm only its tester. It predates NN/NNUE and only uses HCE.

As I mentioned before, its rating would continue to climb if it kept playing more and more decent strength humans.
That has been the undeniable trend and 2900 Elo on lichess does not reflect its ultimate potential, which is probably to be on par or slightly better than the human world champ. That's what its strength projects to be based on private tests against other engines, which point to somewhere around 2800-2900 CCRL.
OK, now we're getting somewhere. Let's call it 2850 CCRL based on your statement. Is there a way to determine its score only against titled players (excluding TOS violators if any) and their mean LiChess blitz rating? Then we can compute an approximate performance rating for it and see the difference from the CCRL estimate. Perhaps it won't be much more than the 206 elo calculated for MiniHuman.
Here is the data of its opponents.

Code: Select all

                            site               oppname pgntitle  pgnrating  TOS  blitzrating  score  Nezh-BOT_score
0   https://lichess.org/oIYFQfAs          dmitrijiiiGM       CM       2657    0         2727    0.0             1.0
1   https://lichess.org/HMPDu49k          dmitrijiiiGM       CM       2656    0         2727    0.5             0.5
2   https://lichess.org/Bw0A7t8l                CXCX83       GM       2653    0         2708    0.0             1.0
3   https://lichess.org/phaor9Ki                CXCX83       GM       2654    0         2708    0.0             1.0
4   https://lichess.org/NgzHhi7C                CXCX83       GM       2655    0         2708    0.0             1.0
5   https://lichess.org/JTH4qCHh                CXCX83       GM       2656    0         2708    0.0             1.0
6   https://lichess.org/J1vRvjKi               DanPach       FM       2488    0         2592    0.0             1.0
7   https://lichess.org/jLOpgANG               DanPach       FM       2488    0         2592    0.0             1.0
8   https://lichess.org/cPEEwdDR               DanPach       FM       2489    0         2592    0.0             1.0
9   https://lichess.org/3ILuIVsH               DanPach       FM       2489    0         2592    0.0             1.0
10  https://lichess.org/6pNUtbRz               DanPach       FM       2490    0         2592    0.0             1.0
11  https://lichess.org/CDZ1vD3w             JuleVerne       FM       2610    0         2620    0.0             1.0
12  https://lichess.org/VKSXZcFl  HorrendousBrilliancy       GM       2635    0         2929    0.0             1.0
13  https://lichess.org/FDZhQKDZ             justantan       GM       2642    0         2850    0.0             1.0
14  https://lichess.org/KuWdI0F9      juancruzariasTDF       CM       2530    0         2707    0.0             1.0
15  https://lichess.org/OwWgh9ie           Tahaned2015       IM       2386    0         2607    0.0             1.0
16  https://lichess.org/WUewJm1l           jeffforever       FM       2474    0         2617    0.0             1.0
17  https://lichess.org/KvBwiY2T                gmluke       GM       2677    0         2737    0.0             1.0
18  https://lichess.org/CLzKn17H             rickyrich       NM       2339    0         2313    0.0             1.0
19  https://lichess.org/eSiPhCGg             rickyrich       NM       2295    0         2313    0.0             1.0
pgnrating column is the lichess blitz rating of the opp of Nezh-BOT at the time of encounter.
blitzrating column is the current lichess blitz rating of the opp of Nezh-BOT.

I tried to calculate its perf using the pgnrating column as this is their actual encounter. Get the mean convert it to FIDE rating. Then take its score rate. Then use the FIDE table given score rate to get the rating difference. Calculate the FIDE perf by opp_rating + rating diff, then convert it back to lichess blitz rating to get its blitz rating based from those titled opponents.

Code: Select all

Nezh-BOT opp mean lichess blitz rating: 2548 (using pgnrating column)
Nezh-BOT score rate: 0.975

FIDE mean rating conversion: 2308 from 2548 (using fide=181 + 0.83458xlichess)
FIDE rating table 8.1a, 0.975 is around rating diff = 538

Code: Select all

Nezh-BOT FIDE perf rating = 2308 + 538 = 2846
Nezh-BOT Lichess Blitz rating conversion: 3193 (using lichess = (fide - 181) / 0.83458)

Nezh-BOT Lichess blitz rating is around 3193, based from titled, non-bot and non-tos violator opponents.
Nice! So the estimated FIDE rating of 2846 is almost exactly the middle of the 2800 to 2900 estimated CCRL rating, although of course with only one draw given up the margin of error is huge. Similarly the estimated FIDE rating for MiniHuman based on your formula is close to the CCRL blitz rating we measured for it (now 2233). It seems that CCRL blitz ratings are not so far off from FIDE ratings for blitz games between humans and engines. Do you know at what time control these blitz games were played on LiChess (could be anywhere from 3-0 to 5-3 I think)? Also, are there any engines on your list with Lichess blitz ratings that are standard A/B engines not using nets which have ratings based solely (or almost solely) on playing humans and many opponents within a hundred elo or so?
Komodo rules!
lkaufman
Posts: 6287
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Lichess Blitz rating to FIDE rating

Post by lkaufman »

lkaufman wrote: Fri Dec 24, 2021 12:36 am
lkaufman wrote: Wed Dec 22, 2021 11:45 pm
Ferdy wrote: Wed Dec 22, 2021 7:48 am
Ferdy wrote: Wed Dec 22, 2021 2:57 am
lkaufman wrote: Tue Dec 21, 2021 5:32 pm This is interesting and quite reasonable; the roughly 6 to 5 spread in ratings from FIDE to Lichess blitz is about what I would expect due to the smaller percentage of draws in blitz. Is there any data that would allow you to compare CCRL blitz ratings with Lichess blitz ratings? Due to the compression of CCRL ratings by BayesElo and the expansion of LiChess blitz ratings, I would expect that the relationship would be fairly close to linear.
There are maia bots in lichess like maia1 has around 1478 blitz. It is just lc0 + maia nn called maia-1100.pb.gz. This can be run with nodes=1 limit, you may use the older versions like Lc0 v0.26. Better to use CPU version to generate more games in shorter time if cpu has more threads.

maia9 has around 1770 blitz using maia-1900.pb.gz nn. This can be easily compared with CCRL engines around this strength levels.

There are other lichess bots but reproducing it for CCRL on our PC can be a challenge. maia is easier. I think the authors of those bots can run those under CCRL settings.

There is minihuman by dkappe with a higher blitz rating. Reading the description I think this can be easily adapted for CCRL. It has a limit of around 2500 nodes.
Mean Girl 8 (32x4) -- the most fun leela-style network -- looking at ~2500 nodes on a Raspberry Pi 3. Will play casual and rated Blitz, Rapid and Classical with at least 3 sec increment in both standard and chess960. Will move almost instantly. Designed to be a reasonable sparring partner. Uses a gambit book for extra entertainment value.
Profile of MiniHuman BOT. It has only played a single BOT in blitz rated game with 7 games. There are 24 opp players who violates Lichess TOS. Lichess has a mechanism to refund rating lost if opp violates TOS. So rating is not affected in this case.

TOS = Terms of Service.

Code: Select all

profile of MiniHuman:
   userid  bgames  blitzrating  tos title
MiniHuman    1297         2433    0   BOT

Code: Select all

number of unique opp names: 141

unique opp names who violates TOS: 24

unique opp names who do not violates TOS: 117

Code: Select all

name of bot opp:
    userid  bgames  blitzrating  tos title
Leela1Node     459         2143    0   BOT

number of games played when opp is a bot: 7
opp with rating 2000 and above that are not TOS violators:

Code: Select all

                 userid  bgames  blitzrating  tos     title
0   Kudritsky_Maksim_04    9516         2523    0  no_title
1               itay121    2692         2467    0  no_title
2            Maharlikan     582         2344    0  no_title
3          DOLPHIN_2012    3251         2332    0  no_title
4            m_kastriot    7733         2294    0  no_title
5             Thxultra2   10244         2291    0  no_title
6            schpringer    8563         2280    0  no_title
7            Karagialis    4374         2261    0  no_title
8             Michalsos    5501         2259    0  no_title
9          Wildcard1659    5152         2249    0  no_title
10       the_providence     202         2244    0  no_title
11             skoyen99     805         2243    0        FM
12         Dingdongking    1858         2231    0  no_title
13       completemagnet     301         2226    0  no_title
14        Mai-San_skida      21         2207    0  no_title
15         DeltaZero_99    3201         2204    0  no_title
16           EarlyLight      64         2204    0  no_title
17             kapibarr    7075         2176    0  no_title
18              rutvik3    1187         2176    0  no_title
19           Leela1Node     459         2143    0       BOT
20          never__quit    9388         2120    0  no_title
21          Soni_Atharv    3375         2119    0  no_title
22       MichaelLambert    4017         2109    0  no_title
23             petermac    5861         2107    0  no_title
24           PiecePeace    1451         2103    0  no_title
25      MasterofUnknown       3         2072    0  no_title
26        checkmatetrix    1513         2060    0  no_title
27              Nahia12    1521         2054    0  no_title
28          DanTheMan82    1216         2053    0  no_title
29              Plaskad     112         2035    0  no_title
30               Speed1   28916         2028    0  no_title
31              Vkiller    2401         2026    0  no_title
32             JanHudak    1631         2025    0  no_title
33        Quick_chess65     380         2024    0  no_title
34          sirprimal11    6300         2021    0  no_title
35                Noyar     827         2016    0  no_title
36             Creignor     367         2004    0  no_title
The ratings in the list are the current ratings, the ratings in the actual game may not be the same.

So I guess this is a good candidate to connect to CCRL Blitz, close to this level of play (Lichess Blitz 2400).
Mean Girl net is at https://github.com/dkappe/leela-chess-w ... -style-net. Then get Lc0 to run it at around 2500 nodes per move according to its profile in Lichess. I am not sure which Lc0 version to use, maybe from year 2019 or 2020.
We're testing it now vs ccrl engines at ccrl blitz tc. dkappe says the correct number of nodes is 3000, not 2500.
So far MiniHuman is at 2227 under CCRL Blitz conditions, just 206 elo below its lichess blitz rating. This is rather surprising to me, since lichess blitz ratings are a class or so above FIDE ratings at this level; in fact 2433 lichess converts to 2212 FIDE by the formula given in this thread. Pretty much a perfect match between FIDE and CCRL blitz for this one engine, allowing for MOE. So assuming that there aren't a lot of cheaters who avoided detection in this sample, the engines may be weaker in blitz than I thought they were around this level; perhaps I overestimated the "contraction" factor going from engine ratings to human ratings? Maybe the relationship between ccrl and lichess is not so far from linear. I think we need some data around the 1500 level or so.
MiniHuman ended up at 2233 under CCRL Blitz conditions, exactly 200 below its lichess blitz rating. However I overlooked one point; looking at the LiChess games, it seems that most of the rated games against strong human opponents have been "slow blitz", with 5' + 3" being very common and perhaps close to average (some games 3' + 3", but some 5' + 4" or 3' + 7" and such). But a bot with fixed number of nodes won't play any better at 5' + 3" than it would at 2' + 1". I suppose the CCRL Blitz rating at 5' + 3" would be about a hundred elo lower. So perhaps we should say that CCRL + 300 is a fair estimate for the lichess blitz rating needed by a human to have an even chance at 5' + 3" blitz, at least in the 2100 CCRL ballpark. Even that seems a bit hard for me to believe, I would think ccrl + 400 seems more realistic, but this is just one datapoint. I think it would be better for this purpose to use a normal engine (not one with a net) that uses the allotted time; any candidates with enough human data?
Komodo rules!
Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: Lichess Blitz rating to FIDE rating

Post by Raphexon »

lkaufman wrote: Fri Dec 24, 2021 8:14 pm
lkaufman wrote: Fri Dec 24, 2021 12:36 am
lkaufman wrote: Wed Dec 22, 2021 11:45 pm
Ferdy wrote: Wed Dec 22, 2021 7:48 am
Ferdy wrote: Wed Dec 22, 2021 2:57 am
lkaufman wrote: Tue Dec 21, 2021 5:32 pm This is interesting and quite reasonable; the roughly 6 to 5 spread in ratings from FIDE to Lichess blitz is about what I would expect due to the smaller percentage of draws in blitz. Is there any data that would allow you to compare CCRL blitz ratings with Lichess blitz ratings? Due to the compression of CCRL ratings by BayesElo and the expansion of LiChess blitz ratings, I would expect that the relationship would be fairly close to linear.
There are maia bots in lichess like maia1 has around 1478 blitz. It is just lc0 + maia nn called maia-1100.pb.gz. This can be run with nodes=1 limit, you may use the older versions like Lc0 v0.26. Better to use CPU version to generate more games in shorter time if cpu has more threads.

maia9 has around 1770 blitz using maia-1900.pb.gz nn. This can be easily compared with CCRL engines around this strength levels.

There are other lichess bots but reproducing it for CCRL on our PC can be a challenge. maia is easier. I think the authors of those bots can run those under CCRL settings.

There is minihuman by dkappe with a higher blitz rating. Reading the description I think this can be easily adapted for CCRL. It has a limit of around 2500 nodes.
Mean Girl 8 (32x4) -- the most fun leela-style network -- looking at ~2500 nodes on a Raspberry Pi 3. Will play casual and rated Blitz, Rapid and Classical with at least 3 sec increment in both standard and chess960. Will move almost instantly. Designed to be a reasonable sparring partner. Uses a gambit book for extra entertainment value.
Profile of MiniHuman BOT. It has only played a single BOT in blitz rated game with 7 games. There are 24 opp players who violates Lichess TOS. Lichess has a mechanism to refund rating lost if opp violates TOS. So rating is not affected in this case.

TOS = Terms of Service.

Code: Select all

profile of MiniHuman:
   userid  bgames  blitzrating  tos title
MiniHuman    1297         2433    0   BOT

Code: Select all

number of unique opp names: 141

unique opp names who violates TOS: 24

unique opp names who do not violates TOS: 117

Code: Select all

name of bot opp:
    userid  bgames  blitzrating  tos title
Leela1Node     459         2143    0   BOT

number of games played when opp is a bot: 7
opp with rating 2000 and above that are not TOS violators:

Code: Select all

                 userid  bgames  blitzrating  tos     title
0   Kudritsky_Maksim_04    9516         2523    0  no_title
1               itay121    2692         2467    0  no_title
2            Maharlikan     582         2344    0  no_title
3          DOLPHIN_2012    3251         2332    0  no_title
4            m_kastriot    7733         2294    0  no_title
5             Thxultra2   10244         2291    0  no_title
6            schpringer    8563         2280    0  no_title
7            Karagialis    4374         2261    0  no_title
8             Michalsos    5501         2259    0  no_title
9          Wildcard1659    5152         2249    0  no_title
10       the_providence     202         2244    0  no_title
11             skoyen99     805         2243    0        FM
12         Dingdongking    1858         2231    0  no_title
13       completemagnet     301         2226    0  no_title
14        Mai-San_skida      21         2207    0  no_title
15         DeltaZero_99    3201         2204    0  no_title
16           EarlyLight      64         2204    0  no_title
17             kapibarr    7075         2176    0  no_title
18              rutvik3    1187         2176    0  no_title
19           Leela1Node     459         2143    0       BOT
20          never__quit    9388         2120    0  no_title
21          Soni_Atharv    3375         2119    0  no_title
22       MichaelLambert    4017         2109    0  no_title
23             petermac    5861         2107    0  no_title
24           PiecePeace    1451         2103    0  no_title
25      MasterofUnknown       3         2072    0  no_title
26        checkmatetrix    1513         2060    0  no_title
27              Nahia12    1521         2054    0  no_title
28          DanTheMan82    1216         2053    0  no_title
29              Plaskad     112         2035    0  no_title
30               Speed1   28916         2028    0  no_title
31              Vkiller    2401         2026    0  no_title
32             JanHudak    1631         2025    0  no_title
33        Quick_chess65     380         2024    0  no_title
34          sirprimal11    6300         2021    0  no_title
35                Noyar     827         2016    0  no_title
36             Creignor     367         2004    0  no_title
The ratings in the list are the current ratings, the ratings in the actual game may not be the same.

So I guess this is a good candidate to connect to CCRL Blitz, close to this level of play (Lichess Blitz 2400).
Mean Girl net is at https://github.com/dkappe/leela-chess-w ... -style-net. Then get Lc0 to run it at around 2500 nodes per move according to its profile in Lichess. I am not sure which Lc0 version to use, maybe from year 2019 or 2020.
We're testing it now vs ccrl engines at ccrl blitz tc. dkappe says the correct number of nodes is 3000, not 2500.
So far MiniHuman is at 2227 under CCRL Blitz conditions, just 206 elo below its lichess blitz rating. This is rather surprising to me, since lichess blitz ratings are a class or so above FIDE ratings at this level; in fact 2433 lichess converts to 2212 FIDE by the formula given in this thread. Pretty much a perfect match between FIDE and CCRL blitz for this one engine, allowing for MOE. So assuming that there aren't a lot of cheaters who avoided detection in this sample, the engines may be weaker in blitz than I thought they were around this level; perhaps I overestimated the "contraction" factor going from engine ratings to human ratings? Maybe the relationship between ccrl and lichess is not so far from linear. I think we need some data around the 1500 level or so.
MiniHuman ended up at 2233 under CCRL Blitz conditions, exactly 200 below its lichess blitz rating. However I overlooked one point; looking at the LiChess games, it seems that most of the rated games against strong human opponents have been "slow blitz", with 5' + 3" being very common and perhaps close to average (some games 3' + 3", but some 5' + 4" or 3' + 7" and such). But a bot with fixed number of nodes won't play any better at 5' + 3" than it would at 2' + 1". I suppose the CCRL Blitz rating at 5' + 3" would be about a hundred elo lower. So perhaps we should say that CCRL + 300 is a fair estimate for the lichess blitz rating needed by a human to have an even chance at 5' + 3" blitz, at least in the 2100 CCRL ballpark. Even that seems a bit hard for me to believe, I would think ccrl + 400 seems more realistic, but this is just one datapoint. I think it would be better for this purpose to use a normal engine (not one with a net) that uses the allotted time; any candidates with enough human data?
simpleEval probably has the most amount of human data, even if it's primarily at bullet. (And a few cheaters here and there)
My 2nd post also has a nice strong of titled players, large sample size and games longer than bullet.

https://lichess.org/V9jNeEMG
27-1 for Simple vs a GM. Though some games are hyperbullet (30seconds).

https://lichess.org/VDu6zbRB/black
143-5 vs an IM. But a lot / most at hyperbullet.

https://lichess.org/N4Rikz1R
Another strong player, but a lot at hyperbullet again.

https://lichess.org/@/simpleEval
I know it uses a very shallow (and weak) opening book to add variety, but no idea which one specifically. Runs on a Pi4B.

Very "fun" opponent. You're going to get an advantage which you'll likely blunder.
Its positional understanding is horrible, its tactics are bonkers.
Last edited by Raphexon on Fri Dec 24, 2021 8:57 pm, edited 1 time in total.
Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: Lichess Blitz rating to FIDE rating

Post by Raphexon »

Some more results:

https://lichess.org/EbBJ0Iux/black

https://lichess.org/tc50EqKV/black (results suspicious imo)

https://lichess.org/0vxkgPmi/black
A NM who fought a lot of rapid games vs simple. 32.5-2.5 for Simple.

https://lichess.org/S8I7N5qX/black
Another huge sample size of bullet games vs a strongish player

https://lichess.org/mNKL56St/black
2 games at 3 min vs an IM

https://lichess.org/uH5t06ee
18-0 at rapid (10m+5s) vs a strongish player.

https://lichess.org/4rCsfcV2/black
Nice sample size at longer TC vs an ok player.

https://lichess.org/xvPszRqH/black
Another good sample size vs a strong player.

https://lichess.org/oWM0NHDj
Strong blitz player. 9.5-0.5 for Simple.

https://lichess.org/GcsrPB8X/black
Another GM getting crushed by Simple. All games at bullet (1m+0) though.

https://lichess.org/PxtepTYR/black
A few games at 3m+2s vs an FM

https://lichess.org/GAI2Vwkb/black
2 games at 3m+2s from a NM

https://lichess.org/tN2sQnuK/black
Strong player at blitz, large sample at 3m+0.

https://lichess.org/A5hcQopF
Simple going 14-0 at 3m+0 vs a NM


So a lot of games. Includes titled players. From Bullet to Rapid.
Cornfed
Posts: 511
Joined: Sun Apr 26, 2020 11:40 pm
Full name: Brian D. Smith

Re: Lichess Blitz rating to FIDE rating

Post by Cornfed »

amanjpro wrote: Wed Dec 22, 2021 4:11 am Lichess bots are highly underrated. Thanks to all the SF bots that are running on the site and hammer the self-made bots.

But looking at Zahak for example even when it was around CCRL 2100 and running on a raspberry pi 3 (32 bit) it still could beat all human players that it has played.
I am guessing Larry is trying to pull in a lot of data points to try to back into reasonable approximations for human elo with Dragon?

This would seem perhaps a bit time consuming, but I wonder he could find a good approximation of his current strength in various time controls and play x amounts of games against Dragon under a few time controls and use that for approximation...then attempt to extrapolate by lowering Dragon a couple of hundred points, playing the same number of games and seeing if the results fit somewhat.

Or simply find one weaker engine most agree has a pretty accurate rating under a couple of TC's and how it already corresponds to human elo, let Dragon play against it and adjust based on the result?

At least those are 'controlled experiments' as opposed to all the issues with trying to pull data from websites where...frankly, people don't always play close to their optimal (lack of sleep, playing against 'students', games to pass the time while waiting for dinner to cook, etc)...differences often exacerbated by playing against engines. Engines don't normally miss simple tactics while human vs engine (or human) play would be littered with them so I would think that would be where most of the volatility would come into play...seems like that would be hard to program for.
lkaufman
Posts: 6287
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Lichess Blitz rating to FIDE rating

Post by lkaufman »

Raphexon wrote: Fri Dec 24, 2021 8:40 pm
https://lichess.org/@/simpleEval
I know it uses a very shallow (and weak) opening book to add variety, but no idea which one specifically. Runs on a Pi4B.

Very "fun" opponent. You're going to get an advantage which you'll likely blunder.
Its positional understanding is horrible, its tactics are bonkers.
Since I'm unfamiliar with "simple" and also unfamiliar with the "Pi4b", can you point to any data that would estimate its rating on that hardware when playing against engines with known CCRL or CEGT ratings? I'm looking for an engine with both a reliable engine vs engine rating and a reliable engine vs. humans LIchess rating.
Komodo rules!
lkaufman
Posts: 6287
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Lichess Blitz rating to FIDE rating

Post by lkaufman »

Cornfed wrote: Fri Dec 24, 2021 9:23 pm
amanjpro wrote: Wed Dec 22, 2021 4:11 am Lichess bots are highly underrated. Thanks to all the SF bots that are running on the site and hammer the self-made bots.

But looking at Zahak for example even when it was around CCRL 2100 and running on a raspberry pi 3 (32 bit) it still could beat all human players that it has played.
I am guessing Larry is trying to pull in a lot of data points to try to back into reasonable approximations for human elo with Dragon?

This would seem perhaps a bit time consuming, but I wonder he could find a good approximation of his current strength in various time controls and play x amounts of games against Dragon under a few time controls and use that for approximation...then attempt to extrapolate by lowering Dragon a couple of hundred points, playing the same number of games and seeing if the results fit somewhat.

Or simply find one weaker engine most agree has a pretty accurate rating under a couple of TC's and how it already corresponds to human elo, let Dragon play against it and adjust based on the result?

At least those are 'controlled experiments' as opposed to all the issues with trying to pull data from websites where...frankly, people don't always play close to their optimal (lack of sleep, playing against 'students', games to pass the time while waiting for dinner to cook, etc)...differences often exacerbated by playing against engines. Engines don't normally miss simple tactics while human vs engine (or human) play would be littered with them so I would think that would be where most of the volatility would come into play...seems like that would be hard to program for.
Yes, you understand my goal. Yes, I do try to estimate dragon level ratings by playing them myself, but I don't want to rely too much on just my personal experience. I have a 2300 blitz rating on Lichess after only 25 games, perhaps it will rise with more experience there. I totally like your quote "Or simply find one weaker engine most agree has a pretty accurate rating under a couple of TC's and how it already corresponds to human elo, let Dragon play against it and adjust based on the result?". But so far no one has named such an engine. Do you have any suggestions?
Komodo rules!
Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: Lichess Blitz rating to FIDE rating

Post by Raphexon »

lkaufman wrote: Fri Dec 24, 2021 9:58 pm
Raphexon wrote: Fri Dec 24, 2021 8:40 pm
https://lichess.org/@/simpleEval
I know it uses a very shallow (and weak) opening book to add variety, but no idea which one specifically. Runs on a Pi4B.

Very "fun" opponent. You're going to get an advantage which you'll likely blunder.
Its positional understanding is horrible, its tactics are bonkers.
Since I'm unfamiliar with "simple" and also unfamiliar with the "Pi4b", can you point to any data that would estimate its rating on that hardware when playing against engines with known CCRL or CEGT ratings? I'm looking for an engine with both a reliable engine vs engine rating and a reliable engine vs. humans LIchess rating.
simpleEval only uses material evaluation. (P=1, N=3, B=3, R=5, Q=9)
And a small random component (which is very important)
Besides that it's mostly Stockfish. You can play it yourself, you will get an advantage. You'll likely not convert.
https://github.com/vondele/Stockfish/tree/simpleEval

Pi4B is the hardware it's running on on Lichess. A Raspberry Pi4B.
I think per core it's about 4x as slow as the i7-4770K CCRL uses as a baseline.

From my testing it's about 50-100 elo weaker than Glaurung 2.2. Also tested it vs Cheese with similar results.
I'd say In the ballpark of 2700 CCRL 40/4.

Taking into account the slower hardware it's using on Lichess I'd expect it to be roughly 2500 CCRL 40/4.
lkaufman
Posts: 6287
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Lichess Blitz rating to FIDE rating

Post by lkaufman »

Raphexon wrote: Fri Dec 24, 2021 10:15 pm
lkaufman wrote: Fri Dec 24, 2021 9:58 pm
Raphexon wrote: Fri Dec 24, 2021 8:40 pm
https://lichess.org/@/simpleEval
I know it uses a very shallow (and weak) opening book to add variety, but no idea which one specifically. Runs on a Pi4B.

Very "fun" opponent. You're going to get an advantage which you'll likely blunder.
Its positional understanding is horrible, its tactics are bonkers.
Since I'm unfamiliar with "simple" and also unfamiliar with the "Pi4b", can you point to any data that would estimate its rating on that hardware when playing against engines with known CCRL or CEGT ratings? I'm looking for an engine with both a reliable engine vs engine rating and a reliable engine vs. humans LIchess rating.
simpleEval only uses material evaluation. (P=1, N=3, B=3, R=5, Q=9)
And a small random component (which is very important)
Besides that it's mostly Stockfish. You can play it yourself, you will get an advantage. You'll likely not convert.
https://github.com/vondele/Stockfish/tree/simpleEval

Pi4B is the hardware it's running on on Lichess. A Raspberry Pi4B.
I think per core it's about 4x as slow as the i7-4770K CCRL uses as a baseline.

From my testing it's about 50-100 elo weaker than Glaurung 2.2. Also tested it vs Cheese with similar results.
I'd say In the ballpark of 2700 CCRL 40/4.

Taking into account the slower hardware it's using on Lichess I'd expect it to be roughly 2500 CCRL 40/4.
The results on lichess suggest a blitz rating there over 3000 and a rapid rating not far below 3000. These are higher numbers than I would expect for ccrl 2500 engine. It seems that chess knowledge helps more vs engines than vs humans, which seems counterintuitive.
Komodo rules!
Cornfed
Posts: 511
Joined: Sun Apr 26, 2020 11:40 pm
Full name: Brian D. Smith

Re: Lichess Blitz rating to FIDE rating

Post by Cornfed »

lkaufman wrote: Fri Dec 24, 2021 10:04 pm
Cornfed wrote: Fri Dec 24, 2021 9:23 pm
amanjpro wrote: Wed Dec 22, 2021 4:11 am Lichess bots are highly underrated. Thanks to all the SF bots that are running on the site and hammer the self-made bots.

But looking at Zahak for example even when it was around CCRL 2100 and running on a raspberry pi 3 (32 bit) it still could beat all human players that it has played.
I am guessing Larry is trying to pull in a lot of data points to try to back into reasonable approximations for human elo with Dragon?

This would seem perhaps a bit time consuming, but I wonder he could find a good approximation of his current strength in various time controls and play x amounts of games against Dragon under a few time controls and use that for approximation...then attempt to extrapolate by lowering Dragon a couple of hundred points, playing the same number of games and seeing if the results fit somewhat.

Or simply find one weaker engine most agree has a pretty accurate rating under a couple of TC's and how it already corresponds to human elo, let Dragon play against it and adjust based on the result?

At least those are 'controlled experiments' as opposed to all the issues with trying to pull data from websites where...frankly, people don't always play close to their optimal (lack of sleep, playing against 'students', games to pass the time while waiting for dinner to cook, etc)...differences often exacerbated by playing against engines. Engines don't normally miss simple tactics while human vs engine (or human) play would be littered with them so I would think that would be where most of the volatility would come into play...seems like that would be hard to program for.
Yes, you understand my goal. Yes, I do try to estimate dragon level ratings by playing them myself, but I don't want to rely too much on just my personal experience. I have a 2300 blitz rating on Lichess after only 25 games, perhaps it will rise with more experience there. I totally like your quote "Or simply find one weaker engine most agree has a pretty accurate rating under a couple of TC's and how it already corresponds to human elo, let Dragon play against it and adjust based on the result?". But so far no one has named such an engine. Do you have any suggestions?
No, but...

It's almost Christmas and my mind is elsewhere, but doesn’t your answers pretty much depend on 2 things:

1. A single established correlation between some version of Dragon and a good human player and
2. Your ability to adjust Dragons playing strength with suitable tweaks?

Example: establish a series of tweaks so that you have a version of Dragon which scores roughly 50% against a 2500 elo human. A baseline. The data is already out there for the expected percentage your average 2500 human is likely to score in standard OTB chess against your average human rated 2700, 2400, 2200, etc.

You then run a series of matches between this ‘2500’ elo version of Dragon and various other tweaked versions of Dragon. Then you comb through the results to find versions which score at roughly the same rate against your 2500 Dragon as various human elo would against your average 2500 human.