CCRL scaling versus human player

xr_a_y · Post by **xr_a_y** » Thu Jun 20, 2019 9:52 am

I guess this is already a well documented subject but let me ask it my way.

I wonder if some kind (very) good chess player would accept to play some games against middle range engines (2200-2700 CCRL elo rating) at CCRL 40/4 TC in order to rescale CCRL rating to FIDE rating.

pedrox · Post by **pedrox** » Thu Jun 20, 2019 11:58 am

There is a website that has the Elo of the chess machines of the 80s-90s. These machines participated more in tournaments with humans than the current engines and hence it is assumed that this list has some Elo that are similar to Elo FIDE. The old SDDF lists also seem to have the most approximate Elos.

https://www.schach-computer.info/wiki/i ... -Elo-Liste

I played one of these machines, the Mephisto Roma32 that has an Elo wiki of 2075 and in CCRL only takes an Elo of about 1575 (currently it is possible to play many of these machines as uci engines), this is 500 points of difference!. This difference is not linear, approximately in the 700 points of Elo the 2 lists are equal.

My Elo list: https://sites.google.com/site/motoresde ... lo-compleo
Watch Mephisto Roma32 14 MHz, Roma32 plays as Darky.

In my engine when playing with ELo I distinguish if it plays against another engine or if it plays against another machine or human. To the human I put it another 100 points of ELo easier that the machines so that he does not complain.

Surely Larry Kaufman will be able to give good information.

xr_a_y · Post by **xr_a_y** » Thu Jun 20, 2019 6:06 pm

I'd really love to know a FIDE-like rating of Minic (Danasah is more or less at the same level). I try to put Minic on Lichess, but mostly engines are playing against it and most often only better engines probably trying to grab some elo point by point ... So Minic is "only" 2200 on lichess. But no human player ever win against it on lichess.

My plan is to work on a "level" functionnality and I'd like to propose a good scaling ...

Laskos · Post by **Laskos** » Thu Jun 20, 2019 8:30 pm

xr_a_y wrote: ↑Thu Jun 20, 2019 9:52 am I guess this is already a well documented subject but let me ask it my way.

I wonder if some kind (very) good chess player would accept to play some games against middle range engines (2200-2700 CCRL elo rating) at CCRL 40/4 TC in order to rescale CCRL rating to FIDE rating.

If you have a comparison only with engines and want to get human FIDE rating at tournament time control, if your engine is behaving regularly at longer TC, do roughly the following:

You get a computer rating of 2300 according to CCRL 40/4. The human FIDE rating at tournament time control is roughly 2800 - (2800 - 2300)*0.7 ~ 2450 FIDE Elo points at tournament time control. Add 100-150 Elo points (so 2550-2600 FIDE Elo) for blitz ratings, engines are strong versus humans at blitz. That factor of 0.7 is the "compression factor" of engine ratings when playing against humans.

xr_a_y · Post by **xr_a_y** » Thu Jun 20, 2019 9:12 pm

Laskos wrote: ↑Thu Jun 20, 2019 8:30 pm
xr_a_y wrote: ↑Thu Jun 20, 2019 9:52 am I guess this is already a well documented subject but let me ask it my way.

I wonder if some kind (very) good chess player would accept to play some games against middle range engines (2200-2700 CCRL elo rating) at CCRL 40/4 TC in order to rescale CCRL rating to FIDE rating.
If you have a comparison only with engines and want to get human FIDE rating at tournament time control, if your engine is behaving regularly at longer TC, do roughly the following:

You get a computer rating of 2300 according to CCRL 40/4. The human FIDE rating at tournament time control is roughly 2800 - (2800 - 2300)*0.7 ~ 2450 FIDE Elo points at tournament time control. Add 100-150 Elo points (so 2550-2600 FIDE Elo) for blitz ratings, engines are strong versus humans at blitz. That factor of 0.7 is the "compression factor" of engine ratings when playing against humans.

So if I get you well, computer CCRL40/4 2800 and human FIDE 2800 fit.

Laskos · Post by **Laskos** » Thu Jun 20, 2019 10:04 pm

xr_a_y wrote: ↑Thu Jun 20, 2019 9:12 pm
Laskos wrote: ↑Thu Jun 20, 2019 8:30 pm
xr_a_y wrote: ↑Thu Jun 20, 2019 9:52 am I guess this is already a well documented subject but let me ask it my way.

I wonder if some kind (very) good chess player would accept to play some games against middle range engines (2200-2700 CCRL elo rating) at CCRL 40/4 TC in order to rescale CCRL rating to FIDE rating.
If you have a comparison only with engines and want to get human FIDE rating at tournament time control, if your engine is behaving regularly at longer TC, do roughly the following:

You get a computer rating of 2300 according to CCRL 40/4. The human FIDE rating at tournament time control is roughly 2800 - (2800 - 2300)*0.7 ~ 2450 FIDE Elo points at tournament time control. Add 100-150 Elo points (so 2550-2600 FIDE Elo) for blitz ratings, engines are strong versus humans at blitz. That factor of 0.7 is the "compression factor" of engine ratings when playing against humans.
So if I get you well, computer CCRL40/4 2800 and human FIDE 2800 fit.

Yes, roughly.

jorose · Post by **jorose** » Thu Jun 20, 2019 10:46 pm

I feel it is conservative to estimate 2800 CCRL at only 2800 FIDE. That being said there are a lot of factors that complicate matters.

Humans memorize openings, this means they will play far stronger than their ratings in the opening, until they are out of book. I don't know if the context you are thinking of allows you to handle opening books, but to smooth things out I would consider adding an opening book. The opening book could be based on human games of the specified rating.

Engine strength is hardware dependent. You may want to consider trying to come up with ways to normalize engine strength across different hardware. I would bet on Carlsen against Minic on a Raspberry PI, but I would bet on Minic against Carlsen on TCEC hardware.

xr_a_y · Post by **xr_a_y** » Fri Jun 21, 2019 6:46 am

Yes, I think I'll play with the following things to defined level :
- fixed depth search
- activate or not some eval feature
- activate or not pruning
- not playing the best move (multi-pv search)

I'll give some result table soon.

lkaufman · Post by **lkaufman** » Fri Jun 21, 2019 8:32 pm

jorose wrote: ↑Thu Jun 20, 2019 10:46 pm I feel it is conservative to estimate 2800 CCRL at only 2800 FIDE. That being said there are a lot of factors that complicate matters.

Humans memorize openings, this means they will play far stronger than their ratings in the opening, until they are out of book. I don't know if the context you are thinking of allows you to handle opening books, but to smooth things out I would consider adding an opening book. The opening book could be based on human games of the specified rating.

Engine strength is hardware dependent. You may want to consider trying to come up with ways to normalize engine strength across different hardware. I would bet on Carlsen against Minic on a Raspberry PI, but I would bet on Minic against Carlsen on TCEC hardware.

The CCRL list is indeed conservative by FIDE standards. Let's assume the engine gets a good book which tries to get out of theory early when it doesn't cost too much. Assume also the hardware specified as standard by CCRL, which is way below current PC speeds. Note that number of threads is not an issue as they rate 1 and 4 threads separately. Let's use CCRL 40/40 as it is obviously more relevant for tournament chess than 40/4. In 1998 Junior 5 earned a 2700 FIDE result in 9 games vs. top players, and while it is too old to even appear on CCRL, by extrapolation it would surely not be higher than 2600 on this list. But the hardware in 1998 was way below even the modest hardware specified by CCRL. The same conclusion would be reached by looking at the various matches from around 2003 or the Kramnik vs Fritz match of 2006; even a 2700 engine on CCRL would get a FIDE rating above 2800 with the above assumptions. Note that Deep Fritz 10, which beat Kramnik in that match, is at 2830 on the list, but it gave several handicaps to Kramnik such as letting him see the engine's book during the game (!), giving him a copy of the engine to practice against for months, limiting TBs to 5 man, etc. On the CCRL specified hardware with no special conditions like those, I believe that Fritz 10 on 4 cpus would easily defeat Carlsen in a match today.
One other point: CCRL uses Bayeselo which contracts the ratings considerably from normal elo, so although I totally agree with Kai about scaling engine results down to 70%, this is really correct just for CEGT. For CCRL much of the contraction is already done by bayeselo, so maybe 85% or so might be the right figure for CCRL 40/40. Of course blitz results are more spread out, so maybe 70% is actually about right for CCRL 40/4.

xr_a_y · Post by **xr_a_y** » Fri Jun 21, 2019 9:00 pm

Thanks a lot for this input.

Let me go back to my initial first idea anyway.
Would it be possible to ask to some human master to officialy play against middle range engines?
I guess some influent people here know some human masters or even that some members are masters themself. It won't be technicaly difficult to organize the thing, on lichess for example.

CCRL scaling versus human player

CCRL scaling versus human player

Re: CCRL scaling versus human player

Re: CCRL scaling versus human player

Re: CCRL scaling versus human player

Re: CCRL scaling versus human player

Re: CCRL scaling versus human player

Re: CCRL scaling versus human player

Re: CCRL scaling versus human player

Re: CCRL scaling versus human player

Re: CCRL scaling versus human player