which is more realistic in human terms, ccrl or cegt?

lkaufman · Post by **lkaufman** » Thu Jul 23, 2020 3:49 am

The CCRL 40/15 list and the CEGT "40/20" (maybe 40/8 or so on modern hardware) lists pretty much agree around the 3500 level, but for engines in the range of strong human amateur players, say 2100-2400 FIDE or so, the CCRL ratings for most engines are far higher than the CEGT ratings, maybe 200 to 250 or so. This is due to using BayesElo vs Ordo, but whatever the reason, I'm simply asking for strong human players, or people who have observed games by strong human players against such engines, to express their opinion as to which list is closer to human FIDE ratings in that rating range, for games played at fairly slow "rapid" time controls on a typical modern 3 Ghz machine. I suspect that the truth is somewhere in between the two lists, but I have very limited data to go by. It would be very nice to know just what level engine is actually a good match for say a 2300 FIDE player.

carldaman · Post by **carldaman** » Thu Jul 23, 2020 3:57 am

I think on average even the CCRL ratings are a little deflated in the 2100-2400 range, which makes CEGT too deflated.

I know some strong players that have opined that Romichess, for example, is FIDE master stength, at least - especially with learning enabled. You could try it on for size yourself, Larry.

lkaufman · Post by **lkaufman** » Thu Jul 23, 2020 5:14 am

carldaman wrote: ↑Thu Jul 23, 2020 3:57 am I think on average even the CCRL ratings are a little deflated in the 2100-2400 range, which makes CEGT too deflated.

I know some strong players that have opined that Romichess, for example, is FIDE master stength, at least - especially with learning enabled. You could try it on for size yourself, Larry.

But the CCRL ratings for Romichess (different versions) are mostly around 2400, which is IM standard; FM standard is 2300. So this would contradict what you are saying. We certainly don't want to consider programs with learning for this, since their ratings on the lists presumably turn off learning. I played a couple fast rapid games with Gaviota 0.80 (a old version, but that should be irrelevant as to the accuracy of its rating), and I just can't believe it would earn the GM title vs. humans, rating on CCRL is 2535. I don't doubt that later versions were GM strength, but not this one. I'll play other such engines as time permits, but due to my age I'm not a typical player for a given rating. I have the feeling that a typical 2200 or 2300 FIDE player now is a much stronger player than such a player was thirty years ago, but I can't really prove it. I don't think engines from 1990 would perform nearly as well today on the same hardware; everyone just knows so much more about chess now and has so much more practice, as well as knowing how to play vs engines.

carldaman · Post by **carldaman** » Thu Jul 23, 2020 5:57 am

lkaufman wrote: ↑Thu Jul 23, 2020 5:14 am
carldaman wrote: ↑Thu Jul 23, 2020 3:57 am I think on average even the CCRL ratings are a little deflated in the 2100-2400 range, which makes CEGT too deflated.

I know some strong players that have opined that Romichess, for example, is FIDE master stength, at least - especially with learning enabled. You could try it on for size yourself, Larry.
But the CCRL ratings for Romichess (different versions) are mostly around 2400, which is IM standard; FM standard is 2300. So this would contradict what you are saying. We certainly don't want to consider programs with learning for this, since their ratings on the lists presumably turn off learning. I played a couple fast rapid games with Gaviota 0.80 (a old version, but that should be irrelevant as to the accuracy of its rating), and I just can't believe it would earn the GM title vs. humans, rating on CCRL is 2535. I don't doubt that later versions were GM strength, but not this one. I'll play other such engines as time permits, but due to my age I'm not a typical player for a given rating. I have the feeling that a typical 2200 or 2300 FIDE player now is a much stronger player than such a player was thirty years ago, but I can't really prove it. I don't think engines from 1990 would perform nearly as well today on the same hardware; everyone just knows so much more about chess now and has so much more practice, as well as knowing how to play vs engines.

I was mentioning others' impressions about Romichess, with an assessment of FM level at the very least, as a lower boundary. Quite possibly it's stronger, maybe IM, or weak GM, but it would be nice if that could be verified.

Anyway, one can certainly play Romi with learning turned off, but then you might run into some determinism issues.

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Thu Jul 23, 2020 8:06 am

carldaman wrote: ↑Thu Jul 23, 2020 5:57 am
lkaufman wrote: ↑Thu Jul 23, 2020 5:14 am
carldaman wrote: ↑Thu Jul 23, 2020 3:57 am I think on average even the CCRL ratings are a little deflated in the 2100-2400 range, which makes CEGT too deflated.

I know some strong players that have opined that Romichess, for example, is FIDE master stength, at least - especially with learning enabled. You could try it on for size yourself, Larry.
But the CCRL ratings for Romichess (different versions) are mostly around 2400, which is IM standard; FM standard is 2300. So this would contradict what you are saying. We certainly don't want to consider programs with learning for this, since their ratings on the lists presumably turn off learning. I played a couple fast rapid games with Gaviota 0.80 (a old version, but that should be irrelevant as to the accuracy of its rating), and I just can't believe it would earn the GM title vs. humans, rating on CCRL is 2535. I don't doubt that later versions were GM strength, but not this one. I'll play other such engines as time permits, but due to my age I'm not a typical player for a given rating. I have the feeling that a typical 2200 or 2300 FIDE player now is a much stronger player than such a player was thirty years ago, but I can't really prove it. I don't think engines from 1990 would perform nearly as well today on the same hardware; everyone just knows so much more about chess now and has so much more practice, as well as knowing how to play vs engines.
I was mentioning others' impressions about Romichess, with an assessment of FM level at the very least, as a lower boundary. Quite possibly it's stronger, maybe IM, or weak GM, but it would be nice if that could be verified.

Anyway, one can certainly play Romi with learning turned off, but then you might run into some determinism issues.

I'll take on Romi when I reach it on the CCRL rating list with the learning turned on although I doubt it will benefit from this feature as it will be only 2 games ....

Cheers,
Dr.D

Modern Times · Post by **Modern Times** » Thu Jul 23, 2020 9:40 am

lkaufman wrote: ↑Thu Jul 23, 2020 3:49 am The CCRL 40/15 list and the CEGT "40/20" (maybe 40/8 or so on modern hardware)

My recollection is this: The CCRL and CEGT 40/40 lists were originally benched on similar Athlon X2 hardware. They re-named theirs a while back to 40/20 to take account of hardware improvements, and we recently renamed ours to 40/15 based on an Intel i7-4770k. So I think they are broadly still similar.

jdart · Post by **jdart** » Thu Jul 23, 2020 10:33 am

I occasionally get asked what the rating of my engine is in human terms. My standard answer has always been that there really is no accurate answer I can give. If you put one of these engines in a series of FIDE rated tournaments against a variety of human players, you'd get it to converge to a proper FIDE rating. But there is really no reason to expect a good correlation between the CEGT/CCRL list and FIDE ratings, among other reasons because they are completely different rating pools.

--Jon

Rebel · Post by **Rebel** » Thu Jul 23, 2020 12:12 pm

jdart wrote: ↑Thu Jul 23, 2020 10:33 am I occasionally get asked what the rating of my engine is in human terms. My standard answer has always been that there really is no accurate answer I can give. If you put one of these engines in a series of FIDE rated tournaments against a variety of human players, you'd get it to converge to a proper FIDE rating. But there is really no reason to expect a good correlation between the CEGT/CCRL list and FIDE ratings, among other reasons because they are completely different rating pools.

--Jon

Totally agree, for example, Rebel Century in 2001 (on a poor Athlon) played a 4 game match at tournament time control (40 moves in 2 hours) against the number 10 of that time on the FIDE rating list, Loek van Wely, rated 2714. 2 wins, 2 losses, 2-2. And yet:

Rebel Century CEGT elo 2379
Rebel Century CCRL elo 2543

The links are:
https://en.wikipedia.org/wiki/Loek_van_Wely
http://rebel13.nl/dos/rebel%20century%204.html

And that's just one example, Junior-Kasparov and Fritz-Kramnik around the same time are other examples.

Vinvin · Post by **Vinvin** » Thu Jul 23, 2020 1:16 pm

At the top, the ratings are about synchronized at the top (3500 for SF 11 4 CPUs).
But the more you go down in the list, the bigger the difference is.

And to compare with human, I'm taking the example of Fritz who played a lot of games at highest level.

Kramnik vs Deep Fritz, Bahrain, October 2002 (4 - 4)
Kasparov versus X3D Fritz 2003 (2-2)
Fritz 8 in Bilbao (2004) vs Ponomariov, Karjakin and Topalov (3.5/4)
Fritz 9 in Bilbao (2005) vs Kasimdzhanov, Ponomariov and Khalifman (2/4)
Kramnik versus Deep Fritz, Bonn December 2006 (2 - 4)

CCRL

Code: Select all

Deep Fritz 10 4CPU	2830	
Fritz 10		2778	
Fritz 9			2742	
Fritz 8 Bilbao		2700

CEGT

Code: Select all

Deep Fritz 10 4CPU 	2659
Fritz 10 		2622
Fritz 9 		2576
Deep Fritz 8 2CPU 	2562
Fritz in Bahrain 	2524
Fritz 8 Bilbao 		2506
Deep Fritz 8 1CPU 	2489
Fritz 6 		2358

The CCRL ratings are way more realistic than CEGT ones.
More, if you consider the time control (15 min/40 moves), the rating for Deep Fritz 10 4CPU is probably close to 3000 compares to FIDE.

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Thu Jul 23, 2020 3:06 pm

Vinvin wrote: ↑Thu Jul 23, 2020 1:16 pm At the top, the ratings are about synchronized at the top (3500 for SF 11 4 CPUs).
But the more you go down in the list, the bigger the difference is.

And to compare with human, I'm taking the example of Fritz who played a lot of games at highest level.

Kramnik vs Deep Fritz, Bahrain, October 2002 (4 - 4)
Kasparov versus X3D Fritz 2003 (2-2)
Fritz 8 in Bilbao (2004) vs Ponomariov, Karjakin and Topalov (3.5/4)
Fritz 9 in Bilbao (2005) vs Kasimdzhanov, Ponomariov and Khalifman (2/4)
Kramnik versus Deep Fritz, Bonn December 2006 (2 - 4)

CCRL
Code: Select all
Deep Fritz 10 4CPU	2830	
Fritz 10		2778	
Fritz 9			2742	
Fritz 8 Bilbao		2700	
CEGT
Code: Select all
Deep Fritz 10 4CPU 	2659
Fritz 10 		2622
Fritz 9 		2576
Deep Fritz 8 2CPU 	2562
Fritz in Bahrain 	2524
Fritz 8 Bilbao 		2506
Deep Fritz 8 1CPU 	2489
Fritz 6 		2358
The CCRL ratings are way more realistic than CEGT ones.
More, if you consider the time control (15 min/40 moves), the rating for Deep Fritz 10 4CPU is probably close to 3000 compares to FIDE.

You are probably right ....

I am a non-rated self-educated chess player but never the less,I discovered and I am still discovering an obvious distortion in the rating of the chess engines in the lower sectors of the CCRL rating list ....

But in general,CCRL is reasonably realistic as realism can be achieved could be achieved when there are no humans involved ....\

Cheers,
Dr.D

which is more realistic in human terms, ccrl or cegt?

which is more realistic in human terms, ccrl or cegt?

Re: which is more realistic in human terms, ccrl or cegt?

Re: which is more realistic in human terms, ccrl or cegt?

Re: which is more realistic in human terms, ccrl or cegt?

Re: which is more realistic in human terms, ccrl or cegt?

Re: which is more realistic in human terms, ccrl or cegt?

Re: which is more realistic in human terms, ccrl or cegt?

Re: which is more realistic in human terms, ccrl or cegt?

Re: which is more realistic in human terms, ccrl or cegt?

Re: which is more realistic in human terms, ccrl or cegt?