which is more realistic in human terms, ccrl or cegt?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: which is more realistic in human terms, ccrl or cegt?

Post by Dr.Wael Deeb »

lkaufman wrote: Fri Jul 24, 2020 6:27 am
Dr.Wael Deeb wrote: Fri Jul 24, 2020 12:39 am
lkaufman wrote: Thu Jul 23, 2020 4:46 pm
Dr.Wael Deeb wrote: Thu Jul 23, 2020 8:06 am
carldaman wrote: Thu Jul 23, 2020 5:57 am
lkaufman wrote: Thu Jul 23, 2020 5:14 am
carldaman wrote: Thu Jul 23, 2020 3:57 am I think on average even the CCRL ratings are a little deflated in the 2100-2400 range, which makes CEGT too deflated.

I know some strong players that have opined that Romichess, for example, is FIDE master stength, at least - especially with learning enabled. You could try it on for size yourself, Larry. :)
But the CCRL ratings for Romichess (different versions) are mostly around 2400, which is IM standard; FM standard is 2300. So this would contradict what you are saying. We certainly don't want to consider programs with learning for this, since their ratings on the lists presumably turn off learning. I played a couple fast rapid games with Gaviota 0.80 (a old version, but that should be irrelevant as to the accuracy of its rating), and I just can't believe it would earn the GM title vs. humans, rating on CCRL is 2535. I don't doubt that later versions were GM strength, but not this one. I'll play other such engines as time permits, but due to my age I'm not a typical player for a given rating. I have the feeling that a typical 2200 or 2300 FIDE player now is a much stronger player than such a player was thirty years ago, but I can't really prove it. I don't think engines from 1990 would perform nearly as well today on the same hardware; everyone just knows so much more about chess now and has so much more practice, as well as knowing how to play vs engines.
Hi Larry,



I was mentioning others' impressions about Romichess, with an assessment of FM level at the very least, as a lower boundary. Quite possibly it's stronger, maybe IM, or weak GM, but it would be nice if that could be verified.

Anyway, one can certainly play Romi with learning turned off, but then you might run into some determinism issues.
I'll take on Romi when I reach it on the CCRL rating list with the learning turned on although I doubt it will benefit from this feature as it will be only 2 games ....

Cheers,
Dr.D
Unless you are actually of grandmaster (FIDE 2500+) strength, which I don't think you would claim to be, your results vs 2400 CCRL rated engines suggest that they are substantially overrated. Yet, an examination of the results of top players from the first few years of the current century suggests that engines not too far above this level are underrated. Something is very weird here. I'm trying to figure out just what it is. I know that the correlation between engines ratings and ratings vs. humans might not be super-high for individual engines, but I'm talking about general statements, such as what would be the average performance vs. humans of engines around any given CCRL (or CEGT) rating. How can you be beating 2400 CCRL rated engines (in general), while engines not too far above that were holding even with Kasparov and Kramnik long ago? What we are missing?
Hi Larry,

The most logical answer to this is that these are not 2400 chess engines compared to the human Elo rating pole .....
These chess engines are in no way close to the chess engines that played Kasparov or Kramnnik ....

Maybe Kasparov and Kramnnik tried to over complicate the position and got tactically blown out just like my two games against CT800 1.40 chess engine where I got crushed like a bug tactically ....

If I recall correctly,Kasparov accused IBM that he was actually playing against a team of grandmasters and asked to see the Deep Thought thinking lines and log files ....

Anyways,let's see what happens next but one thing is for sure:
I am not afraid like the FIDE top rated human chess player to play against chess engines ....

Cheers,
Dr.D
Kasparov vs IBM is not relevant, I'm talking about Kasparov and Kramnik vs. commercial chess engines from 2001 to 2006. Kasparov and Kramnik were no dummies; they were playing for huge sums of money, with months to train and the best advice available on how to optimize their chances. But I can agree with your statement "These chess engines are in no way close to the chess engines that played Kasparov or Kramnnik ....". Most of those engines are around 2700 on the CCRL list (except Fritz 10 which beat Kramnik 4-2 is higher), about 300 elo above the ones you are playing. I think you are saying that the difference is really more than 300 elo; it certainly must be more unless you are actually of GM level. But the conventional thinking is that engine rating lists OVERSTATE elo differences in human terms, the opposite of what we are seeing here. Kai Laskos has studied the question and estimated that engine rating differences need to be reduced by a third or more to predict human ratings, and I have pretty much said the same, only I've generally said a quarter. But I think that the evidence for this is based mostly on human vs. engine games played a long time ago. I can still remember when FIDE had a MINIMUM rating of 2200 (!) (for a while they reduced the minimum for women only to 2000!). So a FIDE 2200 was a weak player who just got lucky once or twice, or perhaps got the rating by some arbitrary rule based on playing in an Olympiad or Zonal. The difference between 2200 and 2700 was just enormous. Now you have to play quite well to get a 2200 rating, the rating differences among humans are much expanded. So my own theory is that the rating differences on the engine lists are now fully valid vs. humans, even if they were not valid twenty or thirty years ago. And since the CCRL ratings are contracted due to using BayesElo, the real difference between those 2400 CCRL engines and the ones the Ks played is actually more than 300 elo, maybe 350 or so (CEGT differences should be accurate, even if the level is not). We have no way to tell whether ratings like 3500 for Stockfish or Lc0 would be valid vs. humans; obviously the engines would need special code and opening books to avoid allowing easy draws when they have Black, and it would be too expensive to get a top GM to play a hundred game serious match to find out whether the human could score more than a couple draws in a hundred games. But I do have a very clear idea of the level of Rybka 2.3.2a and Rybka 3 in 2007 and 2008 as we played many matches under a wide variety of conditions vs. GMs, so if we can clearly establish the level of engines rating at say 2400 on the CCRL list in human terms, we can tell whether the "spread" is too low, too high, or just right. I much appreciate that you (Dr. Deeb) are playing these games, which opened my eyes to the weakness of such "2400" engines, but since you have no rating yourself you can't really do more than put an upper limit on their level. We need games by players of your level with known, established ratings.
Thanks Larry for your interest and your valuable comments ....

I will continue to pierce into the CCRL rating list to shed some light on these chess engines and their relationship with humans ....

Cheers,
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: which is more realistic in human terms, ccrl or cegt?

Post by Dr.Wael Deeb »

carldaman wrote: Fri Jul 24, 2020 7:53 am
Ovyron wrote: Fri Jul 24, 2020 2:03 am
I also suspect Wael will find an engine he may not be able to beat, or draw against, but it will not necessarily be highly rated, because it'd have weaknesses other engines have learned to exploit.
He already did, it's called CT800 1.40 which has a good anti-human style. It is rated 2400 just like the other engines Dr. Deeb beat, but he wasn't able to survive vs CT800. It is all about 'style-clash'.
Indeed ....

The interesting thing is that this particular engine beat all the engines I played till now and is topping the list for now ....

Can we assume that it's tactical abilities and aggressiveness pays well against the other chess engines as well ?

Cheers,
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: which is more realistic in human terms, ccrl or cegt?

Post by Dr.Wael Deeb »

carldaman wrote: Fri Jul 24, 2020 9:54 am Given that the engines have to be tested using openings not of their own choosing would suggest that all rating lists are somewhat compressed, although I'd expect this effect to be more pronounced for stronger engines than weaker ones.
Yes to a certain extent you are correct ....
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
carldaman
Posts: 2283
Joined: Sat Jun 02, 2012 2:13 am

Re: which is more realistic in human terms, ccrl or cegt?

Post by carldaman »

Dr.Wael Deeb wrote: Fri Jul 24, 2020 12:51 pm
carldaman wrote: Fri Jul 24, 2020 7:53 am
Ovyron wrote: Fri Jul 24, 2020 2:03 am
I also suspect Wael will find an engine he may not be able to beat, or draw against, but it will not necessarily be highly rated, because it'd have weaknesses other engines have learned to exploit.
He already did, it's called CT800 1.40 which has a good anti-human style. It is rated 2400 just like the other engines Dr. Deeb beat, but he wasn't able to survive vs CT800. It is all about 'style-clash'.
Indeed ....

The interesting thing is that this particular engine beat all the engines I played till now and is topping the list for now ....

Can we assume that it's tactical abilities and aggressiveness pays well against the other chess engines as well ?

Cheers,
Dr.D
It makes one wonder why CT800 would not be rated much higher than 2400 CCRL. Afaik, CCRL tests engines against opponents mostly within 200+/- Elo, so it must be struggling somewhere to explain the relatively low rating.
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: which is more realistic in human terms, ccrl or cegt?

Post by Dr.Wael Deeb »

carldaman wrote: Fri Jul 24, 2020 1:17 pm
Dr.Wael Deeb wrote: Fri Jul 24, 2020 12:51 pm
carldaman wrote: Fri Jul 24, 2020 7:53 am
Ovyron wrote: Fri Jul 24, 2020 2:03 am
I also suspect Wael will find an engine he may not be able to beat, or draw against, but it will not necessarily be highly rated, because it'd have weaknesses other engines have learned to exploit.
He already did, it's called CT800 1.40 which has a good anti-human style. It is rated 2400 just like the other engines Dr. Deeb beat, but he wasn't able to survive vs CT800. It is all about 'style-clash'.
Indeed ....

The interesting thing is that this particular engine beat all the engines I played till now and is topping the list for now ....

Can we assume that it's tactical abilities and aggressiveness pays well against the other chess engines as well ?

Cheers,
Dr.D
It makes one wonder why CT800 would not be rated much higher than 2400 CCRL. Afaik, CCRL tests engines against opponents mostly within 200+/- Elo, so it must be struggling somewhere to explain the relatively low rating.
Correct and that's why I am building a parallel rating list just for fun by competing the chess engines against each other and I am sure that there are a lot of chess engines in the list that are far underrated or overrated but their number won't be that big I assume ....

Cheers,
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: which is more realistic in human terms, ccrl or cegt?

Post by Ovyron »

Dr.Wael Deeb wrote: Fri Jul 24, 2020 12:51 pm
carldaman wrote: Fri Jul 24, 2020 7:53 am
Ovyron wrote: Fri Jul 24, 2020 2:03 am
I also suspect Wael will find an engine he may not be able to beat, or draw against, but it will not necessarily be highly rated, because it'd have weaknesses other engines have learned to exploit.
He already did, it's called CT800 1.40 which has a good anti-human style. It is rated 2400 just like the other engines Dr. Deeb beat, but he wasn't able to survive vs CT800. It is all about 'style-clash'.
Indeed ....

The interesting thing is that this particular engine beat all the engines I played till now and is topping the list for now ....

Can we assume that it's tactical abilities and aggressiveness pays well against the other chess engines as well ?

Cheers,
Dr.D
This confirms my suspicions: the CCRL can't be translated to human ratings because it doesn't measure any style good against humans, just weaknesses that engines exploit from each other and a human without those weaknesses will not suffer.

And probably Komodo/Stockfish/Leela/NNUE aren't the strongest chess entities. They don't play stronger chess. All they do is exploiting each other's weaknesses the best.

The 700 elo advantage over 2800 rated engines can be a bunch of smoke, maybe if tested against humans no engine can reach 3000 elo. Or maybe they can, who knows?
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: which is more realistic in human terms, ccrl or cegt?

Post by lkaufman »

Dr.Wael Deeb wrote: Fri Jul 24, 2020 2:55 pm
carldaman wrote: Fri Jul 24, 2020 1:17 pm
Dr.Wael Deeb wrote: Fri Jul 24, 2020 12:51 pm
carldaman wrote: Fri Jul 24, 2020 7:53 am
Ovyron wrote: Fri Jul 24, 2020 2:03 am
I also suspect Wael will find an engine he may not be able to beat, or draw against, but it will not necessarily be highly rated, because it'd have weaknesses other engines have learned to exploit.
He already did, it's called CT800 1.40 which has a good anti-human style. It is rated 2400 just like the other engines Dr. Deeb beat, but he wasn't able to survive vs CT800. It is all about 'style-clash'.
Indeed ....

The interesting thing is that this particular engine beat all the engines I played till now and is topping the list for now ....

Can we assume that it's tactical abilities and aggressiveness pays well against the other chess engines as well ?

Cheers,
Dr.D
It makes one wonder why CT800 would not be rated much higher than 2400 CCRL. Afaik, CCRL tests engines against opponents mostly within 200+/- Elo, so it must be struggling somewhere to explain the relatively low rating.
Correct and that's why I am building a parallel rating list just for fun by competing the chess engines against each other and I am sure that there are a lot of chess engines in the list that are far underrated or overrated but their number won't be that big I assume ....

Cheers,
Dr.D
The problem is that you can only construct a relative rating list, since you don't have a rating and you don't believe the ratings of the engines on the CCRL list. What would be really interesting is if you can somehow get a copy of the ancient Fritz 7 engine that split 4 to 4 with Kramnik in 2002. Probably the four processor hardware of the time was roughly comparable to one core on your computer, so if you played it that way you would be playing pretty much the same engine and hardware that performed around 2800 in 2002. Of course I would expect you to lose, the question would be whether you could even come close to a draw in any game or would have no chance. That would tell us a lot. You could pair it vs the other engines too. CCRL doesn't go below Fritz 8, but the gains were about 40 elo per version in those days, so CCRL 2660 would be a pretty good estimate.
Komodo rules!
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: which is more realistic in human terms, ccrl or cegt?

Post by Dr.Wael Deeb »

Ovyron wrote: Fri Jul 24, 2020 7:22 pm
Dr.Wael Deeb wrote: Fri Jul 24, 2020 12:51 pm
carldaman wrote: Fri Jul 24, 2020 7:53 am
Ovyron wrote: Fri Jul 24, 2020 2:03 am
I also suspect Wael will find an engine he may not be able to beat, or draw against, but it will not necessarily be highly rated, because it'd have weaknesses other engines have learned to exploit.
He already did, it's called CT800 1.40 which has a good anti-human style. It is rated 2400 just like the other engines Dr. Deeb beat, but he wasn't able to survive vs CT800. It is all about 'style-clash'.
Indeed ....

The interesting thing is that this particular engine beat all the engines I played till now and is topping the list for now ....

Can we assume that it's tactical abilities and aggressiveness pays well against the other chess engines as well ?

Cheers,
Dr.D
This confirms my suspicions: the CCRL can't be translated to human ratings because it doesn't measure any style good against humans, just weaknesses that engines exploit from each other and a human without those weaknesses will not suffer.

Bingo,right to the point and a bright example from my experience thus far is Simplex 0.9.8 x64 ....

And probably Komodo/Stockfish/Leela/NNUE aren't the strongest chess entities. They don't play stronger chess. All they do is exploiting each other's weaknesses the best.


Probably they are but unless grinding them into a competitive human rating pool we will never know for sure ....

The 700 elo advantage over 2800 rated engines can be a bunch of smoke, maybe if tested against humans no engine can reach 3000 elo. Or maybe they can, who knows?


As I mentioned above we will never know for sure but I am happy about one fact actually and that is when I started this project I didn't realize that it will open a door to a whole aspect of the computer chess which is nearly zero explored and observed and that is the chess engine -chess engine rating lists

Cheers,
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: which is more realistic in human terms, ccrl or cegt?

Post by Dr.Wael Deeb »

lkaufman wrote: Fri Jul 24, 2020 7:30 pm
Dr.Wael Deeb wrote: Fri Jul 24, 2020 2:55 pm
carldaman wrote: Fri Jul 24, 2020 1:17 pm
Dr.Wael Deeb wrote: Fri Jul 24, 2020 12:51 pm
carldaman wrote: Fri Jul 24, 2020 7:53 am
Ovyron wrote: Fri Jul 24, 2020 2:03 am
I also suspect Wael will find an engine he may not be able to beat, or draw against, but it will not necessarily be highly rated, because it'd have weaknesses other engines have learned to exploit.
He already did, it's called CT800 1.40 which has a good anti-human style. It is rated 2400 just like the other engines Dr. Deeb beat, but he wasn't able to survive vs CT800. It is all about 'style-clash'.
Indeed ....

The interesting thing is that this particular engine beat all the engines I played till now and is topping the list for now ....

Can we assume that it's tactical abilities and aggressiveness pays well against the other chess engines as well ?

Cheers,
Dr.D
It makes one wonder why CT800 would not be rated much higher than 2400 CCRL. Afaik, CCRL tests engines against opponents mostly within 200+/- Elo, so it must be struggling somewhere to explain the relatively low rating.
Correct and that's why I am building a parallel rating list just for fun by competing the chess engines against each other and I am sure that there are a lot of chess engines in the list that are far underrated or overrated but their number won't be that big I assume ....

Cheers,
Dr.D
The problem is that you can only construct a relative rating list, since you don't have a rating and you don't believe the ratings of the engines on the CCRL list. What would be really interesting is if you can somehow get a copy of the ancient Fritz 7 engine that split 4 to 4 with Kramnik in 2002. Probably the four processor hardware of the time was roughly comparable to one core on your computer, so if you played it that way you would be playing pretty much the same engine and hardware that performed around 2800 in 2002. Of course I would expect you to lose, the question would be whether you could even come close to a draw in any game or would have no chance. That would tell us a lot. You could pair it vs the other engines too. CCRL doesn't go below Fritz 8, but the gains were about 40 elo per version in those days, so CCRL 2660 would be a pretty good estimate.
A brilliant idea I have to admit ....

I have all the Fritz versions purchased at the time being released including the last one Fritz 17 ....

The problem is that I have to install it with it's own GUI as the later versions of Fritz interface doesn't support the .eng file format ....

Cheers,
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….