which is more realistic in human terms, ccrl or cegt?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: which is more realistic in human terms, ccrl or cegt?

Post by lkaufman »

Dr.Wael Deeb wrote: Thu Jul 23, 2020 8:06 am
carldaman wrote: Thu Jul 23, 2020 5:57 am
lkaufman wrote: Thu Jul 23, 2020 5:14 am
carldaman wrote: Thu Jul 23, 2020 3:57 am I think on average even the CCRL ratings are a little deflated in the 2100-2400 range, which makes CEGT too deflated.

I know some strong players that have opined that Romichess, for example, is FIDE master stength, at least - especially with learning enabled. You could try it on for size yourself, Larry. :)
But the CCRL ratings for Romichess (different versions) are mostly around 2400, which is IM standard; FM standard is 2300. So this would contradict what you are saying. We certainly don't want to consider programs with learning for this, since their ratings on the lists presumably turn off learning. I played a couple fast rapid games with Gaviota 0.80 (a old version, but that should be irrelevant as to the accuracy of its rating), and I just can't believe it would earn the GM title vs. humans, rating on CCRL is 2535. I don't doubt that later versions were GM strength, but not this one. I'll play other such engines as time permits, but due to my age I'm not a typical player for a given rating. I have the feeling that a typical 2200 or 2300 FIDE player now is a much stronger player than such a player was thirty years ago, but I can't really prove it. I don't think engines from 1990 would perform nearly as well today on the same hardware; everyone just knows so much more about chess now and has so much more practice, as well as knowing how to play vs engines.
I was mentioning others' impressions about Romichess, with an assessment of FM level at the very least, as a lower boundary. Quite possibly it's stronger, maybe IM, or weak GM, but it would be nice if that could be verified.

Anyway, one can certainly play Romi with learning turned off, but then you might run into some determinism issues.
I'll take on Romi when I reach it on the CCRL rating list with the learning turned on although I doubt it will benefit from this feature as it will be only 2 games ....

Cheers,
Dr.D
Unless you are actually of grandmaster (FIDE 2500+) strength, which I don't think you would claim to be, your results vs 2400 CCRL rated engines suggest that they are substantially overrated. Yet, an examination of the results of top players from the first few years of the current century suggests that engines not too far above this level are underrated. Something is very weird here. I'm trying to figure out just what it is. I know that the correlation between engines ratings and ratings vs. humans might not be super-high for individual engines, but I'm talking about general statements, such as what would be the average performance vs. humans of engines around any given CCRL (or CEGT) rating. How can you be beating 2400 CCRL rated engines (in general), while engines not too far above that were holding even with Kasparov and Kramnik long ago? What we are missing?
Komodo rules!
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: which is more realistic in human terms, ccrl or cegt?

Post by lkaufman »

Modern Times wrote: Thu Jul 23, 2020 9:40 am
lkaufman wrote: Thu Jul 23, 2020 3:49 am The CCRL 40/15 list and the CEGT "40/20" (maybe 40/8 or so on modern hardware)
My recollection is this: The CCRL and CEGT 40/40 lists were originally benched on similar Athlon X2 hardware. They re-named theirs a while back to 40/20 to take account of hardware improvements, and we recently renamed ours to 40/15 based on an Intel i7-4770k. So I think they are broadly still similar.
That doesn't sound right to me. As far back as I can remember (which should be 2007, when I joined Rybka), CEGT was already 40/20, so they should be running their games for that list at something close to 40/8 on i7 machines now. Can someone from CEGT please clarify the actual time control used on i7 machines now for the 40/20 list?
Komodo rules!
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: which is more realistic in human terms, ccrl or cegt?

Post by Dr.Wael Deeb »

lkaufman wrote: Thu Jul 23, 2020 4:46 pm
Dr.Wael Deeb wrote: Thu Jul 23, 2020 8:06 am
carldaman wrote: Thu Jul 23, 2020 5:57 am
lkaufman wrote: Thu Jul 23, 2020 5:14 am
carldaman wrote: Thu Jul 23, 2020 3:57 am I think on average even the CCRL ratings are a little deflated in the 2100-2400 range, which makes CEGT too deflated.

I know some strong players that have opined that Romichess, for example, is FIDE master stength, at least - especially with learning enabled. You could try it on for size yourself, Larry. :)
But the CCRL ratings for Romichess (different versions) are mostly around 2400, which is IM standard; FM standard is 2300. So this would contradict what you are saying. We certainly don't want to consider programs with learning for this, since their ratings on the lists presumably turn off learning. I played a couple fast rapid games with Gaviota 0.80 (a old version, but that should be irrelevant as to the accuracy of its rating), and I just can't believe it would earn the GM title vs. humans, rating on CCRL is 2535. I don't doubt that later versions were GM strength, but not this one. I'll play other such engines as time permits, but due to my age I'm not a typical player for a given rating. I have the feeling that a typical 2200 or 2300 FIDE player now is a much stronger player than such a player was thirty years ago, but I can't really prove it. I don't think engines from 1990 would perform nearly as well today on the same hardware; everyone just knows so much more about chess now and has so much more practice, as well as knowing how to play vs engines.
Hi Larry,



I was mentioning others' impressions about Romichess, with an assessment of FM level at the very least, as a lower boundary. Quite possibly it's stronger, maybe IM, or weak GM, but it would be nice if that could be verified.

Anyway, one can certainly play Romi with learning turned off, but then you might run into some determinism issues.
I'll take on Romi when I reach it on the CCRL rating list with the learning turned on although I doubt it will benefit from this feature as it will be only 2 games ....

Cheers,
Dr.D
Unless you are actually of grandmaster (FIDE 2500+) strength, which I don't think you would claim to be, your results vs 2400 CCRL rated engines suggest that they are substantially overrated. Yet, an examination of the results of top players from the first few years of the current century suggests that engines not too far above this level are underrated. Something is very weird here. I'm trying to figure out just what it is. I know that the correlation between engines ratings and ratings vs. humans might not be super-high for individual engines, but I'm talking about general statements, such as what would be the average performance vs. humans of engines around any given CCRL (or CEGT) rating. How can you be beating 2400 CCRL rated engines (in general), while engines not too far above that were holding even with Kasparov and Kramnik long ago? What we are missing?
Hi Larry,

At some stage of time,I was certain that you or someone else would ask this question .....

The most logical answer to this is that these are not 2400 chess engines compared to the human Elo rating pole .....
These chess engines are in no way close to the chess engines that played Kasparov or Kramnnik ....

Don't forget that I read a lot of chess material related to how to be a better attacker,how to defense your position,coordination between chess pieces on the board,hell I can go all night long ....

Add to this concentrating my efforts on a certain opening systems like Larsen-Nimzowitch attack,Old Indian Defense,Caro-Kann and facing the Sicilian Defense with the Alapin variation or even 2. b3 ....

For example take a look at this game with KnighDreamer which is supposed to be a strong 2400 Elo chess engine :

[pgn][Event "Et Mortuus Est Rex"]
[Site "?"]
[Date "2020.07.14"]
[Round "?"]
[White "Dr.Deeb"]
[Black "KnightDreamer 3.3"]
[Result "1-0"]
[ECO "A04"]
[WhiteElo "2042"]
[BlackElo "2402"]
[Annotator "SF NNUE halfkp-256 090720 x64 bmi2 (40m)"]
[PlyCount "68"]
[EventDate "2020.??.??"]

{A04: Unusual lines after 1 Nf3 and King's Indian Attack} 1. b3 f5 2. Bb2 e6 3.
Nf3 Nf6 4. g3 b6 5. Bg2 Bb7 6. O-O Nc6 7. d3 {last book move} d5 (7... Qe7 $5
$11 {should be considered}) 8. c4 $16 Qd7 9. cxd5 exd5 10. a3 O-O-O 11. e3 Be7
(11... d4 12. Nxd4 h5 13. Nd2 $16) 12. b4 a6 13. Nbd2 Rhg8 (13... Kb8 14. Rc1
$16) 14. Nb3 $18 Ng4 15. Nbd4 Nxd4 16. Nxd4 Rde8 (16... g5 $142 $5 17. b5 Ne5
$16) 17. b5 $18 axb5 (17... a5 18. Qb3 Bf6 19. Rfc1 Bxd4 20. Bxd4 $18 (20. exd4
$2 f4 $17)) 18. Qb3 c6 (18... g5 {does not win a prize} 19. Nxb5 Rg6 20. h3 $18
) 19. Rfc1 (19. a4 $142 {makes it even easier for White} b4 20. a5 g5 $18)
19... Kb8 (19... Bc5 {no good, but what else?} 20. a4 bxa4 21. Qxa4 Kc7 22.
Rxc5 bxc5 23. Qa5+ Kd6 $18) 20. a4 b4 (20... Rc8 {doesn't get the bull off the
ice} 21. axb5 c5 22. Nc6+ Rxc6 23. Bxd5 $18) 21. a5 bxa5 (21... Bc5 {is no
salvation} 22. axb6 Qd6 23. Qa4 Bxd4 24. Bxd4 $18) 22. Rxa5 c5 (22... Rd8 {
cannot change destiny} 23. h3 Ne5 24. Nc2 $18) 23. Ne2 (23. Rcxc5 $142 {
secures the win} Bxc5 24. Rxc5 Re5 25. Qxb4 Rc8 $18) 23... Rc8 24. h3 Nf6 (
24... Bd8 {a fruitless try to alter the course of the game} 25. Raxc5 (25. hxg4
$6 {is a useless try} Bxa5 26. Nf4 Rge8 $14) 25... Rxc5 26. Rxc5 $18) 25. Be5+
Bd6 26. Bxf6 gxf6 27. Bxd5 Rg7 (27... Bxg3 {is not much help} 28. Bxg8 Bh4 29.
Bf7 $18) 28. d4 (28. Rca1 $142 {and White can celebrate victory} f4 29. Ra8+
Bxa8 30. Rxa8+ Kc7 31. Ra7+ Kd8 32. Rxd7+ Rxd7 33. exf4 Rb8 $18) 28... Bxg3 (
28... f4 {does not improve anything} 29. Rca1 fxg3 30. Ra8+ Kc7 31. R1a7 gxf2+
32. Kxf2 Qf5+ 33. Nf4 Rxa8 34. Rxb7+ Kd8 35. Rxg7 $18) 29. Bxb7 Bc7+ 30. Ng3
Bxa5 31. Bxc8 Qxc8 32. Rxc5 Qd8 33. Rxf5 Rg5 (33... Qb6 {doesn't get the cat
off the tree} 34. Rd5 Rg5 35. Rxg5 fxg5 36. Ne4 $18) 34. Rxg5 fxg5 1-0
[/pgn]

Just applied a general approach on attacking the black king and the chess engine failed to defend itself ....

Maybe Kasparov and Kramnnik tried to over complicate the position and got tactically blown out just like my two games against CT800 1.40 chess engine where I got crushed like a bug tactically ....

If I recall correctly,Kasparov accused IBM that he was actually playing against a team of grandmasters and asked to see the Deep Thought thinking lines and log files ....

Anyways,let's see what happens next but one thing is for sure:
I am not afraid like the FIDE top rated human chess player to play against chess engines ....

Cheers,
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
User avatar
Ovyron
Posts: 4562
Joined: Tue Jul 03, 2007 4:30 am

Re: which is more realistic in human terms, ccrl or cegt?

Post by Ovyron »

lkaufman wrote: Thu Jul 23, 2020 4:46 pm What we are missing?
That a big part of what makes these engines highly rated is that they're really good at exploiting weaker engines' weaknesses, but this doesn't help against humans. So someone like Wael comes along without those weaknesses and outperforms them.

Style is more important than strength when it comes to playing against humans, it's possible the very best engine against humans (from opening position, no handicap) would be one rated at around 3000 CCRL due to style. Engines above this rating would not do better against humans because all those 500 elo do is exploiting other engines' weaknesses, but those tricks are irrelevant.

I also suspect Wael will find an engine he may not be able to beat, or draw against, but it will not necessarily be highly rated, because it'd have weaknesses other engines have learned to exploit.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: which is more realistic in human terms, ccrl or cegt?

Post by lkaufman »

Dr.Wael Deeb wrote: Fri Jul 24, 2020 12:39 am
lkaufman wrote: Thu Jul 23, 2020 4:46 pm
Dr.Wael Deeb wrote: Thu Jul 23, 2020 8:06 am
carldaman wrote: Thu Jul 23, 2020 5:57 am
lkaufman wrote: Thu Jul 23, 2020 5:14 am
carldaman wrote: Thu Jul 23, 2020 3:57 am I think on average even the CCRL ratings are a little deflated in the 2100-2400 range, which makes CEGT too deflated.

I know some strong players that have opined that Romichess, for example, is FIDE master stength, at least - especially with learning enabled. You could try it on for size yourself, Larry. :)
But the CCRL ratings for Romichess (different versions) are mostly around 2400, which is IM standard; FM standard is 2300. So this would contradict what you are saying. We certainly don't want to consider programs with learning for this, since their ratings on the lists presumably turn off learning. I played a couple fast rapid games with Gaviota 0.80 (a old version, but that should be irrelevant as to the accuracy of its rating), and I just can't believe it would earn the GM title vs. humans, rating on CCRL is 2535. I don't doubt that later versions were GM strength, but not this one. I'll play other such engines as time permits, but due to my age I'm not a typical player for a given rating. I have the feeling that a typical 2200 or 2300 FIDE player now is a much stronger player than such a player was thirty years ago, but I can't really prove it. I don't think engines from 1990 would perform nearly as well today on the same hardware; everyone just knows so much more about chess now and has so much more practice, as well as knowing how to play vs engines.
Hi Larry,



I was mentioning others' impressions about Romichess, with an assessment of FM level at the very least, as a lower boundary. Quite possibly it's stronger, maybe IM, or weak GM, but it would be nice if that could be verified.

Anyway, one can certainly play Romi with learning turned off, but then you might run into some determinism issues.
I'll take on Romi when I reach it on the CCRL rating list with the learning turned on although I doubt it will benefit from this feature as it will be only 2 games ....

Cheers,
Dr.D
Unless you are actually of grandmaster (FIDE 2500+) strength, which I don't think you would claim to be, your results vs 2400 CCRL rated engines suggest that they are substantially overrated. Yet, an examination of the results of top players from the first few years of the current century suggests that engines not too far above this level are underrated. Something is very weird here. I'm trying to figure out just what it is. I know that the correlation between engines ratings and ratings vs. humans might not be super-high for individual engines, but I'm talking about general statements, such as what would be the average performance vs. humans of engines around any given CCRL (or CEGT) rating. How can you be beating 2400 CCRL rated engines (in general), while engines not too far above that were holding even with Kasparov and Kramnik long ago? What we are missing?
Hi Larry,

The most logical answer to this is that these are not 2400 chess engines compared to the human Elo rating pole .....
These chess engines are in no way close to the chess engines that played Kasparov or Kramnnik ....

Maybe Kasparov and Kramnnik tried to over complicate the position and got tactically blown out just like my two games against CT800 1.40 chess engine where I got crushed like a bug tactically ....

If I recall correctly,Kasparov accused IBM that he was actually playing against a team of grandmasters and asked to see the Deep Thought thinking lines and log files ....

Anyways,let's see what happens next but one thing is for sure:
I am not afraid like the FIDE top rated human chess player to play against chess engines ....

Cheers,
Dr.D
Kasparov vs IBM is not relevant, I'm talking about Kasparov and Kramnik vs. commercial chess engines from 2001 to 2006. Kasparov and Kramnik were no dummies; they were playing for huge sums of money, with months to train and the best advice available on how to optimize their chances. But I can agree with your statement "These chess engines are in no way close to the chess engines that played Kasparov or Kramnnik ....". Most of those engines are around 2700 on the CCRL list (except Fritz 10 which beat Kramnik 4-2 is higher), about 300 elo above the ones you are playing. I think you are saying that the difference is really more than 300 elo; it certainly must be more unless you are actually of GM level. But the conventional thinking is that engine rating lists OVERSTATE elo differences in human terms, the opposite of what we are seeing here. Kai Laskos has studied the question and estimated that engine rating differences need to be reduced by a third or more to predict human ratings, and I have pretty much said the same, only I've generally said a quarter. But I think that the evidence for this is based mostly on human vs. engine games played a long time ago. I can still remember when FIDE had a MINIMUM rating of 2200 (!) (for a while they reduced the minimum for women only to 2000!). So a FIDE 2200 was a weak player who just got lucky once or twice, or perhaps got the rating by some arbitrary rule based on playing in an Olympiad or Zonal. The difference between 2200 and 2700 was just enormous. Now you have to play quite well to get a 2200 rating, the rating differences among humans are much expanded. So my own theory is that the rating differences on the engine lists are now fully valid vs. humans, even if they were not valid twenty or thirty years ago. And since the CCRL ratings are contracted due to using BayesElo, the real difference between those 2400 CCRL engines and the ones the Ks played is actually more than 300 elo, maybe 350 or so (CEGT differences should be accurate, even if the level is not). We have no way to tell whether ratings like 3500 for Stockfish or Lc0 would be valid vs. humans; obviously the engines would need special code and opening books to avoid allowing easy draws when they have Black, and it would be too expensive to get a top GM to play a hundred game serious match to find out whether the human could score more than a couple draws in a hundred games. But I do have a very clear idea of the level of Rybka 2.3.2a and Rybka 3 in 2007 and 2008 as we played many matches under a wide variety of conditions vs. GMs, so if we can clearly establish the level of engines rating at say 2400 on the CCRL list in human terms, we can tell whether the "spread" is too low, too high, or just right. I much appreciate that you (Dr. Deeb) are playing these games, which opened my eyes to the weakness of such "2400" engines, but since you have no rating yourself you can't really do more than put an upper limit on their level. We need games by players of your level with known, established ratings.
Komodo rules!
User avatar
Ovyron
Posts: 4562
Joined: Tue Jul 03, 2007 4:30 am

Re: which is more realistic in human terms, ccrl or cegt?

Post by Ovyron »

lkaufman wrote: Fri Jul 24, 2020 6:27 am it would be too expensive to get a top GM to play a hundred game serious match to find out whether the human could score more than a couple draws in a hundred games.
Maybe we can switch the question to: how many games would the strongest human willing to play for free would need to get 1 draw against engines? This way we don't need any sponsor because the question itself only cares about those that don't charge for their services.
carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Re: which is more realistic in human terms, ccrl or cegt?

Post by carldaman »

Ovyron wrote: Fri Jul 24, 2020 2:03 am
I also suspect Wael will find an engine he may not be able to beat, or draw against, but it will not necessarily be highly rated, because it'd have weaknesses other engines have learned to exploit.
He already did, it's called CT800 1.40 which has a good anti-human style. It is rated 2400 just like the other engines Dr. Deeb beat, but he wasn't able to survive vs CT800. It is all about 'style-clash'.
Modern Times
Posts: 3806
Joined: Thu Jun 07, 2012 11:02 pm

Re: which is more realistic in human terms, ccrl or cegt?

Post by Modern Times »

lkaufman wrote: Fri Jul 24, 2020 6:27 am And since the CCRL ratings are contracted due to using BayesElo,
Or you could say that the lists that use Ordo are expanded compared to BayesElo. The two algorithms are different yes, the maths is out of my league, but I don't see one as better than the other. You just makr a choice. I recall for example Daniel Shawul preferring BayeElo.
carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Re: which is more realistic in human terms, ccrl or cegt?

Post by carldaman »

Given that the engines have to be tested using openings not of their own choosing would suggest that all rating lists are somewhat compressed, although I'd expect this effect to be more pronounced for stronger engines than weaker ones.
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: which is more realistic in human terms, ccrl or cegt?

Post by Dr.Wael Deeb »

Ovyron wrote: Fri Jul 24, 2020 2:03 am
lkaufman wrote: Thu Jul 23, 2020 4:46 pm What we are missing?
That a big part of what makes these engines highly rated is that they're really good at exploiting weaker engines' weaknesses, but this doesn't help against humans. So someone like Wael comes along without those weaknesses and outperforms them.

Style is more important than strength when it comes to playing against humans, it's possible the very best engine against humans (from opening position, no handicap) would be one rated at around 3000 CCRL due to style. Engines above this rating would not do better against humans because all those 500 elo do is exploiting other engines' weaknesses, but those tricks are irrelevant.

I also suspect Wael will find an engine he may not be able to beat, or draw against, but it will not necessarily be highly rated, because it'd have weaknesses other engines have learned to exploit.
Well explained ....

CT800 1.40 was able to beat me in the two games I played against it ....

It opens the positions and keep on mining tactical shots consistently until you're forced to make a weak move even with the time odd advantage you have and it's thank you for coming situation for the human ....

And yes,a lot of the 2400 Elo rated engines in the CCRL are getting their high rating simply by playing much weaker chess engines in a wild range Elo difference where the weak engines sink down the list ....

Cheers,
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….