Carlsen vs. CCRL 2850 engines in Rapid?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

amanjpro
Posts: 883
Joined: Sat Mar 13, 2021 1:47 am
Full name: Amanj Sherwany

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by amanjpro »

lkaufman wrote: Sun Aug 22, 2021 7:55 pm
Cornfed wrote: Sun Aug 22, 2021 6:21 pm Magnus is a modern Lasker in a way - more often than some other players, when he wants to play for a win, he plays the move most likely to cause his opponent problems, not always the 'objectively best' move.
I agree with the statement, but how does it affect the predicted result of this topic? Since 2850 engines/levels have various weaknesses, I would expect Carlsen would be the best at figuring out how to modify his play to set problems for those crippled or (relatively) weak engines, just as he does for human opponents.
I believe what he means, is that: we cannot take his games, and analyze them using SF and give them scores, which I agree with. This takes away all the human aspect of the game (undesrtand the psychology, weaknesses of the opponent, and play on them)
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by lkaufman »

amanjpro wrote: Sun Aug 22, 2021 8:04 pm
lkaufman wrote: Sun Aug 22, 2021 7:55 pm
Cornfed wrote: Sun Aug 22, 2021 6:21 pm Magnus is a modern Lasker in a way - more often than some other players, when he wants to play for a win, he plays the move most likely to cause his opponent problems, not always the 'objectively best' move.
I agree with the statement, but how does it affect the predicted result of this topic? Since 2850 engines/levels have various weaknesses, I would expect Carlsen would be the best at figuring out how to modify his play to set problems for those crippled or (relatively) weak engines, just as he does for human opponents.
I believe what he means, is that: we cannot take his games, and analyze them using SF and give them scores, which I agree with. This takes away all the human aspect of the game (undesrtand the psychology, weaknesses of the opponent, and play on them)
Good point; in general, comparing human moves with engine moves in this way will always favor the engine, since almost all human players consider factors other than the position in their decisions, such as the opponent, the tournament situation, and clock times.
I've run more games of Skill level 22 vs CCRL engines in Rapid and the rating dropped to 2293, which implies adding 165 to convert from CCRL Rapid to FIDE for Rapid play with humans (based on the 8 games by Jorge Sammour). This feels more reasonable to me than earlier estimates. The 46 elo figure from Ferdy's analysis sounds way too low; perhaps due to the reason in this thread or to something about the methodology.
I also took a look at the SSDF rating list, which I think is a pretty good estimator of how engines would rate at 2 hours + 30" vs FIDE rated humans. The CCRL ratings are maybe about 80 or so elo lower on comparable hardware (this requires a judgment call, as they use different hardware) in the GM level range. If that is correct then adding 165 to CCRL engines for estimated results vs. humans at 15' + 10" seems reasonable, since engines do perform noticeably better (perhaps nearly 100 elo better) at Rapid vs humans. So for now I'll go with this +165 elo figure as the conversion, at least in the human GM range.
Komodo rules!
Cornfed
Posts: 511
Joined: Sun Apr 26, 2020 11:40 pm
Full name: Brian D. Smith

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by Cornfed »

amanjpro wrote: Sun Aug 22, 2021 8:04 pm
I believe what he means, is.....)
Bingo! Sorry, I was headed out for my afternoon walk/run and did not make myself terribly clear. :oops:
Ferdy
Posts: 4853
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by Ferdy »

Ferdy wrote: Sun Aug 22, 2021 5:43 am
lkaufman wrote: Sun Aug 22, 2021 3:21 am
Ferdy wrote: Sun Aug 22, 2021 1:01 am
lkaufman wrote: Mon Aug 16, 2021 8:21 pm How would Magnus Carlsen (FIDE Classical and Rapid ratings both close to 2850) perform in a serious Rapid (15' + 10") match with engines rated around 2850 on the CCRL Rapid (40/15) rating list (assuming they used a good but varied modern opening book), and ran on the reference i7 computer used by that list? Some example engines running on just one thread include Deep Shredder 11, Stockfish 1.4 (not 14!), Minic 1.39, Fritz 11, Toga II 3.0, Naum 3, Gaviota 1.0, and Arasan 18.0. I realize that there is probably no data on this exactly, opinions must be based on results of engines against other top players with a bit of extrapolation. We know that the rating lists spread out the ratings relative to human scale, but I'm trying to determine at least whether this list is accurate relative to FIDE ratings of the best human player. What do you think?
Here is one method to estimate the perf of Magnus.
1. Get some positions where Magnus is to move from the rapid games.
2. Evaluate those positions with a strong engine like SF.
3. If engine best move is not the same with Magnus's move, calculate the score of Magnus's move using SF. Get the rating difference from the score difference. If move is the same error is zero and rating difference is zero.
4. Do the same to a 2850 CCRL engine say engine1. Let it analyze all those Magnus's positions. Use SF to get the rating difference.

At the start Magnus will take a start rating, if there is error as determined by SF, get the rating difference and update magnus rating for that position.
Example:
start rating = 3000
pos: 1, Magnus move: Qg5, Magnus move score: -158, SF move: 0-0, SF move score: -136, error: -136 - (-158) or 22, rating diff: 10
Magnus perf rating: 3000-10 or 2990
pos: 2, Magnus and SF moves are the same,
Magnus perf rating: 2990 (just take the last perf rating since rating diff is zero)
pos: 3 ...
...

After a game take Magnus's average perf rating.

Do the same for engine1. Since engine1 has no move yet, let it analyze the position from Magnus. Compare it with SF too and calculate the rating difference and finally the average rating.

You can now compare the average rating of Magnus and engine1 from the positions of Magnus.

I took 6 games from skilling open prelim where Magnus played, 2 wins, 2 loses and 2 draws. Use Cheese 2.2 around rapid CCRL 2850 as engine1. engine1 is set to analyze at 20s/pos around TC 40/15m CCRL hardware, single core.

Main engine is SF14 set at 3496 (CCRL 40/15). Magnus and Cheese start at 3496 and update the perf rating move by move.

Code: Select all

              name  games  rating
            Cheese      6    3341
   Carlsen, Magnus      6    3295
Calculate the expected score given the rating diff of 3341-3295 or 46.

Code: Select all

Expected scores:
   name  expscore
 Cheese     0.566
Carlsen     0.434

Code: Select all

scores in 12 games:
   name  score
 Cheese    7.0
Carlsen    5.0
Sample log:

Code: Select all

game: 4
pos: r1bq1rk1/pp1n1ppp/2pbpn2/3p4/2PP4/2N1PN2/PPQ1BPPP/R1B2RK1 b - - 6 8
main engine: Stockfish 14, bm: dxc4, score: -25, depth: 26
player: Carlsen, Magnus, bm: b6, score: -51
test engine: Cheese 2.2 64 bits, bm: e5, score: -53
test engine error: 28, player error: 26
Cheese 2.2 64 bits perf: 3484, Carlsen, Magnus perf: 3485

pos: r1bq1rk1/p2n1ppp/1ppbpn2/3p4/2PPP3/2N2N2/PPQ1BPPP/R1B2RK1 b - - 0 9
main engine: Stockfish 14, bm: Nxe4, score: -44, depth: 32
player: Carlsen, Magnus, bm: Nxe4, score: -44
test engine: Cheese 2.2 64 bits, bm: dxe4, score: -54
test engine error: 10, player error: 0
Cheese 2.2 64 bits perf: 3479, Carlsen, Magnus perf: 3485

pos: r1bq1rk1/p2n1ppp/1ppbp3/3p4/2PPN3/5N2/PPQ1BPPP/R1B2RK1 b - - 0 10
main engine: Stockfish 14, bm: dxe4, score: -46, depth: 33
player: Carlsen, Magnus, bm: dxe4, score: -46
test engine: Cheese 2.2 64 bits, bm: dxe4, score: -46
test engine error: 0, player error: 0
Cheese 2.2 64 bits perf: 3479, Carlsen, Magnus perf: 3485
This sounds promising, but the estimated ratings of 3295 and 3341 for 2850 players are ridiculous, which makes the difference suspect as well. What is wrong here, or do I misinterpret something?
It is applied to measure the strength difference between players (46 in this case). One may use a different start rating, but the rating difference is still the same. Just ignore the resulting 3295 and 3341 those figures will change if the start rating of 3496 will be changed.
An application of this method is by finding the rating difference between players through move by move engine analysis (not based on game result) with low move numbers given more weight than higher move numbers. The example below is only depth 12, +4 in opening and ending, higher is better.

Code: Select all

minimum move: 8, score range: +/-1500
Skilling Open 2020 rating list according to sf14 @analysis of depth 12 set at 2800
 rank                    name  games  points  rating
    1        Nakamura, Hikaru     15     9.0    2699
    2         Carlsen, Magnus     15     9.0    2684
    3             Ding, Liren     15     7.5    2680
    4 Vachier-Lagrave, Maxime     15     8.0    2669
    5              So, Wesley     15     8.5    2666
    6          Aronian, Levon     15     8.5    2658
    7             Giri, Anish     15     8.0    2638
    8          Le, Quang Liem     15     8.0    2637
    9          Svidler, Peter     15     6.0    2636
   10       Radjabov, Teimour     15     8.0    2634
   11     Duda, Jan-Krzysztof     15     4.5    2627
   12 Vidit, Santosh Gujrathi     15     6.5    2617
   13       Firouzja, Alireza     15     8.0    2605
   14        Karjakin, Sergey     15     5.5    2592
   15     Nepomniachtchi, Ian     15     8.5    2586
   16   Anton Guijarro, David     15     6.5    2583
That start rating of 2800 is not that important, we can start at other anchor rating. What is important is the rating difference.

Here Hikaru is better than Magnus by 2699-2684 or 15 Elo.

In the example of Magnus and Chesse comparison, it is only one-sided that is we let Cheese analyze the positions from Magnus, a missing data here is we should have also let Magnus evaluate the positions from Cheese.

A nice application to this method is comparing Fischer and Karpov strength according to say SF14 or other engine. Select those games where Fischer and Karpov have common opponent. Fischer-Spassky and Karpov-Spassky for example. Then analyze it with engine, no prejudice just pure engine evaluation.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by lkaufman »

Ferdy wrote: Mon Aug 23, 2021 5:24 am
Ferdy wrote: Sun Aug 22, 2021 5:43 am
lkaufman wrote: Sun Aug 22, 2021 3:21 am
Ferdy wrote: Sun Aug 22, 2021 1:01 am
lkaufman wrote: Mon Aug 16, 2021 8:21 pm How would Magnus Carlsen (FIDE Classical and Rapid ratings both close to 2850) perform in a serious Rapid (15' + 10") match with engines rated around 2850 on the CCRL Rapid (40/15) rating list (assuming they used a good but varied modern opening book), and ran on the reference i7 computer used by that list? Some example engines running on just one thread include Deep Shredder 11, Stockfish 1.4 (not 14!), Minic 1.39, Fritz 11, Toga II 3.0, Naum 3, Gaviota 1.0, and Arasan 18.0. I realize that there is probably no data on this exactly, opinions must be based on results of engines against other top players with a bit of extrapolation. We know that the rating lists spread out the ratings relative to human scale, but I'm trying to determine at least whether this list is accurate relative to FIDE ratings of the best human player. What do you think?
Here is one method to estimate the perf of Magnus.
1. Get some positions where Magnus is to move from the rapid games.
2. Evaluate those positions with a strong engine like SF.
3. If engine best move is not the same with Magnus's move, calculate the score of Magnus's move using SF. Get the rating difference from the score difference. If move is the same error is zero and rating difference is zero.
4. Do the same to a 2850 CCRL engine say engine1. Let it analyze all those Magnus's positions. Use SF to get the rating difference.

At the start Magnus will take a start rating, if there is error as determined by SF, get the rating difference and update magnus rating for that position.
Example:
start rating = 3000
pos: 1, Magnus move: Qg5, Magnus move score: -158, SF move: 0-0, SF move score: -136, error: -136 - (-158) or 22, rating diff: 10
Magnus perf rating: 3000-10 or 2990
pos: 2, Magnus and SF moves are the same,
Magnus perf rating: 2990 (just take the last perf rating since rating diff is zero)
pos: 3 ...
...

After a game take Magnus's average perf rating.

Do the same for engine1. Since engine1 has no move yet, let it analyze the position from Magnus. Compare it with SF too and calculate the rating difference and finally the average rating.

You can now compare the average rating of Magnus and engine1 from the positions of Magnus.

I took 6 games from skilling open prelim where Magnus played, 2 wins, 2 loses and 2 draws. Use Cheese 2.2 around rapid CCRL 2850 as engine1. engine1 is set to analyze at 20s/pos around TC 40/15m CCRL hardware, single core.

Main engine is SF14 set at 3496 (CCRL 40/15). Magnus and Cheese start at 3496 and update the perf rating move by move.

Code: Select all

              name  games  rating
            Cheese      6    3341
   Carlsen, Magnus      6    3295
Calculate the expected score given the rating diff of 3341-3295 or 46.

Code: Select all

Expected scores:
   name  expscore
 Cheese     0.566
Carlsen     0.434

Code: Select all

scores in 12 games:
   name  score
 Cheese    7.0
Carlsen    5.0
Sample log:

Code: Select all

game: 4
pos: r1bq1rk1/pp1n1ppp/2pbpn2/3p4/2PP4/2N1PN2/PPQ1BPPP/R1B2RK1 b - - 6 8
main engine: Stockfish 14, bm: dxc4, score: -25, depth: 26
player: Carlsen, Magnus, bm: b6, score: -51
test engine: Cheese 2.2 64 bits, bm: e5, score: -53
test engine error: 28, player error: 26
Cheese 2.2 64 bits perf: 3484, Carlsen, Magnus perf: 3485

pos: r1bq1rk1/p2n1ppp/1ppbpn2/3p4/2PPP3/2N2N2/PPQ1BPPP/R1B2RK1 b - - 0 9
main engine: Stockfish 14, bm: Nxe4, score: -44, depth: 32
player: Carlsen, Magnus, bm: Nxe4, score: -44
test engine: Cheese 2.2 64 bits, bm: dxe4, score: -54
test engine error: 10, player error: 0
Cheese 2.2 64 bits perf: 3479, Carlsen, Magnus perf: 3485

pos: r1bq1rk1/p2n1ppp/1ppbp3/3p4/2PPN3/5N2/PPQ1BPPP/R1B2RK1 b - - 0 10
main engine: Stockfish 14, bm: dxe4, score: -46, depth: 33
player: Carlsen, Magnus, bm: dxe4, score: -46
test engine: Cheese 2.2 64 bits, bm: dxe4, score: -46
test engine error: 0, player error: 0
Cheese 2.2 64 bits perf: 3479, Carlsen, Magnus perf: 3485
This sounds promising, but the estimated ratings of 3295 and 3341 for 2850 players are ridiculous, which makes the difference suspect as well. What is wrong here, or do I misinterpret something?
It is applied to measure the strength difference between players (46 in this case). One may use a different start rating, but the rating difference is still the same. Just ignore the resulting 3295 and 3341 those figures will change if the start rating of 3496 will be changed.
An application of this method is by finding the rating difference between players through move by move engine analysis (not based on game result) with low move numbers given more weight than higher move numbers. The example below is only depth 12, +4 in opening and ending, higher is better.

Code: Select all

minimum move: 8, score range: +/-1500
Skilling Open 2020 rating list according to sf14 @analysis of depth 12 set at 2800
 rank                    name  games  points  rating
    1        Nakamura, Hikaru     15     9.0    2699
    2         Carlsen, Magnus     15     9.0    2684
    3             Ding, Liren     15     7.5    2680
    4 Vachier-Lagrave, Maxime     15     8.0    2669
    5              So, Wesley     15     8.5    2666
    6          Aronian, Levon     15     8.5    2658
    7             Giri, Anish     15     8.0    2638
    8          Le, Quang Liem     15     8.0    2637
    9          Svidler, Peter     15     6.0    2636
   10       Radjabov, Teimour     15     8.0    2634
   11     Duda, Jan-Krzysztof     15     4.5    2627
   12 Vidit, Santosh Gujrathi     15     6.5    2617
   13       Firouzja, Alireza     15     8.0    2605
   14        Karjakin, Sergey     15     5.5    2592
   15     Nepomniachtchi, Ian     15     8.5    2586
   16   Anton Guijarro, David     15     6.5    2583
That start rating of 2800 is not that important, we can start at other anchor rating. What is important is the rating difference.

Here Hikaru is better than Magnus by 2699-2684 or 15 Elo.

In the example of Magnus and Chesse comparison, it is only one-sided that is we let Cheese analyze the positions from Magnus, a missing data here is we should have also let Magnus evaluate the positions from Cheese.

A nice application to this method is comparing Fischer and Karpov strength according to say SF14 or other engine. Select those games where Fischer and Karpov have common opponent. Fischer-Spassky and Karpov-Spassky for example. Then analyze it with engine, no prejudice just pure engine evaluation.
Yes, that's a very nice idea to compare games against the same opponents (or against each other). Most attempts in the past to compare players by the size of their error (relative to strong engine) ignored the opposition, which was a serious flaw. There is still a problem in that the player who plays more boring openings will tend to show smaller errors, but at least the common opponent minimizes this as the opponent has some say in whether the opening will be boring or complex. The above example is pretty impressive; only one player in the first eight by the method scored above 8, and only one in the last eight scored above 8! The results suggest that Nepo will have no chance against Carlsen! But as you indicate, the method is not useful for our purpose here, as the engine and the humans lack common opponents. It would be very interesting to compare Fischer with Karpov by this method (as you suggest), as well as Kasparov with Carlsen, Botvinnik with Alekhine, Morphy with Steinitz, and other similar matches between top players fairly close in time who never played matches with each other. I suppose you can ever compare players of widely different time periods by working your way back (Carlsen to Kasparov, Kasparov to Karpov, Karpov to Fischer, Fischer to Botvinnik, ...back to Morphy.
I do wonder why Stockfish was only allowed to search to depth 12 for the above analysis. Of course SF depth 12 is stronger than the Rapid play of these top human players in general, but in some positions the humans will find better moves than SF depth 12 (especially in the endgame), so I would think a higher depth would be better, though no point in going super deep since beyond maybe 20 plies or so the human will almost never see more than Stockfish.
Komodo rules!
Ferdy
Posts: 4853
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by Ferdy »

lkaufman wrote: Mon Aug 23, 2021 6:01 am I do wonder why Stockfish was only allowed to search to depth 12 for the above analysis. Of course SF depth 12 is stronger than the Rapid play of these top human players in general, but in some positions the humans will find better moves than SF depth 12 (especially in the endgame), so I would think a higher depth would be better, though no point in going super deep since beyond maybe 20 plies or so the human will almost never see more than Stockfish.
That depth 12 is only a demo. Higher depth is always better of course.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by mvanthoor »

lkaufman wrote: Mon Aug 16, 2021 8:21 pm How would Magnus Carlsen (FIDE Classical and Rapid ratings both close to 2850) perform in a serious Rapid (15' + 10") match with engines rated around 2850 on the CCRL Rapid (40/15) rating list (assuming they used a good but varied modern opening book), and ran on the reference i7 computer used by that list? Some example engines running on just one thread include Deep Shredder 11, Stockfish 1.4 (not 14!), Minic 1.39, Fritz 11, Toga II 3.0, Naum 3, Gaviota 1.0, and Arasan 18.0. I realize that there is probably no data on this exactly, opinions must be based on results of engines against other top players with a bit of extrapolation. We know that the rating lists spread out the ratings relative to human scale, but I'm trying to determine at least whether this list is accurate relative to FIDE ratings of the best human player. What do you think?
Personally, I think it would be great if we could have a bunch some classical and rapid tournaments between engines in the 2650 - 2850 range, and grandmasters in the same range. It would be even better to have tournaments across the range starting at 1500 ELO. That would give us the opportunity to calibrate CCRL to the FIDE list, and give the engines "real" FIDE ratings.

The ratings could be unrated for the players, because it is not yet known if a 2700 CCRL engine is acutally a good match for a 2700 FIDE grandmaster.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by lkaufman »

mvanthoor wrote: Sat Sep 11, 2021 1:04 am
lkaufman wrote: Mon Aug 16, 2021 8:21 pm How would Magnus Carlsen (FIDE Classical and Rapid ratings both close to 2850) perform in a serious Rapid (15' + 10") match with engines rated around 2850 on the CCRL Rapid (40/15) rating list (assuming they used a good but varied modern opening book), and ran on the reference i7 computer used by that list? Some example engines running on just one thread include Deep Shredder 11, Stockfish 1.4 (not 14!), Minic 1.39, Fritz 11, Toga II 3.0, Naum 3, Gaviota 1.0, and Arasan 18.0. I realize that there is probably no data on this exactly, opinions must be based on results of engines against other top players with a bit of extrapolation. We know that the rating lists spread out the ratings relative to human scale, but I'm trying to determine at least whether this list is accurate relative to FIDE ratings of the best human player. What do you think?
Personally, I think it would be great if we could have a bunch some classical and rapid tournaments between engines in the 2650 - 2850 range, and grandmasters in the same range. It would be even better to have tournaments across the range starting at 1500 ELO. That would give us the opportunity to calibrate CCRL to the FIDE list, and give the engines "real" FIDE ratings.

The ratings could be unrated for the players, because it is not yet known if a 2700 CCRL engine is acutally a good match for a 2700 FIDE grandmaster.
That would be great, but who would sponsor it? Those engines are almost all free, they have no commercial incentive to sponsor. Of course if FIDE (or national organization) wanted to rate the events for the humans, the engines would count based on their performance rating in the event, not on the CCRL ratings which are way below "parity" with FIDE. Based on my analysis of the data supplied here recently, I conclude that the CCRL Rapid ratings of engines in the human master/gm range are roughly an indication of the ratings they would get if they played (on the reference hardware) at Rapid (15' + 10") time control, no Ponder, while the human played at a standard classical time control like two hours plus 30" increment.
Komodo rules!
User avatar
AdminX
Posts: 6384
Joined: Mon Mar 13, 2006 2:34 pm
Location: Acworth, GA

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by AdminX »

lkaufman wrote: Sat Sep 11, 2021 2:18 am
mvanthoor wrote: Sat Sep 11, 2021 1:04 am
lkaufman wrote: Mon Aug 16, 2021 8:21 pm How would Magnus Carlsen (FIDE Classical and Rapid ratings both close to 2850) perform in a serious Rapid (15' + 10") match with engines rated around 2850 on the CCRL Rapid (40/15) rating list (assuming they used a good but varied modern opening book), and ran on the reference i7 computer used by that list? Some example engines running on just one thread include Deep Shredder 11, Stockfish 1.4 (not 14!), Minic 1.39, Fritz 11, Toga II 3.0, Naum 3, Gaviota 1.0, and Arasan 18.0. I realize that there is probably no data on this exactly, opinions must be based on results of engines against other top players with a bit of extrapolation. We know that the rating lists spread out the ratings relative to human scale, but I'm trying to determine at least whether this list is accurate relative to FIDE ratings of the best human player. What do you think?
Personally, I think it would be great if we could have a bunch some classical and rapid tournaments between engines in the 2650 - 2850 range, and grandmasters in the same range. It would be even better to have tournaments across the range starting at 1500 ELO. That would give us the opportunity to calibrate CCRL to the FIDE list, and give the engines "real" FIDE ratings.

The ratings could be unrated for the players, because it is not yet known if a 2700 CCRL engine is acutally a good match for a 2700 FIDE grandmaster.
That would be great, but who would sponsor it? Those engines are almost all free, they have no commercial incentive to sponsor. Of course if FIDE (or national organization) wanted to rate the events for the humans, the engines would count based on their performance rating in the event, not on the CCRL ratings which are way below "parity" with FIDE. Based on my analysis of the data supplied here recently, I conclude that the CCRL Rapid ratings of engines in the human master/gm range are roughly an indication of the ratings they would get if they played (on the reference hardware) at Rapid (15' + 10") time control, no Ponder, while the human played at a standard classical time control like two hours plus 30" increment.
Could the funding be crowd sourced?
"Good decisions come from experience, and experience comes from bad decisions."
__________________________________________________________________
Ted Summers
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by mvanthoor »

lkaufman wrote: Sat Sep 11, 2021 2:18 am That would be great, but who would sponsor it? Those engines are almost all free, they have no commercial incentive to sponsor. Of course if FIDE (or national organization) wanted to rate the events for the humans, the engines would count based on their performance rating in the event, not on the CCRL ratings which are way below "parity" with FIDE. Based on my analysis of the data supplied here recently, I conclude that the CCRL Rapid ratings of engines in the human master/gm range are roughly an indication of the ratings they would get if they played (on the reference hardware) at Rapid (15' + 10") time control, no Ponder, while the human played at a standard classical time control like two hours plus 30" increment.
Is there some way to somehow correlate the CCRL ratings to the FIDE list, using the old games from the late 90's to mid-2000?

I remember that someone once posted a formula after some analysis:

FIDE = CCRL * 0.7 + 840

This would make 2800 FIDE equal to 2800 CCRL. I assume that this is not a coincidence. This formula would put the current version of my own engine, which is CCRL 1865, at 2145. If this is even roughly in the ballpark, it's no wonder I don't stand a chance at winning against it, and only drawing occasionally. It's rated at least 145 points higher than I am (ever was, more correctly, in my teens; nowadays I don't play tournaments so I don't have a rating).

I'd love to have a CCRL list that would be calibrated to the FIDE list by Human - Engine tournaments. In the 80's and 90's we had the Aegon tournaments, but only the very strongest dedicated computers where somewhat interesting.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL