REAL ENGINES ELO COMPARED TO HUMANS?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
towforce
Posts: 12514
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: REAL ENGINES ELO COMPARED TO HUMANS?

Post by towforce »

Chess computers have been improving for a long period of time. They first beat a GM at blitz in 1977 when Michael Stean famously called Chess 4.6 "bloody iron monster" when it caught him in a tactical trap. 20 years later, Deeper Blue beat Gary Kasparov under tournament conditions

Given that they have continued to improve throughout their entire history, 3600 elo seems reasonable to me.
Human chess is partly about tactics and strategy, but mostly about memory
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: REAL ENGINES ELO COMPARED TO HUMANS?

Post by mvanthoor »

And before anyone asks if chess is dead because of the enormous strength of the engines... no, it isn't.

Most "normal" people (below 2100 Elo) will never hit the point where opening preparation and memorization is necessary to become significantly stronger (except, to some extent, in club play or something), and the strength of a chess engine doesn't mean that there's no reason for humans to still play.

There's no reason to stop playing chess now that computers can play it better. Should you stop your weight-lifting training because a fork-lift exists?
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
mehmet123
Posts: 686
Joined: Sun Jan 26, 2020 10:38 pm
Location: Turkey
Full name: Mehmet Karaman

Re: REAL ENGINES ELO COMPARED TO HUMANS?

Post by mehmet123 »

Let's look at the results of Pocket Fritz 4. The performance of Pocket Fritz 4 was 2398 elo at Argentina 2008.
This is the last engine which had competed in a human tournament. The engine of Pocket Fritz 4 was Hiarcs 13.1.
In this tournament Hiarcs had played with 20 kn/s. At my notebook Hiarcs 13 searches 20x more position at 1 core according to Pocket Fritz 4.
The elo difference between Stockfish 13 and Hiarcs 13.1 is +828 elo according to Cegt 40/20 rating list.
https://en.chessbase.com/post/breakthro ... enos-aires

I think Rybka 1.2 ( released in 2006/06 ) was the first engine to reach the 3000 human elo.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: REAL ENGINES ELO COMPARED TO HUMANS?

Post by MikeB »

supersharp77 wrote: Tue May 25, 2021 9:46 pm 'Alex Chess' has Posed an Extremely Important Question Indeed....
AlexChess wrote: Tue May 25, 2021 2:01 pm REAL ENGINES ELO COMPARED TO HUMANS?

I have asked this question to the mythical Crafty's programmer Dr. Bob Hyatt 20 years ago :) https://www.chessprogramming.org/Robert_Hyatt , when best engines were 2400 ELO and GMs were easily winners against them. Now GM Hikaru Nakamura has asked 2 more pawns to accept a blitz match against Dragon 1.0 by Komodo https://www.chess.com/news/view/hikaru- ... ess-engine . Personally I think that 3600 ELO rating calculated by some famous rating lists (for Stockfish 13-dev) is an optiminstic evaluation even with a Ryzen 5900x with 2 or more Nvidia GeForce RTX 3060 graphic boards. I think that 3200 ELO BLITZ in more realistic for top engines, 3000 ELO on long time games.

What is your opinion?

Regards, AlexChess
A very important question..based on the performance ELO's in the engine tourneys I have run over the years (blitz mostly
5 min +5 sec etc) A 3200 performance rating is a Fantastic score indeed especially with such a crowded field of NNUE chess engines (lots of draws in the higher levels) A perfect score in such a tourney would produce a ELO "off the charts" (4000+?)
But rarely to never has that happened in the tons of engine tourneys I have observed and created...But you never ever know..and once I set these tourneys up "Strange Things Seem To Always Happen Even Among The BEST Engines"...
I agree With 'Alex Chess' 3200 (for blitz) is Exceptional... :) :wink:
Many people ( and I am one of them) believe that using Stockfish Limit Strength rating of 2850 is probably pretty close to human blitz rating of 2850 in game 3 min plus 2 increment. Full strength SF will outperform SF 2850 by 700 or so at least in game 3" 2". Now some of that difference testing, when testing similar or identical engines, the engine that sees more will outperform the true difference . BY how much ..who knows , but even if you assume 100 Elo , then you are sill talking Stockfish on single core is probably at least 3400 Human Blitz. I would tend to agree that standard chess would drop by 200 Elo or so - but the real answer is that we don't have the data . so who knows. Makes for good conversation anyway. Any rating scheme I have ever seen tend to over inflate the ratings over time when you have consistently have stronger engines ( or human players ) coming in. Most of the rating agencies will see that and try to correct for that at some point.
.
Editorial Comment - I define close as within 100 Elo or so either way in this context
Image
Vinvin
Posts: 5298
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: REAL ENGINES ELO COMPARED TO HUMANS?

Post by Vinvin »

Match humans vs microcomputers stopped 15 years ago when it was clear that microcomputers became stronger than best humans.

Some milestones from https://www.chessprogramming.org/Tourna ... nd_Matches :
GM John van der Wiel vs REBEL 2001 (2½ - 3½)
GM Loek van Wely vs REBEL, Maastricht February 2002 (2 - 2)
Christiansen vs Chessmaster 9000 September 2002 (1½ - 2½)
Kramnik vs Deep Fritz, Bahrain, October 2002 (4 - 4)
Bareev versus HIARCS 2003 (2 - 2)
Kasparov versus Deep Junior 2003 (3 - 3)
Kasparov versus X3D Fritz 2003 (2-2)
Kramnik versus Deep Fritz, Bonn December 2006 (2 - 4)

+ Man vs Machine Team Championship 2004 and 2005 : top GMs vs Junior and Fritz : 6 - 10
After 2006, only games with handicap has been played.

So, how would you know the real strength of the current engines ?
Something like 100 games with slow hardware for Stockfish, Dragon and Lc0 vs GMs over 2600 ? That will cost some money ...

There were some interesting experiences here : http://talkchess.com/forum3/viewtopic.p ... 66#p827666
jdart
Posts: 4406
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: REAL ENGINES ELO COMPARED TO HUMANS?

Post by jdart »

Arasan was matched by a 2500-level on GM on one of the chess servers not too long ago. He didn't win a single game. I don't even think he got a draw over a couple of dozen games.

So that at least a 2700+ level performance.

I don't know if Stockfish would be at the level of its 3500+ CEGT rating if put into the human rating pool, but it could be. Anyway it would be well above the top human GMs.

--Jon
lkaufman
Posts: 6259
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: REAL ENGINES ELO COMPARED TO HUMANS?

Post by lkaufman »

This is a question I've been very interested in for years, both personally and because we would like to be able to assign realistic human ratings for the Skill levels of Komodo Dragon at various time controls. I have a lot of data and some fairly clear conclusions about this, although I certainly don't claim to have all the answers and there is still plenty to learn from new tests. The post doesn't define "Elo" here, but I'll assume that what is meant is what FIDE rating the engines would earn against humans in the three categories rated by FIDE (standard, Rapid, and blitz). Here are some key points:
1. Engine vs engine rating lists exaggerate rating differences in human terms. Estimates by Kai Laskos, with which I agree, are that lists that use standard Elo (Ordo) such as CEGT should have rating differences multiplied by about 2/3 for human equivalence; those that use BayesElo (CCRL) should multiply differences by about 3/4 as the ratings are already contracted somewhat by BayesElo.
2. The level of the CCRL Rapid list is about a hundred elo lower than what these engines would earn running on the reference i7 hardware against comparable humans at 40 moves in two hours. That's based on the engines that actually played about evenly with top humans nearly two decades ago. Just look up results of these engines back then running on four cores, and compare with the single thread ratings on that CCRL list, since it is roughly fair to say that one core on the reference i7 is comparable to four cores on nearly twenty year old hardware. To be precise, I'm saying that 2700 on that list would get a 2800 FIDE rating today at 40/2 hours. Actually the old results suggest adding somewhat more than a hundred elo, but since today's players are more familiar with playing against computers I think adding just the hundred is a fair compromise. Now if we apply the 3/4 rule from step 1, we get that Stockfish 13 on one thread at 3508 would get a FIDE rating of 3306, and on four cores its 3546 would translate to 3335. Presumably on the very best hardware available, say 128 cores, it would be about 3400 FIDE. These numbers look and feel about right to me. When I worked on Rybka in 2008, after many matches under widely varying conditions with humans I concluded that Rybka 3 on my Octal (about like a quad i7) would get a FIDE rating of about 3000. The CCRL rating for this is 3118, which would be 3014 FIDE by my formula, extremely good agreement.
3. Rapid ratings for engines would of course be higher in human terms than Standard ratings; it is very clear that normal engines don't gain as much elo with more thinking time as humans do, at least up to 4 minutes or so per move average, roughly the upper limit beyond which humans feel fatigue if the game isn't adjourned. Based on all the data I've seen over three decades, I would estimate that the relative advantage of engines over humans at Rapid (15' + 10") vs. Standard (40 moves in 2 hours or equivalent) is in the 100 to 150 elo range, maybe 100 on human scale, 150 on Engine vs Engine Elo scale. So we just add 100 to my figures from paragraph 2 to get estimated engine ratings, so Stockfish 13 would be a bit over 3400 human Rapid on one thread, 3500 on monster hardware. We do have one recent datapoint here. Komodo 14 on 16 cores (3414 CCRL Rapid on 4 cores, perhaps near 3500 on 16) played ten games of ten minute chess with Hikaru Nakamura (FIDE Rapid 2829) and won them all. The 3500 CCRL est. rating predicts 3300 standard and 3400 Rapid in human terms for Komodo 14 on 16 cores. Now ten minute chess is on the edge between blitz and Rapid (chess.com calls it Rapid, FIDE would say Blitz), so it's not a perfect test, but anyway the result does not contradict the 3400 estimated Rapid rating. Also, the recent 6.5 to 1.5 win for Komodo Dragon (32 cores) giving two "big" (non-edge) pawns to Nakamura in Rapid (15' + 10", not blitz) actually suggests a higher rating than 3400, since each pawn is about 250 elo at this level.
4. Blitz ratings would be higher still, as humans perform much worse against the same engines in blitz than in Rapid. Here we should use the CCRL blitz list rather than the Rapid list, with 2700 now deemed to equal 3000 FIDE blitz (adding another 100 to the Rapid formula, a very conservative figure). On the CCRL blitz list Stockfish 13 on 8 cores/threads is 3720, which if we subtract 1020/4 and add 300 gives us an estimated FIDE Blitz rating of 3765, compared to the top human, Nakamura, at 2900. This gap implies that Nakamura would make one or two draws in a hundred games. Of course for Stockfish to actually achieve this score it would have to have an opening book that avoided obvious draws as Black, but that is easy to do if you settle for slightly worse but complex positions. I don't know of any long blitz matches between top engines and the top humans on even terms except the above 10 to 0 score for Komodo 14 vs Nakamura, perhaps someone else has seen other similar matches? But I do have significant data on the Komodo Skill levels and personalities playing blitz with Nakamura and other GMs, as well as estimated CCRL blitz ratings for those levels, and the results suggest that my formula may be too generous to the humans. Nakamura's results vs. the bots indicated that he would only be even with bots in the 2200 CCRL blitz neighborhood, maybe 2300 if these were serious matches instead of just shows. 2300 CCRL blitz translates to 2700 human blitz by the above formula, so with Nakamura at 2900 this may indeed be underrating the engines. Here it would be very useful if we have any blitz data on the servers against GMs by CCRL-rated engines which are actually close in strength with the human opponents they played. Can anyone supply results of any such matches?
Komodo rules!
User avatar
AlexChess
Posts: 1562
Joined: Sat Feb 06, 2021 8:06 am
Full name: Alex Morales

Re: REAL ENGINES ELO COMPARED TO HUMANS?

Post by AlexChess »

mvanthoor wrote: Tue May 25, 2021 10:51 pm And before anyone asks if chess is dead because of the enormous strength of the engines... no, it isn't.

Most "normal" people (below 2100 Elo) will never hit the point where opening preparation and memorization is necessary to become significantly stronger (except, to some extent, in club play or something), and the strength of a chess engine doesn't mean that there's no reason for humans to still play.

There's no reason to stop playing chess now that computers can play it better. Should you stop your weight-lifting training because a fork-lift exists?
Computer chess will be always useful to improve human play and "Kaissa" will never die :)
Chess engines and dedicated chess computers fan since 1981 :D macOS Sequoia 16GB-512GB, Windows 11 & Ubuntu ARM64.
ProteusSF Dev Forum
User avatar
AlexChess
Posts: 1562
Joined: Sat Feb 06, 2021 8:06 am
Full name: Alex Morales

Re: REAL ENGINES ELO COMPARED TO HUMANS?

Post by AlexChess »

towforce wrote: Tue May 25, 2021 10:44 pm Chess computers have been improving for a long period of time. They first beat a GM at blitz in 1977 when Michael Stean famously called Chess 4.6 "bloody iron monster" when it caught him in a tactical trap. 20 years later, Deeper Blue beat Gary Kasparov under tournament conditions

Given that they have continued to improve throughout their entire history, 3600 elo seems reasonable to me.
OK, but they must beat a GM regularly and on long time tournaments of more games. It would be nice if Magnus would accept a true match like Kasparov - Karpov 1987 :)
Chess engines and dedicated chess computers fan since 1981 :D macOS Sequoia 16GB-512GB, Windows 11 & Ubuntu ARM64.
ProteusSF Dev Forum
jr66
Posts: 47
Joined: Sun May 23, 2021 6:04 pm
Full name: Jacques Ress

Re: REAL ENGINES ELO COMPARED TO HUMANS?

Post by jr66 »

When you speak about engines ratings, what conclusion when you see for example this CCRL results please ?
https://ccrl.chessdom.com/ccrl/4040/cgi ... 4-bit_4CPU
Confirmation FF2 is a SF 13 clone and Dragon perhaps less strong but what else ?
Do Carlsen and Caruana for example play often with no GM players in tournaments ?
Just for say i really don't understand this engines rating lists sorry....
IM ICCF player