How close to Human FIDE rating are the engine ratings?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Chessqueen
Posts: 5588
Joined: Wed Sep 05, 2018 2:16 am
Location: Moving
Full name: Jorge Picado

How close to Human FIDE rating are the engine ratings?

Post by Chessqueen »

Has anybody done any comparison of the engines rating to human GM ratings, for instance if you pick an engine rated 2010 does it really beat a human that is FIDE 1975, or any engine rated 2750 when is pitted to a human GM rated around 2750 etc....?

PS: And of course this would be using the same computer used to test and assign those engines ratings. What we have done worng for a long long time is to give an x engine a specific rating and then arrangeama tch like Kramnik vs DeepFritz or Vishy Vs Rebel 10 using the most powerful computer at that time, instead of using the same computer which the engine was rated so we can make comparison:roll:
Who is 17 years old GM Gukesh 2nd at the Candidate in Toronto?
https://indianexpress.com/article/sport ... t-9281394/
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: How close to Human FIDE rating are the engine ratings?

Post by Ovyron »

Chessqueen wrote: Fri Jun 19, 2020 5:15 am for instance if you pick an engine rated 2010 does it really beat a human that is FIDE 1975, or any engine rated 2750 when is pitted to a human GM rated around 2750 etc....?
Ratings are given arbitrarily for their own pools, you can't really compare two different pools to make predictions. That is, you can see that a CCRL engine at the top has a rating of 3500, but this number means nothing, only that you can make predictions around nearby opponents to predict performance (ELO breaks around 400 points of difference, so with 3500 elo engine would have... who knows what performance against a 3100 elo engine. That is, the 3500 elo engine would win most of its games, but you'd not know when the 3100 one gets a draw, or how often.)

Up until the last on the list at around 1400 elo. What I mean about "elo means nothing", I mean that all this would work exactly the same if the top one showed 2500 and the very bottom one 400, or if the top was 4500 and the bottom 2400. The only thing that matters is the difference of elo between engines on this pool.

So they could have started with the weakest one at 0, and the top one was 2100. Then your question for that list would be: "for instance if you pick an engine rated 610 does it really beat a human that is FIDE 1975, or any engine rated 1350 when is pitted to a human GM rated around 2750 etc...." - see how the numbers no longer match?

Generally, the pools are close enough that there's some engine with a correct rating in its performance against humans, above this engine the ratings "collapse." That is, engines above this one will not show a performance as high as the engine pool shows, and the higher the rating, the higher the collapse.

Below this engine the ratings "expand", that is, engines below this one will show a higher performance against humans than in the elo pool.

This is related to human blunders. When an engine makes a blunder, it's possible they'll still be able to defend it, when an human does it, engines would punish them perfectly, and the human wouldn't be able to recover.

That's why more important than the ratings on the elo pools is the engine's playing style. An engine can aim for very tricky positions where it's very hard for the human to avoid blundering, and this engine will perform against the human much better than an engine that plays normally, even though all this trickery will be useless against other engines and such a setting may just lose hundreds of elo.

This was the case with the Thinker engine and its Active setting, which was much lower rated than default, but was the best against humans. It's curious that the default setting became known as "Inert", even though the engine's playing style was the complex opposite, with the engine playing with a crazy and dauntless style that would eclipse any opponent.

To answer your question, comparisons have been made, but this has been done with very limited data between humans and engines. The severity of the rating collapse isn't known, but even in the worst case where a 3500 elo engine loses 500 rating points because of it, it's still going to perform like a 3000 elo entity against humans.

The main obstacles in all this:
  • The number of games required. You could get a 2600 elo human to play a 2600 elo engine for 20 games, and get something. 20 games are statistically insignificant, so you could get something that leads to the opposite of the truth.
  • Strong humans are pricey. You'd need big sponsorship to play the required games for statistical significance.
My suggestion: Find the strongest human that agrees to play for free. Match it against an engine of the elo you'd expect to get some 50% performance. Play 1000 games. See who under-performs. That'd be a start.
User avatar
M ANSARI
Posts: 3707
Joined: Thu Mar 16, 2006 7:10 pm

Re: How close to Human FIDE rating are the engine ratings?

Post by M ANSARI »

I watched Hikaru try that with Komodo at rapid time controls and it was really no contest. I also watched MVL try that with SF I think, and again they are just not even close, especially with Lc0 now. Lc0 removes the last possible angle where a human could possibly try and remove tactics from a position and close it up and use long term outmaneuvers behind a pawn chain to fight back. I think at slower time controls things no longer favor the human as there is less chance for the engine to blunder. I guess you might find an engine fall for some long sharp line that was studied by the human with computer assistance and thus gives him an overwhelming position, but those are very uncommon. I did see Penguin win a game where Lc0 somehow blundered and hung a full queen, but that was at bullet time controls and for the one game he won he lost maybe 20x that many in a row. This is becoming more like comparing a race car to a human sprinter ... sure the car can blow a tire or engine and lose the race, but it is not the norm. I would say that SF or Lc0 on reasonable hardware is probably about 400 to 600 ELO stronger than the strongest human and that is being conservative.
MonteCarlo
Posts: 188
Joined: Sun Dec 25, 2016 4:59 pm

Re: How close to Human FIDE rating are the engine ratings?

Post by MonteCarlo »

Unless there was some other Tang-Leela contest of which I'm unaware, the game that Tang won against Leela was VERY early in the project. The net he played against wasn't even close to the strength of the current nets (for reference, the game was played in April 2018; Leela fell for a one move discovery, to which it was especially prone at the time).
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: How close to Human FIDE rating are the engine ratings?

Post by Ovyron »

M ANSARI wrote: Fri Jun 19, 2020 9:35 am I watched Hikaru try that with Komodo at rapid time controls and it was really no contest. I also watched MVL try that with SF I think, and again they are just not even close
What I'm proposing is going against weaker software. I think RomiChess P3n would be a fair candidate against a human. We can measure human performance against that and calculate the elo difference between the human and the engine pool.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: How close to Human FIDE rating are the engine ratings?

Post by corres »

Chessqueen wrote: Fri Jun 19, 2020 5:15 am Has anybody done any comparison of the engines rating to human GM ratings, for instance if you pick an engine rated 2010 does it really beat a human that is FIDE 1975, or any engine rated 2750 when is pitted to a human GM rated around 2750 etc....?
PS: And of course this would be using the same computer used to test and assign those engines ratings. What we have done worng for a long long time is to give an x engine a specific rating and then arrangeama tch like Kramnik vs DeepFritz or Vishy Vs Rebel 10 using the most powerful computer at that time, instead of using the same computer which the engine was rated so we can make comparison:roll:
In the 1990`years was the last time when chess engines (mainly chess machines) participated in human FIDE competitions. At that time the SSDF rating list reflect relative well the Elo of chess engines/chess machines.
For this the SSDF testers made correction the base Elo value used by them.
Between FIDE GMs and chess engines/machines hold mainly premiere competitions consisted in only some game.
The main aim was the advertisement of a new engine, typically ChessBase Fritz products and chess machines (like Mephistos) from Hegener-Glaser firm. The matches between Kasparov and Deep Though and Deep Blue, respectively belonged to this series, but in these cases Kasparov was the advertisement-face of the super computer of IBM.
Nowadays for premier of Komodo is used games between famous GMs and Komodo.
On the Wiki you can find an SSDF list with the leaders of that year.
Some example:
1985-1991 Mephistos Elo 1827-2127 (Hegener-Glaser chess machines 12-36 MHZ, 16/32bits)
1992 Chess Macine Schroeder 3.0 Elo 2174 (PC 486/30 MHZ!)
1993 Mephisto Genius 2.0 Elo 2235 (PC-486/Pentium-50-66 MHZ)
1996 Rebel 8.0 Elo 2337 Pentium 90 MHz
1997 HIARCH 6.0 P200 MMX
1998 Fritz 5.32 Elo 2460 P200 MMX (PC with Windows)
...
...
2003 Shredder 7.04 UCI(!) Elo 2791 Athlon 1200 MHz (the default PC for SSDF at that time)
...
...
2006 Rybka 1.2 Elo 2902 (the first engine-GM!) Athlon 1200 MHz
and so an.

About 2000th year human GMs protested against the participation of chess engines and chess machines on FIDE ratings competitions and the FIDE deleted its results from all FIDE Elo lists.
mehmet123
Posts: 671
Joined: Sun Jan 26, 2020 10:38 pm
Location: Turkey
Full name: Mehmet Karaman

Re: How close to Human FIDE rating are the engine ratings?

Post by mehmet123 »

Pocket Fritz is the last engine that played in a human tournament.
It' s elo performance is 2938 at 2009 Mercosur Cup.The main engine of Pocket Fritz is Hiarcs 13 engine. The speed of Hiarcs 13 engine is only 20 kn/s.

https://en.chessbase.com/post/breakthro ... s-aires/17

My calculations said that the elo of Stockflsh 11 with 20 kn/s is 3035 human elo, Stockfish 11 with 10 kn/s is 2950 elo, Stockfish 11 with 1 kn/s is 2612 elo.

http://talkchess.com/forum3/viewtopic.php?f=6&t=72485
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: How close to Human FIDE rating are the engine ratings?

Post by Laskos »

mehmet123 wrote: Fri Jun 19, 2020 3:08 pm
My calculations said that the elo of Stockflsh 11 with 20 kn/s is 3035 human elo, Stockfish 11 with 10 kn/s is 2950 elo, Stockfish 11 with 1 kn/s is 2612 elo.

http://talkchess.com/forum3/viewtopic.php?f=6&t=72485
Very reasonable estimation.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: How close to Human FIDE rating are the engine ratings?

Post by lkaufman »

Laskos wrote: Fri Jun 19, 2020 3:41 pm
mehmet123 wrote: Fri Jun 19, 2020 3:08 pm
My calculations said that the elo of Stockflsh 11 with 20 kn/s is 3035 human elo, Stockfish 11 with 10 kn/s is 2950 elo, Stockfish 11 with 1 kn/s is 2612 elo.

http://talkchess.com/forum3/viewtopic.php?f=6&t=72485
Very reasonable estimation.
I think that these are extrapolations not based on data at the low end. Stockfish is very weak at anything like 1 kn/s, much weaker than komodo at same, certainly no match for a 2600 GM. The 20 kn/s might be pretty accurate, somewhere around this speed Komodo and Stockfish are pretty well matched. I think it would be a very interesting and close competition as to which engine needs fewer milliseconds movetime to win a match from a top GM playing at some reasonable time control like 15' + 10", presumably the engines would only use one thread as I'm not sure that they benefit from MP at such fast levels.
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: How close to Human FIDE rating are the engine ratings?

Post by Laskos »

lkaufman wrote: Fri Jun 19, 2020 5:28 pm
Laskos wrote: Fri Jun 19, 2020 3:41 pm
mehmet123 wrote: Fri Jun 19, 2020 3:08 pm
My calculations said that the elo of Stockflsh 11 with 20 kn/s is 3035 human elo, Stockfish 11 with 10 kn/s is 2950 elo, Stockfish 11 with 1 kn/s is 2612 elo.

http://talkchess.com/forum3/viewtopic.php?f=6&t=72485
Very reasonable estimation.
I think that these are extrapolations not based on data at the low end. Stockfish is very weak at anything like 1 kn/s, much weaker than komodo at same, certainly no match for a 2600 GM. The 20 kn/s might be pretty accurate, somewhere around this speed Komodo and Stockfish are pretty well matched. I think it would be a very interesting and close competition as to which engine needs fewer milliseconds movetime to win a match from a top GM playing at some reasonable time control like 15' + 10", presumably the engines would only use one thread as I'm not sure that they benefit from MP at such fast levels.
1kn/s about 200k nodes per move at tournament time control. In these conditions I was thinking at FIDE rating. Are you sure SF is much weaker than Komodo at 200k nodes per move? This is about 0.1s on an i7 core, I am not that far with time controls from that 0.1s/move in some of my experiments, and I am not seeing SF weak at all. I saw SF being very weak at depths 5-9, but 200k nodes are something like that:

info depth 15 seldepth 18 multipv 1 score cp 80 nodes 200068 nps 1626569 tbhits 0 time 123 pv e2e4 e7e5
bestmove e2e4 ponder e7e5

I had several years ago derived that Komodo at 1 million nodes per move is about 2700 FIDE at tournament time control. 200k nodes per move for SF of nowadays seems reasonable to me to be about 2600 FIDE.

EDIT:

Look at a part of the recent experiment here with SF. The snippet is doubling at 0.1s + 0.001s time control. In Windows 10 and with SF it can be done. And it's very large, reasonably so:

Ultra-STC

0.2s+0.002s vs 0.1s+0.001s

Score of SF_dev2n vs SF_dev1n: 3172 - 416 - 412 [0.845] 4000
Elo difference: 293.9 +/- 13.3, LOS: 100.0 %, DrawRatio: 10.3 %
Finished match

Logistic Elo: 293.9
Gaussian Elo: 286.6