elo ratings in stockfish
Moderator: Ras
-
lkaufman
- Posts: 6280
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
elo ratings in stockfish
What has been the usual experience of human players playing against Stockfish set to the same elo level as the human (or to the skill level closest to matching their rating) in rapid games? Is it similar at all levels or do the human players with low ratings do better or worse than those with high ratings against the appropriate level? I know that the SF Elos are tied to CCRL, so logically since engine ratings are known to overstate rating differences in human terms one would expect that weak human players would score worse than strong ones against levels with ratings equal to their own, but is this actually what people are observing in practice?
Komodo rules!
-
Frank Quisinsky
- Posts: 7208
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: elo ratings in stockfish
Hi Larry,
I think Stockfish is to strong for any Elo comparsion to humans.
A GM told me for two months (trained against Shredder 12 tactical skills and used the endgame knowledge for own improvements on 3.7Ghz i7 with 1 core) that Shredder 12 is playing with the rules I am using for my FCP Tourney-2022 with a bit more as 2700 Elo. And Shredder 13 is 350 Elo stronger. That means, that the ratings for FCP Tourney-2022 are 75 Elo to high.
If so ..
The rating from the current Stockfish version with games longer as 1h should be around 3400 Elo.
Very interesting idea because Shredder 12 have tactcial holes and is strong in endgames.
Of course the GM used Shredder 12 for self playing and other engines for analyzes.
The playing levels from Wasp are great.
Very in the near to the reality of playing strength.
Best
Frank
PS: For my new rating list I will start January 2022 I will changed that.
I will set with the conditions I am using: 1:20h per game on 4.4 Ghz, 1 core, ponder = off, the Elo from Xiphos to 3100. Xiphos is great to adjust the Elo for all others, engine is strong in endgames, strong tactcial skills and no Neural Network. The deal with NN is that the very strong passage into the endgame will be still stronger. In my opinion is Xiphos the optimal engine to justify ratings. A very human like style produced Xiphos. Xiphos is playing with white pieces much more aggressive as with black pieces, most humans do the same.
I think Stockfish is to strong for any Elo comparsion to humans.
A GM told me for two months (trained against Shredder 12 tactical skills and used the endgame knowledge for own improvements on 3.7Ghz i7 with 1 core) that Shredder 12 is playing with the rules I am using for my FCP Tourney-2022 with a bit more as 2700 Elo. And Shredder 13 is 350 Elo stronger. That means, that the ratings for FCP Tourney-2022 are 75 Elo to high.
If so ..
The rating from the current Stockfish version with games longer as 1h should be around 3400 Elo.
Very interesting idea because Shredder 12 have tactcial holes and is strong in endgames.
Of course the GM used Shredder 12 for self playing and other engines for analyzes.
The playing levels from Wasp are great.
Very in the near to the reality of playing strength.
Best
Frank
PS: For my new rating list I will start January 2022 I will changed that.
I will set with the conditions I am using: 1:20h per game on 4.4 Ghz, 1 core, ponder = off, the Elo from Xiphos to 3100. Xiphos is great to adjust the Elo for all others, engine is strong in endgames, strong tactcial skills and no Neural Network. The deal with NN is that the very strong passage into the endgame will be still stronger. In my opinion is Xiphos the optimal engine to justify ratings. A very human like style produced Xiphos. Xiphos is playing with white pieces much more aggressive as with black pieces, most humans do the same.
-
lkaufman
- Posts: 6280
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: elo ratings in stockfish
Hi Frank,Frank Quisinsky wrote: ↑Sun Dec 19, 2021 11:08 am Hi Larry,
I think Stockfish is to strong for any Elo comparsion to humans.
A GM told me for two months (trained against Shredder 12 tactical skills and used the endgame knowledge for own improvements on 3.7Ghz i7 with 1 core) that Shredder 12 is playing with the rules I am using for my FCP Tourney-2022 with a bit more as 2700 Elo. And Shredder 13 is 350 Elo stronger. That means, that the ratings for FCP Tourney-2022 are 75 Elo to high.
If so ..
The rating from the current Stockfish version with games longer as 1h should be around 3400 Elo.
Very interesting idea because Shredder 12 have tactcial holes and is strong in endgames.
Of course the GM used Shredder 12 for self playing and other engines for analyzes.
The playing levels from Wasp are great.
Very in the near to the reality of playing strength.
Best
Frank
PS: For my new rating list I will start January 2022 I will changed that.
I will set with the conditions I am using: 1:20h per game on 4.4 Ghz, 1 core, ponder = off, the Elo from Xiphos to 3100. Xiphos is great to adjust the Elo for all others, engine is strong in endgames, strong tactcial skills and no Neural Network. The deal with NN is that the very strong passage into the endgame will be still stronger. In my opinion is Xiphos the optimal engine to justify ratings. A very human like style produced Xiphos. Xiphos is playing with white pieces much more aggressive as with black pieces, most humans do the same.
Thanks for your comments, although I was not asking about full strength Stockfish elo, but rather the human Elo of Stockfish when set to use UCI_Elo = 2200 (for example) against a 2200 human, and how the results might differ at different human levels. But regarding your comment about Shredder 12 being just a bit over 2700 human elo in one hour + games, that is very hard to believe. On the SSDF rating list, Shredder 12 is 3104 on an old, slow quad, and 2983 on a medieval single core athlon at 1.2 Ghz. Presumably on one thread of a modern i7 it would be something like 3075 on their list. Now I think their list was a bit too high (I estimated 65 elo) based on results of top players two decades ago, so maybe 3000 was about right, and tests of move-matching show that elo ratings for a given level of play have dropped about a 100 points in those twenty years, so perhaps 2900 would be about right. But 2700? Maybe for someone who trained specifically against that engine for many games, but that's not the way to measure a fair rating for engines in general.
Komodo rules!
-
Frank Quisinsky
- Posts: 7208
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: elo ratings in stockfish
Hi Larry,
I misunderstanding your message, sorry!
GM Meyer thinking (2 Ghz with 2 cores notebook around the year 2010) that Shredder 12 have not more as 2800 Elo. I think that chess programs at this time are overrated because Shredder / Rybka have strong endgames and only strong chess players saw all the tactcial holes with many pieces on board. Very sure that Magnus vs. Shredder 12 with 4Ghz / 1 core will win with a very clear score. Today I am thinking more that 3050 Elo for Shredder 13 is much more realistic as 3125 in my tourney.
Based on such information I sent John the information to the Elo-calculation for the Wasp levels. 2700 Elo for Wasp 4.5 on DGT-Pi for an example as max. Elo level is very realistic.
For two months I made with human games a new calculation (I collected the games since more as 35 years). Here I get a result for Shredder 12 (but different hardware, often I have no information in my database about hardware and often about the time control) = 2728 Elo.
I am very sure that all the ratings we can find about chess programs are clearly to high.
The reality for Shredder 13 with longer time controls should be 3050 - 3100 Elo or - 350 = Shredder 12.
Best
Frank
PS: Stockfish ist not the best engine to reduce Elo strength. Stockfish started analyzes not with depth 1. Wasp do that and can play wonderful games vs. Super Conny with 1600 Elo (example). 35 nodes per second is enough. The reason I am very happy that Wasp is running in my DGT Centaur with the real Elo levels and the human-like style Wasp produced. Stockfish is here not the right program for a fun-chess-computer in my humble opinion.
I misunderstanding your message, sorry!
GM Meyer thinking (2 Ghz with 2 cores notebook around the year 2010) that Shredder 12 have not more as 2800 Elo. I think that chess programs at this time are overrated because Shredder / Rybka have strong endgames and only strong chess players saw all the tactcial holes with many pieces on board. Very sure that Magnus vs. Shredder 12 with 4Ghz / 1 core will win with a very clear score. Today I am thinking more that 3050 Elo for Shredder 13 is much more realistic as 3125 in my tourney.
Based on such information I sent John the information to the Elo-calculation for the Wasp levels. 2700 Elo for Wasp 4.5 on DGT-Pi for an example as max. Elo level is very realistic.
For two months I made with human games a new calculation (I collected the games since more as 35 years). Here I get a result for Shredder 12 (but different hardware, often I have no information in my database about hardware and often about the time control) = 2728 Elo.
I am very sure that all the ratings we can find about chess programs are clearly to high.
The reality for Shredder 13 with longer time controls should be 3050 - 3100 Elo or - 350 = Shredder 12.
Best
Frank
PS: Stockfish ist not the best engine to reduce Elo strength. Stockfish started analyzes not with depth 1. Wasp do that and can play wonderful games vs. Super Conny with 1600 Elo (example). 35 nodes per second is enough. The reason I am very happy that Wasp is running in my DGT Centaur with the real Elo levels and the human-like style Wasp produced. Stockfish is here not the right program for a fun-chess-computer in my humble opinion.
-
lkaufman
- Posts: 6280
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: elo ratings in stockfish
Thanks, that is useful information. But how can you reconcile this with the fact that Kasparov and Kramnik, both around 2800 at the time, played several drawn matches against the top engines in the years 2001 to 2004, on hardware inferior to one core of good i7 today? Those engines were rated 200 or more elo below Shredder 12 (on SSDF). Remember, they were playing for enormous prize funds, with months to prepare and huge incentive to do so. So even allowing for ratings today being 100 elo lower (for same level) than 20 years ago, shouldn't Shredder 12 be at least 2900 now on one thread i7? I don't doubt your statement about 2728 result for Shredder 12 vs. humans; even if some of those were played on one thread of fairly old hardware, probably the average hardware wasn't too much below one thread i7. So how can you reconcile these two highly contradictory facts? Perhaps it is partly due to opening books; maybe home users play against variety opening books (random openings) whereas Kasparov and Kramnik played against highly optimized opening books? That's the only explanation that I can think of.Frank Quisinsky wrote: ↑Sun Dec 19, 2021 6:22 pm Hi Larry,
I misunderstanding your message, sorry!
GM Meyer thinking (2 Ghz with 2 cores notebook around the year 2010) that Shredder 12 have not more as 2800 Elo. I think that chess programs at this time are overrated because Shredder / Rybka have strong endgames and only strong chess players saw all the tactcial holes with many pieces on board. Very sure that Magnus vs. Shredder 12 with 4Ghz / 1 core will win with a very clear score. Today I am thinking more that 3050 Elo for Shredder 13 is much more realistic as 3125 in my tourney.
Based on such information I sent John the information to the Elo-calculation for the Wasp levels. 2700 Elo for Wasp 4.5 on DGT-Pi for an example as max. Elo level is very realistic.
For two months I made with human games a new calculation (I collected the games since more as 35 years). Here I get a result for Shredder 12 (but different hardware, often I have no information in my database about hardware and often about the time control) = 2728 Elo.
I am very sure that all the ratings we can find about chess programs are clearly to high.
The reality for Shredder 13 with longer time controls should be 3050 - 3100 Elo or - 350 = Shredder 12.
Best
Frank
PS: Stockfish ist not the best engine to reduce Elo strength. Stockfish started analyzes not with depth 1. Wasp do that and can play wonderful games vs. Super Conny with 1600 Elo (example). 35 nodes per second is enough. The reason I am very happy that Wasp is running in my DGT Centaur with the real Elo levels and the human-like style Wasp produced. Stockfish is here not the right program for a fun-chess-computer in my humble opinion.
Regarding the reduced Elo levels, I'm not concerned about whether Stockfish is a good opponent for this purpose, I'm just trying to figure out whether basing the elo levels on results with CCRL engines causes systematic major errors when comparing to human ratings. In other words, is the spread between Stockfish Elo 2500 and Stockfish Elo 1500 accurate in human terms?
Komodo rules!
-
Frank Quisinsky
- Posts: 7208
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: elo ratings in stockfish
Hi Larry,
that is exactly my opinion.
Do you know what Magnus will do with Rybka or Shredder if he is looking in all the short lost games we collected from this engines. Both programs are the number 1 in the past and today we know that both programs have massive problems with many pieces on board and king safety.
Nobody should be an expert with an opinion that in sleeping mode without to looking on the board Magnus will demolish / smash Rybka and Shredder 12 if he know the weaknesses. And if he have no luck with aggressiveness in games vs. such programs Magnus can try to hold the draw in endgames. I am very sure Magnus will make around 70% in a fight vs. Rybka 4Ghz & 1 core or Shredder 12 4Ghz 1 core.
In the past all of us are thinking ... wow the number 1 in computerchess but times changed. We have to many information about all the programs and strong database-systems. I think that it is the reason for the point you set. I know about the results from Kasparaov ... in a game he lost the queen in one move and such things.
Stockfish with 3400 Elo (longer time controls) is not to beat from humans today.
Maybe Carlsen need hundrets of games. That's my personal opinion after all I am reading about it.
Best
Frank
that is exactly my opinion.
Do you know what Magnus will do with Rybka or Shredder if he is looking in all the short lost games we collected from this engines. Both programs are the number 1 in the past and today we know that both programs have massive problems with many pieces on board and king safety.
Nobody should be an expert with an opinion that in sleeping mode without to looking on the board Magnus will demolish / smash Rybka and Shredder 12 if he know the weaknesses. And if he have no luck with aggressiveness in games vs. such programs Magnus can try to hold the draw in endgames. I am very sure Magnus will make around 70% in a fight vs. Rybka 4Ghz & 1 core or Shredder 12 4Ghz 1 core.
In the past all of us are thinking ... wow the number 1 in computerchess but times changed. We have to many information about all the programs and strong database-systems. I think that it is the reason for the point you set. I know about the results from Kasparaov ... in a game he lost the queen in one move and such things.
Stockfish with 3400 Elo (longer time controls) is not to beat from humans today.
Maybe Carlsen need hundrets of games. That's my personal opinion after all I am reading about it.
Best
Frank
-
Frank Quisinsky
- Posts: 7208
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: elo ratings in stockfish
Hi Larry,
if you are looking in one core analyzes from Ivantschuk games with Shredder 12 ...
I am very very sure that you build as strong chess player the same opinion that Shredder 12 never have more as 2700-2725 Elo. I do that with the games from my favorite player Ivantschuk and often I am thinking ... 2600 for Shredder is to high but Shredder have a very strong endgame.
Enough ...
Have a nice christmas with your family.
Best
Frank
if you are looking in one core analyzes from Ivantschuk games with Shredder 12 ...
I am very very sure that you build as strong chess player the same opinion that Shredder 12 never have more as 2700-2725 Elo. I do that with the games from my favorite player Ivantschuk and often I am thinking ... 2600 for Shredder is to high but Shredder have a very strong endgame.
Enough ...
Have a nice christmas with your family.
Best
Frank
-
lkaufman
- Posts: 6280
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: elo ratings in stockfish
I wondered if you have similar information about human results for engines other than Shredder 12, perhaps much weaker ones, maybe in the 1800 to 2200 human range for example? Maybe even your own personal results? The precise hardware isn't so important, but of course we shouldn't mix results on normal PCs with results on cellphones or other devices that are much slower than a single thread i7.Frank Quisinsky wrote: ↑Sun Dec 19, 2021 6:22 pm
For two months I made with human games a new calculation (I collected the games since more as 35 years). Here I get a result for Shredder 12 (but different hardware, often I have no information in my database about hardware and often about the time control) = 2728 Elo.
Komodo rules!