Is Komodo UCI_ELO Settings too Weak or too Strong ?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Chessqueen
Posts: 5685
Joined: Wed Sep 05, 2018 2:16 am
Location: Moving
Full name: Jorge Picado

Is Komodo UCI_ELO Settings too Weak or too Strong ?

Post by Chessqueen »

You do NOT have to post any games or provide your rating here, but most of you who have purchased Komodo Dragon 2.6 have already pit yourself against an UCI_ELO suitable for or equal to your rating? Simply answer if it is too weak, just about right, or too strong, so Mr. Kaufman get an idea if he has to make a param adjustment or leave it as it is. Your Answer is needed. If you know how to create a Poll, please attach it here.
Chessqueen
Posts: 5685
Joined: Wed Sep 05, 2018 2:16 am
Location: Moving
Full name: Jorge Picado

Re: Is Komodo UCI_ELO Settings too Weak or too Strong ?

Post by Chessqueen »

Chessqueen wrote: Sun Jan 02, 2022 4:05 pm You do NOT have to post any games or provide your rating here, but most of you who have purchased Komodo Dragon 2.6 have already pit yourself against an UCI_ELO suitable for or equal to your rating? Simply answer if it is too weak, just about right, or too strong, so Mr. Kaufman get an idea if he has to make a param adjustment or leave it as it is. Your Answer is needed. If you know how to create a Poll, please attach it here.
I have noticed that most people here only purchase or download Engines to pit them against each other and do NOT benefit from playing against them at their levels.
lkaufman
Posts: 6279
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Is Komodo UCI_ELO Settings too Weak or too Strong ?

Post by lkaufman »

Chessqueen wrote: Sun Jan 02, 2022 4:05 pm You do NOT have to post any games or provide your rating here, but most of you who have purchased Komodo Dragon 2.6 have already pit yourself against an UCI_ELO suitable for or equal to your rating? Simply answer if it is too weak, just about right, or too strong, so Mr. Kaufman get an idea if he has to make a param adjustment or leave it as it is. Your Answer is needed. If you know how to create a Poll, please attach it here.
Well, I now have one pretty solid data point in the mid-amateur range. The engine Safrad 2.2.40, CCRL blitz rating 1008, has played a lot of games on LiChess with human players. I looked at all of its rated games against humans over 1400 (actual range 1470 to 1840) at "slow blitz" time controls (from 5' + 2" to 5' + 4" or equivalent, with 5' + 3" being the most typical tc). There were 104 games, 70 wins 25 losses, 10 draws vs. average blitz rating of 1683 giving a performance rating of 1848. According to the transformation formula in another thread here, this equates to 1723 FIDE. So based on this data, that 1008 CCRL blitz rated engine would be evenly matched at 5' + 3" with a typical human with a FIDE rating of 1723. Perhaps more of a disparity that we might have guessed. The closest Dragon 2.6 setting (trying only multiples of 100) was 1500 at this time control, which beat Safrad by 18 elo in 500 games at 5' + 3". So this suggests that a setting of 1482 Elo on Dragon would be even with a 1723 FIDE opponent in slow blitz. However we targeted 15' +10" Rapid, not slow blitz, and surely a human would play better with triple the time, though probably not this much better. So it seems we made Dragon a bit too strong around 1500 elo setting.
Now to complete the picture, I ran all elo settings (multiples of 100 only) in the human range (well, 600 thru 2800) against a variety of CCRL rated engines at 5' + 3", 500 game matches, and generated ratings from that. The first point is that the ratings of the various engines were not as spread out as in CCRL (or CEGT), maybe about 30% less spread. This is roughly the estimate for human elo contraction, so I have good reason to believe that these ratings for the CCRL engines should be realistic in human terms relative to one another. But the conclusion regarding the Dragon levels is that all the settings in the 900 to 1600 range are probably too difficult for similarly rated humans, but below 900 and above 1600 things gradually get more in line with human ratings. Around 1900 level I think the settings may be too easy for similarly rated humans, and maybe at some rating in the grandmaster range they are about right again. I'm playing games with the levels myself to get a clearer picture, but the main issue is the diffence between 5' + 3" games and 15' + 10" games. It would have been a lot easier to target "slow blitz", as there is so much more data available at that time control. Maybe I'll post my humanized slow blitz rating list, with Safrad 2.2 set to 1723 as the reference engine. It's only a sampling of engines, not even including top ones as they are well beyond the human range, but the idea is to see if the list looks credible for humans playing slow blitz with them.
Komodo rules!
Chessqueen
Posts: 5685
Joined: Wed Sep 05, 2018 2:16 am
Location: Moving
Full name: Jorge Picado

Re: Is Komodo UCI_ELO Settings too Weak or too Strong ?

Post by Chessqueen »

lkaufman wrote: Mon Jan 03, 2022 2:58 am
Chessqueen wrote: Sun Jan 02, 2022 4:05 pm You do NOT have to post any games or provide your rating here, but most of you who have purchased Komodo Dragon 2.6 have already pit yourself against an UCI_ELO suitable for or equal to your rating? Simply answer if it is too weak, just about right, or too strong, so Mr. Kaufman get an idea if he has to make a param adjustment or leave it as it is. Your Answer is needed. If you know how to create a Poll, please attach it here.
Well, I now have one pretty solid data point in the mid-amateur range. The engine Safrad 2.2.40, CCRL blitz rating 1008, has played a lot of games on LiChess with human players. I looked at all of its rated games against humans over 1400 (actual range 1470 to 1840) at "slow blitz" time controls (from 5' + 2" to 5' + 4" or equivalent, with 5' + 3" being the most typical tc). There were 104 games, 70 wins 25 losses, 10 draws vs. average blitz rating of 1683 giving a performance rating of 1848. According to the transformation formula in another thread here, this equates to 1723 FIDE. So based on this data, that 1008 CCRL blitz rated engine would be evenly matched at 5' + 3" with a typical human with a FIDE rating of 1723. Perhaps more of a disparity that we might have guessed. The closest Dragon 2.6 setting (trying only multiples of 100) was 1500 at this time control, which beat Safrad by 18 elo in 500 games at 5' + 3". So this suggests that a setting of 1482 Elo on Dragon would be even with a 1723 FIDE opponent in slow blitz. However we targeted 15' +10" Rapid, not slow blitz, and surely a human would play better with triple the time, though probably not this much better. So it seems we made Dragon a bit too strong around 1500 elo setting.
Now to complete the picture, I ran all elo settings (multiples of 100 only) in the human range (well, 600 thru 2800) against a variety of CCRL rated engines at 5' + 3", 500 game matches, and generated ratings from that. The first point is that the ratings of the various engines were not as spread out as in CCRL (or CEGT), maybe about 30% less spread. This is roughly the estimate for human elo contraction, so I have good reason to believe that these ratings for the CCRL engines should be realistic in human terms relative to one another. But the conclusion regarding the Dragon levels is that all the settings in the 900 to 1600 range are probably too difficult for similarly rated humans, but below 900 and above 1600 things gradually get more in line with human ratings. Around 1900 level I think the settings may be too easy for similarly rated humans, and maybe at some rating in the grandmaster range they are about right again. I'm playing games with the levels myself to get a clearer picture, but the main issue is the diffence between 5' + 3" games and 15' + 10" games. It would have been a lot easier to target "slow blitz", as there is so much more data available at that time control. Maybe I'll post my humanized slow blitz rating list, with Safrad 2.2 set to 1723 as the reference engine. It's only a sampling of engines, not even including top ones as they are well beyond the human range, but the idea is to see if the list looks credible for humans playing slow blitz with them.
You should ask LiChess or Chess.com to allow you to use your Komodo Dragon 2.6 after you make a few adjustments with the ratings you you feel comfortable that correspond with Humans at T?C of 5' + 3" since most chess players rated from 1000 to 1800 at LiChess and Chess.com play at at 5' + 3" and NOT at T/C of 15 minutes and 10 sec bonus.