Page 1 of 2

Humanized Engine Rating List

Posted: Sat Jun 12, 2021 9:24 pm
by lkaufman
These are the blitz ratings that various engines and Skill levels would be expected to obtain at 3' +2" blitz and at 15' + 10" Rapid on one thread of a modern i7 computer against humans with FIDE Blitz/Rapid ratings in the 2200 to 2900 range, as appropriate. The computers are assumed to have only an 8 move deep variety opening book. The relative ratings are based only on 200 game matches against Lc0, cpu version 27, network 69200 using a three move book of popular openings. This large network gets only around ten nodes per second on a typical i7, making it a good proxy for a top human player in that both will make occasional tactical blunders in fast games. This leads to a much smaller range of ratings than playing the standard (alpha-beta) engines against each other, presumably because the Monte-Carlo search and Neural net of Lc0 are so different from standard engines that just adding one more ply doesn't give the same elo gain as in a direct match. The contraction of the rating spread is roughly what estimates have been for the contraction of standard engine vs. engine lists to simulate human ratings, notably by Kai Laskos. Lc0 on CPU performs much better in Rapid than in Blitz against these standard engines, by roughly 160 elo, about the amount that humans benefit by against standard engines from the longer time limit, so it is likely that the Rapid rating of Lc0 cpu is very close to the Blitz rating it would earn against human opposition. Therefore I'm assigning the same initial rating to Lc0 cpu for both Rapid and Blitz, which results in nearly a class lower (160 elo) Rapid ratings for standard engines than their Blitz ratings, which is consistent with human results against engines. To set the level of the two lists, I picked a rating of 3100 for Lc0 cpu because it results in ratings that are pretty consistent with human results against those engines that have played Rapid or Blitz with humans. This can be adjusted as we get more data of engine vs human results. To reduce the large error bars, I may add games against a different network. To rate weak engines that lose nearly every game to Lc0cpu, I can rate them against the same net set to look at just a single node (setting max nps to 0.001 accomplishes this). Many engines still need Rapid tests; all have blitz ratings listed.
Comments welcome, particularly on whether the level of the lists is about right, too high, or too low, supported by data.

Engine: Blitz (3' + 2") : Rapid (15' + 10")

Stockfish 13: 3594 : 3412
KomodoDragon 2: 3594 : 3412
KomodoDragon 2 MCTS: 3502
Stockfish 11: 3450 : 3322
Komodo 14.1: 3391 : 3227
Stockfish 9: 3372 : 3253
Stockfish 13 elo2850: 3239 : 3056
Critter 1.6a: 3233
Wasp 4.5: 3231 : 3086
Wasp 4: 3225
Gull 3: 3221 : 3084
Fritz 15: 3195
DeepRybka 4: 3154 : 3020
Wasp 3: 3123
Wasp 2: 3105 : 2938
KomodoDragon 2 Sk 24: 3102 : 2862
Lc0cpu v27 net 69200: 3100 : 3100
Rybka 2.3.2a: 3095 : 2953
Stockfish 13 elo2500: 3046
Rybka 1: 3031
Komodo 14.1 Skill 24: 3019 : 2776
Fruit 2.2.1: 2987 : 2777
KomodoDragon 2 Sk 23: 2971 : 2737
Benjamin 1.0: 2955
Komodo 14.1 Skill 23: 2907 : 2569
KomodoDragon 2 Sk 22: 2885 : 2559
Komodo 14.1 Skill 22: 2799
Zahak 3.0: 2750 : 2536
Pawny 0.2: 2728
Baislicka 1.0: 2703
KomodoDragon 2 Sk 21: 2670
Komodo 14.1 Skill 21: 2670
BikJump 2.01: 2630
KomodoDragon 2 Sk 20: 2588
Komodo 14.1 Skill 20: 2559
Zahak 1.0.0: 2548
Snowy 0.2: 2536 : 2180
Stockfish 13 elo2000: 2536
Komodo 14.1 Skill 19: 2510
Pigeon-1.5.1: 2481
CDrill1800-32b: 2464
KomodoDragon 2 Sk 19: 2464
KomodoDragon 2 Sk 18: 2251

Re: Humanized Engine Rating List

Posted: Sat Jun 12, 2021 9:47 pm
by Uri Blass
lkaufman wrote: Sat Jun 12, 2021 9:24 pm These are the blitz ratings that various engines and Skill levels would be expected to obtain at 3' +2" blitz and at 15' + 10" Rapid on one thread of a modern i7 computer against humans with FIDE Blitz/Rapid ratings in the 2200 to 2900 range, as appropriate. The computers are assumed to have only an 8 move deep variety opening book. The relative ratings are based only on 200 game matches against Lc0, cpu version 27, network 69200 using a three move book of popular openings. This large network gets only around ten nodes per second on a typical i7, making it a good proxy for a top human player in that both will make occasional tactical blunders in fast games. This leads to a much smaller range of ratings than playing the standard (alpha-beta) engines against each other, presumably because the Monte-Carlo search and Neural net of Lc0 are so different from standard engines that just adding one more ply doesn't give the same elo gain as in a direct match. The contraction of the rating spread is roughly what estimates have been for the contraction of standard engine vs. engine lists to simulate human ratings, notably by Kai Laskos. Lc0 on CPU performs much better in Rapid than in Blitz against these standard engines, by roughly 160 elo, about the amount that humans benefit by against standard engines from the longer time limit, so it is likely that the Rapid rating of Lc0 cpu is very close to the Blitz rating it would earn against human opposition. Therefore I'm assigning the same initial rating to Lc0 cpu for both Rapid and Blitz, which results in nearly a class lower (160 elo) Rapid ratings for standard engines than their Blitz ratings, which is consistent with human results against engines. To set the level of the two lists, I picked a rating of 3100 for Lc0 cpu because it results in ratings that are pretty consistent with human results against those engines that have played Rapid or Blitz with humans. This can be adjusted as we get more data of engine vs human results. To reduce the large error bars, I may add games against a different network. To rate weak engines that lose nearly every game to Lc0cpu, I can rate them against the same net set to look at just a single node (setting max nps to 0.001 accomplishes this). Many engines still need Rapid tests; all have blitz ratings listed.
Comments welcome, particularly on whether the level of the lists is about right, too high, or too low, supported by data.

Engine: Blitz (3' + 2") : Rapid (15' + 10")

Stockfish 13: 3594 : 3412
KomodoDragon 2: 3594 : 3412
KomodoDragon 2 MCTS: 3502
Stockfish 11: 3450 : 3322
Komodo 14.1: 3391 : 3227
Stockfish 9: 3372 : 3253
Stockfish 13 elo2850: 3239 : 3056
Critter 1.6a: 3233
Wasp 4.5: 3231 : 3086
Wasp 4: 3225
Gull 3: 3221 : 3084
Fritz 15: 3195
DeepRybka 4: 3154 : 3020
Wasp 3: 3123
Wasp 2: 3105 : 2938
KomodoDragon 2 Sk 24: 3102 : 2862
Lc0cpu v27 net 69200: 3100 : 3100
Rybka 2.3.2a: 3095 : 2953
Stockfish 13 elo2500: 3046
Rybka 1: 3031
Komodo 14.1 Skill 24: 3019 : 2776
Fruit 2.2.1: 2987 : 2777
KomodoDragon 2 Sk 23: 2971 : 2737
Benjamin 1.0: 2955
Komodo 14.1 Skill 23: 2907 : 2569
KomodoDragon 2 Sk 22: 2885 : 2559
Komodo 14.1 Skill 22: 2799
Zahak 3.0: 2750 : 2536
Pawny 0.2: 2728
Baislicka 1.0: 2703
KomodoDragon 2 Sk 21: 2670
Komodo 14.1 Skill 21: 2670
BikJump 2.01: 2630
KomodoDragon 2 Sk 20: 2588
Komodo 14.1 Skill 20: 2559
Zahak 1.0.0: 2548
Snowy 0.2: 2536 : 2180
Stockfish 13 elo2000: 2536
Komodo 14.1 Skill 19: 2510
Pigeon-1.5.1: 2481
CDrill1800-32b: 2464
KomodoDragon 2 Sk 19: 2464
KomodoDragon 2 Sk 18: 2251
I wonder how you get this 160 elo number that you claim humans do better in 15'+10'' relative to 3'+2''

I do not believe that Lc0 emulates humans well.
I suspect that Lc0 is basically better in the opening when humans are better in the endgame(did not use lc0 recently but it was my impression in the past).

Re: Humanized Engine Rating List

Posted: Sat Jun 12, 2021 10:03 pm
by lkaufman
Uri Blass wrote: Sat Jun 12, 2021 9:47 pm
lkaufman wrote: Sat Jun 12, 2021 9:24 pm These are the blitz ratings that various engines and Skill levels would be expected to obtain at 3' +2" blitz and at 15' + 10" Rapid on one thread of a modern i7 computer against humans with FIDE Blitz/Rapid ratings in the 2200 to 2900 range, as appropriate. The computers are assumed to have only an 8 move deep variety opening book. The relative ratings are based only on 200 game matches against Lc0, cpu version 27, network 69200 using a three move book of popular openings. This large network gets only around ten nodes per second on a typical i7, making it a good proxy for a top human player in that both will make occasional tactical blunders in fast games. This leads to a much smaller range of ratings than playing the standard (alpha-beta) engines against each other, presumably because the Monte-Carlo search and Neural net of Lc0 are so different from standard engines that just adding one more ply doesn't give the same elo gain as in a direct match. The contraction of the rating spread is roughly what estimates have been for the contraction of standard engine vs. engine lists to simulate human ratings, notably by Kai Laskos. Lc0 on CPU performs much better in Rapid than in Blitz against these standard engines, by roughly 160 elo, about the amount that humans benefit by against standard engines from the longer time limit, so it is likely that the Rapid rating of Lc0 cpu is very close to the Blitz rating it would earn against human opposition. Therefore I'm assigning the same initial rating to Lc0 cpu for both Rapid and Blitz, which results in nearly a class lower (160 elo) Rapid ratings for standard engines than their Blitz ratings, which is consistent with human results against engines. To set the level of the two lists, I picked a rating of 3100 for Lc0 cpu because it results in ratings that are pretty consistent with human results against those engines that have played Rapid or Blitz with humans. This can be adjusted as we get more data of engine vs human results. To reduce the large error bars, I may add games against a different network. To rate weak engines that lose nearly every game to Lc0cpu, I can rate them against the same net set to look at just a single node (setting max nps to 0.001 accomplishes this). Many engines still need Rapid tests; all have blitz ratings listed.
Comments welcome, particularly on whether the level of the lists is about right, too high, or too low, supported by data.

Engine: Blitz (3' + 2") : Rapid (15' + 10")

Stockfish 13: 3594 : 3412
KomodoDragon 2: 3594 : 3412
KomodoDragon 2 MCTS: 3502
Stockfish 11: 3450 : 3322
Komodo 14.1: 3391 : 3227
Stockfish 9: 3372 : 3253
Stockfish 13 elo2850: 3239 : 3056
Critter 1.6a: 3233
Wasp 4.5: 3231 : 3086
Wasp 4: 3225
Gull 3: 3221 : 3084
Fritz 15: 3195
DeepRybka 4: 3154 : 3020
Wasp 3: 3123
Wasp 2: 3105 : 2938
KomodoDragon 2 Sk 24: 3102 : 2862
Lc0cpu v27 net 69200: 3100 : 3100
Rybka 2.3.2a: 3095 : 2953
Stockfish 13 elo2500: 3046
Rybka 1: 3031
Komodo 14.1 Skill 24: 3019 : 2776
Fruit 2.2.1: 2987 : 2777
KomodoDragon 2 Sk 23: 2971 : 2737
Benjamin 1.0: 2955
Komodo 14.1 Skill 23: 2907 : 2569
KomodoDragon 2 Sk 22: 2885 : 2559
Komodo 14.1 Skill 22: 2799
Zahak 3.0: 2750 : 2536
Pawny 0.2: 2728
Baislicka 1.0: 2703
KomodoDragon 2 Sk 21: 2670
Komodo 14.1 Skill 21: 2670
BikJump 2.01: 2630
KomodoDragon 2 Sk 20: 2588
Komodo 14.1 Skill 20: 2559
Zahak 1.0.0: 2548
Snowy 0.2: 2536 : 2180
Stockfish 13 elo2000: 2536
Komodo 14.1 Skill 19: 2510
Pigeon-1.5.1: 2481
CDrill1800-32b: 2464
KomodoDragon 2 Sk 19: 2464
KomodoDragon 2 Sk 18: 2251
I wonder how you get this 160 elo number that you claim humans do better in 15'+10'' relative to 3'+2''

I do not believe that Lc0 emulates humans well.
I suspect that Lc0 is basically better in the opening when humans are better in the endgame(did not use lc0 recently but it was my impression in the past).
The 160 figure is calculated from the Lc0 data; for humans there is no precise comparison of blitz to Rapid, but it is well known that humans do substantially better against engines with more time. "Substantially" isn't a number, but based on my experience with engines playing strong humans over more than thirty years it is surely more than 100 elo (for Rapid vs Blitz) and probably below 200 elo, so at least 160 isn't way off the mark. Engines reached GM level in blitz around 1990, and took another three or four years of hardware plus software advance to do the same in Rapid. This seems roughly consistent with the 160 elo gap.
I agree that Lc0 isn't a great human emulator for the reason you mention, but using it for this is much better than using standard engines where seeing one ply deeper is critical and where the engines never make shallow blunders, it should at least make the scale of the list correct, even if it might favor or disfavor particular engines a bit. The low NPS of the cpu version makes even the opening play somewhat dubious due to missing some simple tactics.

Re: Humanized Engine Rating List

Posted: Sun Jun 13, 2021 3:24 pm
by Chessqueen
lkaufman wrote: Sat Jun 12, 2021 10:03 pm
Uri Blass wrote: Sat Jun 12, 2021 9:47 pm
lkaufman wrote: Sat Jun 12, 2021 9:24 pm These are the blitz ratings that various engines and Skill levels would be expected to obtain at 3' +2" blitz and at 15' + 10" Rapid on one thread of a modern i7 computer against humans with FIDE Blitz/Rapid ratings in the 2200 to 2900 range, as appropriate. The computers are assumed to have only an 8 move deep variety opening book. The relative ratings are based only on 200 game matches against Lc0, cpu version 27, network 69200 using a three move book of popular openings. This large network gets only around ten nodes per second on a typical i7, making it a good proxy for a top human player in that both will make occasional tactical blunders in fast games. This leads to a much smaller range of ratings than playing the standard (alpha-beta) engines against each other, presumably because the Monte-Carlo search and Neural net of Lc0 are so different from standard engines that just adding one more ply doesn't give the same elo gain as in a direct match. The contraction of the rating spread is roughly what estimates have been for the contraction of standard engine vs. engine lists to simulate human ratings, notably by Kai Laskos. Lc0 on CPU performs much better in Rapid than in Blitz against these standard engines, by roughly 160 elo, about the amount that humans benefit by against standard engines from the longer time limit, so it is likely that the Rapid rating of Lc0 cpu is very close to the Blitz rating it would earn against human opposition. Therefore I'm assigning the same initial rating to Lc0 cpu for both Rapid and Blitz, which results in nearly a class lower (160 elo) Rapid ratings for standard engines than their Blitz ratings, which is consistent with human results against engines. To set the level of the two lists, I picked a rating of 3100 for Lc0 cpu because it results in ratings that are pretty consistent with human results against those engines that have played Rapid or Blitz with humans. This can be adjusted as we get more data of engine vs human results. To reduce the large error bars, I may add games against a different network. To rate weak engines that lose nearly every game to Lc0cpu, I can rate them against the same net set to look at just a single node (setting max nps to 0.001 accomplishes this). Many engines still need Rapid tests; all have blitz ratings listed.
Comments welcome, particularly on whether the level of the lists is about right, too high, or too low, supported by data.

Engine: Blitz (3' + 2") : Rapid (15' + 10")

Stockfish 13: 3594 : 3412
KomodoDragon 2: 3594 : 3412
KomodoDragon 2 MCTS: 3502
Stockfish 11: 3450 : 3322
Komodo 14.1: 3391 : 3227
Stockfish 9: 3372 : 3253
Stockfish 13 elo2850: 3239 : 3056
Critter 1.6a: 3233
Wasp 4.5: 3231 : 3086
Wasp 4: 3225
Gull 3: 3221 : 3084
Fritz 15: 3195
DeepRybka 4: 3154 : 3020
Wasp 3: 3123
Wasp 2: 3105 : 2938
KomodoDragon 2 Sk 24: 3102 : 2862
Lc0cpu v27 net 69200: 3100 : 3100
Rybka 2.3.2a: 3095 : 2953
Stockfish 13 elo2500: 3046
Rybka 1: 3031
Komodo 14.1 Skill 24: 3019 : 2776
Fruit 2.2.1: 2987 : 2777
KomodoDragon 2 Sk 23: 2971 : 2737
Benjamin 1.0: 2955
Komodo 14.1 Skill 23: 2907 : 2569
KomodoDragon 2 Sk 22: 2885 : 2559
Komodo 14.1 Skill 22: 2799
Zahak 3.0: 2750 : 2536
Pawny 0.2: 2728
Baislicka 1.0: 2703
KomodoDragon 2 Sk 21: 2670
Komodo 14.1 Skill 21: 2670
BikJump 2.01: 2630
KomodoDragon 2 Sk 20: 2588
Komodo 14.1 Skill 20: 2559
Zahak 1.0.0: 2548
Snowy 0.2: 2536 : 2180
Stockfish 13 elo2000: 2536
Komodo 14.1 Skill 19: 2510
Pigeon-1.5.1: 2481
CDrill1800-32b: 2464
KomodoDragon 2 Sk 19: 2464
KomodoDragon 2 Sk 18: 2251
I wonder how you get this 160 elo number that you claim humans do better in 15'+10'' relative to 3'+2''

I do not believe that Lc0 emulates humans well.
I suspect that Lc0 is basically better in the opening when humans are better in the endgame(did not use lc0 recently but it was my impression in the past).
The 160 figure is calculated from the Lc0 data; for humans there is no precise comparison of blitz to Rapid, but it is well known that humans do substantially better against engines with more time. "Substantially" isn't a number, but based on my experience with engines playing strong humans over more than thirty years it is surely more than 100 elo (for Rapid vs Blitz) and probably below 200 elo, so at least 160 isn't way off the mark. Engines reached GM level in blitz around 1990, and took another three or four years of hardware plus software advance to do the same in Rapid. This seems roughly consistent with the 160 elo gap.
I agree that Lc0 isn't a great human emulator for the reason you mention, but using it for this is much better than using standard engines where seeing one ply deeper is critical and where the engines never make shallow blunders, it should at least make the scale of the list correct, even if it might favor or disfavor particular engines a bit. The low NPS of the cpu version makes even the opening play somewhat dubious due to missing some simple tactics.
Now I have a question, human playing Vs engines like Komodo Dragon2 MCTS at 3'+2 versus 15'+10" what Elo gain do human get with the increased in time. For instance IM Andras Toth playing versus Komodo Dragon2 MCTZ at 3'+2" versus playing at 15'+10" what score do you expect out of 6 games?

Re: Humanized Engine Rating List

Posted: Sun Jun 13, 2021 9:26 pm
by lkaufman
Chessqueen wrote: Sun Jun 13, 2021 3:24 pm
lkaufman wrote: Sat Jun 12, 2021 10:03 pm
Uri Blass wrote: Sat Jun 12, 2021 9:47 pm
lkaufman wrote: Sat Jun 12, 2021 9:24 pm These are the blitz ratings that various engines and Skill levels would be expected to obtain at 3' +2" blitz and at 15' + 10" Rapid on one thread of a modern i7 computer against humans with FIDE Blitz/Rapid ratings in the 2200 to 2900 range, as appropriate. The computers are assumed to have only an 8 move deep variety opening book. The relative ratings are based only on 200 game matches against Lc0, cpu version 27, network 69200 using a three move book of popular openings. This large network gets only around ten nodes per second on a typical i7, making it a good proxy for a top human player in that both will make occasional tactical blunders in fast games. This leads to a much smaller range of ratings than playing the standard (alpha-beta) engines against each other, presumably because the Monte-Carlo search and Neural net of Lc0 are so different from standard engines that just adding one more ply doesn't give the same elo gain as in a direct match. The contraction of the rating spread is roughly what estimates have been for the contraction of standard engine vs. engine lists to simulate human ratings, notably by Kai Laskos. Lc0 on CPU performs much better in Rapid than in Blitz against these standard engines, by roughly 160 elo, about the amount that humans benefit by against standard engines from the longer time limit, so it is likely that the Rapid rating of Lc0 cpu is very close to the Blitz rating it would earn against human opposition. Therefore I'm assigning the same initial rating to Lc0 cpu for both Rapid and Blitz, which results in nearly a class lower (160 elo) Rapid ratings for standard engines than their Blitz ratings, which is consistent with human results against engines. To set the level of the two lists, I picked a rating of 3100 for Lc0 cpu because it results in ratings that are pretty consistent with human results against those engines that have played Rapid or Blitz with humans. This can be adjusted as we get more data of engine vs human results. To reduce the large error bars, I may add games against a different network. To rate weak engines that lose nearly every game to Lc0cpu, I can rate them against the same net set to look at just a single node (setting max nps to 0.001 accomplishes this). Many engines still need Rapid tests; all have blitz ratings listed.
Comments welcome, particularly on whether the level of the lists is about right, too high, or too low, supported by data.

Engine: Blitz (3' + 2") : Rapid (15' + 10")

Stockfish 13: 3594 : 3412
KomodoDragon 2: 3594 : 3412
KomodoDragon 2 MCTS: 3502
Stockfish 11: 3450 : 3322
Komodo 14.1: 3391 : 3227
Stockfish 9: 3372 : 3253
Stockfish 13 elo2850: 3239 : 3056
Critter 1.6a: 3233
Wasp 4.5: 3231 : 3086
Wasp 4: 3225
Gull 3: 3221 : 3084
Fritz 15: 3195
DeepRybka 4: 3154 : 3020
Wasp 3: 3123
Wasp 2: 3105 : 2938
KomodoDragon 2 Sk 24: 3102 : 2862
Lc0cpu v27 net 69200: 3100 : 3100
Rybka 2.3.2a: 3095 : 2953
Stockfish 13 elo2500: 3046
Rybka 1: 3031
Komodo 14.1 Skill 24: 3019 : 2776
Fruit 2.2.1: 2987 : 2777
KomodoDragon 2 Sk 23: 2971 : 2737
Benjamin 1.0: 2955
Komodo 14.1 Skill 23: 2907 : 2569
KomodoDragon 2 Sk 22: 2885 : 2559
Komodo 14.1 Skill 22: 2799
Zahak 3.0: 2750 : 2536
Pawny 0.2: 2728
Baislicka 1.0: 2703
KomodoDragon 2 Sk 21: 2670
Komodo 14.1 Skill 21: 2670
BikJump 2.01: 2630
KomodoDragon 2 Sk 20: 2588
Komodo 14.1 Skill 20: 2559
Zahak 1.0.0: 2548
Snowy 0.2: 2536 : 2180
Stockfish 13 elo2000: 2536
Komodo 14.1 Skill 19: 2510
Pigeon-1.5.1: 2481
CDrill1800-32b: 2464
KomodoDragon 2 Sk 19: 2464
KomodoDragon 2 Sk 18: 2251
I wonder how you get this 160 elo number that you claim humans do better in 15'+10'' relative to 3'+2''

I do not believe that Lc0 emulates humans well.
I suspect that Lc0 is basically better in the opening when humans are better in the endgame(did not use lc0 recently but it was my impression in the past).
The 160 figure is calculated from the Lc0 data; for humans there is no precise comparison of blitz to Rapid, but it is well known that humans do substantially better against engines with more time. "Substantially" isn't a number, but based on my experience with engines playing strong humans over more than thirty years it is surely more than 100 elo (for Rapid vs Blitz) and probably below 200 elo, so at least 160 isn't way off the mark. Engines reached GM level in blitz around 1990, and took another three or four years of hardware plus software advance to do the same in Rapid. This seems roughly consistent with the 160 elo gap.
I agree that Lc0 isn't a great human emulator for the reason you mention, but using it for this is much better than using standard engines where seeing one ply deeper is critical and where the engines never make shallow blunders, it should at least make the scale of the list correct, even if it might favor or disfavor particular engines a bit. The low NPS of the cpu version makes even the opening play somewhat dubious due to missing some simple tactics.
Now I have a question, human playing Vs engines like Komodo Dragon2 MCTS at 3'+2 versus 15'+10" what Elo gain do human get with the increased in time. For instance IM Andras Toth playing versus Komodo Dragon2 MCTZ at 3'+2" versus playing at 15'+10" what score do you expect out of 6 games?
Well, I estimated above that the human will perform about 160 elo better in Rapid than in Blitz against the same engine. Presumably the human improves by something like 260 while the engine gains maybe 100. But with knight odds the extra time won't help the engine much, so presumably the increased performance by the human will be something close to 260 rather than 160 elo, I would expect.

Re: Humanized Engine Rating List

Posted: Fri May 13, 2022 5:14 pm
by Fritz 0
lkaufman wrote: Sat Jun 12, 2021 9:24 pm These are the blitz ratings that various engines and Skill levels would be expected to obtain at 3' +2" blitz and at 15' + 10" Rapid on one thread of a modern i7 computer against humans with FIDE Blitz/Rapid ratings in the 2200 to 2900 range, as appropriate. The computers are assumed to have only an 8 move deep variety opening book. The relative ratings are based only on 200 game matches against Lc0, cpu version 27, network 69200 using a three move book of popular openings. This large network gets only around ten nodes per second on a typical i7, making it a good proxy for a top human player in that both will make occasional tactical blunders in fast games. This leads to a much smaller range of ratings than playing the standard (alpha-beta) engines against each other, presumably because the Monte-Carlo search and Neural net of Lc0 are so different from standard engines that just adding one more ply doesn't give the same elo gain as in a direct match. The contraction of the rating spread is roughly what estimates have been for the contraction of standard engine vs. engine lists to simulate human ratings, notably by Kai Laskos. Lc0 on CPU performs much better in Rapid than in Blitz against these standard engines, by roughly 160 elo, about the amount that humans benefit by against standard engines from the longer time limit, so it is likely that the Rapid rating of Lc0 cpu is very close to the Blitz rating it would earn against human opposition. Therefore I'm assigning the same initial rating to Lc0 cpu for both Rapid and Blitz, which results in nearly a class lower (160 elo) Rapid ratings for standard engines than their Blitz ratings, which is consistent with human results against engines. To set the level of the two lists, I picked a rating of 3100 for Lc0 cpu because it results in ratings that are pretty consistent with human results against those engines that have played Rapid or Blitz with humans. This can be adjusted as we get more data of engine vs human results. To reduce the large error bars, I may add games against a different network. To rate weak engines that lose nearly every game to Lc0cpu, I can rate them against the same net set to look at just a single node (setting max nps to 0.001 accomplishes this). Many engines still need Rapid tests; all have blitz ratings listed.
Comments welcome, particularly on whether the level of the lists is about right, too high, or too low, supported by data.

Engine: Blitz (3' + 2") : Rapid (15' + 10")

Stockfish 13: 3594 : 3412
KomodoDragon 2: 3594 : 3412
KomodoDragon 2 MCTS: 3502
Stockfish 11: 3450 : 3322
Komodo 14.1: 3391 : 3227
Stockfish 9: 3372 : 3253
Stockfish 13 elo2850: 3239 : 3056
Critter 1.6a: 3233
Wasp 4.5: 3231 : 3086
Wasp 4: 3225
Gull 3: 3221 : 3084
Fritz 15: 3195
DeepRybka 4: 3154 : 3020
Wasp 3: 3123
Wasp 2: 3105 : 2938
KomodoDragon 2 Sk 24: 3102 : 2862
Lc0cpu v27 net 69200: 3100 : 3100
Rybka 2.3.2a: 3095 : 2953
Stockfish 13 elo2500: 3046
Rybka 1: 3031
Komodo 14.1 Skill 24: 3019 : 2776
Fruit 2.2.1: 2987 : 2777
KomodoDragon 2 Sk 23: 2971 : 2737
Benjamin 1.0: 2955
Komodo 14.1 Skill 23: 2907 : 2569
KomodoDragon 2 Sk 22: 2885 : 2559
Komodo 14.1 Skill 22: 2799
Zahak 3.0: 2750 : 2536
Pawny 0.2: 2728
Baislicka 1.0: 2703
KomodoDragon 2 Sk 21: 2670
Komodo 14.1 Skill 21: 2670
BikJump 2.01: 2630
KomodoDragon 2 Sk 20: 2588
Komodo 14.1 Skill 20: 2559
Zahak 1.0.0: 2548
Snowy 0.2: 2536 : 2180
Stockfish 13 elo2000: 2536
Komodo 14.1 Skill 19: 2510
Pigeon-1.5.1: 2481
CDrill1800-32b: 2464
KomodoDragon 2 Sk 19: 2464
KomodoDragon 2 Sk 18: 2251
Amazing! This 111 Elo difference between these two Komodo levels is almost exactly what I'm getting based on their 5000+ game matches against Dragon 3 Elo 2300. Komodo level 21 has +59 while level 20 is at -49. It seems that Dragon 3 is as good a simulation of a human as Leela.

Re: Humanized Engine Rating List

Posted: Fri May 13, 2022 5:46 pm
by lkaufman
Fritz 0 wrote: Fri May 13, 2022 5:14 pm
lkaufman wrote: Sat Jun 12, 2021 9:24 pm These are the blitz ratings that various engines and Skill levels would be expected to obtain at 3' +2" blitz and at 15' + 10" Rapid on one thread of a modern i7 computer against humans with FIDE Blitz/Rapid ratings in the 2200 to 2900 range, as appropriate. The computers are assumed to have only an 8 move deep variety opening book. The relative ratings are based only on 200 game matches against Lc0, cpu version 27, network 69200 using a three move book of popular openings. This large network gets only around ten nodes per second on a typical i7, making it a good proxy for a top human player in that both will make occasional tactical blunders in fast games. This leads to a much smaller range of ratings than playing the standard (alpha-beta) engines against each other, presumably because the Monte-Carlo search and Neural net of Lc0 are so different from standard engines that just adding one more ply doesn't give the same elo gain as in a direct match. The contraction of the rating spread is roughly what estimates have been for the contraction of standard engine vs. engine lists to simulate human ratings, notably by Kai Laskos. Lc0 on CPU performs much better in Rapid than in Blitz against these standard engines, by roughly 160 elo, about the amount that humans benefit by against standard engines from the longer time limit, so it is likely that the Rapid rating of Lc0 cpu is very close to the Blitz rating it would earn against human opposition. Therefore I'm assigning the same initial rating to Lc0 cpu for both Rapid and Blitz, which results in nearly a class lower (160 elo) Rapid ratings for standard engines than their Blitz ratings, which is consistent with human results against engines. To set the level of the two lists, I picked a rating of 3100 for Lc0 cpu because it results in ratings that are pretty consistent with human results against those engines that have played Rapid or Blitz with humans. This can be adjusted as we get more data of engine vs human results. To reduce the large error bars, I may add games against a different network. To rate weak engines that lose nearly every game to Lc0cpu, I can rate them against the same net set to look at just a single node (setting max nps to 0.001 accomplishes this). Many engines still need Rapid tests; all have blitz ratings listed.
Comments welcome, particularly on whether the level of the lists is about right, too high, or too low, supported by data.

Engine: Blitz (3' + 2") : Rapid (15' + 10")

Stockfish 13: 3594 : 3412
KomodoDragon 2: 3594 : 3412
KomodoDragon 2 MCTS: 3502
Stockfish 11: 3450 : 3322
Komodo 14.1: 3391 : 3227
Stockfish 9: 3372 : 3253
Stockfish 13 elo2850: 3239 : 3056
Critter 1.6a: 3233
Wasp 4.5: 3231 : 3086
Wasp 4: 3225
Gull 3: 3221 : 3084
Fritz 15: 3195
DeepRybka 4: 3154 : 3020
Wasp 3: 3123
Wasp 2: 3105 : 2938
KomodoDragon 2 Sk 24: 3102 : 2862
Lc0cpu v27 net 69200: 3100 : 3100
Rybka 2.3.2a: 3095 : 2953
Stockfish 13 elo2500: 3046
Rybka 1: 3031
Komodo 14.1 Skill 24: 3019 : 2776
Fruit 2.2.1: 2987 : 2777
KomodoDragon 2 Sk 23: 2971 : 2737
Benjamin 1.0: 2955
Komodo 14.1 Skill 23: 2907 : 2569
KomodoDragon 2 Sk 22: 2885 : 2559
Komodo 14.1 Skill 22: 2799
Zahak 3.0: 2750 : 2536
Pawny 0.2: 2728
Baislicka 1.0: 2703
KomodoDragon 2 Sk 21: 2670
Komodo 14.1 Skill 21: 2670
BikJump 2.01: 2630
KomodoDragon 2 Sk 20: 2588
Komodo 14.1 Skill 20: 2559
Zahak 1.0.0: 2548
Snowy 0.2: 2536 : 2180
Stockfish 13 elo2000: 2536
Komodo 14.1 Skill 19: 2510
Pigeon-1.5.1: 2481
CDrill1800-32b: 2464
KomodoDragon 2 Sk 19: 2464
KomodoDragon 2 Sk 18: 2251
Amazing! This 111 Elo difference between these two Komodo levels is almost exactly what I'm getting based on their 5000+ game matches against Dragon 3 Elo 2300. Komodo level 21 has +59 while level 20 is at -49. It seems that Dragon 3 is as good a simulation of a human as Leela.
So if you trust this list, Dragon 3 Elo 2300 would play at the same level as a human playing blitz rated about 2610. Although those Komodo levels don't have Rapid ratings on the list, other Komodo Skill levels in the 22 to 23 range average nearly 300 elo lower in Rapid, so this suggests about 2310 Elo for Dragon 3 Elo 2300 playing Rapid, an extremely close match!

Re: Humanized Engine Rating List

Posted: Fri May 13, 2022 6:14 pm
by Fritz 0
lkaufman wrote: Fri May 13, 2022 5:46 pm
Fritz 0 wrote: Fri May 13, 2022 5:14 pm
lkaufman wrote: Sat Jun 12, 2021 9:24 pm These are the blitz ratings that various engines and Skill levels would be expected to obtain at 3' +2" blitz and at 15' + 10" Rapid on one thread of a modern i7 computer against humans with FIDE Blitz/Rapid ratings in the 2200 to 2900 range, as appropriate. The computers are assumed to have only an 8 move deep variety opening book. The relative ratings are based only on 200 game matches against Lc0, cpu version 27, network 69200 using a three move book of popular openings. This large network gets only around ten nodes per second on a typical i7, making it a good proxy for a top human player in that both will make occasional tactical blunders in fast games. This leads to a much smaller range of ratings than playing the standard (alpha-beta) engines against each other, presumably because the Monte-Carlo search and Neural net of Lc0 are so different from standard engines that just adding one more ply doesn't give the same elo gain as in a direct match. The contraction of the rating spread is roughly what estimates have been for the contraction of standard engine vs. engine lists to simulate human ratings, notably by Kai Laskos. Lc0 on CPU performs much better in Rapid than in Blitz against these standard engines, by roughly 160 elo, about the amount that humans benefit by against standard engines from the longer time limit, so it is likely that the Rapid rating of Lc0 cpu is very close to the Blitz rating it would earn against human opposition. Therefore I'm assigning the same initial rating to Lc0 cpu for both Rapid and Blitz, which results in nearly a class lower (160 elo) Rapid ratings for standard engines than their Blitz ratings, which is consistent with human results against engines. To set the level of the two lists, I picked a rating of 3100 for Lc0 cpu because it results in ratings that are pretty consistent with human results against those engines that have played Rapid or Blitz with humans. This can be adjusted as we get more data of engine vs human results. To reduce the large error bars, I may add games against a different network. To rate weak engines that lose nearly every game to Lc0cpu, I can rate them against the same net set to look at just a single node (setting max nps to 0.001 accomplishes this). Many engines still need Rapid tests; all have blitz ratings listed.
Comments welcome, particularly on whether the level of the lists is about right, too high, or too low, supported by data.

Engine: Blitz (3' + 2") : Rapid (15' + 10")

Stockfish 13: 3594 : 3412
KomodoDragon 2: 3594 : 3412
KomodoDragon 2 MCTS: 3502
Stockfish 11: 3450 : 3322
Komodo 14.1: 3391 : 3227
Stockfish 9: 3372 : 3253
Stockfish 13 elo2850: 3239 : 3056
Critter 1.6a: 3233
Wasp 4.5: 3231 : 3086
Wasp 4: 3225
Gull 3: 3221 : 3084
Fritz 15: 3195
DeepRybka 4: 3154 : 3020
Wasp 3: 3123
Wasp 2: 3105 : 2938
KomodoDragon 2 Sk 24: 3102 : 2862
Lc0cpu v27 net 69200: 3100 : 3100
Rybka 2.3.2a: 3095 : 2953
Stockfish 13 elo2500: 3046
Rybka 1: 3031
Komodo 14.1 Skill 24: 3019 : 2776
Fruit 2.2.1: 2987 : 2777
KomodoDragon 2 Sk 23: 2971 : 2737
Benjamin 1.0: 2955
Komodo 14.1 Skill 23: 2907 : 2569
KomodoDragon 2 Sk 22: 2885 : 2559
Komodo 14.1 Skill 22: 2799
Zahak 3.0: 2750 : 2536
Pawny 0.2: 2728
Baislicka 1.0: 2703
KomodoDragon 2 Sk 21: 2670
Komodo 14.1 Skill 21: 2670
BikJump 2.01: 2630
KomodoDragon 2 Sk 20: 2588
Komodo 14.1 Skill 20: 2559
Zahak 1.0.0: 2548
Snowy 0.2: 2536 : 2180
Stockfish 13 elo2000: 2536
Komodo 14.1 Skill 19: 2510
Pigeon-1.5.1: 2481
CDrill1800-32b: 2464
KomodoDragon 2 Sk 19: 2464
KomodoDragon 2 Sk 18: 2251
Amazing! This 111 Elo difference between these two Komodo levels is almost exactly what I'm getting based on their 5000+ game matches against Dragon 3 Elo 2300. Komodo level 21 has +59 while level 20 is at -49. It seems that Dragon 3 is as good a simulation of a human as Leela.
So if you trust this list, Dragon 3 Elo 2300 would play at the same level as a human playing blitz rated about 2610. Although those Komodo levels don't have Rapid ratings on the list, other Komodo Skill levels in the 22 to 23 range average nearly 300 elo lower in Rapid, so this suggests about 2310 Elo for Dragon 3 Elo 2300 playing Rapid, an extremely close match!
I believe this is correct, since I think that humans are much stronger in rapid than in blitz, probably by nearly 300 Elo. The difference is almost certainly bigger than between rapid and classical, the latter being maybe 200 Elo.

Re: Humanized Engine Rating List

Posted: Fri May 13, 2022 7:35 pm
by Marcus9
lkaufman wrote: Sat Jun 12, 2021 9:24 pm These are the blitz ratings that various engines and Skill levels would be expected to obtain at 3' +2" blitz and at 15' + 10" Rapid on one thread of a modern i7 computer against humans with FIDE Blitz/Rapid ratings in the 2200 to 2900 range, as appropriate. The computers are assumed to have only an 8 move deep variety opening book. The relative ratings are based only on 200 game matches against Lc0, cpu version 27, network 69200 using a three move book of popular openings. This large network gets only around ten nodes per second on a typical i7, making it a good proxy for a top human player in that both will make occasional tactical blunders in fast games. This leads to a much smaller range of ratings than playing the standard (alpha-beta) engines against each other, presumably because the Monte-Carlo search and Neural net of Lc0 are so different from standard engines that just adding one more ply doesn't give the same elo gain as in a direct match. The contraction of the rating spread is roughly what estimates have been for the contraction of standard engine vs. engine lists to simulate human ratings, notably by Kai Laskos. Lc0 on CPU performs much better in Rapid than in Blitz against these standard engines, by roughly 160 elo, about the amount that humans benefit by against standard engines from the longer time limit, so it is likely that the Rapid rating of Lc0 cpu is very close to the Blitz rating it would earn against human opposition. Therefore I'm assigning the same initial rating to Lc0 cpu for both Rapid and Blitz, which results in nearly a class lower (160 elo) Rapid ratings for standard engines than their Blitz ratings, which is consistent with human results against engines. To set the level of the two lists, I picked a rating of 3100 for Lc0 cpu because it results in ratings that are pretty consistent with human results against those engines that have played Rapid or Blitz with humans. This can be adjusted as we get more data of engine vs human results. To reduce the large error bars, I may add games against a different network. To rate weak engines that lose nearly every game to Lc0cpu, I can rate them against the same net set to look at just a single node (setting max nps to 0.001 accomplishes this). Many engines still need Rapid tests; all have blitz ratings listed.
Comments welcome, particularly on whether the level of the lists is about right, too high, or too low, supported by data.

Engine: Blitz (3' + 2") : Rapid (15' + 10")

Stockfish 13: 3594 : 3412
KomodoDragon 2: 3594 : 3412
KomodoDragon 2 MCTS: 3502
Stockfish 11: 3450 : 3322
Komodo 14.1: 3391 : 3227
Stockfish 9: 3372 : 3253
Stockfish 13 elo2850: 3239 : 3056
Critter 1.6a: 3233
Wasp 4.5: 3231 : 3086
Wasp 4: 3225
Gull 3: 3221 : 3084
Fritz 15: 3195
DeepRybka 4: 3154 : 3020
Wasp 3: 3123
Wasp 2: 3105 : 2938
KomodoDragon 2 Sk 24: 3102 : 2862
Lc0cpu v27 net 69200: 3100 : 3100
Rybka 2.3.2a: 3095 : 2953
Stockfish 13 elo2500: 3046
Rybka 1: 3031
Komodo 14.1 Skill 24: 3019 : 2776
Fruit 2.2.1: 2987 : 2777
KomodoDragon 2 Sk 23: 2971 : 2737
Benjamin 1.0: 2955
Komodo 14.1 Skill 23: 2907 : 2569
KomodoDragon 2 Sk 22: 2885 : 2559
Komodo 14.1 Skill 22: 2799
Zahak 3.0: 2750 : 2536
Pawny 0.2: 2728
Baislicka 1.0: 2703
KomodoDragon 2 Sk 21: 2670
Komodo 14.1 Skill 21: 2670
BikJump 2.01: 2630
KomodoDragon 2 Sk 20: 2588
Komodo 14.1 Skill 20: 2559
Zahak 1.0.0: 2548
Snowy 0.2: 2536 : 2180
Stockfish 13 elo2000: 2536
Komodo 14.1 Skill 19: 2510
Pigeon-1.5.1: 2481
CDrill1800-32b: 2464
KomodoDragon 2 Sk 19: 2464
KomodoDragon 2 Sk 18: 2251
Very Interesting! The results seems very consistent with what I would have expected, I don't have to reply. Thank you Larry!

Re: Humanized Engine Rating List

Posted: Fri May 13, 2022 11:04 pm
by Fritz 0
Yes, these results for Dragon 3 Elo 2300, Komodo level 21 and Komodo level 20 seem about right to me, since I am competitive with them at classical, so I could test them myself. Other engines (or levels) I have not tested, but generally it seems that values are correct for higher rated ones, but maybe too high for lower ones. I suppose it's because the rating difference between them and Leela is too big to be reliable.