Mediocre engines (like mine) v GMs

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

JVMerlino
Posts: 1357
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Mediocre engines (like mine) v GMs

Post by JVMerlino »

It's not often that I get to see Myrddin (CCRL rating 2390 on one CPU) play against strong humans. Thankfully, ICC gave me a free membership for a couple of months. There have been five games against humans in the GM category, and Myrddin easily won all of them. I still find this amazing, but perhaps that's simply because I don't have as much experience watching Myrddin against humans.

What surprised me most was how quickly the games were decided. The time control was 3' for all of them. The longest game was 44 moves before the human resigned, the shortest was 24 moves. On average, the games were "over" (Myrddin feels it is up by a piece) by move 31. Additionally, the GM's ratings were always higher than Myrddin's, ranging from 50-200 points in favor of the human.

So, to anybody who has seen a lot of games, does this surprise you? Did Myrddin get lucky? Or are GMs now outmatched even by the average amateur engine?

jm
Last edited by JVMerlino on Sun Sep 02, 2018 7:20 pm, edited 1 time in total.
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Mediocre engines (like mine) v GMs

Post by jdart »

I have not seen titled players match Arasan on the chess servers for quite some time, but I remember an IM played some games quite a few years back and I don't think he won any. But I do remember an FM back in the early days that got it into a terribly cramped position with almost zero mobility and I think he won. That was a long time ago though.

The thing is, these titled humans are very good players. But if you run most human games, even high-level ones, through an engine for analysis, you will see inaccuracies, or at least what the machine thinks are inaccuracies (and usually it is right). Many games will contain serious errors. The engine never gets tired, loses concentration, or overlooks shallow tactics. So it will exploit these lapses in judgement or vision, without mercy. Then on top of that you have engines nowadays that can find very deep plans and tactics and play perfectly in the endgame, with tablebases. That makes them very hard to beat.

--Jon
JVMerlino
Posts: 1357
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: Mediocre engines (like mine) v GMs

Post by JVMerlino »

FWIW, this was the most interesting (i.e. closest) game:

[pgn][Event "ICS rated blitz match"] [Site "chessclub.com"] [Date "2018.08.31"] [Round "-"] [White "Atlus"] [Black "MyrddinComp"] [Result "0-1"] [WhiteElo "2511"] [BlackElo "2372"] [TimeControl "180"] [Annotator "19... +0.79"] 1. e4 e5 2. Nf3 Nc6 3. d4 exd4 4. Nxd4 Nf6 5. Nxc6 bxc6 6. e5 Qe7 7. Qe2 Nd5 8. c4 Ba6 9. b3 g5 10. g3 Bg7 11. Bb2 O-O-O 12. Bg2 Rhe8 13. O-O Bxe5 14. Qxe5 Qxe5 15. Bxe5 Rxe5 16. cxd5 Bxf1 17. Kxf1 cxd5 18. Nc3 c6 19. Bf3 d6 {+0.79/14 4} 20. Rd1 Kb7 {+0.74/15 12} 21. Rd4 Rde8 {+0.81/13 4} 22. Ne2 h6 {+0.86/12 4} 23. b4 Kb6 {+0.96/14 4} 24. a4 f5 {+0.86/15 9} 25. h3 a6 {+0.79/14 4} 26. Rd1 g4 {+0.97/16 4} 27. hxg4 fxg4 {+0.95/18 6} 28. Bxg4 Re4 {+0.92/17 5} 29. Bh5 R8e5 {+1.15/16 3} 30. Nf4 Rxf4 {+1.73/17 3} 31.gxf4 Rxh5 {+1.65/18 15} 32. Kg2 Rf5 {+1.72/18 2.3} 33. Kg3 h5 {+1.83/17 2.0} 34. Kf3 a5 {+1.90/19 13} 35. bxa5+ Kxa5 {+1.99/18 1.7} 36. Rg1 Rf7 {+2.05/17 4} 37. Rg5 Rh7 {+2.05/18 1.7} 38. f5 Kxa4 {+2.29/16 1.6} 39. Kg2 Kb4 {+2.58/16 1.6} 40. f6 Rf7 {+3.02/17 1.6} 41. Rf5 Kc3 {+3.27/16 3} 42. Kg3 d4 {+4.25/16 1.5} 43. Kh4 d3 {+5.91/14 1.6} 44. Kxh5 d2 {+6.72/14 1.5} 45. Kg6 Rxf6+ {+6.99/14 2.0} {Atlus resigns} 0-1 [/pgn]
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Mediocre engines (like mine) v GMs

Post by lkaufman »

JVMerlino wrote: Sun Sep 02, 2018 6:57 pm It's not often that I get to see Myrddin (CCRL rating 2390 on one CPU) play against strong humans. Thankfully, ICC gave me a free membership for a couple of months. There have been five games against humans in the GM category, and Myrddin easily won all of them. I still find this amazing, but perhaps that's simply because I don't have as much experience watching Myrddin against humans.

What surprised me most was how quickly the games were decided. The time control was 3' for all of them. The longest game was 44 moves before the human resigned, the shortest was 24 moves. On average, the games were "over" (Myrddin feels it is up by a piece) by move 31. Additionally, the GM's ratings were always higher than Myrddin's, ranging from 50-200 points in favor of the human.

So, to anybody who has seen a lot of games, does this surprise you? Did Myrddin get lucky? Or are GMs now outmatched even by the average amateur engine?

jm
The explanation is simple. The CCRL lists, even the blitz list, are calibrated to more or less match some level of human play in 40 moves in two hour games. But blitz weakends human play by more than twice as much as it weakends engine play. If the CCRL blitz list was calibrated based on human play at blitz level, the ratings would be quite a bit higher. Also, computer vs computer play spreads out the ratings, so even if they are correct at 2800 level the 2400 engines will be underrated and the 3200 engines overrated, maybe by something like a 4 to 3 ration. When I play against different Komodo levels on chess.com, if I play at levels totalling around 3' I am only around level 11, but at levels near 15' total I am around level 17. So my play goes up several hundred elo from 3' to 15', while an engine might only gain 150 or so.
Komodo rules!
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: Mediocre engines (like mine) v GMs

Post by JJJ »

lkaufman wrote: Sun Sep 02, 2018 7:33 pm
JVMerlino wrote: Sun Sep 02, 2018 6:57 pm It's not often that I get to see Myrddin (CCRL rating 2390 on one CPU) play against strong humans. Thankfully, ICC gave me a free membership for a couple of months. There have been five games against humans in the GM category, and Myrddin easily won all of them. I still find this amazing, but perhaps that's simply because I don't have as much experience watching Myrddin against humans.

What surprised me most was how quickly the games were decided. The time control was 3' for all of them. The longest game was 44 moves before the human resigned, the shortest was 24 moves. On average, the games were "over" (Myrddin feels it is up by a piece) by move 31. Additionally, the GM's ratings were always higher than Myrddin's, ranging from 50-200 points in favor of the human.

So, to anybody who has seen a lot of games, does this surprise you? Did Myrddin get lucky? Or are GMs now outmatched even by the average amateur engine?

jm
The explanation is simple. The CCRL lists, even the blitz list, are calibrated to more or less match some level of human play in 40 moves in two hour games. But blitz weakends human play by more than twice as much as it weakends engine play. If the CCRL blitz list was calibrated based on human play at blitz level, the ratings would be quite a bit higher. Also, computer vs computer play spreads out the ratings, so even if they are correct at 2800 level the 2400 engines will be underrated and the 3200 engines overrated, maybe by something like a 4 to 3 ration. When I play against different Komodo levels on chess.com, if I play at levels totalling around 3' I am only around level 11, but at levels near 15' total I am around level 17. So my play goes up several hundred elo from 3' to 15', while an engine might only gain 150 or so.
So you might try some showmatch at bullet as a bonus game with knight or rook handicap.
JVMerlino
Posts: 1357
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: Mediocre engines (like mine) v GMs

Post by JVMerlino »

lkaufman wrote: Sun Sep 02, 2018 7:33 pm
JVMerlino wrote: Sun Sep 02, 2018 6:57 pm It's not often that I get to see Myrddin (CCRL rating 2390 on one CPU) play against strong humans. Thankfully, ICC gave me a free membership for a couple of months. There have been five games against humans in the GM category, and Myrddin easily won all of them. I still find this amazing, but perhaps that's simply because I don't have as much experience watching Myrddin against humans.

What surprised me most was how quickly the games were decided. The time control was 3' for all of them. The longest game was 44 moves before the human resigned, the shortest was 24 moves. On average, the games were "over" (Myrddin feels it is up by a piece) by move 31. Additionally, the GM's ratings were always higher than Myrddin's, ranging from 50-200 points in favor of the human.

So, to anybody who has seen a lot of games, does this surprise you? Did Myrddin get lucky? Or are GMs now outmatched even by the average amateur engine?

jm
The explanation is simple. The CCRL lists, even the blitz list, are calibrated to more or less match some level of human play in 40 moves in two hour games. But blitz weakends human play by more than twice as much as it weakends engine play. If the CCRL blitz list was calibrated based on human play at blitz level, the ratings would be quite a bit higher. Also, computer vs computer play spreads out the ratings, so even if they are correct at 2800 level the 2400 engines will be underrated and the 3200 engines overrated, maybe by something like a 4 to 3 ration. When I play against different Komodo levels on chess.com, if I play at levels totalling around 3' I am only around level 11, but at levels near 15' total I am around level 17. So my play goes up several hundred elo from 3' to 15', while an engine might only gain 150 or so.
This makes sense. Although I still wonder why the ICC ratings, which are calculated from a massive pool of games, don't seem to apply as reliably to human vs computer games. Remember that the GMs' blitz ratings were higher than Myrddin's, which SHOULD mean that they are all better than Myrddin (on ICC) at blitz, therefore making the +5 =0 -0 result very unlikely. The game shown above was played by a GM with more than 3000 blitz games on ICC, and a rating 140 points higher than Myrddin. Myrddin has 7500+ blitz games on ICC. That all should be more than enough to give a good confidence in the relationship between any two ratings. Or perhaps not if it's human vs. computer ratings?
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Mediocre engines (like mine) v GMs

Post by lkaufman »

JVMerlino wrote: Mon Sep 03, 2018 1:03 am
lkaufman wrote: Sun Sep 02, 2018 7:33 pm
JVMerlino wrote: Sun Sep 02, 2018 6:57 pm
So, to anybody who has seen a lot of games, does this surprise you? Did Myrddin get lucky? Or are GMs now outmatched even by the average amateur engine?

jm
The explanation is simple. The CCRL lists, even the blitz list, are calibrated to more or less match some level of human play in 40 moves in two hour games. But blitz weakends human play by more than twice as much as it weakends engine play. If the CCRL blitz list was calibrated based on human play at blitz level, the ratings would be quite a bit higher. Also, computer vs computer play spreads out the ratings, so even if they are correct at 2800 level the 2400 engines will be underrated and the 3200 engines overrated, maybe by something like a 4 to 3 ration. When I play against different Komodo levels on chess.com, if I play at levels totalling around 3' I am only around level 11, but at levels near 15' total I am around level 17. So my play goes up several hundred elo from 3' to 15', while an engine might only gain 150 or so.
This makes sense. Although I still wonder why the ICC ratings, which are calculated from a massive pool of games, don't seem to apply as reliably to human vs computer games. Remember that the GMs' blitz ratings were higher than Myrddin's, which SHOULD mean that they are all better than Myrddin (on ICC) at blitz, therefore making the +5 =0 -0 result very unlikely. The game shown above was played by a GM with more than 3000 blitz games on ICC, and a rating 140 points higher than Myrddin. Myrddin has 7500+ blitz games on ICC. That all should be more than enough to give a good confidence in the relationship between any two ratings. Or perhaps not if it's human vs. computer ratings?
Most likely the Myrddin ICC blitz rating was spread over games totalling from 3 to 14 minutes (counting 40xincrement), which is the range for "blitz". Any human player who knows anything about chess computers would play at a level totalling at least ten minutes if he wants to raise his rating, so I imagine the average total time limit was maybe 7 minutes or so. If everyone played game/3' the Myrddin ICC rating might be much higher. Another factor is that you probably allowed challenges from anyone, so players with nothing to lose might play it often, or players who were underrated. The 3000 rated players probably only accept challenges when they think it's at least a fair deal rating-wise.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Mediocre engines (like mine) v GMs

Post by lkaufman »

JJJ wrote: Sun Sep 02, 2018 10:35 pm So you might try some showmatch at bullet as a bonus game with knight or rook handicap.
I suppose you mean against a top GM like Nakamura or MVL. At pure one minute Komodo would just win a lot of lost positions on time, but at 1' + 1" knight odds would probably be a fair match and not be decided by time forfeits. I suppose we'll try it at some point
Komodo rules!
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: Mediocre engines (like mine) v GMs

Post by JJJ »

You might propose that to "penguin" ( andrew tang ), he played many game against Leela in a showmatch.
whereagles
Posts: 565
Joined: Thu Nov 13, 2014 12:03 pm

Re: Mediocre engines (like mine) v GMs

Post by whereagles »

lkaufman wrote: Mon Sep 03, 2018 5:26 am I suppose you mean against a top GM like Nakamura or MVL. At pure one minute Komodo would just win a lot of lost positions on time, but at 1' + 1" knight odds would probably be a fair match and not be decided by time forfeits. I suppose we'll try it at some point
I've seen a live stream of Wesley So vs Stockfish at 1'+1" with knight odds. SF won like 90% :)