What engine breaks even with GMs in blitz?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: What engine breaks even with GMs in blitz?

Post by Raphexon »

lkaufman wrote: Thu Apr 11, 2019 5:23 am
Raphexon wrote: Wed Apr 10, 2019 8:11 pm
lkaufman wrote: Wed Apr 10, 2019 5:42 pm
Raphexon wrote: Wed Apr 10, 2019 8:03 am
lkaufman wrote: Wed Apr 10, 2019 6:41 am
mwyoung wrote: Tue Apr 09, 2019 2:59 pm
mwyoung wrote: Tue Apr 09, 2019 8:05 am
lkaufman wrote: Tue Apr 09, 2019 7:21 am
mwyoung wrote: Tue Apr 09, 2019 6:59 am
mwyoung wrote: Tue Apr 09, 2019 6:38 am
lkaufman wrote: Sun Apr 07, 2019 6:32 pm In blitz (let's say 3' + 2" or as close to this as possible), the top engines today are far beyond human level. But how far down the list do we have to go to find engines (and specified hardware) that score evenly against GMs, preferably ones with known identities and ratings? I'm sure there is plenty of data to answer this question as countless games have been played online over the years, but does anyone actually have some data, such as "Engine xyz on one thread scored 50% against GMs averaging 2600 FIDE" for example? The question I'd like to answer is: How much would we have to add to CCRL blitz ratings to estimate the FIDE blitz rating of a human GM who would score 50% against it at 3' + 2"?
Hello Larry,

It looks like you are going to have to go down to the bottom of the list.

Here is a news report from 1994. About how Fritz 2 won against all the worlds best in 1994 in 5m blitz games.

https://www.independent.co.uk/arts-ente ... 38085.html

In 1994 at the time of the news report. The best processor was the Pentium.

And the report says: "When Intel sponsored the World Chess Express Challenge in Munich last Friday, they could never have hoped for such a good advertisement for their high-speed Pentium processor. It turned a good computer - Fritz 2 - into a world beater.:"

March 1994:
Intel introduces and ships faster Pentium chips, based on 0.6 micron BiCMOS manufacturing. The processor now includes clock-doubling of 1.5 or 2 time the external clock rate, allowing processor speeds of up to 100 MHz on a 50-66 MHz system bus. The processor also includes power management capabilities to allow stopping and restarting the processor. Code-name during development was P54C. The 60/90 MHz Pentium 735 processor is rated at 149.8 MIPS, and is priced at US$849 in 1000 unit quantities. The 66/100 MHz Pentium 815 processor is rated at 166.3 MIPS, and is priced at US$995 in 1000 unit quantities. [205.98] [265] [62] [550.29] [551.168,259] [557.134] [584.43] [689.115] [276]
I found 2 games of Fritz 2 from 1992. Scoring 1-1 Playing GM Kasparov. Fritz could be playing on a 386 or 486 processor in 1992. My guess would be the 486.

[pgn] [Event "Koln (5')"] [Site "Koln (5')"] [Date "1992.??.??"] [EventDate "?"] [Round "?"] [Result "0-1"] [White "Fritz (Computer)"] [Black "Garry Kasparov"] [ECO "B30"] [WhiteElo "?"] [BlackElo "?"] [PlyCount "77"] 1.e4 c5 2.Nf3 Nc6 3.Nc3 g6 4.d4 cxd4 5.Nxd4 Bg7 6.Be3 Nf6 7.Nxc6 bxc6 8.e5 Ng8 9.f4 Nh6 10.Qd2 O-O 11.O-O-O d6 12.exd6 exd6 13.Qxd6 Qxd6 14.Rxd6 Nf5 15.Rd3 Ba6 16.Bc5 Bxd3 17.Bxf8 Bxf1 18.Bxg7 Bxg2+ 19.Rg1 Kxg7 20.Rxg2 Rb8 21.Re2 Rh8 22.b3 h5 23.Kb2 h4 24.h3 Rd8 25.Ne4 Ng3 26.Nxg3 hxg3 27.Rg2 Rd4 28.Rxg3 Rxf4 29.Rc3 Rh4 30.Rxc6 Rxh3 31.Ra6 g5 32.Rxa7 g4 33.a4 g3 34.Ra5 Rh6 35.Rg5+ Rg6 36.Rxg3 Rxg3 37.c4 f5 38.b4 f4 39.Kc2 0-1[/pgn]

[pgn][Event "Koln (5')"] [Site "Koln (5')"] [Date "1992.??.??"] [EventDate "?"] [Round "?"] [Result "1-0"] [White "Fritz (Computer)"] [Black "Garry Kasparov"] [ECO "A07"] [WhiteElo "?"] [BlackElo "?"] [PlyCount "87"] 1.e4 c5 2.d3 Nc6 3.g3 g6 4.Bg2 Bg7 5.Nf3 d6 6.O-O e5 7.Bg5 f6 8.Be3 Nge7 9.a3 O-O 10.Nc3 Kh8 11.b4 b6 12.Rb1 Be6 13.b5 Nd4 14.a4 f5 15.Ng5 Bg8 16.exf5 Nexf5 17.Bxa8 Qxa8 18.Bxd4 Nxd4 19.Ne2 Nf5 20.c3 d5 21.Re1 h6 22.Nf3 g5 23.d4 e4 24.Ne5 Qe8 25.Nc6 Nd6 26.Nc1 Qd7 27.Nb3 Qh3 28.Qe2 Be6 29.dxc5 bxc5 30.Nxc5 Bg4 31.Qf1 Qh5 32.Rb3 Nc4 33.Nxe4 dxe4 34.Qxc4 Rxf2 35.h4 gxh4 36.Kxf2 hxg3 37.Ke3 Qg5+ 38.Kxe4 Qf5+ 39.Ke3 Qf3+ 40.Kd2 g2 41.Re8+ Bf8 42.Qd4+ Kh7 43.Rb1 Bg7 44.Re7 1-0 [/pgn]
I didn't remember these two events, but I suppose it makes sense that overall Fritz 2 would perform maybe 2750 or so at blitz overall mostly on a Pentium, because Rexchess performed in the 2500s around 1990 on a 486, and Fritz 2 was later and stronger. Considering the hardware avancement since the Pentium, I suppose that the estimates of raising CCRL blitz ratings by 500 for FIDE blitz rating equivalence should be revised upward quite a bit. I don't know if I even have any engine weak enough to play the same level on my 5 Ghz I7 as Fritz 2 did on a Pentium! Well, there's always the handicapped levels of Komodo, one of them must be suitable. But I'd have to have a weak enough engine to run it against to determine which level that would be! Any suggestions of engines of that level that are easy to download and problem-free?
I have tested all of these in the past, and they have worked. And they are the right vintage...
Put your laptop on power saving mode, and or use less time.

http://rebel13.nl/windows/rebel's%20with%20uci.html


MGP 1993.jpg
I looked up the rating for Fritz 2 and Gideon pro. From the 1993 computer chess reports. Gideon pro was rated about 100 elo better then Fritz 2. Tested on a 486 with 4mb of HT.
I picked Pawny 0.2 x64 as my current decade substitute for Fritz 2. It is 2385 CCRL blitz; the list doesn't go back as far as Fritz 2, but by looking at the other Fritz versions and extrapolating backwards I would guess that this would be a pretty fair match on a modern computer. So presumably that would mean that even running at 0.1 GHz (instead of my nearly 5 GHz) it would be about the level of the Pentium that performed around 2750 in blitz with Fritz 2, without even considering that an i7 should be much better than a Pentium at the same speed (can anyone estimate that?). Komodo level 19 is losing to Pawnee at 3' + 2" at full speed but only by 104 elo after 24 games, which means it should be something like the level that Fritz 2 would have achieved with a 25 to 1 speedup from what it had in 1994! That would mean it would crush even Carlsen at blitz, but that doesn't seem to be right, as it has just mixed results vs. Naka and MVL at "slow blitz". Something seems wrong here, not sure what. Anyway, Lc0 11248 totally crushes Komodo level 19 giving it knight odds, even though Komodo does know to exchange major pieces when up a knight.
Somebody benched Stockfish 6 on old hardware. (Newer versions don't work anymore)
So if you have SF6, you can compare results.

http://www.talkchess.com/forum3/viewtopic.php?t=63857

"Intel Pentium I 75Mhz 6200nps/7400nps External cache 256kb COMP=i586 (Command bench=7465nps)"

A modern i7 should be like 8+ times as fast at the same clock speed. (conservative estimate)
A more liberal estimate of mine is that a modern i7 would be around 15 times as fast at 0.1 ghz. (single core)
Thanks. So if we use 10x as a compromise estimate, that means that my laptop is about 500 times faster than the hardware that Fritz 2 used to place ahead of everyone but Kasparov in Blitz? But this is crazy, that would imply that Fritz 2 or a similar rated engine like Pawny 0.2 x64 on my laptop would easily win 100% of the blitz games from Magnus Carlsen. Does anyone believe that? What is wrong here?
Pawny maybe scales really badly with extra nodes/time.
And I assume Fritz used an opening book, which should help somewhat.

Either way I tested Igel vs the Play Magnus app* (age 28) with Igel search till depth 15.
Igel trivially beat Magnus as black. But I also realized depth 15 for Igel isn't very blitz.
Maybe I'll try again tomorrow and just limit nodes per move for Igel instead of a set depth.

*Uses a modifed Glaurung set at a specific strength + opening book to mimic Magnus.

But I can believe it.
Engines have a massive advantage in Blitz.
Pawny is rated close to what I estimate Fritz 2 would be on CCRL reference hardware, not on an ancient Pentium. As for opening book I was assuming the modern engine would also use one, which would presumably be much better than any book from 1994.
I just assumed that something called "Play Magnus" was just a gimmick; is there solid reason to believe that it has actually proven itself equal to Magnus at some level or levels and if so what time limit was he playing under? Of course it also matters whether he actually had incentive to win if such games were played. Presumably it's best to set an opposing engine to time plus inc rather than nodes or depth, since that is the normal standard for chess these days.
The score for Lc0 11248 (on 2080) vs. Pawny 0.2 (on 5 GHz i7) at 3' + 2" giving knight odds was 15.5 to 4.5 for Lc0. So Lc0 totally crushes an engine at knight odds which it would seem from the discussion should overwhelm even Magnus at 3' + 2".
The Play Magnus is mostly a gimmick, but strong enough that Magnus could lose to his 14 yo "self."
The app is blitz, there are a few videos of Magnus playing it as blitz.

Either way, I do really think Pawny 0.2 on a strong modern CPU should be enough to crush Magnus.
But maybe specific anti-computer tactics lead to inconsistensies.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: What engine breaks even with GMs in blitz?

Post by Ferdy »

Ferdy wrote: Thu Apr 11, 2019 4:20 am

Code: Select all

sim version 3
------ Human GM 2600 (time: 5000 ms scale: 1.0) ------
 56.52  Monolith 0.4 2427 (time: 5000 ms scale: 1.0)
 55.67  Hermann 2.8 2349 (time: 5000 ms scale: 1.0)

Update 1:

Code: Select all

sim version 3
------ Human GM 2600 (time: 5000 ms scale: 1.0) ------
 56.52  Monolith 0.4 2427 (time: 5000 ms scale: 1.0)
 56.30  Fridolin_3.00 2482 (time: 5000 ms scale: 1.0)
 55.95  Ethereal 8.16 2459 (time: 5000 ms scale: 1.0)
 55.94  Floyd 0.9 2410 (time: 5000 ms scale: 1.0)
 55.67  Hermann 2.8 2349 (time: 5000 ms scale: 1.0)
 55.43  Maverick 1.5 2379 (time: 5000 ms scale: 1.0)
 45.64  Giraffe 20150908 2243 (time: 5000 ms scale: 1.0)

On schedule:

Shield 2.1, 2560
Marvin 2.0.0, 2324
Rhetoric 1.4, 2541
Nebula 2.0, 2475
Zurichess Geneva, 2317
Lc0 v0.21.1 w11258-120x9-se (backend=blas), CEGT rating ????
voffka
Posts: 288
Joined: Sat Jun 30, 2018 10:58 pm
Location: Ukraine
Full name: Volodymyr Shcherbyna

Re: What engine breaks even with GMs in blitz?

Post by voffka »

Hello Raphexon,
Raphexon wrote: Wed Apr 10, 2019 8:11 pm
[...]

Either way I tested Igel vs the Play Magnus app* (age 28) with Igel search till depth 15.
Igel trivially beat Magnus as black. But I also realized depth 15 for Igel isn't very blitz.
Maybe I'll try again tomorrow and just limit nodes per move for Igel instead of a set depth.

[...]
Thanks for trying Igel ;) Yes, the search is mostly untouched in Igel (versions 0.8 to 1.4.2) and remains almost the same as in GreKo 2018.01. I mostly concentrate on evaluation changes. But soon I will focus on search as well because I feel uncomfortable when on most tournaments it searches so shallowly compared to opponents.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: What engine breaks even with GMs in blitz?

Post by Ferdy »

Ferdy wrote: Thu Apr 11, 2019 8:29 am
Ferdy wrote: Thu Apr 11, 2019 4:20 am

Code: Select all

sim version 3
------ Human GM 2600 (time: 5000 ms scale: 1.0) ------
 56.52  Monolith 0.4 2427 (time: 5000 ms scale: 1.0)
 55.67  Hermann 2.8 2349 (time: 5000 ms scale: 1.0)

Update 1:

Code: Select all

sim version 3
------ Human GM 2600 (time: 5000 ms scale: 1.0) ------
 56.52  Monolith 0.4 2427 (time: 5000 ms scale: 1.0)
 56.30  Fridolin_3.00 2482 (time: 5000 ms scale: 1.0)
 55.95  Ethereal 8.16 2459 (time: 5000 ms scale: 1.0)
 55.94  Floyd 0.9 2410 (time: 5000 ms scale: 1.0)
 55.67  Hermann 2.8 2349 (time: 5000 ms scale: 1.0)
 55.43  Maverick 1.5 2379 (time: 5000 ms scale: 1.0)
 45.64  Giraffe 20150908 2243 (time: 5000 ms scale: 1.0)

On schedule:

Shield 2.1, 2560
Marvin 2.0.0, 2324
Rhetoric 1.4, 2541
Nebula 2.0, 2475
Zurichess Geneva, 2317
Lc0 v0.21.1 w11258-120x9-se (backend=blas), CEGT rating ????

Update 2:
The rating after the engine version is now based on CCRL 40/4 taken from wayback machine (Nov 2018). Probably not too far from current CCRL 40/4 rating list.

Code: Select all

sim version 3
------ Human GM 2600 (time: 5000 ms scale: 1.0) ------
 56.52  Monolith 0.4 2601 (time: 5000 ms scale: 1.0)
 56.30  Fridolin_3.00 2655 (time: 5000 ms scale: 1.0)
 55.95  Ethereal 8.16 2630 (time: 5000 ms scale: 1.0)
 55.94  Floyd 0.9 2584 (time: 5000 ms scale: 1.0)
 55.67  Hermann 2.8 2531 (time: 5000 ms scale: 1.0)
 55.43  Maverick 1.5 2567 (time: 5000 ms scale: 1.0)
 55.33  zurichess geneva 2496 (time: 5000 ms scale: 1.0)
 45.64  Giraffe 20150908 2410 (time: 5000 ms scale: 1.0)

Missing CCRL rating like for Fridolin is estimated via interpolation

Image

Code: Select all

CCRL = (1.056 x CEGT) + 33.859
User avatar
Rebel
Posts: 6997
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: What engine breaks even with GMs in blitz?

Post by Rebel »

mwyoung wrote: Tue Apr 09, 2019 6:59 am I found 2 games of Fritz 2 from 1992. Scoring 1-1 Playing GM Kasparov. Fritz could be playing on a 386 or 486 processor in 1992. My guess would be the 486.

[pgn] [Event "Koln (5')"] [Site "Koln (5')"] [Date "1992.??.??"] [EventDate "?"] [Round "?"] [Result "0-1"] [White "Fritz (Computer)"] [Black "Garry Kasparov"] [ECO "B30"] [WhiteElo "?"] [BlackElo "?"] [PlyCount "77"] 1.e4 c5 2.Nf3 Nc6 3.Nc3 g6 4.d4 cxd4 5.Nxd4 Bg7 6.Be3 Nf6 7.Nxc6 bxc6 8.e5 Ng8 9.f4 Nh6 10.Qd2 O-O 11.O-O-O d6 12.exd6 exd6 13.Qxd6 Qxd6 14.Rxd6 Nf5 15.Rd3 Ba6 16.Bc5 Bxd3 17.Bxf8 Bxf1 18.Bxg7 Bxg2+ 19.Rg1 Kxg7 20.Rxg2 Rb8 21.Re2 Rh8 22.b3 h5 23.Kb2 h4 24.h3 Rd8 25.Ne4 Ng3 26.Nxg3 hxg3 27.Rg2 Rd4 28.Rxg3 Rxf4 29.Rc3 Rh4 30.Rxc6 Rxh3 31.Ra6 g5 32.Rxa7 g4 33.a4 g3 34.Ra5 Rh6 35.Rg5+ Rg6 36.Rxg3 Rxg3 37.c4 f5 38.b4 f4 39.Kc2 0-1[/pgn]

[pgn][Event "Koln (5')"] [Site "Koln (5')"] [Date "1992.??.??"] [EventDate "?"] [Round "?"] [Result "1-0"] [White "Fritz (Computer)"] [Black "Garry Kasparov"] [ECO "A07"] [WhiteElo "?"] [BlackElo "?"] [PlyCount "87"] 1.e4 c5 2.d3 Nc6 3.g3 g6 4.Bg2 Bg7 5.Nf3 d6 6.O-O e5 7.Bg5 f6 8.Be3 Nge7 9.a3 O-O 10.Nc3 Kh8 11.b4 b6 12.Rb1 Be6 13.b5 Nd4 14.a4 f5 15.Ng5 Bg8 16.exf5 Nexf5 17.Bxa8 Qxa8 18.Bxd4 Nxd4 19.Ne2 Nf5 20.c3 d5 21.Re1 h6 22.Nf3 g5 23.d4 e4 24.Ne5 Qe8 25.Nc6 Nd6 26.Nc1 Qd7 27.Nb3 Qh3 28.Qe2 Be6 29.dxc5 bxc5 30.Nxc5 Bg4 31.Qf1 Qh5 32.Rb3 Nc4 33.Nxe4 dxe4 34.Qxc4 Rxf2 35.h4 gxh4 36.Kxf2 hxg3 37.Ke3 Qg5+ 38.Kxe4 Qf5+ 39.Ke3 Qf3+ 40.Kd2 g2 41.Re8+ Bf8 42.Qd4+ Kh7 43.Rb1 Bg7 44.Re7 1-0 [/pgn]
Don't forget how these games were played, this 1992, PC relative new and Kaspy was put behind the screen, not yet very experienced with the mouse, go figure.
90% of coding is debugging, the other 10% is writing bugs.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: What engine breaks even with GMs in blitz?

Post by Ferdy »

Update 4:
Rating after engine name is based from CCRL 40/4

Code: Select all

sim version 3
------ Human GM 2600 (time: 5000 ms scale: 1.0) ------
 63.24  Lc0 v0.21.1 w11258-120x9-se blas ? (time: 5000 ms scale: 1.0)
 56.69  Shield 2.1 2734 (time: 5000 ms scale: 1.0)
 56.52  Monolith 0.4 2601 (time: 5000 ms scale: 1.0)
 56.30  Fridolin_3.00 2655 (time: 5000 ms scale: 1.0)
 56.29  Rhetoric 1.4 2718 (time: 5000 ms scale: 1.0)
 56.06  Marvin 2.0.0 2494 (time: 5000 ms scale: 1.0)
 56.05  Nebula 2.0 2655 (time: 5000 ms scale: 1.0)
 55.95  Ethereal 8.16 2630 (time: 5000 ms scale: 1.0)
 55.94  Floyd 0.9 2584 (time: 5000 ms scale: 1.0)
 55.67  Hermann 2.8 2531 (time: 5000 ms scale: 1.0)
 55.43  Maverick 1.5 2567 (time: 5000 ms scale: 1.0)
 55.33  zurichess geneva 2496 (time: 5000 ms scale: 1.0)
 45.64  Giraffe 20150908 2410 (time: 5000 ms scale: 1.0)

On schedule:

CDrill 1800 1774
Monarch 1.7 2057
Nemo 1.0.1 2858

Lc0 blas on my PC (Intel i7-2600K 3.4 Ghz) at 40/3m TC is very tactically but strong positionally. It has the highest similarity with Human GM so far. This encourages me to try the weaker engines CDrill 1800 and Monarch 1.7.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: What engine breaks even with GMs in blitz?

Post by Ferdy »

Ferdy wrote: Thu Apr 11, 2019 6:59 pm Update 4:
Rating after engine name is based from CCRL 40/4

Code: Select all

sim version 3
------ Human GM 2600 (time: 5000 ms scale: 1.0) ------
 63.24  Lc0 v0.21.1 w11258-120x9-se blas ? (time: 5000 ms scale: 1.0)
 56.69  Shield 2.1 2734 (time: 5000 ms scale: 1.0)
 56.52  Monolith 0.4 2601 (time: 5000 ms scale: 1.0)
 56.30  Fridolin_3.00 2655 (time: 5000 ms scale: 1.0)
 56.29  Rhetoric 1.4 2718 (time: 5000 ms scale: 1.0)
 56.06  Marvin 2.0.0 2494 (time: 5000 ms scale: 1.0)
 56.05  Nebula 2.0 2655 (time: 5000 ms scale: 1.0)
 55.95  Ethereal 8.16 2630 (time: 5000 ms scale: 1.0)
 55.94  Floyd 0.9 2584 (time: 5000 ms scale: 1.0)
 55.67  Hermann 2.8 2531 (time: 5000 ms scale: 1.0)
 55.43  Maverick 1.5 2567 (time: 5000 ms scale: 1.0)
 55.33  zurichess geneva 2496 (time: 5000 ms scale: 1.0)
 45.64  Giraffe 20150908 2410 (time: 5000 ms scale: 1.0)

On schedule:

CDrill 1800 1774
Monarch 1.7 2057
Nemo 1.0.1 2858

Lc0 blas on my PC (Intel i7-2600K 3.4 Ghz) at 40/3m TC is weak tactically but strong positionally. It has the highest similarity with Human GM so far. This encourages me to try the weaker engines CDrill 1800 and Monarch 1.7.
Update 5:

Code: Select all

sim version 3
------ Human GM 2600 (time: 5000 ms scale: 1.0) ------
 63.24  Lc0 v0.21.1 w11258-120x9-se blas ? (time: 5000 ms scale: 1.0)
 57.10  Nemo 1.01 2858 (time: 5000 ms scale: 1.0)
 56.69  Shield 2.1 2734 (time: 5000 ms scale: 1.0)
 56.52  Monolith 0.4 2601 (time: 5000 ms scale: 1.0)
 56.30  Fridolin_3.00 2655 (time: 5000 ms scale: 1.0)
 56.29  Rhetoric 1.4 2718 (time: 5000 ms scale: 1.0)
 56.06  Marvin 2.0.0 2494 (time: 5000 ms scale: 1.0)
 56.05  Nebula 2.0 2655 (time: 5000 ms scale: 1.0)
 55.95  Ethereal 8.16 2630 (time: 5000 ms scale: 1.0)
 55.94  Floyd 0.9 2584 (time: 5000 ms scale: 1.0)
 55.67  Hermann 2.8 2531 (time: 5000 ms scale: 1.0)
 55.43  Maverick 1.5 2567 (time: 5000 ms scale: 1.0)
 55.33  zurichess geneva 2496 (time: 5000 ms scale: 1.0)
 55.05  Stockfish 10 3454 (time: 1000 ms scale: 1.0)
 52.73  Monarch 1.7 2057 (time: 5000 ms scale: 1.0)
 49.53  CDrill 1800 1774 (time: 5000 ms scale: 1.0)
 45.64  Giraffe 20150908 2410 (time: 5000 ms scale: 1.0)
Added CDrill 5s, Nemo 5s, Monarch 5s and Stockfish 1s.

On schedule:
Mephisto Amsterdam
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: What engine breaks even with GMs in blitz?

Post by Ferdy »

Ferdy wrote: Fri Apr 12, 2019 7:13 am
Ferdy wrote: Thu Apr 11, 2019 6:59 pm Update 4:
Rating after engine name is based from CCRL 40/4

Code: Select all

sim version 3
------ Human GM 2600 (time: 5000 ms scale: 1.0) ------
 63.24  Lc0 v0.21.1 w11258-120x9-se blas ? (time: 5000 ms scale: 1.0)
 56.69  Shield 2.1 2734 (time: 5000 ms scale: 1.0)
 56.52  Monolith 0.4 2601 (time: 5000 ms scale: 1.0)
 56.30  Fridolin_3.00 2655 (time: 5000 ms scale: 1.0)
 56.29  Rhetoric 1.4 2718 (time: 5000 ms scale: 1.0)
 56.06  Marvin 2.0.0 2494 (time: 5000 ms scale: 1.0)
 56.05  Nebula 2.0 2655 (time: 5000 ms scale: 1.0)
 55.95  Ethereal 8.16 2630 (time: 5000 ms scale: 1.0)
 55.94  Floyd 0.9 2584 (time: 5000 ms scale: 1.0)
 55.67  Hermann 2.8 2531 (time: 5000 ms scale: 1.0)
 55.43  Maverick 1.5 2567 (time: 5000 ms scale: 1.0)
 55.33  zurichess geneva 2496 (time: 5000 ms scale: 1.0)
 45.64  Giraffe 20150908 2410 (time: 5000 ms scale: 1.0)

On schedule:

CDrill 1800 1774
Monarch 1.7 2057
Nemo 1.0.1 2858

Lc0 blas on my PC (Intel i7-2600K 3.4 Ghz) at 40/3m TC is weak tactically but strong positionally. It has the highest similarity with Human GM so far. This encourages me to try the weaker engines CDrill 1800 and Monarch 1.7.
Update 5:

Code: Select all

sim version 3
------ Human GM 2600 (time: 5000 ms scale: 1.0) ------
 63.24  Lc0 v0.21.1 w11258-120x9-se blas ? (time: 5000 ms scale: 1.0)
 57.10  Nemo 1.01 2858 (time: 5000 ms scale: 1.0)
 56.69  Shield 2.1 2734 (time: 5000 ms scale: 1.0)
 56.52  Monolith 0.4 2601 (time: 5000 ms scale: 1.0)
 56.30  Fridolin_3.00 2655 (time: 5000 ms scale: 1.0)
 56.29  Rhetoric 1.4 2718 (time: 5000 ms scale: 1.0)
 56.06  Marvin 2.0.0 2494 (time: 5000 ms scale: 1.0)
 56.05  Nebula 2.0 2655 (time: 5000 ms scale: 1.0)
 55.95  Ethereal 8.16 2630 (time: 5000 ms scale: 1.0)
 55.94  Floyd 0.9 2584 (time: 5000 ms scale: 1.0)
 55.67  Hermann 2.8 2531 (time: 5000 ms scale: 1.0)
 55.43  Maverick 1.5 2567 (time: 5000 ms scale: 1.0)
 55.33  zurichess geneva 2496 (time: 5000 ms scale: 1.0)
 55.05  Stockfish 10 3454 (time: 1000 ms scale: 1.0)
 52.73  Monarch 1.7 2057 (time: 5000 ms scale: 1.0)
 49.53  CDrill 1800 1774 (time: 5000 ms scale: 1.0)
 45.64  Giraffe 20150908 2410 (time: 5000 ms scale: 1.0)
Added CDrill 5s, Nemo 5s, Monarch 5s and Stockfish 1s.

On schedule:
Mephisto Amsterdam

Update 6:

Code: Select all

sim version 3
------ Human GM 2600 (time: 5000 ms scale: 1.0) ------
 63.24  Lc0 v0.21.1 w11258-120x9-se blas ? (time: 5000 ms scale: 1.0)
 57.10  Nemo 1.01 2858 (time: 5000 ms scale: 1.0)
 56.69  Wasp 3.60 UCI_Elo 2600 (time: 5000 ms scale: 1.0)
 56.69  Shield 2.1 2734 (time: 5000 ms scale: 1.0)
 56.52  Monolith 0.4 2601 (time: 5000 ms scale: 1.0)
 56.30  Fridolin_3.00 2655 (time: 5000 ms scale: 1.0)
 56.29  Rhetoric 1.4 2718 (time: 5000 ms scale: 1.0)
 56.06  Marvin 2.0.0 2494 (time: 5000 ms scale: 1.0)
 56.05  Nebula 2.0 2655 (time: 5000 ms scale: 1.0)
 55.95  Ethereal 8.16 2630 (time: 5000 ms scale: 1.0)
 55.94  Floyd 0.9 2584 (time: 5000 ms scale: 1.0)
 55.67  Hermann 2.8 2531 (time: 5000 ms scale: 1.0)
 55.43  Maverick 1.5 2567 (time: 5000 ms scale: 1.0)
 55.33  zurichess geneva 2496 (time: 5000 ms scale: 1.0)
 55.05  Stockfish 10 3454 (time: 1000 ms scale: 1.0)
 52.73  Monarch 1.7 2057 (time: 5000 ms scale: 1.0)
 51.69  Mephisto Amsterdam ? (time: 5000 ms scale: 1.0)
 49.53  CDrill 1800 1774 (time: 5000 ms scale: 1.0)
 45.64  Giraffe 20150908 2410 (time: 5000 ms scale: 1.0)
Added Mephisto and Wasp 3.60 UCI_Elo 2600
Jesse Gersenson
Posts: 593
Joined: Sat Aug 20, 2011 9:43 am

Re: What engine breaks even with GMs in blitz?

Post by Jesse Gersenson »

Sedat's benchmarks include entries from my 75 Mhz 486, 150 Mhz Pentium I, 366 Mhz Pentium II, and 1133 Mhz Pentium III:
https://sites.google.com/site/computers ... benchmarks
https://sites.google.com/site/computers ... benchmarks

http://www.rybkaforum.net/cgi-bin/rybka ... ?tid=25884
"Chess Tiger 14 performanced 2788 elo at Mercosur Cup 2001 (Pentium 3 866 Mhz, ~ 415 Fritz Benchmark)
Chess Tiger 14 performanced 2805 elo at Najdorf Memorial Open 2001 (Pentium 3 866 Mhz, ~ 415 Fritz Benchmark)"

If you want to run any tests, my Pentium II and III are easy to run tests on, they run modern Linux distributions. I'm not sure what state the Pentium I is in, and my two 486's are both major hassles to use.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: What engine breaks even with GMs in blitz?

Post by Laskos »

lkaufman wrote: Wed Apr 10, 2019 12:04 am
Laskos wrote: Tue Apr 09, 2019 10:48 pm
lkaufman wrote: Tue Apr 09, 2019 8:28 pm
lkaufman wrote: Tue Apr 09, 2019 6:48 pm
I ran the same test as you did overnight, except that I ran at the actual 45' + 15" level under discussion instead of your 15' + 5" level. So far I have five wins for Arasan 14.3, no wins for Lc0 11248, one draw, and the current game pretty clearly a draw so call it two draws. My 2080 is about 20% faster than your 2070 we determined, but perhaps my 4.9 GHz i7 is also a bit faster than yours? Anyway, it seems that tripling the time limit made a big difference, 6-1 instead of 6-4.
Much to my surprise, Lc0 won that "drawn" game making the score 1.5 -5.5, not 1-6. Lc0 had a lone queen against bishop, knight, and three pawns, and so I assumed (and the evals indicated) that Lc0 would seek perpetual check. But somehow it picked up all three of the pawns, one by one over many moves, and won the queen vs. two minors endgame (no TBs used).
Might start looking similar to my result, although I expect Arasan to perform worse at 45' + 15'' than at 15' + 5'' (and your result will probably show that). That endgame you describe seems a bit funny.

I am getting quite interesting results with ThothFish, a SF derivative which can be adjusted to like or dislike swapping pieces to desired degree. I am playing with some parameters at fast TC, and got a "weak" (small number of nodes) ThothFish which likes very much swapping pieces and overperforms heavily the regular "weak" (small number of nodes) SF, both being Knight up against "strong" (and handicapped) Lc0 11248. Adjusted in this way SF can probably model somehow a human too.

So, Arasan results are sure not the final word.
Well, Lc0 won the last (8th) game vs. Arasan, so the final score was 2.5 to 5.5 for Lc0 giving knight odds at 45' + 15", pretty much what we would expect based on your result at 15' + 5". The ThothFish test sounds interesting; I wonder how strong the incentive to exchange (especially queens) should be for optimum results at knight odds. I suppose though that it's not perfect in that it might still try to trade even if it loses back the piece. For example if it will pay a pawn to trade queens, that might be okay when up a full piece but is certainly not ok when up a piece for two pawns.
I performed this longish test (2 days) using ThothFish. First at fast time control I checked which settings work to convert best the Knight advantage. At some fast disbalanced TC I got for ThothFish with "Exchange" factors = 200 (all of them) compared to SF_dev:
59/100 conversions for ThothFish
41/100 conversions for SF_dev

Now, using those settings (=200) I found to work best, I played real games with TC = 45' + 15'' for Lc0 11248. The human opponent is "ThothFish200" at 1 million nodes per move. I found 1-2 years ago and I repeated my reasoning these days, and I am am pretty confident that SF_dev at about 1 million nodes per move is a 2750 - 2850 FIDE Elo opponent for a human at 45' + 15'' time control. That is about 0.15s/move on my 4 core PC, or game in 10 seconds. I will not repeat the reasoning as it is long and not illuminating, but I am quite confident of it.

Real games:
Lc0 11248 at 45' + 15'' at Knight odds versus SF_dev at 1 million nodes/move:
+17 =3 -0
for SF_dev (mimicking human). So it seems that Lc0 could draw 3/20 games against a top-GM (2800 +/ 50 FIDE) at Knight odds.

But here comes ThothFish, which knows better to convert Knight advantage (but is no stronger than SF_dev in normal play, actually a bit weaker), and the result is different:
Lc0 11248 at 45' + 15'' at Knight odds versus ThothFIsh at 1 million nodes/move:
+20 =0 -0
for ThothFish (mimicking human). It seems Lc0 cannot draw or win any game of out of 20 against a top-GM at Knight odds
.


I am not sure how well ThothFIsh at 1 million nodes/move mimics a human, it probably blunders tactically less than a human. Probably we got pretty good results at Knight odds against Arasan mimicking human because Arasan knows worse than SF how to convert a Knight advantage, and MUCH worse than ThothFish. So, it seems Knight odds against a very top GM at 45' + 15'' are possible only if human top GM blunders pretty badly. That's not exactly what I had hoped for. Maybe against Lc0 playing very aggressively and forcing the human to blunder it could be fun, but I guess the playing top GM would be annoyed by Lc0 play.