Nakamura vs Stockfish, public match 8/23

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Who will win the four-game match?

Nakamura
5
7%
Stockfish
55
82%
Tie
7
10%
 
Total votes: 67

lkaufman
Posts: 6259
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Nakamura vs Stockfish, public match 8/23

Post by lkaufman »

There is one important point that I think everyone commenting here has overlooked. Engine vs. engine rating lists (except maybe SSDF) are almost all done using some set of opening positions that are fairly representative of human play (maybe GM play, maybe not). But if a GM is playing an engine (without a sufficient handicap) and getting paid per point scored, he will presumably play for a draw in every game and choose the most drawish move possible at every point in the opening. Of course the engine bookmaker can do the opposite, but it is much easier to head for a draw than to avoid one without making clearly inferior moves. Probably the GM will rarely achieve a draw with Black, but with White it might be possible to achieve drawish positions most of the time. So even if the rating difference based on engine vs engine testing is 800 elo, meaning a 1% expected score for the GM, I think it is highly likely that he will be able to draw far more than 2 out of 100 games, of which he will have fifty chances with White.
The only actual data we have on this is the White plus draw odds handicap match of Rybka (I think close to Rybka 3) vs. GM Joel Benjamin. He achieved just two draws in eight games. Assuming that he would have achieved one draw in another eight games as Black, his score would have been a bit under 10%, implying a rating gap of 400. These games were played on a quad (could have been 8 core, although I think not) with a very well optimized book for the purpose of avoiding draws, and some engine code with that goal as well. I think Joel was 2570 at the time, so a good estimate for Rybka 3 on a quad might be 2970.
In general over the years I've found that reducing engine vs. engine elo differences by 25% gives good predictions of results vs. humans.
syzygy
Posts: 5784
Joined: Tue Feb 28, 2012 11:56 pm

Re: Nakamura vs Stockfish, public match 8/23

Post by syzygy »

lkaufman wrote:There is one important point that I think everyone commenting here has overlooked.
At least Uri mentioned this as well (and I agree with it), but for the subthread hardware vs software improvement it does not make so much difference. In fact, I would be surprised if a 100 machine-machine Elo improvement due to software improvements (i.e. better search, better eval) is not worth more versus humans than a 100 machine-machine Elo improvement due to hardware improvements (i.e. more nodes per second).
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Nakamura vs Stockfish, public match 8/23

Post by Laskos »

lkaufman wrote:There is one important point that I think everyone commenting here has overlooked. Engine vs. engine rating lists (except maybe SSDF) are almost all done using some set of opening positions that are fairly representative of human play (maybe GM play, maybe not). But if a GM is playing an engine (without a sufficient handicap) and getting paid per point scored, he will presumably play for a draw in every game and choose the most drawish move possible at every point in the opening. Of course the engine bookmaker can do the opposite, but it is much easier to head for a draw than to avoid one without making clearly inferior moves. Probably the GM will rarely achieve a draw with Black, but with White it might be possible to achieve drawish positions most of the time. So even if the rating difference based on engine vs engine testing is 800 elo, meaning a 1% expected score for the GM, I think it is highly likely that he will be able to draw far more than 2 out of 100 games, of which he will have fifty chances with White.
The only actual data we have on this is the White plus draw odds handicap match of Rybka (I think close to Rybka 3) vs. GM Joel Benjamin. He achieved just two draws in eight games. Assuming that he would have achieved one draw in another eight games as Black, his score would have been a bit under 10%, implying a rating gap of 400. These games were played on a quad (could have been 8 core, although I think not) with a very well optimized book for the purpose of avoiding draws, and some engine code with that goal as well. I think Joel was 2570 at the time, so a good estimate for Rybka 3 on a quad might be 2970.
In general over the years I've found that reducing engine vs. engine elo differences by 25% gives good predictions of results vs. humans.
That's correct. I was only talking about computer ratings, and separation hardware-software there is measured by the same points. Porting it to human ratings is completely different matter, and it will subjected to human ratings probably independently hardware/software. Basically, one would rescale (compress) computer ratings by a factor >1 to get human ratings. And estimations of top dogs of 3200 FIDE instead of 3500 FIDE are probably more or less correct.
lkaufman
Posts: 6259
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Nakamura vs Stockfish, public match 8/23

Post by lkaufman »

syzygy wrote:
lkaufman wrote:There is one important point that I think everyone commenting here has overlooked.
At least Uri mentioned this as well (and I agree with it), but for the subthread hardware vs software improvement it does not make so much difference. In fact, I would be surprised if a 100 machine-machine Elo improvement due to software improvements (i.e. better search, better eval) is not worth more versus humans than a 100 machine-machine Elo improvement due to hardware improvements (i.e. more nodes per second).
Yes, I quite agree with everything you say here. Gains due to improved evaluation ought to be worth more against humans than the same gain due to improved hardware speed. But they will still be less than predicted by engine vs. engine games with variety opening books.
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: Nakamura vs Stockfish, public match 8/23

Post by Sedat Canbaz »

Just my 2 cents more over this issue,

I expect,
Rybka 3 + Perfect 16 book + on my i7 980X machine to be approx. 3300 Elo
In other words: Rybka 3 is at least 500 Elo stronger than GMs of 2800 Elo

And I have no patience to see a X person, to prove...that I am wrong ! :)

What does that mean: no more comments please, let's just play )))

You know my address...I am ready every time for a such challenge !!!
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Nakamura vs Stockfish, public match 8/23

Post by Milos »

lkaufman wrote:There is one important point that I think everyone commenting here has overlooked. Engine vs. engine rating lists (except maybe SSDF) are almost all done using some set of opening positions that are fairly representative of human play (maybe GM play, maybe not). But if a GM is playing an engine (without a sufficient handicap) and getting paid per point scored, he will presumably play for a draw in every game and choose the most drawish move possible at every point in the opening. Of course the engine bookmaker can do the opposite, but it is much easier to head for a draw than to avoid one without making clearly inferior moves.
This is where you are totally wrong. There are no clearly inferior moves when you are playing 500Elo weaker opponent. It is enough for an engine to force unknown complicated open game even with eval of -100cp, and there are way to many for a human to prepare them even if you drop your eval to -50cp. And if you allow engine to use a large enough book, human is lost, period.
If you let any human, even Carlsen into an unknown opening game even with 100cp advantage chance for him to draw becomes virtually zero, because, humans without opening theory lose much more than machines.
lkaufman
Posts: 6259
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Nakamura vs Stockfish, public match 8/23

Post by lkaufman »

Interesting points. But no one (as far as I know) has written an engine that aims for open, complex positions even at the cost of 50 cp or so. Maybe if someone did so you might be right, but I am talking about actual engines of today. Just having a huge book can't insure getting a double-edged position when you are Black; every interesting defense to mainline openings allows some way for White to simplify and reach a drawish position if that is his goal. Maybe someone could create a book that would avoid drawish positions as Black at the cost of maybe 50 CP given correct play, but here too I don't think such a book actually exists today.
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: Nakamura vs Stockfish, public match 8/23

Post by Sedat Canbaz »

lkaufman wrote:There is one important point that I think everyone commenting here has overlooked.
Larry,

There are more important points that I think some chess friends missed to check or can't see the reality:
1) My published GMs vs Engine rating list (games source: Ed Schröder)
2)SCCT Book Tournaments (the participants are very strong Book Makers)
3)My challenges (I am already ready to organize without a prize :) )
etc and etc...

Btw, I suggest again to be checked the below links (those links have serious provings...that nowadays the Top 5 engines are expecting at least 500 Elo stronger than GMs)
http://www.talkchess.com/forum/viewtopi ... ight=sedat
http://www.talkchess.com/forum/viewtopi ... ight=sedat
http://www.talkchess.com/forum/viewtopi ... ight=sedat
http://www.rebel.nl/resu.htm
http://www.hiarcs.com/Games/Mercosur2009/mercosur09.htm

.
Image

.
Image

.
Image
lkaufman wrote: I think Joel was 2570 at the time, so a good estimate for Rybka 3 on a quad might be 2970.
Rybka 2970 Elo ? come on ?! )) Is there any prove ??
As far as I know we need many hundreds of games per player to say Elo

And on which quad ? and with which opening book ??)

Note that there are quads:

Code: Select all

kN/s  Cores  EXE   Processors             Speed      Hardware Users
8434     4   x64   Intel Core i5-4670K    @4.50GHz   Dark Wizzie
473      4   w32   Lenovo A3000 Quad Core  1.20GHz   Souvik Chakraborty
For full list:
http://www.sedatcanbaz.com/chess/?page_id=19

The Hardware Elo difference is expecting approx: 250 Elo

Note also that exception the importance of the hardware speed, the opening book play another BIG role too:

Code: Select all

Rank Name                                   Elo    +    -  games score oppo. draws

  1 SedatChess, Rybka 3                    3368   11   11  2727   69%  3243   45% 
350 Offer koning, Rybka 3                  3080   27   28   457   31%  3203   35% 
For full standings:
http://www.sedatcanbaz.com/chess/?page_id=473

The Opening Book Elo difference is approx: 280 Elo


Hopes this time helps....!!
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: Nakamura vs Stockfish, public match 8/23

Post by Sedat Canbaz »

Note: Shredder includes different old versions (as far I remember Shredder 6, Shredder 7, Shredder 8, Shredder 9)

Probably the latest Shredder 12 is expecting to be performed much better against GMs

For example, CEGT 40/4
Shredder 12 x64 1CPU 2800 Elo
Shredder 9.1 1CPU 2539 Elo

Elo Difference: 261 Elo

And unfortunately CEGT rated Shredder 9.1 around 2500 Elo
Is this values are true in reality, I mean Shredder 9.1 is weaker than Top GMs ???

But if we look at the crosstable, we see 260 Elo in favor for the old versions of Shredder

Really...there are a lot of wrong things in our life...sad, but true !
Uri Blass
Posts: 10915
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Nakamura vs Stockfish, public match 8/23

Post by Uri Blass »

Sedat Canbaz wrote:Note: Shredder includes different old versions (as far I remember Shredder 6, Shredder 7, Shredder 8, Shredder 9)

Probably the latest Shredder 12 is expecting to be performed much better against GMs

For example, CEGT 40/4
Shredder 12 x64 1CPU 2800 Elo
Shredder 9.1 1CPU 2539 Elo

Elo Difference: 261 Elo

And unfortunately CEGT rated Shredder 9.1 around 2500 Elo
Is this values are true in reality, I mean Shredder 9.1 is weaker than Top GMs ???

But if we look at the crosstable, we see 260 Elo in favor for the old versions of Shredder

Really...there are a lot of wrong things in our life...sad, but true !
Computer rating list is not supposed to give rating against humans
and it is impossible for rating lists to do it because difference in rating is not equivalent.

If program A is 800 elo stronger than program B in computer rating list then it is probably clearly less than 800 elo stronger against humans.