Nakamura vs Stockfish, public match 8/23

lkaufman · Post by **lkaufman** » Thu Aug 28, 2014 12:09 am

There is one important point that I think everyone commenting here has overlooked. Engine vs. engine rating lists (except maybe SSDF) are almost all done using some set of opening positions that are fairly representative of human play (maybe GM play, maybe not). But if a GM is playing an engine (without a sufficient handicap) and getting paid per point scored, he will presumably play for a draw in every game and choose the most drawish move possible at every point in the opening. Of course the engine bookmaker can do the opposite, but it is much easier to head for a draw than to avoid one without making clearly inferior moves. Probably the GM will rarely achieve a draw with Black, but with White it might be possible to achieve drawish positions most of the time. So even if the rating difference based on engine vs engine testing is 800 elo, meaning a 1% expected score for the GM, I think it is highly likely that he will be able to draw far more than 2 out of 100 games, of which he will have fifty chances with White.
The only actual data we have on this is the White plus draw odds handicap match of Rybka (I think close to Rybka 3) vs. GM Joel Benjamin. He achieved just two draws in eight games. Assuming that he would have achieved one draw in another eight games as Black, his score would have been a bit under 10%, implying a rating gap of 400. These games were played on a quad (could have been 8 core, although I think not) with a very well optimized book for the purpose of avoiding draws, and some engine code with that goal as well. I think Joel was 2570 at the time, so a good estimate for Rybka 3 on a quad might be 2970.
In general over the years I've found that reducing engine vs. engine elo differences by 25% gives good predictions of results vs. humans.

syzygy · Post by **syzygy** » Thu Aug 28, 2014 12:33 am

lkaufman wrote:There is one important point that I think everyone commenting here has overlooked.

At least Uri mentioned this as well (and I agree with it), but for the subthread hardware vs software improvement it does not make so much difference. In fact, I would be surprised if a 100 machine-machine Elo improvement due to software improvements (i.e. better search, better eval) is not worth more versus humans than a 100 machine-machine Elo improvement due to hardware improvements (i.e. more nodes per second).

Laskos · Post by **Laskos** » Thu Aug 28, 2014 12:47 am

lkaufman wrote:There is one important point that I think everyone commenting here has overlooked. Engine vs. engine rating lists (except maybe SSDF) are almost all done using some set of opening positions that are fairly representative of human play (maybe GM play, maybe not). But if a GM is playing an engine (without a sufficient handicap) and getting paid per point scored, he will presumably play for a draw in every game and choose the most drawish move possible at every point in the opening. Of course the engine bookmaker can do the opposite, but it is much easier to head for a draw than to avoid one without making clearly inferior moves. Probably the GM will rarely achieve a draw with Black, but with White it might be possible to achieve drawish positions most of the time. So even if the rating difference based on engine vs engine testing is 800 elo, meaning a 1% expected score for the GM, I think it is highly likely that he will be able to draw far more than 2 out of 100 games, of which he will have fifty chances with White.
The only actual data we have on this is the White plus draw odds handicap match of Rybka (I think close to Rybka 3) vs. GM Joel Benjamin. He achieved just two draws in eight games. Assuming that he would have achieved one draw in another eight games as Black, his score would have been a bit under 10%, implying a rating gap of 400. These games were played on a quad (could have been 8 core, although I think not) with a very well optimized book for the purpose of avoiding draws, and some engine code with that goal as well. I think Joel was 2570 at the time, so a good estimate for Rybka 3 on a quad might be 2970.
In general over the years I've found that reducing engine vs. engine elo differences by 25% gives good predictions of results vs. humans.

That's correct. I was only talking about computer ratings, and separation hardware-software there is measured by the same points. Porting it to human ratings is completely different matter, and it will subjected to human ratings probably independently hardware/software. Basically, one would rescale (compress) computer ratings by a factor >1 to get human ratings. And estimations of top dogs of 3200 FIDE instead of 3500 FIDE are probably more or less correct.

lkaufman · Post by **lkaufman** » Thu Aug 28, 2014 1:24 am

syzygy wrote:
lkaufman wrote:There is one important point that I think everyone commenting here has overlooked.
At least Uri mentioned this as well (and I agree with it), but for the subthread hardware vs software improvement it does not make so much difference. In fact, I would be surprised if a 100 machine-machine Elo improvement due to software improvements (i.e. better search, better eval) is not worth more versus humans than a 100 machine-machine Elo improvement due to hardware improvements (i.e. more nodes per second).

Yes, I quite agree with everything you say here. Gains due to improved evaluation ought to be worth more against humans than the same gain due to improved hardware speed. But they will still be less than predicted by engine vs. engine games with variety opening books.

Sedat Canbaz · Post by **Sedat Canbaz** » Thu Aug 28, 2014 1:26 am

Just my 2 cents more over this issue,

I expect,
Rybka 3 + Perfect 16 book + on my i7 980X machine to be approx. 3300 Elo
In other words: Rybka 3 is at least 500 Elo stronger than GMs of 2800 Elo

And I have no patience to see a X person, to prove...that I am wrong !

What does that mean: no more comments please, let's just play )))

You know my address...I am ready every time for a such challenge !!!

Milos · Post by **Milos** » Thu Aug 28, 2014 1:41 am

lkaufman wrote:There is one important point that I think everyone commenting here has overlooked. Engine vs. engine rating lists (except maybe SSDF) are almost all done using some set of opening positions that are fairly representative of human play (maybe GM play, maybe not). But if a GM is playing an engine (without a sufficient handicap) and getting paid per point scored, he will presumably play for a draw in every game and choose the most drawish move possible at every point in the opening. Of course the engine bookmaker can do the opposite, but it is much easier to head for a draw than to avoid one without making clearly inferior moves.

This is where you are totally wrong. There are no clearly inferior moves when you are playing 500Elo weaker opponent. It is enough for an engine to force unknown complicated open game even with eval of -100cp, and there are way to many for a human to prepare them even if you drop your eval to -50cp. And if you allow engine to use a large enough book, human is lost, period.
If you let any human, even Carlsen into an unknown opening game even with 100cp advantage chance for him to draw becomes virtually zero, because, humans without opening theory lose much more than machines.

lkaufman · Post by **lkaufman** » Thu Aug 28, 2014 2:30 am

Interesting points. But no one (as far as I know) has written an engine that aims for open, complex positions even at the cost of 50 cp or so. Maybe if someone did so you might be right, but I am talking about actual engines of today. Just having a huge book can't insure getting a double-edged position when you are Black; every interesting defense to mainline openings allows some way for White to simplify and reach a drawish position if that is his goal. Maybe someone could create a book that would avoid drawish positions as Black at the cost of maybe 50 CP given correct play, but here too I don't think such a book actually exists today.

Sedat Canbaz · Post by **Sedat Canbaz** » Thu Aug 28, 2014 10:27 am

lkaufman wrote:There is one important point that I think everyone commenting here has overlooked.

Larry,

There are more important points that I think some chess friends missed to check or can't see the reality:
1) My published GMs vs Engine rating list (games source: Ed Schröder)
2)SCCT Book Tournaments (the participants are very strong Book Makers)
3)My challenges (I am already ready to organize without a prize

)
etc and etc...

Btw, I suggest again to be checked the below links (those links have serious provings...that nowadays the Top 5 engines are expecting at least 500 Elo stronger than GMs)
http://www.talkchess.com/forum/viewtopi ... ight=sedat
http://www.talkchess.com/forum/viewtopi ... ight=sedat
http://www.talkchess.com/forum/viewtopi ... ight=sedat
http://www.rebel.nl/resu.htm
http://www.hiarcs.com/Games/Mercosur2009/mercosur09.htm

.

.

lkaufman wrote: I think Joel was 2570 at the time, so a good estimate for Rybka 3 on a quad might be 2970.

Rybka 2970 Elo ? come on ?! )) Is there any prove ??
As far as I know we need many hundreds of games per player to say Elo

And on which quad ? and with which opening book ??)

Note that there are quads:

Code: Select all

kN/s  Cores  EXE   Processors             Speed      Hardware Users
8434     4   x64   Intel Core i5-4670K    @4.50GHz   Dark Wizzie
473      4   w32   Lenovo A3000 Quad Core  1.20GHz   Souvik Chakraborty

For full list:
http://www.sedatcanbaz.com/chess/?page_id=19

The Hardware Elo difference is expecting approx: 250 Elo

Note also that exception the importance of the hardware speed, the opening book play another BIG role too:

Code: Select all

Rank Name                                   Elo    +    -  games score oppo. draws

  1 SedatChess, Rybka 3                    3368   11   11  2727   69%  3243   45% 
350 Offer koning, Rybka 3                  3080   27   28   457   31%  3203   35%

For full standings:
http://www.sedatcanbaz.com/chess/?page_id=473

The Opening Book Elo difference is approx: 280 Elo

Hopes this time helps....!!

Sedat Canbaz · Post by **Sedat Canbaz** » Thu Aug 28, 2014 11:03 am

Note: Shredder includes different old versions (as far I remember Shredder 6, Shredder 7, Shredder 8, Shredder 9)

Probably the latest Shredder 12 is expecting to be performed much better against GMs

For example, CEGT 40/4
Shredder 12 x64 1CPU 2800 Elo
Shredder 9.1 1CPU 2539 Elo

Elo Difference: 261 Elo

And unfortunately CEGT rated Shredder 9.1 around 2500 Elo
Is this values are true in reality, I mean Shredder 9.1 is weaker than Top GMs ???

But if we look at the crosstable, we see 260 Elo in favor for the old versions of Shredder

Really...there are a lot of wrong things in our life...sad, but true !

Uri Blass · Post by **Uri Blass** » Thu Aug 28, 2014 11:31 am

Sedat Canbaz wrote:Note: Shredder includes different old versions (as far I remember Shredder 6, Shredder 7, Shredder 8, Shredder 9)

Probably the latest Shredder 12 is expecting to be performed much better against GMs

For example, CEGT 40/4
Shredder 12 x64 1CPU 2800 Elo
Shredder 9.1 1CPU 2539 Elo

Elo Difference: 261 Elo

And unfortunately CEGT rated Shredder 9.1 around 2500 Elo
Is this values are true in reality, I mean Shredder 9.1 is weaker than Top GMs ???

But if we look at the crosstable, we see 260 Elo in favor for the old versions of Shredder

Really...there are a lot of wrong things in our life...sad, but true !

Computer rating list is not supposed to give rating against humans
and it is impossible for rating lists to do it because difference in rating is not equivalent.

If program A is 800 elo stronger than program B in computer rating list then it is probably clearly less than 800 elo stronger against humans.

Nakamura vs Stockfish, public match 8/23

Who will win the four-game match?

Re: Nakamura vs Stockfish, public match 8/23

Re: Nakamura vs Stockfish, public match 8/23

Re: Nakamura vs Stockfish, public match 8/23

Re: Nakamura vs Stockfish, public match 8/23

Re: Nakamura vs Stockfish, public match 8/23

Re: Nakamura vs Stockfish, public match 8/23

Re: Nakamura vs Stockfish, public match 8/23

Re: Nakamura vs Stockfish, public match 8/23

Re: Nakamura vs Stockfish, public match 8/23

Re: Nakamura vs Stockfish, public match 8/23