There is one important point that I think everyone commenting here has overlooked. Engine vs. engine rating lists (except maybe SSDF) are almost all done using some set of opening positions that are fairly representative of human play (maybe GM play, maybe not). But if a GM is playing an engine (without a sufficient handicap) and getting paid per point scored, he will presumably play for a draw in every game and choose the most drawish move possible at every point in the opening. Of course the engine bookmaker can do the opposite, but it is much easier to head for a draw than to avoid one without making clearly inferior moves. Probably the GM will rarely achieve a draw with Black, but with White it might be possible to achieve drawish positions most of the time. So even if the rating difference based on engine vs engine testing is 800 elo, meaning a 1% expected score for the GM, I think it is highly likely that he will be able to draw far more than 2 out of 100 games, of which he will have fifty chances with White.
The only actual data we have on this is the White plus draw odds handicap match of Rybka (I think close to Rybka 3) vs. GM Joel Benjamin. He achieved just two draws in eight games. Assuming that he would have achieved one draw in another eight games as Black, his score would have been a bit under 10%, implying a rating gap of 400. These games were played on a quad (could have been 8 core, although I think not) with a very well optimized book for the purpose of avoiding draws, and some engine code with that goal as well. I think Joel was 2570 at the time, so a good estimate for Rybka 3 on a quad might be 2970.
In general over the years I've found that reducing engine vs. engine elo differences by 25% gives good predictions of results vs. humans.
Nakamura vs Stockfish, public match 8/23
Moderator: Ras
-
lkaufman
- Posts: 6259
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
-
syzygy
- Posts: 5784
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Nakamura vs Stockfish, public match 8/23
At least Uri mentioned this as well (and I agree with it), but for the subthread hardware vs software improvement it does not make so much difference. In fact, I would be surprised if a 100 machine-machine Elo improvement due to software improvements (i.e. better search, better eval) is not worth more versus humans than a 100 machine-machine Elo improvement due to hardware improvements (i.e. more nodes per second).lkaufman wrote:There is one important point that I think everyone commenting here has overlooked.
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Nakamura vs Stockfish, public match 8/23
That's correct. I was only talking about computer ratings, and separation hardware-software there is measured by the same points. Porting it to human ratings is completely different matter, and it will subjected to human ratings probably independently hardware/software. Basically, one would rescale (compress) computer ratings by a factor >1 to get human ratings. And estimations of top dogs of 3200 FIDE instead of 3500 FIDE are probably more or less correct.lkaufman wrote:There is one important point that I think everyone commenting here has overlooked. Engine vs. engine rating lists (except maybe SSDF) are almost all done using some set of opening positions that are fairly representative of human play (maybe GM play, maybe not). But if a GM is playing an engine (without a sufficient handicap) and getting paid per point scored, he will presumably play for a draw in every game and choose the most drawish move possible at every point in the opening. Of course the engine bookmaker can do the opposite, but it is much easier to head for a draw than to avoid one without making clearly inferior moves. Probably the GM will rarely achieve a draw with Black, but with White it might be possible to achieve drawish positions most of the time. So even if the rating difference based on engine vs engine testing is 800 elo, meaning a 1% expected score for the GM, I think it is highly likely that he will be able to draw far more than 2 out of 100 games, of which he will have fifty chances with White.
The only actual data we have on this is the White plus draw odds handicap match of Rybka (I think close to Rybka 3) vs. GM Joel Benjamin. He achieved just two draws in eight games. Assuming that he would have achieved one draw in another eight games as Black, his score would have been a bit under 10%, implying a rating gap of 400. These games were played on a quad (could have been 8 core, although I think not) with a very well optimized book for the purpose of avoiding draws, and some engine code with that goal as well. I think Joel was 2570 at the time, so a good estimate for Rybka 3 on a quad might be 2970.
In general over the years I've found that reducing engine vs. engine elo differences by 25% gives good predictions of results vs. humans.
-
lkaufman
- Posts: 6259
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Nakamura vs Stockfish, public match 8/23
Yes, I quite agree with everything you say here. Gains due to improved evaluation ought to be worth more against humans than the same gain due to improved hardware speed. But they will still be less than predicted by engine vs. engine games with variety opening books.syzygy wrote:At least Uri mentioned this as well (and I agree with it), but for the subthread hardware vs software improvement it does not make so much difference. In fact, I would be surprised if a 100 machine-machine Elo improvement due to software improvements (i.e. better search, better eval) is not worth more versus humans than a 100 machine-machine Elo improvement due to hardware improvements (i.e. more nodes per second).lkaufman wrote:There is one important point that I think everyone commenting here has overlooked.
-
Sedat Canbaz
- Posts: 3018
- Joined: Thu Mar 09, 2006 11:58 am
- Location: Antalya/Turkey
Re: Nakamura vs Stockfish, public match 8/23
Just my 2 cents more over this issue,
I expect,
Rybka 3 + Perfect 16 book + on my i7 980X machine to be approx. 3300 Elo
In other words: Rybka 3 is at least 500 Elo stronger than GMs of 2800 Elo
And I have no patience to see a X person, to prove...that I am wrong !
What does that mean: no more comments please, let's just play )))
You know my address...I am ready every time for a such challenge !!!
I expect,
Rybka 3 + Perfect 16 book + on my i7 980X machine to be approx. 3300 Elo
In other words: Rybka 3 is at least 500 Elo stronger than GMs of 2800 Elo
And I have no patience to see a X person, to prove...that I am wrong !
What does that mean: no more comments please, let's just play )))
You know my address...I am ready every time for a such challenge !!!
-
Milos
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Nakamura vs Stockfish, public match 8/23
This is where you are totally wrong. There are no clearly inferior moves when you are playing 500Elo weaker opponent. It is enough for an engine to force unknown complicated open game even with eval of -100cp, and there are way to many for a human to prepare them even if you drop your eval to -50cp. And if you allow engine to use a large enough book, human is lost, period.lkaufman wrote:There is one important point that I think everyone commenting here has overlooked. Engine vs. engine rating lists (except maybe SSDF) are almost all done using some set of opening positions that are fairly representative of human play (maybe GM play, maybe not). But if a GM is playing an engine (without a sufficient handicap) and getting paid per point scored, he will presumably play for a draw in every game and choose the most drawish move possible at every point in the opening. Of course the engine bookmaker can do the opposite, but it is much easier to head for a draw than to avoid one without making clearly inferior moves.
If you let any human, even Carlsen into an unknown opening game even with 100cp advantage chance for him to draw becomes virtually zero, because, humans without opening theory lose much more than machines.
-
lkaufman
- Posts: 6259
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Nakamura vs Stockfish, public match 8/23
Interesting points. But no one (as far as I know) has written an engine that aims for open, complex positions even at the cost of 50 cp or so. Maybe if someone did so you might be right, but I am talking about actual engines of today. Just having a huge book can't insure getting a double-edged position when you are Black; every interesting defense to mainline openings allows some way for White to simplify and reach a drawish position if that is his goal. Maybe someone could create a book that would avoid drawish positions as Black at the cost of maybe 50 CP given correct play, but here too I don't think such a book actually exists today.
-
Sedat Canbaz
- Posts: 3018
- Joined: Thu Mar 09, 2006 11:58 am
- Location: Antalya/Turkey
Re: Nakamura vs Stockfish, public match 8/23
Larry,lkaufman wrote:There is one important point that I think everyone commenting here has overlooked.
There are more important points that I think some chess friends missed to check or can't see the reality:
1) My published GMs vs Engine rating list (games source: Ed Schröder)
2)SCCT Book Tournaments (the participants are very strong Book Makers)
3)My challenges (I am already ready to organize without a prize
etc and etc...
Btw, I suggest again to be checked the below links (those links have serious provings...that nowadays the Top 5 engines are expecting at least 500 Elo stronger than GMs)
http://www.talkchess.com/forum/viewtopi ... ight=sedat
http://www.talkchess.com/forum/viewtopi ... ight=sedat
http://www.talkchess.com/forum/viewtopi ... ight=sedat
http://www.rebel.nl/resu.htm
http://www.hiarcs.com/Games/Mercosur2009/mercosur09.htm
.

.

.

Rybka 2970 Elo ? come on ?! )) Is there any prove ??lkaufman wrote: I think Joel was 2570 at the time, so a good estimate for Rybka 3 on a quad might be 2970.
As far as I know we need many hundreds of games per player to say Elo
And on which quad ? and with which opening book ??)
Note that there are quads:
Code: Select all
kN/s Cores EXE Processors Speed Hardware Users
8434 4 x64 Intel Core i5-4670K @4.50GHz Dark Wizzie
473 4 w32 Lenovo A3000 Quad Core 1.20GHz Souvik Chakrabortyhttp://www.sedatcanbaz.com/chess/?page_id=19
The Hardware Elo difference is expecting approx: 250 Elo
Note also that exception the importance of the hardware speed, the opening book play another BIG role too:
Code: Select all
Rank Name Elo + - games score oppo. draws
1 SedatChess, Rybka 3 3368 11 11 2727 69% 3243 45%
350 Offer koning, Rybka 3 3080 27 28 457 31% 3203 35% http://www.sedatcanbaz.com/chess/?page_id=473
The Opening Book Elo difference is approx: 280 Elo
Hopes this time helps....!!
-
Sedat Canbaz
- Posts: 3018
- Joined: Thu Mar 09, 2006 11:58 am
- Location: Antalya/Turkey
Re: Nakamura vs Stockfish, public match 8/23
Note: Shredder includes different old versions (as far I remember Shredder 6, Shredder 7, Shredder 8, Shredder 9)
Probably the latest Shredder 12 is expecting to be performed much better against GMs
For example, CEGT 40/4
Shredder 12 x64 1CPU 2800 Elo
Shredder 9.1 1CPU 2539 Elo
Elo Difference: 261 Elo
And unfortunately CEGT rated Shredder 9.1 around 2500 Elo
Is this values are true in reality, I mean Shredder 9.1 is weaker than Top GMs ???
But if we look at the crosstable, we see 260 Elo in favor for the old versions of Shredder
Really...there are a lot of wrong things in our life...sad, but true !
Probably the latest Shredder 12 is expecting to be performed much better against GMs
For example, CEGT 40/4
Shredder 12 x64 1CPU 2800 Elo
Shredder 9.1 1CPU 2539 Elo
Elo Difference: 261 Elo
And unfortunately CEGT rated Shredder 9.1 around 2500 Elo
Is this values are true in reality, I mean Shredder 9.1 is weaker than Top GMs ???
But if we look at the crosstable, we see 260 Elo in favor for the old versions of Shredder
Really...there are a lot of wrong things in our life...sad, but true !
-
Uri Blass
- Posts: 10915
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Nakamura vs Stockfish, public match 8/23
Computer rating list is not supposed to give rating against humansSedat Canbaz wrote:Note: Shredder includes different old versions (as far I remember Shredder 6, Shredder 7, Shredder 8, Shredder 9)
Probably the latest Shredder 12 is expecting to be performed much better against GMs
For example, CEGT 40/4
Shredder 12 x64 1CPU 2800 Elo
Shredder 9.1 1CPU 2539 Elo
Elo Difference: 261 Elo
And unfortunately CEGT rated Shredder 9.1 around 2500 Elo
Is this values are true in reality, I mean Shredder 9.1 is weaker than Top GMs ???
But if we look at the crosstable, we see 260 Elo in favor for the old versions of Shredder
Really...there are a lot of wrong things in our life...sad, but true !
and it is impossible for rating lists to do it because difference in rating is not equivalent.
If program A is 800 elo stronger than program B in computer rating list then it is probably clearly less than 800 elo stronger against humans.