MYG

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: MYG

Post by Laskos »

JJJ wrote:I m just hoping it's not Stockfish dev.
This program does so far 54% against Stockfish 8 like Stockfish dev did in regression test.
I also hope so, the performance and shapes look similar to a good Stockfish dev. I can already predict that the final performance in Ordo rating after the full test (3520 games) will be in the range [3325, 3355] with 95% confidence, so this is most probably the new leader on IPON. After some 1500 games, a more involved weighted fit gives:

Image

Which in general is indicative of Shredder and Stockfish. But against top 3, Houdini seems to fit better. Hard to say. Also, if the score of X versus Stockfish 8 is 51 to 45, and if the draw rate is close to 70% (that's the draw rate between close in strength top opponents on IPON), then the score is probably close to +17 -11 =68, not that bad even against Stockfish 8. But again, close to Stockfish dev.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: MYG

Post by Laskos »

Laskos wrote:
JJJ wrote:I m just hoping it's not Stockfish dev.
This program does so far 54% against Stockfish 8 like Stockfish dev did in regression test.
I also hope so, the performance and shapes look similar to a good Stockfish dev. I can already predict that the final performance in Ordo rating after the full test (3520 games) will be in the range [3325, 3355] with 95% confidence, so this is most probably the new leader on IPON. After some 1500 games, a more involved weighted fit gives:

Image

Which in general is indicative of Shredder and Stockfish. But against top 3, Houdini seems to fit better. Hard to say. Also, if the score of X versus Stockfish 8 is 51 to 45, and if the draw rate is close to 70% (that's the draw rate between close in strength top opponents on IPON), then the score is probably close to +17 -11 =68, not that bad even against Stockfish 8. But again, close to Stockfish dev.
If you look at the slope of Komodo with its positive Contempt, it has the opposite trend. From an older post of mine, FGRL rating list at LTC:

Image

Stockfish had a slope similar shown in IPON with this engine X, and IIRC Shredder too. Houdini IIRC had almost no clear slope at all.
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: MYG

Post by JJJ »

Stockfish dev would be probably a little less good because most engine increase their strenght at mid time control.

If it is shredder, it is good because we re gonna have 4 top engines about same strength. But I don't think it is. I don't see Shredder being above Komodo after loosing to it at WCCC.

On the other hand, Houdini is coming this month and had big progress last time in few time.
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: MYG

Post by JJJ »

Funny thing is I remember you making a graph of the futur of the elo of engines, with Stockfish remaining the best and the others trying to catch it.

And now, Stockfish might loose his first place and maybe his second if the engines progress of Komodo Houdini and Stockfish remain the same. Nice race anyway !

Edit, I m not sure this engine would beat Stockfish dev in direct encounter.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: MYG

Post by Laskos »

JJJ wrote:
Edit, I m not sure this engine would beat Stockfish dev in direct encounter.
Yes, if the score of this engine against Stockfish 8 is something like +17 -11 = 68, Stockfish dev is performing at least as well against Stockfish 8. The direct matches of Stockfishes are very drawish, and generally Stockfish has some sort of "draw bug" and often uselessly draws against weaker engines, thus its rating is deflated. I think the use of positive Contempt for rating lists would significantly increase Stockfish rating on these lists.
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: MYG

Post by JJJ »

I think there was some test and nothing really conclusive.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: MYG

Post by Laskos »

JJJ wrote:I think there was some test and nothing really conclusive.
I mean in rating lists, where almost all engines are weaker or much weaker. Not Fishtest self-games. In self-games Contempt surely cannot help.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: MYG

Post by IWB »

Laskos wrote:... I think the use of positive Contempt for rating lists would significantly increase Stockfish rating on these lists.
I consider that a myth repeated to often.

I played a full set of games against all opponents with SF 7 on the 18th of June 2016 and it (Version 7 at that time) gained 4 Elo with a Contempt of 20 (which was the prefered contempt in discussions at that date). 4 Elo is less than half one SD in my list, so basicaly it was noise and I might get the same with just repeating the normal SF games ...
You find the full information on my main page on that date (and on the 20th.06.16 the games of Komodo 10 without contempt).

Regards
Ingo
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: MYG

Post by JJJ »

And I think the draw weakness should be removed by not using contempt.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: MYG

Post by IWB »

After about 50% of the game the score against SF is +23, =70, 1-19

Because you made some well educated guesses, the rating with a propper ORDO calc would be:

Code: Select all

   1 NEW                          :   3353     16   80.5%    31.8    3080     100        1435.5    1152     567      65    1784
   2 Komodo 11.2.2                :   3319     11   78.3%    35.8    3073      90        2671.5    2060    1223     130    3413
   3 Stockfish 8                  :   3310      9   77.5%    39.6    3069      91        4861.5    3621    2481     170    6272
   4 Komodo 11.01                 :   3302     11   79.4%    34.3    3049      97        3317.0    2601    1432     147    4180
   5 Komodo 10.4                  :   3289     11   78.0%    36.2    3048      69        2745.5    2108    1275     137    3520
   6 Houdini 5.01                 :   3285      9   74.8%    39.9    3074     100        4358.5    3195    2327     308    5830
Huge error margin of course.

Ingo