MYG

Laskos · Post by **Laskos** » Sat Sep 09, 2017 10:06 am

JJJ wrote:I m just hoping it's not Stockfish dev.
This program does so far 54% against Stockfish 8 like Stockfish dev did in regression test.

I also hope so, the performance and shapes look similar to a good Stockfish dev. I can already predict that the final performance in Ordo rating after the full test (3520 games) will be in the range [3325, 3355] with 95% confidence, so this is most probably the new leader on IPON. After some 1500 games, a more involved weighted fit gives:

Which in general is indicative of Shredder and Stockfish. But against top 3, Houdini seems to fit better. Hard to say. Also, if the score of X versus Stockfish 8 is 51 to 45, and if the draw rate is close to 70% (that's the draw rate between close in strength top opponents on IPON), then the score is probably close to +17 -11 =68, not that bad even against Stockfish 8. But again, close to Stockfish dev.

Laskos · Post by **Laskos** » Sat Sep 09, 2017 10:42 am

Laskos wrote:
JJJ wrote:I m just hoping it's not Stockfish dev.
This program does so far 54% against Stockfish 8 like Stockfish dev did in regression test.
I also hope so, the performance and shapes look similar to a good Stockfish dev. I can already predict that the final performance in Ordo rating after the full test (3520 games) will be in the range [3325, 3355] with 95% confidence, so this is most probably the new leader on IPON. After some 1500 games, a more involved weighted fit gives:

Which in general is indicative of Shredder and Stockfish. But against top 3, Houdini seems to fit better. Hard to say. Also, if the score of X versus Stockfish 8 is 51 to 45, and if the draw rate is close to 70% (that's the draw rate between close in strength top opponents on IPON), then the score is probably close to +17 -11 =68, not that bad even against Stockfish 8. But again, close to Stockfish dev.

If you look at the slope of Komodo with its positive Contempt, it has the opposite trend. From an older post of mine, FGRL rating list at LTC:

Stockfish had a slope similar shown in IPON with this engine X, and IIRC Shredder too. Houdini IIRC had almost no clear slope at all.

JJJ · Post by **JJJ** » Sat Sep 09, 2017 10:54 am

Stockfish dev would be probably a little less good because most engine increase their strenght at mid time control.

If it is shredder, it is good because we re gonna have 4 top engines about same strength. But I don't think it is. I don't see Shredder being above Komodo after loosing to it at WCCC.

On the other hand, Houdini is coming this month and had big progress last time in few time.

JJJ · Post by **JJJ** » Sat Sep 09, 2017 10:56 am

Funny thing is I remember you making a graph of the futur of the elo of engines, with Stockfish remaining the best and the others trying to catch it.

And now, Stockfish might loose his first place and maybe his second if the engines progress of Komodo Houdini and Stockfish remain the same. Nice race anyway !

Edit, I m not sure this engine would beat Stockfish dev in direct encounter.

Laskos · Post by **Laskos** » Sat Sep 09, 2017 11:06 am

JJJ wrote:
Edit, I m not sure this engine would beat Stockfish dev in direct encounter.

Yes, if the score of this engine against Stockfish 8 is something like +17 -11 = 68, Stockfish dev is performing at least as well against Stockfish 8. The direct matches of Stockfishes are very drawish, and generally Stockfish has some sort of "draw bug" and often uselessly draws against weaker engines, thus its rating is deflated. I think the use of positive Contempt for rating lists would significantly increase Stockfish rating on these lists.

JJJ · Post by **JJJ** » Sat Sep 09, 2017 11:17 am

I think there was some test and nothing really conclusive.

Laskos · Post by **Laskos** » Sat Sep 09, 2017 11:40 am

JJJ wrote:I think there was some test and nothing really conclusive.

I mean in rating lists, where almost all engines are weaker or much weaker. Not Fishtest self-games. In self-games Contempt surely cannot help.

IWB · Post by **IWB** » Sat Sep 09, 2017 12:01 pm

Laskos wrote:... I think the use of positive Contempt for rating lists would significantly increase Stockfish rating on these lists.

I consider that a myth repeated to often.

I played a full set of games against all opponents with SF 7 on the 18th of June 2016 and it (Version 7 at that time) gained 4 Elo with a Contempt of 20 (which was the prefered contempt in discussions at that date). 4 Elo is less than half one SD in my list, so basicaly it was noise and I might get the same with just repeating the normal SF games ...
You find the full information on my main page on that date (and on the 20th.06.16 the games of Komodo 10 without contempt).

Regards
Ingo

JJJ · Post by **JJJ** » Sat Sep 09, 2017 12:09 pm

And I think the draw weakness should be removed by not using contempt.

IWB · Post by **IWB** » Sat Sep 09, 2017 12:16 pm

After about 50% of the game the score against SF is +23, =70, 1-19

Because you made some well educated guesses, the rating with a propper ORDO calc would be:

Code: Select all

   1 NEW                          &#58;   3353     16   80.5%    31.8    3080     100        1435.5    1152     567      65    1784
   2 Komodo 11.2.2                &#58;   3319     11   78.3%    35.8    3073      90        2671.5    2060    1223     130    3413
   3 Stockfish 8                  &#58;   3310      9   77.5%    39.6    3069      91        4861.5    3621    2481     170    6272
   4 Komodo 11.01                 &#58;   3302     11   79.4%    34.3    3049      97        3317.0    2601    1432     147    4180
   5 Komodo 10.4                  &#58;   3289     11   78.0%    36.2    3048      69        2745.5    2108    1275     137    3520
   6 Houdini 5.01                 &#58;   3285      9   74.8%    39.9    3074     100        4358.5    3195    2327     308    5830

Huge error margin of course.

Ingo

Re: MYG

Re: MYG

Re: MYG

Re: MYG

Re: MYG

Re: MYG

Re: MYG

Re: MYG

Re: MYG

Re: MYG