Positional ratinglist updated: Naum 3 improved by 82 ELO!

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Yarget

Positional ratinglist updated: Naum 3 improved by 82 ELO!

Post by Yarget »

Hello everyone!

As some of you might know I have recently started to make 2 ratinglists based on fixed openingpositions. One of them is based on 10 positional and mostly closed positions while the other one is based on 10 gambits (including some very sharp ones). For further details look here:

http://64.68.157.89/forum/viewtopic.php?t=18891

A couple of days ago I presented the Gambit testresults for Naum 3 in which the new Naumversion proved to be 67 ELO stronger than its predecessor, look here for details:

http://64.68.157.89/forum/viewtopic.php?t=19653

I have now completed the Positonal testgames for Naum 3. Before presenting the results and the updated ratinglist let's have a look at the latest Positional ratinglist:

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Rybka 2.3.2a mp 32-bit         : 2942   41  40   220    70.9 %   2787   30.0 %
  2 Toga II 1.4 beta5c             : 2855   38  37   220    58.6 %   2795   34.5 %
  3 Deep Shredder 11 UCI           : 2835   38  38   220    55.5 %   2796   32.7 %
  4 Deep Fritz 10                  : 2832   38  38   220    55.0 %   2797   30.9 %
  5 Zap!Chess Zanzibar             : 2807   37  37   220    51.1 %   2799   35.0 %
  6 LoopMP 11A.32                  : 2790   37  37   220    48.4 %   2801   36.8 %
  7 SpikeMP 1.2 Turin              : 2790   37  37   220    48.4 %   2801   34.1 %
  8 Deep Junior 10.1               : 2787   42  42   220    48.0 %   2801   18.6 %
  9 HIARCS 11.1 MP UCI             : 2781   37  37   220    47.0 %   2801   35.0 %
 10 Naum 2.2                       : 2774   35  36   220    45.9 %   2802   40.9 %
 11 Glaurung 2.0.1                 : 2709   39  39   220    36.1 %   2808   30.5 %
 12 bright-0.2c                    : 2701   39  40   220    35.0 %   2809   30.0 %
As you see Naum 2.2 had a rather difficult time in the positional games. Naum was simply drawing too many games (40,9%!) to make a strong result. Here comes the results for Naum 3 in the Positional games:

Code: Select all

2 Naum 3                    : 2856  220 (+ 92,= 74,- 54), 58.6 %

Rybka 2.3.2a mp 32-bit        :  20 (+  4,= 12,-  4), 50.0 %
Deep Shredder 11 UCI          :  20 (+  7,=  4,-  9), 45.0 %
Deep Junior 10.1              :  20 (+  8,=  7,-  5), 57.5 %
HIARCS 11.1 MP UCI            :  20 (+ 11,=  7,-  2), 72.5 %
Deep Fritz 10                 :  20 (+  5,=  7,-  8), 42.5 %
LoopMP 11A.32                 :  20 (+ 12,=  4,-  4), 70.0 %
SpikeMP 1.2 Turin             :  20 (+  7,=  6,-  7), 50.0 %
Glaurung 2.0.1                :  20 (+ 13,=  6,-  1), 80.0 %
Zap!Chess Zanzibar            :  20 (+  8,=  9,-  3), 62.5 %
Toga II 1.4 beta5c            :  20 (+  7,=  7,-  6), 52.5 %
bright-0.2c                   :  20 (+ 10,=  5,-  5), 62.5 %
Pay attention to the first result: yes, Naum 3 managed to draw the match against Rybka!! In the last 2-3 years I have been running many testgames (PEJ-Ratinglist, CSS SMP Ratinglist and now these testgames) with Rybka and it has won every single match (no matter the conditions and no matter which Rybkaversion I have been testing) I have made. Indeed an amazing performance by Naum that indicates that a lot of qualities is hidden in this engine. Having said that it should be mentioned that Naum performed less well in some of the other matches (defeats against Fritz and Shredder and only 10-10 against Spike). All in all Naum 3 made a good performance which the following updated Positional Ratinglist shows:

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Rybka 2.3.2a mp 32-bit         : 2928   40  39   220    69.1 %   2788   31.8 %
  2 Naum 3                         : 2856   38  38   220    58.6 %   2795   33.6 %
  3 Toga II 1.4 beta5c             : 2848   37  37   220    57.5 %   2796   35.0 %
  4 Deep Shredder 11 UCI           : 2841   39  39   220    56.4 %   2796   30.0 %
  5 Deep Fritz 10                  : 2832   39  39   220    55.0 %   2797   30.0 %
  6 SpikeMP 1.2 Turin              : 2788   38  38   220    48.2 %   2801   32.7 %
  7 Zap!Chess Zanzibar             : 2787   37  37   220    48.0 %   2801   36.8 %
  8 LoopMP 11A.32                  : 2783   37  37   220    47.3 %   2802   34.5 %
  9 Deep Junior 10.1               : 2781   42  42   220    47.0 %   2802   18.6 %
 10 HIARCS 11.1 MP UCI             : 2767   37  37   220    44.8 %   2803   35.0 %
 11 bright-0.2c                    : 2700   40  41   220    34.8 %   2809   27.7 %
 12 Glaurung 2.0.1                 : 2690   40  40   220    33.4 %   2810   28.6 %
The exact difference in playingstrength for Naum 2.2 and Naum 3 regarding the Positional games is therefore 82 ratingpoints! Indeed a strong and substantial improvement achieved by Alex Naumov although the original aim (100 ELO improvement) wasn't confirmed under these testconditions. Rybka and Vas are still the dominant number 1 but Naum (at least under these testconditions) has passed engines like Shredder, Toga and Fritz (if only by a few ratingpoints). Besides from the obvious improvement in playingstrength it's worth mentioning that the drawfrequency of Naum 3 has decreased a lot compared to Naum 2.2 (from 40,9% to 33,6%). It's my impression that Naum 3 is playing more offensive than Naum 2.2 and I consider the substantial decrease in draws as another expression for this.

A s a result of the completed Naum 3 test I have updated the "allround" ratinglist (the 2 lists combined):

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Rybka 2.3.2a mp 32-bit         : 2942   29  29   440    71.0 %   2787   28.9 %
  2 Naum 3                         : 2872   28  28   440    61.1 %   2793   29.5 %
  3 Toga II 1.4 beta5c             : 2854   27  27   440    58.4 %   2795   32.7 %
  4 Deep Shredder 11 UCI           : 2844   28  28   440    56.9 %   2796   28.0 %
  5 Deep Fritz 10                  : 2826   28  28   440    54.2 %   2797   25.7 %
  6 LoopMP 11A.32                  : 2796   27  27   440    49.4 %   2800   31.6 %
  7 HIARCS 11.1 MP UCI             : 2795   27  27   440    49.3 %   2800   30.9 %
  8 Zap!Chess Zanzibar             : 2782   27  27   440    47.3 %   2801   32.3 %
  9 Deep Junior 10.1               : 2735   30  30   440    40.0 %   2806   17.7 %
 10 SpikeMP 1.2 Turin              : 2734   28  28   440    39.9 %   2806   28.0 %
 11 Glaurung 2.0.1                 : 2718   28  28   440    37.5 %   2807   26.8 %
 12 bright-0.2c                    : 2700   29  29   440    34.9 %   2809   23.9 %
Not surprisingly Naum 3 is second in this ratinglist and exactly 75 ELO stronger than Naum 2.2. The distance is now 70 ELO up to Rybka so if Alex could make another improvement of about 75 ELO tomorrow then Naum would eqaulize the playingstrength of Rybka :)

Although Naum 3 is very strong and much improved to Naum 2.2 I have the feeling that Deep Fritz 11 is going to be even stronger and perhaps reaching Rybka. However Rybka 3 is not far away and apparently a clear improvement to the current version 2.3.2a. We'll see.

Regards
Per
Spock

Re: Positional ratinglist updated: Naum 3 improved by 82 ELO

Post by Spock »

Thanks for the update Per.

Under most conditions I think Naum 3 is probably in the range 60-85 ELO stronger. I got +60 ELO for FRC blitz on a single CPU, and that I think is a worse case scenario for Naum. Naum prefers normal chess I think, and give it more time and more CPUs and it starts to shine
Erik Roggenburg

Re: Positional ratinglist updated: Naum 3 improved by 82 ELO

Post by Erik Roggenburg »

Looks like Naum 3 is about $1 per Elo point gained since Naum 2.2. :D

Not too shabby.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Positional ratinglist updated: Naum 3 improved by 82 ELO

Post by Laskos »

I put both your results for Pos. and Gamb. in one post.



Positionalratinglist:

Code: Select all

   Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Rybka 2.3.2a mp 32-bit         : 2928   40  39   220    69.1 %   2788   31.8 %
  2 Naum 3                         : 2856   38  38   220    58.6 %   2795   33.6 %
  3 Toga II 1.4 beta5c             : 2848   37  37   220    57.5 %   2796   35.0 %
  4 Deep Shredder 11 UCI           : 2841   39  39   220    56.4 %   2796   30.0 %
  5 Deep Fritz 10                  : 2832   39  39   220    55.0 %   2797   30.0 %
  6 SpikeMP 1.2 Turin              : 2788   38  38   220    48.2 %   2801   32.7 %
  7 Zap!Chess Zanzibar             : 2787   37  37   220    48.0 %   2801   36.8 %
  8 LoopMP 11A.32                  : 2783   37  37   220    47.3 %   2802   34.5 %
  9 Deep Junior 10.1               : 2781   42  42   220    47.0 %   2802   18.6 %
 10 HIARCS 11.1 MP UCI             : 2767   37  37   220    44.8 %   2803   35.0 %
 11 bright-0.2c                    : 2700   40  41   220    34.8 %   2809   27.7 %
 12 Glaurung 2.0.1                 : 2690   40  40   220    33.4 %   2810   28.6 %



Gambitratinglist:

Code: Select all

     Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Rybka 2.3.2a mp 32-bit         : 2958   43  42   220    73.0 %   2786   25.9 %
  2 Naum 3                         : 2889   41  41   220    63.6 %   2792   25.5 %
  3 Toga II 1.4 beta5c             : 2860   39  39   220    59.3 %   2794   30.5 %
  4 Deep Shredder 11 UCI           : 2848   40  40   220    57.5 %   2796   25.9 %
  5 HIARCS 11.1 MP UCI             : 2825   40  39   220    53.9 %   2798   26.8 %
  6 Deep Fritz 10                  : 2822   41  41   220    53.4 %   2798   21.4 %
  7 LoopMP 11A.32                  : 2810   39  39   220    51.6 %   2799   28.6 %
  8 Zap!Chess Zanzibar             : 2778   39  39   220    46.6 %   2802   27.7 %
  9 Glaurung 2.0.1                 : 2746   40  40   220    41.6 %   2805   25.0 %
 10 bright-0.2c                    : 2701   42  43   220    35.0 %   2809   20.0 %
 11 Deep Junior 10.1               : 2687   44  45   220    33.0 %   2810   16.8 %
 12 SpikeMP 1.2 Turin              : 2677   42  43   220    31.6 %   2811   23.2 %


Positional minus Gambit

Rybka -30
Naum3 -33
Toga -12
Shredder -7
Fritz +10
Spike +111
Zappa +9
Loop -27
Junior +94
Hiarcs -58
bright -1
Glaurung -56



The results for Spike and Junior are certainly significant. For Hiarcs, Glaurung we can be almost certain, and for Naum and Rybka we can suspect that they prefer Gambit games.
Kai
Yarget

Re: Positional ratinglist updated: Naum 3 improved by 82 ELO

Post by Yarget »

Hello Ray, Erik and Kai!

Thanks for your comments.

To you Ray: yeah, I think you are right that the improvement of Naum are in the range of 60-85 ELO-points. However I won't be surprised if Heinz at CEGT 40/120 on a Quad can squeeze some more ELO-points out of Naum. Long TC's and big hardware should favour Naum.

To you Erik: Have you started tesing Naum 3? I always follow your ratinglist with great interest.

To you Kai: Thanks for updating the "sensitive" list. Yes, Spike and Junior remain very sensitive/unsensitive while Bright really is to be considered as a great all-round engine.

Regards
Per
Yarget

Re: Positional ratinglist updated: Naum 3 improved by 82 ELO

Post by Yarget »

sensitive/unsensitive
Upss, I meant sensitive/unbalanced :)
Erik Roggenburg

Re: Positional ratinglist updated: Naum 3 improved by 82 ELO

Post by Erik Roggenburg »

Yarget wrote:Hello Ray, Erik and Kai!

Thanks for your comments.

To you Ray: yeah, I think you are right that the improvement of Naum are in the range of 60-85 ELO-points. However I won't be surprised if Heinz at CEGT 40/120 on a Quad can squeeze some more ELO-points out of Naum. Long TC's and big hardware should favour Naum.

To you Erik: Have you started tesing Naum 3? I always follow your ratinglist with great interest.

To you Kai: Thanks for updating the "sensitive" list. Yes, Spike and Junior remain very sensitive/unsensitive while Bright really is to be considered as a great all-round engine.

Regards
Per
Actually, I've been busy using Naum 3 as an analysis partner for some middle game positions in a line from a Ruy Lopez. I haven't had a chance to test Naum 3 yet, but I'd like to for a frame of reference.

-Erik
Uri Blass
Posts: 10927
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Positional ratinglist updated: Naum 3 improved by 82 ELO

Post by Uri Blass »

Yarget wrote:Hello Ray, Erik and Kai!

Thanks for your comments.

To you Ray: yeah, I think you are right that the improvement of Naum are in the range of 60-85 ELO-points. However I won't be surprised if Heinz at CEGT 40/120 on a Quad can squeeze some more ELO-points out of Naum. Long TC's and big hardware should favour Naum.

To you Erik: Have you started tesing Naum 3? I always follow your ratinglist with great interest.

To you Kai: Thanks for updating the "sensitive" list. Yes, Spike and Junior remain very sensitive/unsensitive while Bright really is to be considered as a great all-round engine.

Regards
Per
I see no reason to expect Naum3 to earn more from long time control.
Based on everything that I see the ranking of top programs is almost the same in all time controls.

Naum2.2 already seems to earn slightly more from long time control based on rating lists so I see no reason to expect Naum3 to earn more than Naum2.2

Uri
Spock

Re: Positional ratinglist updated: Naum 3 improved by 82 ELO

Post by Spock »

Agreed, I doubt if the 40/120 testing would show anything more than +85
Yarget

Re: Positional ratinglist updated: Naum 3 improved by 82 ELO

Post by Yarget »

I agree that the chances that Naum 3 in CEGT Quad 40/120 will exceed +85 ELO compared to Naum 2.2 are less than likely (less than 50%). However, as Heinz van Kempen wrote recently "especially what Naum concerns all depends on time and amount of cores". The following statistic shows that Naum is one of the engines that benefit the most from longer TC's:

http://www.husvankempen.de/nunn/Replay/ ... arison.htm

Let's wait and see the CEGT 40/120 Quad results.

Regards
Per