Gambit ratinglist updated: Naum 3 improved by 67 ELO

Yarget · Post by **Yarget** » Sat Feb 16, 2008 12:04 am

Hello everyone!

As some of you might know I have recently started to make 2 ratinglists based on fixed openingpositions. One of them is based on 10 positional and mostly closed positions while the other one is based on 10 gambits (including some very sharp ones). For further details look here:

http://64.68.157.89/forum/viewtopic.php?t=18891

I have now completed the gambittests for the new Naum 3. Before presenting the results and the updated ratinglist let's have a look at the latest gambitratinglist:

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws 

  1 Rybka 2.3.2a mp 32-bit         : 2956   42  42   220    72.7 %   2786   27.3 % 
  2 Toga II 1.4 beta5c             : 2869   39  39   220    60.7 %   2794   29.5 % 
  3 Deep Shredder 11 UCI           : 2852   40  40   220    58.2 %   2795   25.5 % 
  4 Deep Fritz 10                  : 2833   41  41   220    55.2 %   2797   20.5 % 
  5 HIARCS 11.1 MP UCI             : 2829   39  39   220    54.5 %   2797   29.1 % 
  6 Naum 2.2                       : 2822   39  38   220    53.4 %   2798   30.5 % 
  7 LoopMP 11A.32                  : 2811   39  39   220    51.8 %   2799   28.2 % 
  8 Zap!Chess Zanzibar             : 2794   39  39   220    49.1 %   2800   27.3 % 
  9 Glaurung 2.0.1                 : 2749   40  40   220    42.0 %   2805   25.0 % 
 10 bright-0.2c                    : 2715   41  42   220    37.0 %   2808   23.2 % 
 11 Deep Junior 10.1               : 2690   44  44   220    33.4 %   2810   16.8 % 
 12 SpikeMP 1.2 Turin              : 2679   42  42   220    31.8 %   2811   24.5 %

Naum 2.2 did quite well in the gambitgames. Here comes the results for Naum 3 in the gambitgames:

Code: Select all

2 Naum 3                    : 2889  220 (+112,= 56,- 52), 63.6 %

Rybka 2.3.2a mp 32-bit        :  20 (+  3,=  4,- 13), 25.0 %
Deep Shredder 11 UCI          :  20 (+  6,=  9,-  5), 52.5 %
Deep Junior 10.1              :  20 (+ 13,=  2,-  5), 70.0 %
Deep Fritz 10                 :  20 (+ 10,=  7,-  3), 67.5 %
HIARCS 11.1 MP UCI            :  20 (+ 10,=  3,-  7), 57.5 %
Glaurung 2.0.1                :  20 (+ 11,=  6,-  3), 70.0 %
LoopMP 11A.32                 :  20 (+  9,=  6,-  5), 60.0 %
SpikeMP 1.2 Turin             :  20 (+ 15,=  3,-  2), 82.5 %
Zap!Chess Zanzibar            :  20 (+ 11,=  7,-  2), 72.5 %
Toga II 1.4 beta5c            :  20 (+ 10,=  5,-  5), 62.5 %
bright-0.2c                   :  20 (+ 14,=  4,-  2), 80.0 %

When comparing the singleresults for Naum 3 and its predecessor it appears that Naum 3 has improved all the single matchresults with one exception: Naum 2.2 lost "only" 5½-14½ to Rybka while the new version lost 5-15! Having said that it should be mentioned that Naum 3 generally made a strong performance. I've watched many of the games and it's my impression that the new version represents a big step forward for Naum. Here follows the updated and new Gambitratinglist:

Code: Select all

     Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Rybka 2.3.2a mp 32-bit         : 2958   43  42   220    73.0 %   2786   25.9 %
  2 Naum 3                         : 2889   41  41   220    63.6 %   2792   25.5 %
  3 Toga II 1.4 beta5c             : 2860   39  39   220    59.3 %   2794   30.5 %
  4 Deep Shredder 11 UCI           : 2848   40  40   220    57.5 %   2796   25.9 %
  5 HIARCS 11.1 MP UCI             : 2825   40  39   220    53.9 %   2798   26.8 %
  6 Deep Fritz 10                  : 2822   41  41   220    53.4 %   2798   21.4 %
  7 LoopMP 11A.32                  : 2810   39  39   220    51.6 %   2799   28.6 %
  8 Zap!Chess Zanzibar             : 2778   39  39   220    46.6 %   2802   27.7 %
  9 Glaurung 2.0.1                 : 2746   40  40   220    41.6 %   2805   25.0 %
 10 bright-0.2c                    : 2701   42  43   220    35.0 %   2809   20.0 %
 11 Deep Junior 10.1               : 2687   44  45   220    33.0 %   2810   16.8 %
 12 SpikeMP 1.2 Turin              : 2677   42  43   220    31.6 %   2811   23.2 %

The exact difference in playingstrength for Naum 2.2 and Naum 3 regarding the gambitgames is therefore 67 ratingpoints! Indeed a strong and substantial improvement achieved by Alex Naumov although the original aim (100 ELO improvement) wasn't confirmed under these testconditions. Rybka and Vas are still the dominant number 1 but Naum (at least under these testconditions) has passed engines like Shredder, Toga, Fritz and Hiarcs. Besides from the improvement in playingstrength it's worth mentioning 2 things regarding Naum 3 in the gambittests:

1) It's my clear impression that Naum 3 has a more offensive style of playing compared to Naum 2.2. At least the difference in drawfrequencies seem to support that impression: 30,5% for Naum 2.2 while Naum 3 only drew 25,5% of the games.

2) Another argument for a more offensive playingstyle (and for improved strength in sharp, unbalanced positions) is the fact that Naum 3 performed extremely well in the (in my opinion) 2 sharpest gambitopenings in my tests: The danish (or nordic) Gambit in which Naum 3 reached 70,45% and The Latvian in which Naum scored incredible 79,55%!! Especially the last result is amazing as many engines have a score not far away from 50% in this opening (many white wins and many black losses).

I will now begin the positional tests for Naum 3. Perhaps a bit surprisingly Naum 2.2 had a rather difficult time in these tests. Let's see if the same will be true for Naum 3.

Regards
Per

geots · Post by **geots** » Sat Feb 16, 2008 12:19 am

Excellent work! Thanks, Per.

Regards,

Yarget · Post by **Yarget** » Sat Feb 16, 2008 1:27 pm

Thanks for your interest George. You should know that I follow your Naum 3 testresults close. As I reported yesterday Naum 3 performed very well in the (in my opinion) 2 sharpest Gambits that I have included in the gambittests. I have taken a closer look at the results in the Danish Gambit or Nordic Gambit in which white sacrifices 2 pawns (1. e4 e5 2. d4 exd4 3. c3 dxc3 4. Bc4 cxb2). Based on these results I have made a small ratinglist in which only the results from the Danish Gambit is included:

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Naum 3                         : 2945  160 149    22    70.5 %   2794   13.6 %
  2 Zap!Chess Zanzibar             : 2928  150 142    22    68.2 %   2796   18.2 %
  3 Deep Fritz 10                  : 2912  154 145    22    65.9 %   2797   13.6 %
  4 Toga II 1.4 beta5c             : 2896  135 130    22    63.6 %   2799   27.3 %
  5 Rybka 2.3.2a mp 32-bit         : 2881  128 125    22    61.4 %   2800   31.8 %
  6 LoopMP 11A.32                  : 2851  145 142    22    56.8 %   2803   13.6 %
  7 Deep Shredder 11 UCI           : 2807  120 120    22    50.0 %   2807   36.4 %
  8 HIARCS 11.1 MP UCI             : 2778  129 131    22    45.5 %   2809   27.3 %
  9 Glaurung 2.0.1                 : 2763  134 136    22    43.2 %   2811   22.7 %
 10 bright-0.2c                    : 2748  138 142    22    40.9 %   2812   18.2 %
 11 SpikeMP 1.2 Turin              : 2612  152 165    22    22.7 %   2825   18.2 %
 12 Deep Junior 10.1               : 2480  214 243    22    11.4 %   2837    4.5 %

Needless to say, with only 22 games per engine the statistical fundament is very weak. However my mainpoint, that Naum 3 likes to play the sharp gambits (the score in the Latvian Gambit was incredible 79,55%) should be clear enough.

I have now started the positional testgames for Naum 3 and with the match against Zap!Chess Zanzibar 2CPU. Normally I use "Run the gaunlet" for my tests but Zap is excepted due to occasionally load and unloadproblems (I can avoid the load and unload process by running a "Match" between two engines). Naum 2.2 had a hard match against Zap in the positional games:
Naum 2.2 2CPU - Zap!Chess Zanzibar 2CPU 5½-14½

Naum 3 performed much better:

Naum 3 2CPU - Zap!Chess Zanzibar 2CPU 12½-7½ (Rybka won this match 14-6)

Needless to say, 20 games are "nothing" but still it's a remarkable difference. Now the rest of the matches can start (run the gaunlet). Let's see if Naum can keep this high level.

Regards
Per

geots · Post by **geots** » Sat Feb 16, 2008 4:18 pm

Yarget wrote:Thanks for your interest George. You should know that I follow your Naum 3 testresults close. As I reported yesterday Naum 3 performed very well in the (in my opinion) 2 sharpest Gambits that I have included in the gambittests. I have taken a closer look at the results in the Danish Gambit or Nordic Gambit in which white sacrifices 2 pawns (1. e4 e5 2. d4 exd4 3. c3 dxc3 4. Bc4 cxb2). Based on these results I have made a small ratinglist in which only the results from the Danish Gambit is included:
Code: Select all
    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Naum 3                         : 2945  160 149    22    70.5 %   2794   13.6 %
  2 Zap!Chess Zanzibar             : 2928  150 142    22    68.2 %   2796   18.2 %
  3 Deep Fritz 10                  : 2912  154 145    22    65.9 %   2797   13.6 %
  4 Toga II 1.4 beta5c             : 2896  135 130    22    63.6 %   2799   27.3 %
  5 Rybka 2.3.2a mp 32-bit         : 2881  128 125    22    61.4 %   2800   31.8 %
  6 LoopMP 11A.32                  : 2851  145 142    22    56.8 %   2803   13.6 %
  7 Deep Shredder 11 UCI           : 2807  120 120    22    50.0 %   2807   36.4 %
  8 HIARCS 11.1 MP UCI             : 2778  129 131    22    45.5 %   2809   27.3 %
  9 Glaurung 2.0.1                 : 2763  134 136    22    43.2 %   2811   22.7 %
 10 bright-0.2c                    : 2748  138 142    22    40.9 %   2812   18.2 %
 11 SpikeMP 1.2 Turin              : 2612  152 165    22    22.7 %   2825   18.2 %
 12 Deep Junior 10.1               : 2480  214 243    22    11.4 %   2837    4.5 %
Needless to say, with only 22 games per engine the statistical fundament is very weak. However my mainpoint, that Naum 3 likes to play the sharp gambits (the score in the Latvian Gambit was incredible 79,55%) should be clear enough.

I have now started the positional testgames for Naum 3 and with the match against Zap!Chess Zanzibar 2CPU. Normally I use "Run the gaunlet" for my tests but Zap is excepted due to occasionally load and unloadproblems (I can avoid the load and unload process by running a "Match" between two engines). Naum 2.2 had a hard match against Zap in the positional games:
Naum 2.2 2CPU - Zap!Chess Zanzibar 2CPU 5½-14½

Naum 3 performed much better:

Naum 3 2CPU - Zap!Chess Zanzibar 2CPU 12½-7½ (Rybka won this match 14-6)

Needless to say, 20 games are "nothing" but still it's a remarkable difference. Now the rest of the matches can start (run the gaunlet). Let's see if Naum can keep this high level.

Regards
Per

Per, this is some extremely interesting stuff you have here. And a diff. take from the normal "i beat you - you beat me " that seems to be the norm. There is a wealth of info. here. Thanks so much.

Best,

George

Uri Blass · Post by **Uri Blass** » Sat Feb 16, 2008 4:44 pm

Yarget wrote:Thanks for your interest George. You should know that I follow your Naum 3 testresults close. As I reported yesterday Naum 3 performed very well in the (in my opinion) 2 sharpest Gambits that I have included in the gambittests. I have taken a closer look at the results in the Danish Gambit or Nordic Gambit in which white sacrifices 2 pawns (1. e4 e5 2. d4 exd4 3. c3 dxc3 4. Bc4 cxb2). Based on these results I have made a small ratinglist in which only the results from the Danish Gambit is included:
Code: Select all
    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Naum 3                         : 2945  160 149    22    70.5 %   2794   13.6 %
  2 Zap!Chess Zanzibar             : 2928  150 142    22    68.2 %   2796   18.2 %
  3 Deep Fritz 10                  : 2912  154 145    22    65.9 %   2797   13.6 %
  4 Toga II 1.4 beta5c             : 2896  135 130    22    63.6 %   2799   27.3 %
  5 Rybka 2.3.2a mp 32-bit         : 2881  128 125    22    61.4 %   2800   31.8 %
  6 LoopMP 11A.32                  : 2851  145 142    22    56.8 %   2803   13.6 %
  7 Deep Shredder 11 UCI           : 2807  120 120    22    50.0 %   2807   36.4 %
  8 HIARCS 11.1 MP UCI             : 2778  129 131    22    45.5 %   2809   27.3 %
  9 Glaurung 2.0.1                 : 2763  134 136    22    43.2 %   2811   22.7 %
 10 bright-0.2c                    : 2748  138 142    22    40.9 %   2812   18.2 %
 11 SpikeMP 1.2 Turin              : 2612  152 165    22    22.7 %   2825   18.2 %
 12 Deep Junior 10.1               : 2480  214 243    22    11.4 %   2837    4.5 %
Needless to say, with only 22 games per engine the statistical fundament is very weak. However my mainpoint, that Naum 3 likes to play the sharp gambits (the score in the Latvian Gambit was incredible 79,55%) should be clear enough.

I have now started the positional testgames for Naum 3 and with the match against Zap!Chess Zanzibar 2CPU. Normally I use "Run the gaunlet" for my tests but Zap is excepted due to occasionally load and unloadproblems (I can avoid the load and unload process by running a "Match" between two engines). Naum 2.2 had a hard match against Zap in the positional games:
Naum 2.2 2CPU - Zap!Chess Zanzibar 2CPU 5½-14½

Naum 3 performed much better:

Naum 3 2CPU - Zap!Chess Zanzibar 2CPU 12½-7½ (Rybka won this match 14-6)

Needless to say, 20 games are "nothing" but still it's a remarkable difference. Now the rest of the matches can start (run the gaunlet). Let's see if Naum can keep this high level.

Regards
Per

Thanks
Nice to see rating list when rybka is not number 1.

It may be interesting if people make more games from this opening at different time controls.

I do not like seeing rybka as number 1 in all rating lists when it is obvious that rybka still has weaknesses.

I think that it may be good if rating lists can give information to people which engine to buy except rybka and with no anti-rybka rating list they clearly fail in this task.

People who use rybka may prefer to buy an engine that is 150 elo weaker than rybka(CCRL CEGT) but has advantage in some openings relative to rybka and not an engine that is 100 elo weaker than rybka and has no advantage in all openings.

Uri

Yarget · Post by **Yarget** » Sat Feb 16, 2008 9:54 pm

True Uri, it's a bit unusual to see a ratinglist in which Rybka is not leading. To be sure, Rybka does lead the Gambitratinglist I'm running but not if you only select the Danish/Nordic Gambit. In this and other forums there has been some discussions if Naum 3 is worth the money. Although Rybka is still leading almost every ratinglist I think Naum 3 is a good buy and the performance in the Danish Gambit is an expression of this. From a general point of view Rybka is the number 1 but there are special areas in which other engines are stronger.

Regards
Per

Gambit ratinglist updated: Naum 3 improved by 67 ELO

Gambit ratinglist updated: Naum 3 improved by 67 ELO

Re: Gambit ratinglist updated: Naum 3 improved by 67 ELO

Re: Gambit ratinglist updated: Naum 3 improved by 67 ELO

Re: Gambit ratinglist updated: Naum 3 improved by 67 ELO

Re: Gambit ratinglist updated: Naum 3 improved by 67 ELO

Re: Gambit ratinglist updated: Naum 3 improved by 67 ELO