Ratinglist based on positional openingpositions

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Yarget

Re: Ratinglist based on positional openingpositions

Post by Yarget »

I'm glad that my tests have caused several comments. Naturally I'm doing these tests because it's fun and interesting (like most testers I guess) but if there seem to be some interest around your tests then it's even more interesting.

At this moment where I have 12 engines in my ratinglists it might be interesting to reflect about some of the results so far. Earlier in this thread I have presented a list which shows how "sensitive" (or balanced) the engines are. The higher the number is the more "sensitive" the engine is to the type of testgames selected and vice versa. Starting with the most "sensitive" engines the list looks like this:

1. SpikeMP 1.2 Turin 111 ratingpoints
2. Deep Junior 10.1 97 ratingpoints
3-4. Hiarcs 11.1 MP & Naum 2.2 each 48 ratingpoints
5. Glaurung 2.0.1 40 ratingpoints
6. LoopMP 11A.32 21 ratingpoints
7. Deep Shredder 11 17 ratingpoints
8-10. Toga II 1.4 beta5c, Rybka 2.3.2a mp & Bright 0.2c each 14 ratingpoints
11. Zap!Chess Zanzibar 13 ratingpoints
12. Deep Fritz 10 1 ratingpoint

In other words: for Fritz it doesn't matter at all whether it plays the gambits or the positional games while engines like Junior and Spike are very "sensitive" or unbalanced engines.

To me the biggest surprises so far are the results of Naum and Spike. I have always considered Naum a solid, strong and indeed positional engine and therefore I expected this fine engine to do well in the positional games and less well in the gambitgames. Well, the results show that it is opposite: Naum 2.2 has got 2822 in the gambitgames but only 2774 in the positional games! Stastistics have learned us that we should be careful to draw big, frim conclusions but still I find these results amazing. Earlier in this thread Marek contributed with a very interesting explanation for this:
Getting engines to play with a positional book ought to be thought of as a test of their dynamism in quiet positions; whereas getting engines to play with a gambit book is a test of their soundness in wild positions.

In other words, the Gambit Rating could be renamed the Soundness Rating, while the Positional Rating could be renamed the Dynamism Rating. An engine that is further down the Soundness Rating List is probably playing too wildly; an engine that is further down the Dynamism Rating List is probably playing too quietly. An engine that is equally strong in both lists would be a balanced engine.


In a way this is "turning things upside down" but the more I reflect about it the more I find it likely. Naum is doing fine in the gambits because these openings need a sound and "clever" approach (the opening is already wild) and obviously it works for Naum. Opposite to this, in the positional games a more wild and dynamic engine is needed to unbalance the games and here Naum is doing less well. Take a look at the drawfrequency for Naum in the positional games: it's over 40%! Naum is simply playing too many draws in the positional games to make a good overall performance. This interpretation could also explain why positional engines like Rybka and Shredder are doing better in the gambitgames than in the positional games (check the ratinglists, I know the ratingdifference is smaller but still......) and certainly explain why the ambitious (look at the low drawfrequency) Junior is doing so fine in the positional games and vice versa in the gambits. Again, this interpretation is "turning things upside down" but it can explain some of the "unexpected" results (Junior, Naum, Shredder, Rybka and also Spike if you consider Spike to be a dynamic, sharp engine. The high drawfrequency of Spike (in opposite to Junior) in the positional games does not support this view).

I'm looking forward to test Naum 3. If the above mentioned interpretation is correct and if Naum 3 will turn out to be solid, strong, positional engine that is about 100 ELO improved then it will make a terrific performance in the gambittest! We'll see.

Regards
Per
Yarget

Re: Ratinglist based on positional openingpositions

Post by Yarget »

With a small delay I have started testing Naum 3. I am very excited to see the difference (small or big) between Naum 2.2 and Naum 3 under my quite special testconditions. I doubt that Naum 3 is optimized for my testconditons but a general improve in playingstrength should ofcourse also result in better performances in my tests. One thing is for sure, Naum 3 has made a terrific start. I have started with the gambitgames and with the match against Zap!Chess Zanzibar 2CPU. Normally I use "Run the gaunlet" for my tests but Zap is excepted due to occasionally load and unloadproblems (I can avoid the load and unload process by running a "Match" between two engines). Despite that Naum 2.2 is higher ranked than Zap! in the gambitratinglist it had a hard match against Zap in the gambitgames:

Naum 2.2 2CPU - Zap!Chess Zanzibar 2CPU 9-11

Naum 3 performed much better:

Naum 3 2CPU - Zap!Chess Zanzibar 2CPU 14½-5½ (Rybka won this match 14-6)

Needless to say, 20 games are "nothing" but still it's a remarkable difference. Now the rest of the matches can start (run the gaunlet). Let's see if Naum can keep this high level.

Regards
Per
User avatar
Werner
Posts: 2993
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: Ratinglist based on positional openingpositions

Post by Werner »

Hi Per,
very nice to read some positive news. I wish you much success.
At the moment here running Naum 3 x64 2 CPU - Deep Shredder 11 4CPU (has the same rating as Zappa II with 2CPUs) :
Naum leads 10 - 9 !

regards
Werner
Yarget

Re: Ratinglist based on positional openingpositions

Post by Yarget »

Thanks Werner and I wish you good luck with your own testgames. You should know that I follow your/CEGT Naum 3 results very closely. True, the opinions so far regarding Naum 3 have been divided but at least under my testconditions it has made a very strong start. I followed many of the games between Naum 3 and Zap "live" this afternoon and several times Zap was simply outplayed in 30 or 40 moves. Compared to version 2.2 Naum 3 seems to be more offensive. Is that also your impression?

Best regards
Per
Yarget

Re: Ratinglist based on positional openingpositions

Post by Yarget »

Hello everyone!

I have now finished half of the gambitgames for Naum 3 and it's time to make an update. This is the latest Gambitratinglist I have presented:

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Rybka 2.3.2a mp 32-bit         : 2956   42  42   220    72.7 %   2786   27.3 %
  2 Toga II 1.4 beta5c             : 2869   39  39   220    60.7 %   2794   29.5 %
  3 Deep Shredder 11 UCI           : 2852   40  40   220    58.2 %   2795   25.5 %
  4 Deep Fritz 10                  : 2833   41  41   220    55.2 %   2797   20.5 %
  5 HIARCS 11.1 MP UCI             : 2829   39  39   220    54.5 %   2797   29.1 %
  6 Naum 2.2                       : 2822   39  38   220    53.4 %   2798   30.5 %
  7 LoopMP 11A.32                  : 2811   39  39   220    51.8 %   2799   28.2 %
  8 Zap!Chess Zanzibar             : 2794   39  39   220    49.1 %   2800   27.3 %
  9 Glaurung 2.0.1                 : 2749   40  40   220    42.0 %   2805   25.0 %
 10 bright-0.2c                    : 2715   41  42   220    37.0 %   2808   23.2 %
 11 Deep Junior 10.1               : 2690   44  44   220    33.4 %   2810   16.8 %
 12 SpikeMP 1.2 Turin              : 2679   42  42   220    31.8 %   2811   24.5 %
Naum 2.2 did quite well in the gambits but Naum 3 is as expected doing much better. At this moment (after 110 games) it has a total score of 64,55%. If Naum 3 will end with a score around 65% then Naum will have a secure second rank in the ratinglist with a clear distance to Toga and Shredder but also with a clear distance up to Rybka. At this moment Naum is leading all matches with one exception (against Rybka which is leading 7-3). I am watching a lot of the games and it's my impression that the latest Naumversion is playing more offensive than its predecessor and the drawfrequency may indicate that: 30,5% (as you see in the ratinglist) for Naum 2.2 while the new version has a momentary one of 25,45%.

Regards
Per