At this moment where I have 12 engines in my ratinglists it might be interesting to reflect about some of the results so far. Earlier in this thread I have presented a list which shows how "sensitive" (or balanced) the engines are. The higher the number is the more "sensitive" the engine is to the type of testgames selected and vice versa. Starting with the most "sensitive" engines the list looks like this:
1. SpikeMP 1.2 Turin 111 ratingpoints
2. Deep Junior 10.1 97 ratingpoints
3-4. Hiarcs 11.1 MP & Naum 2.2 each 48 ratingpoints
5. Glaurung 2.0.1 40 ratingpoints
6. LoopMP 11A.32 21 ratingpoints
7. Deep Shredder 11 17 ratingpoints
8-10. Toga II 1.4 beta5c, Rybka 2.3.2a mp & Bright 0.2c each 14 ratingpoints
11. Zap!Chess Zanzibar 13 ratingpoints
12. Deep Fritz 10 1 ratingpoint
In other words: for Fritz it doesn't matter at all whether it plays the gambits or the positional games while engines like Junior and Spike are very "sensitive" or unbalanced engines.
To me the biggest surprises so far are the results of Naum and Spike. I have always considered Naum a solid, strong and indeed positional engine and therefore I expected this fine engine to do well in the positional games and less well in the gambitgames. Well, the results show that it is opposite: Naum 2.2 has got 2822 in the gambitgames but only 2774 in the positional games! Stastistics have learned us that we should be careful to draw big, frim conclusions but still I find these results amazing. Earlier in this thread Marek contributed with a very interesting explanation for this:
Getting engines to play with a positional book ought to be thought of as a test of their dynamism in quiet positions; whereas getting engines to play with a gambit book is a test of their soundness in wild positions.
In other words, the Gambit Rating could be renamed the Soundness Rating, while the Positional Rating could be renamed the Dynamism Rating. An engine that is further down the Soundness Rating List is probably playing too wildly; an engine that is further down the Dynamism Rating List is probably playing too quietly. An engine that is equally strong in both lists would be a balanced engine.
In a way this is "turning things upside down" but the more I reflect about it the more I find it likely. Naum is doing fine in the gambits because these openings need a sound and "clever" approach (the opening is already wild) and obviously it works for Naum. Opposite to this, in the positional games a more wild and dynamic engine is needed to unbalance the games and here Naum is doing less well. Take a look at the drawfrequency for Naum in the positional games: it's over 40%! Naum is simply playing too many draws in the positional games to make a good overall performance. This interpretation could also explain why positional engines like Rybka and Shredder are doing better in the gambitgames than in the positional games (check the ratinglists, I know the ratingdifference is smaller but still......) and certainly explain why the ambitious (look at the low drawfrequency) Junior is doing so fine in the positional games and vice versa in the gambits. Again, this interpretation is "turning things upside down" but it can explain some of the "unexpected" results (Junior, Naum, Shredder, Rybka and also Spike if you consider Spike to be a dynamic, sharp engine. The high drawfrequency of Spike (in opposite to Junior) in the positional games does not support this view).
I'm looking forward to test Naum 3. If the above mentioned interpretation is correct and if Naum 3 will turn out to be solid, strong, positional engine that is about 100 ELO improved then it will make a terrific performance in the gambittest! We'll see.
Regards
Per