As some of you might remember I used to do the MP-tests for the former CSS Ratinglist. This ratinglist was based on fixed openingpositions and engines were not allowed to use any kind of openingbooks. I still remember that especially Deep Junior 10 was performing extremely well in certain closed openings like English (openingposition after: 1. c2-c4 c7-c5 2. Sb1-c3 Sb8-c6 3. g2-g3 g7-g6 4. Lf1-g2 Lf8-g7 5. e2-e4 e7-e5) while performing less well in other (more often) "open" openings. Inspired by this I got the idea to the current project that I've started a couple of weeks ago.
I have selected 10 fixed openingpositions that I would describe as positional. Most of them are very closed openingpositions (like the above mentioned English, Benoni, Stonewall, closed Kingsindian to mention some of them) and common to all 10 openingpositions is that sharp and tactical play is not "just around the corner". It's more about "long" knightoperations, pushing the pawns at the right moment after careful preparations, optimizing small advantages and so on. Needless to say, tactics and combinations can and will very often occur in these games but again: they are not likely to happend before the middlegame or more often the late middlegame.
Contrary to this I have selected 10 fixed openingpositions that consist of (very often sharp) gambits like Kings Gambit, Nordic Gambit, Morra Gambit and Blackmar-Diemar to mention a couple. The aim of all these tests are to determine which engines that "prefer" positional openings, which who "prefer" the closed, positional ones and which who don't mind. It should be emphasized that these tests won't result in firm conclusions stating that Engine X is a positional one or the opposite. Coming to such conclusions require more openingpositions, more games etc, much more than one person can do in one PC. However these tests might provide some indications regarding the preferred type of positions for a number of engines. Here comes the exact testconditions:
Windows XP Pro 32 bit
(Deep) Fritz 10 GUI
AMD Athlon 64 X2 4200
128 MB Hashtables for each engine
3-4-5 Tablebases (32 MB cache)
Pondern OFF
Timecontrol: 40/4 repeatedly (4 minutes for 40 moves)
Books: No books allowed, engines play on their own from the startpoint of each openingposition
Games: Each engine is playing each openingposition against all opponents with both white and black meaning that each enginematch will consist of 20 games
I have just finished the first 900 games in the positional test meaning that 10 engines have played 180 games each. This is how the first positional ratinglist looks (averagerating 2800):
Code: Select all
Program Elo + - Games Score Av.Op. Draws
1 Rybka 2.3.2a mp 32-bit : 2928 44 43 180 69.4 % 2785 32.2 %
2 Deep Shredder 11 UCI : 2833 42 42 180 55.3 % 2796 33.9 %
3 Deep Fritz 10 : 2831 43 42 180 55.0 % 2796 31.1 %
4 Zap!Chess Zanzibar : 2798 42 42 180 49.7 % 2800 32.8 %
5 Deep Junior 10.1 : 2793 46 46 180 48.9 % 2800 20.0 %
6 LoopMP 11A.32 : 2784 40 40 180 47.5 % 2801 37.2 %
7 HIARCS 11.1 MP UCI : 2780 42 42 180 46.9 % 2802 32.8 %
8 SpikeMP 1.2 Turin : 2780 42 42 180 46.9 % 2802 32.8 %
9 Naum 2.2 : 2768 39 39 180 45.0 % 2803 41.1 %
10 Glaurung 2.0.1 : 2705 43 44 180 35.3 % 2810 30.6 %1. Deep Junior 10.1 +70,33 ratingpoints
2. Deep Fritz 10 +49,22 ratingpoints
3. Rybka 2.3.2a mp +30,33 ratingpoints
4. Zap!Chess Zanzibar 2CPU +17,00 ratingpoints
5. SpikeMP 1.2 Turin +8,11 ratingpoints
6. LoopMP 11A.32 +1,44 ratingpoints
7. Deep Shredder 11 UCI -18,56 ratingpoints
8. Naum 2.2 2CPU -36,33 ratingpoints
9. Hiarcs 11.1 MP -57,44 ratingpoints
10. Glaurung 2.0.1 2CPU -64,11 ratingpoints
In other words: Deep Junior 10.1 has gained app. 70 ratingpoints compared to the CEGT Referencelist by competing in my tests, Deep Fritz 10 app. 49 points and so on. Considering what I wrote in the beginning it's hardly a surprise that Deep Junior 10.1 has gained 70 ratingpoints when the games start in a very often closed, positional position. Junior is an extrene engine in many ways. More surprising is the performance by Deep Fritz 10 (losing only 9½-10½ against Rybka!) and perhaps also Rybka. I certainly didn't expect Hiarcs to be more than 50 ratingpoints worse than the CEGT list but especially a 5-15 defeat against Deep Junior 10.1 was painful.
I have just started the tests of the Gambitopenings. When these tests are done I'll make a new ratinglist and then I'll compare the two lists. If someone is interested in the games send me a PM and I'll send you a pgn-file with the games.
Best regards
Per