GGT2/Total Scores, rnd 1-36/Firebird leads, Rybka stagnates
GGT2 Total Scores Round 1-36; 18 Gambits; Opening 21-38; Eco C02-C58;[/b] Nimzo,Danish,Calabrese,Lewis,Wing,Urusov,Falkbeer,Charousek,WildMuzio,Hanstein,Allgaier,Elephant, Latvian,Belgrade,Halloween,Fegatello,2x Two Nights.
new: Belgrade, Halloween, Italian, Fegatello, Two Knights
Code: Select all
Fire Stock Naum DRybka DFritz DShred ZapMex Points
----------------------------------------------------------------------------------
FireBird 1.2 ***** 18.5-17.5 21.5-14.5 19.0-17.0 25.0-11.0 25.0-11.0 25.5-10.5 134.5
Stockfish1.7.1 17.5-18.5 ***** 20.0-16.0 17.0-19.0 21.5-14.5 24.5-12.0 28.5- 7.5 128.5
Naum4.2 14.5-21.5 16.0-20.0 ***** 19.5-16.5 24.0-12.0 23.5-12.5 24.0-12.0 121.5
Deep Rybka3 17.0-19.0 19.0-17.0 16.5-19.5 ***** 22.5-13.5 21.5-14.5 23.0-13.0 119.5
DeepFritz12 11.0-25.0 14.5-21.5 12.0-24.0 13.5-22.5 ***** 20.0-16.0 17.0-19.0 88.0
DeepShredder12 11.0-25.0 12.0-24.0 12.5-23.5 14.5-21.5 16.0-20.0 ***** 21.0-15.0 87.0
ZappaMexico II 10.5-25.5 7.5-28.5 12.0-24.0 13.0-23.0 19.0-17.0 15.0-21.0 ***** 77.0
--------------------------------------------------------------------------------------------------
total 756 games
12 rounds more haven't changed the characteristic of this tournament.
1 Significant jump between the best four and the last three
2 The upper group is mainly characterized by
- constant leadership of Firebird
- Rybka doesn't reach its usual ranking
- bad scoring of Naum against Firebird
- individual match balance between Firebird, Stockfish and Rybka
3 The traditional head to head by Fritz and Shredder
GGT2 Total Performance Round 1-36Code: Select all
Win Draw Loss Points Perform Games
------------------------------------
FireBird 1.2 82 105 29 134.5 62% 216
Stockfish 1.7.1 84 89 43 128.5 59% 216
Naum 4.2 79 85 52 121.5 56% 216
Deep Rybka 3 70 99 47 119.5 55% 216
DeepFritz 12 47 82 87 88.0 40% 216
DeepShredder 12 44 86 86 87.0 40% 216
ZappaMexico II 37 80 99 77.0 35% 216
-------------------------------------------------------
total 756 games
Firebird wins only three games more than the third placed Naum, but it loses much rarer (only 13%) than the opponents.
Played games GGT2: 18 gambits=36 rounds Code: Select all
Games
1 engine/round 6
1 engine/gambit 12 (double round, switched colours)
1 engine pair/18 gambits 36 (for ex., FireBird against Rybka)
1 gambit 42 (7x6)
1 engine/18gambits 216 (12x18)
12 gambits 756 (42x18)
There are still 12 gambits to play. Finally you'll get 30 gambits, 60 rounds, 1260 games, 360 games by each engine and 60 matches by each engine pair. At the end of the tournament the error margins will range about +/- 27 Elo points. This will probably be sufficient to classify 3-4 significant engine ranks.
GGT2 Total Elo-Ranking Round 1-36 with CEGT-Calibration Code: Select all
Program Elo + - Games Score Elo + -
GGT CEGT
1 FireBird 1.2 x64 : 3187 34 33 216 62.3 % 0000
2 Stockfish 1.7.1 x64 : 3169 36 36 216 59.5 % 3159 11 11
3 Naum 4.2 x64 : 3150 36 36 216 56.2 % 3138 14 14
4 Deep Rybka 3 x64 : 3144 34 34 216 55.3 % 3181 10 10
5 Deep Fritz 12 : 3056 37 37 216 40.7 % 3054 14 14
6 Deep Shredder 12 x64 : 3054 36 36 216 40.3 % 3063 9 9
7 Zappa Mexico II x64 : 3024 37 38 216 35.6 % 3018 9 9
------------------------------------------------------------------------------
756 games; Starting Value EloStat:3112; List: CEGT 40/20,4 threads, June 2010
I tried the calibration with as many engines as possible, and I got finally a starting Elo of 3112. which fits excellently to the scores of CEGT, with the exception of Rybka. According to the table above, the scores of Stockfish, Naum, Fritz, Shredder and Zappa doesn't show significances between CEGT and GGT on the 5% error level! Please pay attention, by using the extreme narrow and exigent error bars of the CEGT between 9 and 14 Elo points!
Of course you could get a better correlation with Rybka by adjusting the Start Elo.
Calibration is nothing more than a simple linear transformation which doesn't change the ranks and the Elo distances between the engines.
GGT2 Total Elo-Ranking Round 1-36 with CCRL-Calibration Code: Select all
Program Elo + - Games Score CCRL + -
1 FireBird 1.2 x64 : 3242 34 33 216 62.3 % 0000
2 Stockfish 1.7.1 x64 : 3227 36 36 216 59.5 % 3221 24 24
3 Naum 4.2 x64 : 3208 36 36 216 56.2 % 3184 28 28
4 Deep Rybka 3 x64 : 3202 34 34 216 55.3 % 3232 23 22
5 Deep Fritz 12 : 3114 37 37 216 40.7 % 3087 40 40
6 Deep Shredder 12 x64 : 3112 36 36 216 40.3 % 3131 19 19
7 Zappa Mexico II x64 : 3082 37 38 216 35.6 % 3074 13 13
------------------------------------------------------------------------------
756 games; Starting Value EloStat:3170; List: CEGT 40/40,4 threads, June 2010
CCRL is calibrated significantly higher than CEGT. The top mp-engines are scoring far beyond the human ratings. Is this really the true human/engine proportion or should the superiority of the programs corrected better downwards? The subject was discussed just again in this forum.
http://talkchess.com/forum/viewtopic.php?t=35125
Starting with 3170 Elo you got the table above. The correlation between CCRL and GGT is very strong too.
PMCC=Pearson product-moment correlation coefficient
Of course you can calculate a correlation coefficient to express the degree of relationship between the rating lists. The value +1 stands for the highest positive correlation (for instance, CCRL correlated with itself) and 0 stands for no relationship at all. Could be you are just about to see the first member-made correlation calculation in this forum or al least between actual ranking lists ?!

You get the following coefficients for 6 of the engines (without FireBird)
GGT/CEGT= 0.96
GGT/CCRL= 0.94
CEGT/CCRL= 0.98
up to 0.2 => very low; up to 0.5 => low; up to 0.7 => middle; up to 0.9 => strong; above 0.9 => very strong
In spite of the small sample (only 6 engines = 4 degrees of freedom) the coefficients are highly significant on the 1% error level. To express it statistically correct, you can say:
Concerning the 6 selected engines, there is a very strong linear correlation between the three ratings. That was to be expected in view of the very stringent concordance of the calibrated Elo rankings.
PGN-Link:
http://www.file-upload.net/download-265 ... 8.pgn.html
Next:
GGT2 21-60. Tournament finished.
Book
50 gambit starting positions. GGT1:Eco00 - B44. GGT2:EcoC02 - E60
Test conditions
Time Control: tournament level 40/20', 20/10', 10'+12''
System: Intel Core i7 920, oc 3600-3800 MHz, 6 GB DDR3 RAM. Vista 64
Hyperthreading off, Turbo Mode off.
Engine parameters: 3 threads. Ponder off. 1,2 GB Hash.
EGTB 3,4,5: Nalimov, TotalBases, sometimes TripleBases. Stockfish don't use EGTB. Bitbases are not needed. FireBird's TotalBases and RAM-resident TripleBases don't work always properly.
Fritz12-GUI: remis late, resign late/never.