CyclonexTremeII : Thinker 5.4Di / Rybka 2.2n2 finished

rainhaus · Post by **rainhaus** » Fri Oct 02, 2009 8:31 pm

1. Results
Cyclone xTremeII : Thinker 5.4 Di
total score: 24,5 : 45,5 (35% : 65%)
lost/won/remis by Cyclone: -33/ 12/ =25
Data Analysis by EloStat
Hypothetical starting Elo 2920
Cyclone Xtreme II : Thinker 5.4 Di
2866 Elo : 2974 Elo /+67/-68 confidence interval, 5% error level

Another disaster for Cyclone xTreme II (3) and a very significant one. Calibrating with Thinker's CEGT score (4CPU 2973 Elo) Cyclone sags below the 2900 Elo.

Cyclone xTremeII : Rybka 2.2n2
total score: 31 : 39 (44% : 56%)
lost/won/remis by Cyclone: -23/ 15/ =32
Data Analysis by EloStat
Hypothetical starting Elo 2960
Cyclone Xtreme II : Rybka 2.2n2
2940 Elo : 2980 Elo /+60/-61 confidence interval, 5% error level

Marking 2940 Elo against free Rybka 2.2n2 is a respectable score.

2. RBx(a,f,c)
RBx is an index and means "related to the best". It is an experimental index which calibrates the Elo points of any engine with the average Elo points of the best. The appendix a,f,c means all, free or commercial. The selection of the best is an individual one but could be standardized anytime. To become a statistical value with error margins you can input the raw data in EloStat as usual (mode single competition).
Input:
1. average oponents CEGT-Elo=2987 (Stockfish1.4 Thinker5.4Di und Rybka2.2n2)
2. lost= 85 won=39 remis=86 by Cyclone xTreme II, 210 games1
Output:
RB(f) Cyclone Xtreme II(03) = 2910 Elo (2873-2946 confidence interval)
(a propos wide error margins, don't forget, the closer to the middle the more probable the values!)
The index is stringent and grim and of course it is significantly lower than the normal scoring within a rating list. I'll use it to get an additional reference value which doesn't include the weaker engines. More about its practical validity somewhat later. I'll try a a few RB lists and then you could see how it correlates with the ranking in the public lists. Maybe it won't be as useful as I now think it is.

System: core i7 220,oc3600-3800 MHz,6GB RAM. Windows Vista 64Bit. Arena Mark 33.73. Fritz11 Mark: 22.61/10.852.000 n/sec.
You may read in some published Benchmarks Fritz Marks which are significantly higher, between 13.000.000 and 14.000.000 n/seconds for the Core i7 Extreme 965 or for over clocked 940er or 920er. This are the scores of the standard configuration with an activated Hyper Threading. At this mode, my over clocked core i7 920 also marks 13.414.000 nodes/sec, but the Arena Mark changes only minimally. So it seems that the Fritz mark might be bluffed by the 4 virtual processors, the Arena Mark is not. Which engines might profit from HT really? This topic was already discussed in the fora. BTW, when Bob Hyatt was discussing the corei7 topic I understood only half of his detailed explanations but I had the distinct feeling he was right:) To be on the safe side I've deactivated the standard management of the core i7 architecture. Also:
Hyper threading off Speed Step off, Turbo Boost off
GUI=Arena 2.01. Hash=1024, time control=40/10 min, ponder=off. threads=3
Nalimov 3,4,5 on, BitBases 3,4,5 on (not used by Cyclone).
Book: none. 35 own starting positions (ECO Mainline), to repeat with switched colours.

More interpretation of the results in "Annals of testing an elusive engine". In process.

Rainer Neuhäusler