The April 11th update of the CCRL Rating Lists and Statistics is now available for viewing at:
http://www.computerchess.org.uk/ccrl/4040/
The list gets updated periodically during the week and these updates can be viewed here:
http://www.computerchess.org.uk/ccrl/4040.live/
Please be aware that no game downloads are available from this live link.
The links to the various rating lists can be found just beneath the default Best Versions list.
For example there is a 32-bit Single CPU list.
Our standard testing is at 40 moves in 40 minutes repeating while our current blitz testing is at both 40 moves in 4 minutes repeating and 40 moves in 12 minutes repeating, all adjusted to the AMD64 X2 4600+ (2.4GHz).
Currently active testers in our team are:
Graham Banks, Ray Banks, Shaun Brewer, Kirill Kryukov, Dom Leste, Tom Logan, Andreas Schwartmann, Charles Smith, George Speight, Chris Taylor, Chuck Wilson, Gabor Szots and Martin Thoresen.
40/40 Notes
There are currently 114,872 games in our 40/40 database.
Many engines on our list have few games and in many cases their ratings are likely to fluctuate (markedly for some) until a lot more games are played. Therefore no conclusions should be drawn about their strength yet.
To illustrate this point, when an engine has 200 games played, the error margin is still approximately +-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!
This of course highlights the importance of looking at other rating lists that are also available in order to draw comparisons and get a more accurate overall picture.
4CPU 64-bit Engines
Hiarcs 12 was again the focus of our testing this week.
After 400+ games Hiarcs 12 Sharpen PV=On (recommended for 40/40 time controls and longer) continues to lag behind Rybka 2.3.2a, Zappa Mexico II, Naum 3 and Deep Shredder 11, just ahead of Deep Fritz 10.1 and Toga II 1.4 beta5c.
Hiarcs 12 Sharpen PV=Off (default setting) also remains slightly ahead of the Sharpen PV=On setting.
Loop M1-T is next in the pecking order, ahead of the evenly matched pair of Glaurung 2.0.1 and Bright 0.3a.
Deep Junior 10, Deep Sjeng 2.7 and Scorpio 2.0 are the other well tested engines in this category.
2CPU Engines
With the emphasis of our multi-cpu testing on 4CPU as opposed to 2CPU, there are gaps in this category and some of the engines also require further games.
Testing of Hiarcs 12 in this category will continue in earnest once we've completed our 4CPU testing.
Rybka 2.3.2a holds top spot in this category also, with a 50+ ELO lead.
Naum 3 currently holds the edge over Zappa Mexico for second spot.
Deep Shredder is next, comfortably ahead of Deep Fritz 10 and Loop M1-T.
The current ratings of both Hiarcs 12 and Toga II 1.4 beta5c mean little as they require more games.
Deep Fritz 10.1 hasn't been tested in this category, but is likely to be better than Deep Fritz 10 as demonstrated quite clearly in the 4CPU ratings.
Glaurung 2.0.1, Bright 0.3a and Deep Junior 10 are closely grouped and have a sizeable advantage over Chessmaster 11.
Pharaon 3.5.1 is the only other engine with a reasonable number of games.
Single CPU Engines
Rybka 2.3.2a has an impressive 100 ELO lead over the the closely grouped Deep Shredder 11, Naum 3, Zappa Mexico II and Fritz 11.
Deep Shredder 11 1CPU is 64-bit as opposed to Shredder 11 which can only be run as a 32-bit engine.
Toga II 3.1.2SE (currently the highest rated of the Togas tested in this category) currently holds a small edge over Hiarcs 12 Sharpen PV=On.
Next come Loop 13.6 and Fruit 2.3.1, with a comfortable gap between them and the closely grouped and Bright 0.3a, Deep Sjeng 2.7, Spike 1.2 Turin and Glaurung 2.0.1.
Thinker 5.1c Passive has only 31 games and its current rating means little.
Junior 10.1 is sandwiched between the group just mentioned and the group of engines below that includes Ktulu 8.0, Chess Tiger 2007.1, SmarThink 1.00 and Frenzee Feb08.
Chessmaster 11, Scorpio 2.0, Alaric 707, Movei 00.8.438 (10 10 10) and Booot 4.14.0 comprise the next group of engines ahead of E.T Chess 13.01.08, SlowChess Blitz WV2.1, Ruffian 2.1.0, WildCat 8 and Delfi 5.2.
The current rating of GarboChess 2.11 should be ignored due to its small handful of games.
For Chessmaster enthusiasts, the testing of various settings to see how much gain can be made over the default settings is in full swing and can be followed here:
http://kirr.homeunix.org/chess/discussi ... f=7&t=3054
A Chessmaster tournament will be run soon, pitting the best of the bunch against each other.
Free Single CPU Engines
Toga II 3.1.2SE (the latest version of Toga that we've so far tested at 40/40 has possibly overtaken Rybka 1.0 as the top free engine, but it is very close.
Fruit 2.3.1 (the strongest publicly available version at present) comes in third ahead of Bright 0.3a, Spike 1.2 Turin, Glaurung 2.0.1.
Thinker 5.1c Passive requires many more games before we can tell exactly where it stands in relation to these engines.
Naum 2.0 and Frenzee Feb08 are 40+ ELO further back.
Although the new Twisted Logic 20080404x has made a promising start, its current rating is hugely inflated and should drop as more games are played.
Scorpio 2.0, Alaric 707, Movei 00.8.438 (10 10 10) and Booot 4.14.0 come in next, ahead of E.T Chess 13.01.08, SlowChess Blitz WV2.1, WildCat 8, Zappa 1.1 and Delfi 5.2.
GarboChess 2.11 and Sloppy 0.2.0 could well be up with the latter group of engines, but nothing definite will be known until further extensive testing is completed.
We test a very extensive range of amateur engines (currently ranging down to the 2000 ELO level) through a range of tournaments, all of which can be followed in our public forum.
Our aim is of course to ensure that all engines lower on our lists get 200+ games.
Blitz Notes
There are currently 265,228 games in our 40/4 database.
The latest ratings can be found at one of the following links:
http://computerchess.org.uk/ccrl/404/
http://computerchess.org.uk/ccrl/404.live/
An enormous amount of work goes into the blitz list and it is well worth a visit.
Hiarcs 12 testing has started here also and early indications are that it performs comparatively better at blitz than at longer time controls, battling it out with Zappa Mexico as the third strongest engine behind Rybka 2.3.2a and Naum 3.
Of special interest to some will be the best free 1CPU engines list which is being constructed through a systematic testing approach as mentioned here:
http://kirr.homeunix.org/chess/discussi ... f=7&t=3271
FRC Notes
No news this week.
There are currently 26,100 games in the FRC 40/4 database.
Ray tests only those engines that can play FRC through the Shredder Classic GUI.
If engine authors have a new and stable version of their engine that will run under this GUI, they should contact Ray if they wish to see it tested.
Hiarcs 12 comes in third amongst the available engines behind Shredder 11 and Naum 3 (remembering of course that the top engine, Rybka 2.3.2 FRC, has remained private).
For FRC the best list to look at is the pure list.
http://www.computerchess.org.uk/ccrl/404FRC/
Stats/Presentation Notes
The LOS (likelihood of superiority) stats to the right hand side of each rating list tell you the likelihood in percentage terms of each engine being superior to the engine directly below them.
A list of games played this week per engine can be found in the update thread in the CCRL public forum.
All games are available for download by engine, by month or by ECO code.
ELO ratings are now saved in all game databases for those engines that have 200 games or more.
Clicking on an engine name will give details as to opponents played plus homepage links where applicable.
Custom lists of engines can be selected for comparison.
An openings report page lists the number of games played by ECO codes with draw percentage and White win percentage. Clicking on a column heading will sort the list by that column.
CCRL update (11th April 2008)
Moderator: Ras
-
- Posts: 44626
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
CCRL update (11th April 2008)
gbanksnz at gmail.com
Re: CCRL update (11th April 2008)
How is it calculated ? Does it mean that we are 100% sure that rybka is better than Zappa Mexico II (same thing with naum 3 and Deep Shredder 11) ?The LOS (likelihood of superiority) stats to the right hand side of each rating list tell you the likelihood in percentage terms of each engine being superior to the engine directly below them.
Re: CCRL update (11th April 2008)
The calculations are a bit complicated I'm afraid but I think CCRL uses Bayeselo do to the task (http://remi.coulom.free.fr/Bayesian-Elo/).Golem wrote:How is it calculated ? Does it mean that we are 100% sure that rybka is better than Zappa Mexico II (same thing with naum 3 and Deep Shredder 11) ?The LOS (likelihood of superiority) stats to the right hand side of each rating list tell you the likelihood in percentage terms of each engine being superior to the engine directly below them.
For your second question, yes we are 100% (over 99.5% so we round up) sure that Rybka is stronger than Zappa Mexico. There are some technical details that must be respected for the statement being true, maybe they are mentionned in the documentation if you are interested to verify them.
-
- Posts: 44626
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: CCRL update (11th April 2008)
Marc MP wrote:The calculations are a bit complicated I'm afraid but I think CCRL uses Bayeselo do to the task (http://remi.coulom.free.fr/Bayesian-Elo/).Golem wrote:How is it calculated ? Does it mean that we are 100% sure that rybka is better than Zappa Mexico II (same thing with naum 3 and Deep Shredder 11) ?The LOS (likelihood of superiority) stats to the right hand side of each rating list tell you the likelihood in percentage terms of each engine being superior to the engine directly below them.
I think you're right, but Kirill or Shaun are the best to confirm. I'm just into the testing side of things.![]()
For your second question, yes we are 100% (over 99.5% so we round up) sure that Rybka is stronger than Zappa Mexico. There are some technical details that must be respected for the statement being true, maybe they are mentionned in the documentation if you are interested to verify them.
gbanksnz at gmail.com
Re: CCRL update (11th April 2008)
I found the reference Graham. It is on the "Thanks" page: http://www.computerchess.org.uk/ccrl/4040/thanks.htmlGraham Banks wrote:Marc MP wrote:The calculations are a bit complicated I'm afraid but I think CCRL uses Bayeselo do to the task (http://remi.coulom.free.fr/Bayesian-Elo/).Golem wrote:How is it calculated ? Does it mean that we are 100% sure that rybka is better than Zappa Mexico II (same thing with naum 3 and Deep Shredder 11) ?The LOS (likelihood of superiority) stats to the right hand side of each rating list tell you the likelihood in percentage terms of each engine being superior to the engine directly below them.
I think you're right, but Kirill or Shaun are the best to confirm. I'm just into the testing side of things.![]()
For your second question, yes we are 100% (over 99.5% so we round up) sure that Rybka is stronger than Zappa Mexico. There are some technical details that must be respected for the statement being true, maybe they are mentionned in the documentation if you are interested to verify them.
Toward the end, we can read:
Also:We want to thank Remi Coulom for creating a wonderful tool Bayeselo. We use it to compute our rating lists, LOS figures and performances.
Code: Select all
Kirill Kryukov and Shaun Brewer automated statistical analysis and designed the web-site, with lot of useful feedback from other team members.

Re: CCRL update (11th April 2008)
Thank you for the link and the explanation, I'll read all of this when I have some time.Marc MP wrote: The calculations are a bit complicated I'm afraid but I think CCRL uses Bayeselo do to the task (http://remi.coulom.free.fr/Bayesian-Elo/).
For your second question, yes we are 100% (over 99.5% so we round up) sure that Rybka is stronger than Zappa Mexico. There are some technical details that must be respected for the statement being true, maybe they are mentionned in the documentation if you are interested to verify them.