The latest updates of the CCRL Rating Lists and Statistics are available for viewing at:
http://www.computerchess.org.uk/ccrl/4040/ (40/40)
http://computerchess.org.uk/ccrl/404/ (40/4)
The live link to the 40/4 list given below is currently the most up to date for that list.
The lists sometimes get updated during the week and these updates can be viewed here:
http://www.computerchess.org.uk/ccrl/4040.live/ (40/40)
http://computerchess.org.uk/ccrl/404.live/ (40/4)
However, no game downloads are available from these live links.
The links to the various rating lists can be found just beneath the default Best Versions list.
For example there is a 32-bit Single CPU list.
Our 40 moves in 40 minutes repeating and 40 moves in 4 minutes repeating are both adjusted to the AMD64 X2 4600+ (2.4GHz).
Currently active testers are:
Graham Banks, Ray Banks, Shaun Brewer, Kirill Kryukov, Dom Leste, Tom Logan, Wassim Saeed, Charles Smith, George Speight and Gabor Szots.
Currently inactive testers are:
Sarah Bird, Andreas Schwartmann, Chris Taylor, Martin Thoresen and Chuck Wilson.
Be aware that in the early stages of testing, an engine's rating can often fluctuate a lot.
It is strongly advised to also look at the many other rating lists available in order to get a more accurate overall picture of an engine's rating relative to others.
40/40 Notes
There are currently just under 150,000 games in the 40/40 database.
4CPU 64-bit Engines
Rybka 3 is almost 150 elo clear at the top, ahead of both Naum 3.1 and Zappa Mexico II which are very even in strength.
50+ elo further back is a closely bunched group that includes Deep Shredder 11, Deep Sjeng 3.0, Toga II 1.4.1SE, Hiarcs 12, Bright 0.4a (private) and Deep Fritz 10.1. We have not tested Deep Sjeng WC2008 in this category yet.
Glaurung 2.1 and Loop M1-T are next in the pecking order.
The relative ratings of the 2CPU engines that have been well tested are pretty much the same as their 4CPU counterparts.
Single CPU Engines
Rybka 3 is roughly 190 elo ahead of other engines in this category. Although more games are still needed, it seems apparent that the default settings are better than both the dynamic and human settings.
Naum 3.1, Zappa Mexico II and Fritz 11 are all pretty close in strength. It is expected that Deep Sjeng WC2008 will join them once we've tested it more extensively.
There is a small margin back to Shredder 11 and Toga II 1.4.1SE.
Hiarcs 12 is further back still, ahead of the group that includes Bright 0.4a (private), Fruit 2.3.1, Loop 13.6, Glaurung 2.1, Cyclone 1.0 and Thinker 5.1e Passive.
Free Single CPU Engines
Rybka 2.2 heads the field with a 50+ elo gap back to Toga II 1.4.1SE.
There is a similar gap back to Fruit 2.3.1, Glaurung 2.1, Cyclone 1.0 and Thinker 5.2e Passive. Spike 1.2 Turin and Bright 0.3a are a further 20+ elo behind, but clearly stronger than the next group that includes Frenzee Feb08, Twisted Logic 20080620 and Delfi 5.4.
CCRL tests a wide range of free engines, ranging right down to the 1900 elo level. The intention is to get well over 200 games for each of these engines. This rating list is certainly our most extensive one.
Recently released engines that seem to have made big strides are Twisted Logic, Cyrano, DanaSah, Rotor, Pupsi and NanoSzachy.
Blitz Notes
An enormous amount of work goes into the blitz list, and with over 350,000 games in the database, it is well worth a visit.
Of special interest to some will be the best free 1CPU engines list which is being constructed through a systematic testing approach as mentioned here:
http://kirill-kryukov.com/chess/discuss ... f=7&t=3271
FRC Notes
Ray tests only those engines that can play FRC through the Shredder Classic GUI.
If engine authors have a new and stable version of their engine that will run under this GUI, they should contact Ray if they wish to see it tested.
Rybka 3 has a massive 200 elo lead over the closely grouped Shredder 11, Naum 3.1 and Deep Sjeng 3.0.
Hiarcs Paderborn 2007 in fifth spot is well ahead of Fruit 051103 and Loop 10.32f (the most recent Loop version that could play FRC).
For FRC the best list to look at is the pure list.
http://www.computerchess.org.uk/ccrl/404FRC/
Stats/Presentation Notes
The LOS (likelihood of superiority) stats to the right hand side of each rating list tell you the likelihood in percentage terms of each engine being superior to the engine directly below them.
All games are available for download by engine, by month or by ECO code.
ELO ratings are now saved in all game databases for those engines that have 200 games or more.
Clicking on an engine name will give details as to opponents played plus homepage links where applicable.
Custom lists of engines can be selected for comparison.
An openings report page lists the number of games played by ECO codes with draw percentage and White win percentage. Clicking on a column heading will sort the list by that column.
CCRL update (31st October 2008)
Moderator: Ras
-
Graham Banks
- Posts: 45323
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
CCRL update (31st October 2008)
gbanksnz at gmail.com
-
IWB
- Posts: 1539
- Joined: Thu Mar 09, 2006 2:02 pm
Re: CCRL update (31st October 2008)
Hello Graham
Just a question for understanding:
Have a look here:
http://www.computerchess.org.uk/ccrl/40 ... ons_only=1
and here:
http://www.computerchess.org.uk/ccrl/40 ... e_cpu.html
One list is: "Single CPU engines", the other is "Pure single CPU engines" but they have different rankings (5,6,7 + maybe others).
Unfortunately I have to say that every time I look at you list I found same "flaws". As soon as you look into details it becomes obvious that somehow you concept of different testers + different hardware + different times + different opponents + different number of games for each entry seems to be "suspicious".
Sorry but please have a look at all of your basic test conditions. This is really puzzling me as i try to find a good and reliable testing method for engines
Nevertheless I have a very high respect for the amount of work you are investing, keep the good work (but check you methods)
Ingo Bauer
Just a question for understanding:
Have a look here:
http://www.computerchess.org.uk/ccrl/40 ... ons_only=1
and here:
http://www.computerchess.org.uk/ccrl/40 ... e_cpu.html
One list is: "Single CPU engines", the other is "Pure single CPU engines" but they have different rankings (5,6,7 + maybe others).
Unfortunately I have to say that every time I look at you list I found same "flaws". As soon as you look into details it becomes obvious that somehow you concept of different testers + different hardware + different times + different opponents + different number of games for each entry seems to be "suspicious".
Sorry but please have a look at all of your basic test conditions. This is really puzzling me as i try to find a good and reliable testing method for engines
Nevertheless I have a very high respect for the amount of work you are investing, keep the good work (but check you methods)
Ingo Bauer
-
Graham Banks
- Posts: 45323
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: CCRL update (31st October 2008)
Hi Ingo,
the "pure" lists only include games played amongst the engines on those lists.
Cheers, Graham.
the "pure" lists only include games played amongst the engines on those lists.
Cheers, Graham.
gbanksnz at gmail.com
-
IWB
- Posts: 1539
- Joined: Thu Mar 09, 2006 2:02 pm
Re: CCRL update (31st October 2008)
Hello
I see differnet number of games in the lists, but does this mean that the single list have games vs Engines which are NOT in the list? I did not check this, but whom do you expect to understand that?
Now completly puzzled
Ingo
That doent really help.Graham Banks wrote:Hi Ingo,
the "pure" lists only include games played amongst the engines on those lists.
Cheers, Graham.
I see differnet number of games in the lists, but does this mean that the single list have games vs Engines which are NOT in the list? I did not check this, but whom do you expect to understand that?
Now completly puzzled
Ingo
-
Graham Banks
- Posts: 45323
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: CCRL update (31st October 2008)
From the notes at the top of each pure list:IWB wrote:Hello
That doent really help.Graham Banks wrote:Hi Ingo,
the "pure" lists only include games played amongst the engines on those lists.
Cheers, Graham.
I see differnet number of games in the lists, but does this mean that the single list have games vs Engines which are NOT in the list? I did not check this, but whom do you expect to understand that?
Now completly puzzled
Ingo
"Pure" list removes rating distortion
"Pure" list is computed to remove the distortion that may affect the main rating list. Distortion appears when several versions or settings of the same engine are included together in the testing study. Suppose you have engine A and several versions of engine B: B1, B2, B3. Suppose also that A is particularly strong versus any version of B, which often happens in real testing because of some characteristics of those engines. In such case A will have higher rating, comparing to the study where only one version of B is present. Same thing may happen when A is weak versus B, getting lower rating.
To remove that distortion, a separate game database is constructed from games played only by best version in each engine "family". To save some space and time, pure database has all moves stripped out, it contains PGN header and results only. Then the "Pure list" is computed based for that "pure" database using Bayeselo.
gbanksnz at gmail.com
-
IWB
- Posts: 1539
- Joined: Thu Mar 09, 2006 2:02 pm
Re: CCRL update (31st October 2008)
Hello Graham
Hello Graham
-------------------------------------------------------------------------------
Contents:
* Home
* 40/4
* 40/4 FRC
* 40/40
* Forum
CCRL 40/40
Downloads and Statistics
October 31, 2008
Testing summary:
Total: 147'043 games
played by 531 programs
21374 CPU days (X2 4600+)
White wins: 53'156 (36.1%)
Black wins: 39'891 (27.1%)
Draws: 53'996 (36.7%)
White score: 54.5%
* Index
* About
* Complete list
* Pure list
* Games
* Correlation
* History
* Thanks
* Links
Pure list for single-CPU engines
Pure database download
To save space, pure database has all moves stripped out, it contains PGN header and results only. This pure database is useful only for rating calculation or similar analysis, it does not have actual games, only the results.
Download pure database, 7'620 games: 0.02 MB
CCRL 40/40 Rating List -- Pure single-CPU engines
------------------------------------------------------------------------------------------------------------------
There is nothing mentioned as described by you on the first check. But OK, I got the principle now!
One question:
Given engine A is very good vs any subset of engine B, while engine C is very bad against any subset of engine B and engine A is very bad vs the whole subset of engine C. Now your pure list has a problem as you only count "best " versions while it might be that a engine A is very good vs anything else than lower subset of C.
Yes this is a construct, but valid anyhow!
I have to think about this by myself!
Again: I know about the amount of work you are doing and I apreciate that!
Bye
Ingo
Graham Banks wrote:From the notes at the top of each pure list:IWB wrote:Hello
That doent really help.Graham Banks wrote:Hi Ingo,
the "pure" lists only include games played amongst the engines on those lists.
Cheers, Graham.
I see differnet number of games in the lists, but does this mean that the single list have games vs Engines which are NOT in the list? I did not check this, but whom do you expect to understand that?
Now completly puzzled
Ingo
"Pure" list removes rating distortion
"Pure" list is computed to remove the distortion that may affect the main rating list. Distortion appears when several versions or settings of the same engine are included together in the testing study. Suppose you have engine A and several versions of engine B: B1, B2, B3. Suppose also that A is particularly strong versus any version of B, which often happens in real testing because of some characteristics of those engines. In such case A will have higher rating, comparing to the study where only one version of B is present. Same thing may happen when A is weak versus B, getting lower rating.
To remove that distortion, a separate game database is constructed from games played only by best version in each engine "family". To save some space and time, pure database has all moves stripped out, it contains PGN header and results only. Then the "Pure list" is computed based for that "pure" database using Bayeselo.
Hello Graham
This is the top of the pure list as I see it:Graham Banks wrote: From the notes provided at the top of each pure list:
-------------------------------------------------------------------------------
Contents:
* Home
* 40/4
* 40/4 FRC
* 40/40
* Forum
CCRL 40/40
Downloads and Statistics
October 31, 2008
Testing summary:
Total: 147'043 games
played by 531 programs
21374 CPU days (X2 4600+)
White wins: 53'156 (36.1%)
Black wins: 39'891 (27.1%)
Draws: 53'996 (36.7%)
White score: 54.5%
* Index
* About
* Complete list
* Pure list
* Games
* Correlation
* History
* Thanks
* Links
Pure list for single-CPU engines
Pure database download
To save space, pure database has all moves stripped out, it contains PGN header and results only. This pure database is useful only for rating calculation or similar analysis, it does not have actual games, only the results.
Download pure database, 7'620 games: 0.02 MB
CCRL 40/40 Rating List -- Pure single-CPU engines
------------------------------------------------------------------------------------------------------------------
There is nothing mentioned as described by you on the first check. But OK, I got the principle now!
One question:
Given engine A is very good vs any subset of engine B, while engine C is very bad against any subset of engine B and engine A is very bad vs the whole subset of engine C. Now your pure list has a problem as you only count "best " versions while it might be that a engine A is very good vs anything else than lower subset of C.
Yes this is a construct, but valid anyhow!
I have to think about this by myself!
Again: I know about the amount of work you are doing and I apreciate that!
Bye
Ingo
-
Graham Banks
- Posts: 45323
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: CCRL update (31st October 2008)
Hi Ingo,IWB wrote: One question:
Given engine A is very good vs any subset of engine B, while engine C is very bad against any subset of engine B and engine A is very bad vs the whole subset of engine C. Now your pure list has a problem as you only count "best " versions while it might be that a engine A is very good vs anything else than lower subset of C.
Yes this is a construct, but valid anyhow!
I have to think about this by myself!
Again: I know about the amount of work you are doing and I apreciate that!
Bye
Ingo
Kirill is more the statistician than I am, so I'll get him to answer your questions.
We don't mind people asking questions or posting constructive criticism about what we do.
We fully realise that our rating lists will have some anomalies. All rating lists do, which is why we recommend that all rating lists should be looked at to provide an overall picture of engines' strength relative to each other.
Regards, Graham.
gbanksnz at gmail.com