CCRL update (8th September 2007)

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Graham Banks
Posts: 44581
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

CCRL update (8th September 2007)

Post by Graham Banks »

The September 8th update of the CCRL Rating Lists and Statistics is now available for viewing at:
http://www.computerchess.org.uk/ccrl/4040/

The links to the various rating lists can be found just beneath the default Best Versions list.
For example there is a 32-bit Single CPU list.

Our standard testing is at 40 moves in 40 minutes repeating while our current blitz testing is at both 40 moves in 4 minutes repeating and 40 moves in 12 minutes repeating, all adjusted to the AMD64 X2 4600+ (2.4GHz).

Currently active testers in our team are:
Graham Banks, Ray Banks, Shaun Brewer, Kirill Kryukov, Dom Leste, Tom Logan, Andreas Schwartmann, Charles Smith, George Speight, Chris Taylor, Chuck Wilson, Gabor Szots and Martin Thoresen.

A big thanks to all testers as usual for their efforts this week.


40/40 Notes

There currently 73,174 games in our 40/40 database.

Many engines on our list have few games and in many cases their ratings are likely to fluctuate (markedly for some) until a lot more games are played. Therefore no conclusions should be drawn about their strength yet.
To illustrate this point, when an engine has 200 games played, the error margin is still approximately +-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!
This of course highlights the importance of looking at other rating lists that are also available in order to draw comparisons and get a more accurate overall picture.


Multi CPU Engines

Rybka 2.3.2a 64-bit 4CPU predictably rules the roost.

Zap!Chess Zanzibar 64-bit 4CPU is clearly number 2 ahead of Naum 2.2 64-bit 4CPU and Hiarcs 11.1 4CPU.
Hiarcs 11.2 4CPU still requires more games.

Loop M1-T 64-bit 4CPU is next in the pecking order, ahead of the oldies - Deep Shredder 10 64-bit 4CPU, Deep Fritz 10 4CPU and Deep Junior 10 4CPU.

Glaurung 2 epsilon/5 64-bit 2CPU is the strongest free engine on this list.


Single CPU Engines

Rybka 2.3.2a leads the ratings here as well, although by a slightly larger margin.

Toga II 1.3.1, Loop M1-T, Zap!Chess Zanzibar!, Hiarcs 11.1 and Naum 2.2 are all very close in strength. There is no clearly second strongest engine at this stage.
Even though Deep Sjeng 2.7 has only a handful of games at this stage, watch out for it to be up with this group. It seems to be a nice improvement over Deep Sjeng 2.5.

Fritz 10, Fruit 051103, Fruit 2.3.1, Shredder 10 and Strelka 1.8 are the next group of engines, just a little further back, but the margin is almost negligible (20 ELO).

Spike 1.2 Turin and Junior 10 are no longer up there with the top engines, if you can base that on a 50 ELO point difference.

Likewise there is a similar gap in strength back to Ktulu 8.0, SmarThink 1.00, Glaurung 2 epsilon/5 and Chess Tiger 2007.1.


Free Engines

Rybka 1.0 retains its crown as the top free engine ahead of Toga II 1.3.1.

Fruit 051103 and Fruit 2.3.1 both need many more games before we can state with any certainty which is the stronger of the two.
Whether or not, they can stay ahead of Strelka 1.8 remains to be seen.

Spike 1.2 Turin is 20 ELO further back.

Glaurung 2 epsilon/5 and Naum 2.0 are next, ahead of Alaric 707, Scorpio 1.91, Delfi 5.1 and SlowChess Blitz WV2.1.

Zappa 1.1, Movei 0.08.438, WildCat 7, Pro Deo 1.2 and List 512 are further back.
The latest Movei is a good improvement over previous versions.

As we make our way down the list, it should be noted that the most recent versions of Booot, Hamsters, DanaSah and Natwarlal seem to have made good gains over previous versions.
Others to keep an eye on as they get more games are the latest versions of BugChess2 and Popochin.

We test a very extensive range of amateur engines through our Amateur Championship divisions (32-bit 1CPU) plus other tournaments, all of which can be followed in our public forum.

Our aim is of course to ensure that all engines lower on our lists get at least 200 games.


Blitz Notes

There are currently 164,174 games in our 40/4 database.

The 40/4 update is usually done separately to our 40/40 update.
There has been no update this week because Shaun is on vacation.
http://computerchess.org.uk/ccrl/404/


FRC Notes

There are currently 19,400 games in the FRC 40/4 database.

Ray tests only those engines that can play FRC through the Shredder Classic GUI.
If engine authors have a new and stable version of their engine that will run under this GUI, they should contact Ray if they wish to see it tested.

Although Rybka 2.3.2 FRC tops the list, it is a private engine, therefore Hiarcs 11.1 is still the best available FRC engine.

Ray has recently tested Rybka 2.3.2, Hiarcs 11.2, Naum 2.2, Fruit 2.3, Fruit 051103, Hamsters 0.4, Hermann 2.0 and Movei 0.08.438.
All are now included in the ratings.

Deep Sjeng 2.7 is currently being tested, and with testing almost complete, looks likely to be a 70+ ELO improvement over the previous version.
The FRC list will be updated during this weekend.

For FRC the best list to look at is the pure list.
http://www.computerchess.org.uk/ccrl/404FRC/


Stats/Presentation Notes

The LOS stats to the right hand side of each rating list are "likelihood of superiority" stats. They tell you the likelihood in percentage terms of each engine being superior to the engine directly below them.

A list of games played this week per engine can be found in the update thread in the CCRL public forum, accessible through the link given at the top of this post.

All games are available for download through the link given at the top of this post. They can be downloaded by engine or by month.
ELO ratings are now saved in all game databases for those engines that have 200 games or more.

Clicking on an engine name will give details as to opponents played plus homepage links where applicable.
New to the web site
Clicking on an engine name now also gives you a ratings history graph for that engine over time (a bit further down the page). The green line is the actual rating. The red lines are the upper and lower error bars, and the blue line represents the number of games. We think this really cool :-)

Custom lists of engines can be selected for comparison.

An openings report page (link at bottom of index page) lists the number of games played by ECO codes with draw percentage and White win percentage. Clicking on a column heading will sort the list by that column.
Games can now be downloaded by ECO code.
gbanksnz at gmail.com
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: CCRL update (8th September 2007)

Post by Dirt »

Graham Banks wrote:New to the web site
Clicking on an engine name now also gives you a ratings history graph for that engine over time (a bit further down the page). The green line is the actual rating. The red lines are the upper and lower error bars, and the blue line represents the number of games. We think this really cool :-)
Very nice indeed, although the transparent background makes the graphs look horrible with my dark background choice. It took me a moment to realize the blue line (number of games) scales from zero to one hundred percent.

While you've explained the graph, an explanation on the page would be nice. Even just changing the title to "Daily rating, confidence range, and total games" would do.
User avatar
Graham Banks
Posts: 44581
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: CCRL update (8th September 2007)

Post by Graham Banks »

Dirt wrote:
Graham Banks wrote:New to the web site
Clicking on an engine name now also gives you a ratings history graph for that engine over time (a bit further down the page). The green line is the actual rating. The red lines are the upper and lower error bars, and the blue line represents the number of games. We think this really cool :-)
Very nice indeed, although the transparent background makes the graphs look horrible with my dark background choice. It took me a moment to realize the blue line (number of games) scales from zero to one hundred percent.

While you've explained the graph, an explanation on the page would be nice. Even just changing the title to "Daily rating, confidence range, and total games" would do.
Hi Greg,

I'll draw your request to Kirill's attention. 8-)

Regards, Graham.
gbanksnz at gmail.com
User avatar
Ovyron
Posts: 4562
Joined: Tue Jul 03, 2007 4:30 am

Re: CCRL update (8th September 2007)

Post by Ovyron »

Graham Banks wrote:the blue line represents the number of games.
I am confused, on the main page I read that (for example) Rybka 2.3.2a 64bit 4CPU has 507 games, however, the blue line in the graph points to the number 3220, that would mean 3220 games. Maybe I'm the only one confused by this?

EDIT - I think showing the games as a percentage is a bit meaningless, and that having them showing the number of games over time would be more meaningful.
Your beliefs create your reality, so be careful what you wish for.
User avatar
Werner
Posts: 2991
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: CCRL update (8th September 2007)

Post by Werner »

Hi Graham,
I think, you have very nice graphics and statistics on your site!

The new DeepSjeng scores really good. But I am afraid, it cannot hold its present rangking near Hiarcs 11.1 4CPU :D :D

4 Hiarcs 11.1 4CPU 2985 +22 −22 54.9% −32.1 43.2% 669 50.2%
Deep Sjeng 2.7 1CPU 2984 +270 −206 83.3% −185.0 33.3% 6

(of course not to much games...)
Werner
User avatar
Graham Banks
Posts: 44581
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: CCRL update (8th September 2007)

Post by Graham Banks »

Werner wrote:Hi Graham,
I think, you have very nice graphics and statistics on your site!

The new DeepSjeng scores really good. But I am afraid, it cannot hold its present rangking near Hiarcs 11.1 4CPU :D :D

4 Hiarcs 11.1 4CPU 2985 +22 −22 54.9% −32.1 43.2% 669 50.2%
Deep Sjeng 2.7 1CPU 2984 +270 −206 83.3% −185.0 33.3% 6

(of course not to much games...)
Hello Werner,

Deep Sjeng's rating will come down for sure! :wink:

According to 1000+ games in Ray's FRC testing, Deep Sjeng 2.7 is 80 ELO ahead of Deep Sjeng 2.5.

I suspect that at 40/40 it's more likely to be about a 50 point difference at best. Ready to eat humble pie again if I'm wrong! :P :lol:

Regards, Graham.
gbanksnz at gmail.com
User avatar
Kirill Kryukov
Posts: 518
Joined: Sun Mar 19, 2006 4:12 am
Full name: Kirill Kryukov

Re: CCRL update (8th September 2007)

Post by Kirill Kryukov »

Greg Simpson wrote:Very nice indeed, although the transparent background makes the graphs look horrible with my dark background choice. It took me a moment to realize the blue line (number of games) scales from zero to one hundred percent.
You know, it's similar like to come to the shop in dark glasses and complain that everything looks dark. Our web-pages specify a background color. If you override it with some color of your own then who there is to blame that it's unreadable? BTW, we use colors to communicate some information too, not only for decoration, so when you change the colors you may be changing the meaning or information on those pages too.
Greg Simpson wrote:While you've explained the graph, an explanation on the page would be nice. Even just changing the title to "Daily rating, confidence range, and total games" would do.
Yes we are planning to add the explanations too. Thanks for comments!
User avatar
Kirill Kryukov
Posts: 518
Joined: Sun Mar 19, 2006 4:12 am
Full name: Kirill Kryukov

Re: CCRL update (8th September 2007)

Post by Kirill Kryukov »

Ulysses P. wrote:
Graham Banks wrote:the blue line represents the number of games.
I am confused, on the main page I read that (for example) Rybka 2.3.2a 64bit 4CPU has 507 games, however, the blue line in the graph points to the number 3220, that would mean 3220 games. Maybe I'm the only one confused by this?
It seems you are not the only one. We will add an explanation in future updates. :-)
Ulysses P. wrote:EDIT - I think showing the games as a percentage is a bit meaningless, and that having them showing the number of games over time would be more meaningful.
Sorry, I'm not sure what you mean here. Will the resulting graph be different from our current ones, or only the axis markings?
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: CCRL update (8th September 2007)

Post by Dirt »

Kirill Kryukov wrote:You know, it's similar like to come to the shop in dark glasses and complain that everything looks dark. Our web-pages specify a background color. If you override it with some color of your own then who there is to blame that it's unreadable? BTW, we use colors to communicate some information too, not only for decoration, so when you change the colors you may be changing the meaning or information on those pages too.
I didn't ask you to change that. As I find bright backgrounds intolerable I don't allow them, and I accept the occasional unreadable graphic. Since my preference are unusual (but not unique: Microsoft provides a high contrast color scheme for a reason), I don't expect people to change things to accommodate me. On the other hand, I don't see what you gain by using a transparent png file when for legibility the background pretty much has to be a particular color; although there easily could be some issue that I am unaware of.
User avatar
Kirill Kryukov
Posts: 518
Joined: Sun Mar 19, 2006 4:12 am
Full name: Kirill Kryukov

Re: CCRL update (8th September 2007)

Post by Kirill Kryukov »

Dirt wrote:I didn't ask you to change that. As I find bright backgrounds intolerable I don't allow them, and I accept the occasional unreadable graphic. Since my preference are unusual (but not unique: Microsoft provides a high contrast color scheme for a reason), I don't expect people to change things to accommodate me. On the other hand, I don't see what you gain by using a transparent png file when for legibility the background pretty much has to be a particular color; although there easily could be some issue that I am unaware of.
Actually I was not aware that we use PGN with transparent background. :-) I agree that it does not make particularly much sense. I will look if I can change it to normal opaque background. Though it is pretty low in the priority list, I hope you'll understand. Thanks for your feedback!