CCRL update (25th August 2007)

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
Graham Banks
Posts: 41455
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

CCRL update (25th August 2007)

Post by Graham Banks »

The August 25th update of the CCRL Rating Lists and Statistics is now available for viewing at:
http://www.computerchess.org.uk/ccrl/4040/

The links to the various rating lists can be found just beneath the default Best Versions list.
For example there is a 32-bit Single CPU list.

Our standard testing is at 40 moves in 40 minutes repeating while our current blitz testing is at both 40 moves in 4 minutes repeating and 40 moves in 12 minutes repeating, all adjusted to the AMD64 X2 4600+ (2.4GHz).

Currently active testers in our team are:
Graham Banks, Ray Banks, Shaun Brewer, Kirill Kryukov, Dom Leste, Tom Logan, Andreas Schwartmann, Charles Smith, George Speight, Chris Taylor, Chuck Wilson, Gabor Szots and Martin Thoresen.

A big thanks to all testers as usual for their efforts this week.


40/40 Notes

There currently 71,846 games in our 40/40 database.

Many engines on our list have few games and in many cases their ratings are likely to fluctuate (markedly for some) until a lot more games are played. Therefore no conclusions should be drawn about their strength yet.
To illustrate this point, when an engine has 200 games played, the error margin is still approximately +-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!
This of course highlights the importance of looking at other rating lists that are also available in order to draw comparisons and get a more accurate overall picture.


Multi CPU Engines

Although Rybka 2.3.2a 64-bit 4CPU seems to be only a small improvement over Rybka 2.2 64-bit 4CPU, the gap is greater on 2CPU.

Zap!Chess Zanzibar 64-bit 4CPU is clearly number 2 ahead of Naum 2.2 64-bit 4CPU and Hiarcs 11.1 4CPU.
Although Hiarcs 11.2 4CPU has few games so far, results in our various lists and others seem to indicate that it is a little weaker than Hiarcs 11.1.

The current ratings for Loop M1-T 64-bit 4CPU and 2CPU suggest that there is little gain from the extra two CPU.

Deep Shredder 10 64-bit 4CPU, Deep Fritz 10 4CPU and Deep Junior 10 4CPU are off the pace.

Glaurung 2 epsilon/5 64-bit 2CPU is the strongest free engine on this list.


Single CPU Engines

Rybka 2.3.2a leads the ratings here as well, although by a slightly larger margin.
It also looks like the 64-bit version could make more difference to strength than with previous versions.

Newly released Toga II 1.3.1 is battling it out for second spot with Zap!Chess Zanzibar!
Loop M1-T could well be a threat to both as it gets more games under its belt.
Toga II 1.3.1 looks to be stronger than Toga II 1.3.4 at this time control.

It now appears likely that Hiarcs 11.2 is not an improvement over Hiarcs 11.1.

Naum 2.2 has made a 40 ELO gain over the previous version and looks to be similar in strength to the well tested Fritz 10.
Strelka 1.8 and Fruit 051103 are with them at present, but need many more games.
Fruit 051103 certainly seems to have an edge over Fruit 2.3.1, although the latter does play rather attractively.

Shredder 10 is slightly behind the aforementioned group.

Spike 1.2 Turin, Junior 10 and Deep Sjeng 2.5 are the next group of engines and seem to be very even in strength.
Junior 10.1 is weaker than Junior 10 according to our testing.

SmarThink 1.00, Ktulu 8.0, Glaurung 2 epsilon/5 and Chess Tiger 2007.1 are further adrift.


Free Engines

Rybka 1.0 retains its crown as the top free engine ahead of Strelka 1.8 and the newly released Togas and Fruits.

Spike 1.2 Turin has been left behind a little by some of the new releases.

Glaurung 2 epsilon/5 and Naum 2.0 are next, ahead of Alaric 707, Scorpio 1.91, Delfi 5.1 and SlowChess Blitz WV2.1.

Zappa 1.1, WildCat 7, Pro Deo 1.2 and List 512 are further back.

As we make our way down the list, it should be noted that the most recent versions of Booot, DanaSah, Hermann, Natwarlal and Feuerstein seem to have made good gains over previous versions.
Others to keep an eye on as they get more games are the latest versions of Hamsters, BugChess2, Alfil, Popochin, NanoSzachy and GreKo.

We test a very extensive range of amateur engines through our Amateur Championship divisions (32-bit 1CPU) plus other tournaments, all of which can be followed in our public forum.

Our aim is of course to ensure that all engines lower on our lists get at least 200 games.


Blitz Notes

There are currently 161,294 games in our 40/4 database.

The 40/4 update is usually done separately to our 40/40 update. The most recent update can always be viewed here:
http://computerchess.org.uk/ccrl/404.live/


FRC Notes

There are currently 17,600 games in the FRC 40/4 database.

Ray tests only those engines that can play FRC through the Shredder Classic GUI.
If engine authors have a new and stable version of their engine that will run under this GUI, they should contact Ray if they wish to see it tested.

Ray has recently tested Rybka 2.3.2, Hiarcs 11.2, Naum 2.2, Fruit 2.3, Fruit 051103, Hamsters 0.4, Hermann 2.0 and Movei 0.08.438.
All are now included in the ratings.

The improvement in strength of Hamsters 0.4 and Movei 0.08.438 is particularly noteworthy.

Rybka 2.3.2 (private) has predictably taken over top spot, but Hiarcs 11.1 is still the best available FRC engine.

For FRC the best list to look at is the pure list.
http://www.computerchess.org.uk/ccrl/404FRC/


Stats/Presentation Notes

The LOS stats to the right hand side of each rating list are "likelihood of superiority" stats. They tell you the likelihood in percentage terms of each engine being superior to the engine directly below them.

A list of games played this week per engine can be found in the update thread in the CCRL public forum, accessible through the link given at the top of this post.
Please note that our forum has been moved and is now much quicker to load and more readily accessible. The link given will redirect you automatically.

All games are available for download through the link given at the top of this post. They can be downloaded by engine or by month.
ELO ratings are now saved in all game databases for those engines that have 200 games or more.

Clicking on an engine name will give details as to opponents played plus homepage links where applicable.

Custom lists of engines can be selected for comparison.

An openings report page (link at bottom of index page) lists the number of games played by ECO codes with draw percentage and White win percentage. Clicking on a column heading will sort the list by that column.
Games can now be downloaded by ECO code.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: CCRL update (25th August 2007)

Post by IWB »

Hello Graham

I have a few questions because I do not get some things:

In your 32 bit List Fruit 05/11/03 is ranked 7th while S10 is 8th. Both have the same Elo rating, Fruit has a deviation of +/- 59 and Shredder of +/-10. According to your explenation about LOS Fruit has an LOS of 49.7% over Shredder. Besides the fact that to my best knowledge Fruit 05/11/03 is really better than Shredder with at least 25 Elo I do not understand why your list came to that conclusion? According to the date I see it should be the other way around (at the moment!)

How do you calculate a LOS? Shredder and Fruit have the same rating with different deviation. They both sit in the middle of their range. Should'nt the LOS be 50% and not 49.7?

Additionaly I see something similar between Hiarcs X54 64 bit and Spike 1.2 Turin - Rank 9 and 10.

Mabye I got something completly wrong.

Thx and bye
Ingo
User avatar
Graham Banks
Posts: 41455
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: CCRL update (25th August 2007)

Post by Graham Banks »

IWB wrote:Hello Graham

I have a few questions because I do not get some things:

In your 32 bit List Fruit 05/11/03 is ranked 7th while S10 is 8th. Both have the same Elo rating, Fruit has a deviation of +/- 59 and Shredder of +/-10. According to your explenation about LOS Fruit has an LOS of 49.7% over Shredder. Besides the fact that to my best knowledge Fruit 05/11/03 is really better than Shredder with at least 25 Elo I do not understand why your list came to that conclusion? According to the date I see it should be the other way around (at the moment!)

How do you calculate a LOS? Shredder and Fruit have the same rating with different deviation. They both sit in the middle of their range. Should'nt the LOS be 50% and not 49.7?

Additionaly I see something similar between Hiarcs X54 64 bit and Spike 1.2 Turin - Rank 9 and 10.

Mabye I got something completly wrong.

Thx and bye
Ingo
Hi Ingo,

I'll get Kirill to answer this one! :P
However, I do note that Fruit 051103 only has 86 games so far, so no definite conclusions can be made yet.

Regards, Graham.
gbanksnz at gmail.com
User avatar
Kirill Kryukov
Posts: 492
Joined: Sun Mar 19, 2006 4:12 am

Re: CCRL update (25th August 2007)

Post by Kirill Kryukov »

IWB wrote:Hello Graham

I have a few questions because I do not get some things:

In your 32 bit List Fruit 05/11/03 is ranked 7th while S10 is 8th. Both have the same Elo rating, Fruit has a deviation of +/- 59 and Shredder of +/-10. According to your explenation about LOS Fruit has an LOS of 49.7% over Shredder. Besides the fact that to my best knowledge Fruit 05/11/03 is really better than Shredder with at least 25 Elo I do not understand why your list came to that conclusion? According to the date I see it should be the other way around (at the moment!)

How do you calculate a LOS? Shredder and Fruit have the same rating with different deviation. They both sit in the middle of their range. Should'nt the LOS be 50% and not 49.7?

Additionaly I see something similar between Hiarcs X54 64 bit and Spike 1.2 Turin - Rank 9 and 10.

Mabye I got something completly wrong.

Thx and bye
Ingo
Hello, Ingo. I guess you are talking about this rating list. As you can see there, "Fruit 05/11/03" has no rank there (this can happen if it has too few games, or if its another version is rated higher in the same list). It has empty space in the "Rank" column, so it has no rank. Shredder 10 is ranked #8.

I think we don't care currently about the order of those engines with identical rating. As you said, LOS value suggests that Shredder 10 is slightly stronger than Fruit 05/11/03, but the ratings have to be integer, so they can be identical due to rounding.
IWB wrote:Besides the fact that to my best knowledge Fruit 05/11/03 is really better than Shredder with at least 25 Elo I do not understand why your list came to that conclusion?
We compute ratings based on the database of games those engines played under our conditions. Sorry, but we can't incorporate "your knowledge" into our rating estimation.
IWB wrote:How do you calculate a LOS? Shredder and Fruit have the same rating with different deviation. They both sit in the middle of their range. Should'nt the LOS be 50% and not 49.7?
Ratings and LOS values are computed with Bayeselo program. Note that ratigns we see are always rounded to integer numbers, that's why the ratings can look identical while LOS is not 50.0%.
IWB wrote:Additionaly I see something similar between Hiarcs X54 64 bit and Spike 1.2 Turin - Rank 9 and 10.
May be Bayeselo can be modified to output fractional ratings, then this problem can be solved easily. At the moment the order of engines with the same rating is random in our list. (Or may be it is not random I don't remember, but at least it is not related to any strength estimation). So no conclusions should be made based on order of engines with the same rating in our list.

Best,
Kirill
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: CCRL update (25th August 2007)

Post by IWB »

Hello Kirill
Kirill Kryukov wrote: ... As you can see there, "Fruit 05/11/03" has no rank there (this can happen if it has too few games, or if its another version is rated higher in the same list). It has empty space in the "Rank" column, so it has no rank.
Ahh ok, I did not thought about that. I interpreted it as it is done ususlally in rankings lists as identical positions (which would not make sense in that case either, as there are different ratings for the engines)
Kirill Kryukov wrote: We compute ratings based on the database of games those engines played under our conditions. Sorry, but we can't incorporate "your knowledge" into our rating estimation.
That was not what I ment or have asked for - I only have to be carefull if I compare something with Shredder, I get easily "beaten" by others when doing this! :wink:

Thanks for your explanations, now it makes much more sence to me!

Bye and have a nice weekend
Ingo
User avatar
Kirill Kryukov
Posts: 492
Joined: Sun Mar 19, 2006 4:12 am

Re: CCRL update (25th August 2007)

Post by Kirill Kryukov »

IWB wrote:Hello Kirill
Kirill Kryukov wrote: ... As you can see there, "Fruit 05/11/03" has no rank there (this can happen if it has too few games, or if its another version is rated higher in the same list). It has empty space in the "Rank" column, so it has no rank.
Ahh ok, I did not thought about that. I interpreted it as it is done ususlally in rankings lists as identical positions (which would not make sense in that case either, as there are different ratings for the engines)
In cases where two ranked engines have identical rating, we show their ranks as "24‑25" for both ("24‑25" is an example from the same list). :)

Best to you,
Kirill
Norm Pollock
Posts: 1056
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: CCRL update (25th August 2007)

Post by Norm Pollock »

Hi Graham & Kiril,

In the 40/40 group, I wonder why testing was halted for Naum 2.2 64-bit 2CPU. It only has played 31 games. And based on that, its rating is actually higher (3013 to 3004) than the 4CPU version which has played 266 games.

So by your rules, because of insufficient games, the 4CPU version is considered to be the "best" version. But how about giving the 2CPU version a fair shot at being the "best" version? Btw, the 2CPU version has a 56.4% LOS over the 4CPU version.

-Norm
User avatar
Graham Banks
Posts: 41455
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: CCRL update (25th August 2007)

Post by Graham Banks »

Norm Pollock wrote:Hi Graham & Kiril,

In the 40/40 group, I wonder why testing was halted for Naum 2.2 64-bit 2CPU. It only has played 31 games. And based on that, its rating is actually higher (3013 to 3004) than the 4CPU version which has played 266 games.

So by your rules, because of insufficient games, the 4CPU version is considered to be the "best" version. But how about giving the 2CPU version a fair shot at being the "best" version? Btw, the 2CPU version has a 56.4% LOS over the 4CPU version.

-Norm
Hi Norm,

our 2CPU testing needs a lot more input for sure.
We really need another tester or two who're prepared to do either 64-bit 2CPU or 32-bit 2CPU testing or both.
Trouble is that genuine testers are in short supply whereas bogus ones aren't. :(
If there are one or two genuine testers interested in applying to join CCRL, please drop me a line.

Regards, Graham.
gbanksnz at gmail.com