CCRL update (4th May 2007)

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Graham Banks
Posts: 34850
Joined: Sun Feb 26, 2006 9:52 am
Location: Auckland, NZ

CCRL update (4th May 2007)

Post by Graham Banks » Fri May 04, 2007 10:23 pm

The May 4th update of the CCRL Rating Lists and Statistics is now available for viewing at:
http://www.computerchess.org.uk/ccrl/4040/

The links to the various rating lists can be found just beneath the default Best Versions list.
For example there is a 32-bit Single CPU list.

Our standard testing is at 40 moves in 40 minutes repeating while our current blitz testing is at both 40 moves in 4 minutes repeating and 40 moves in 12 minutes repeating, all adjusted to the AMD64 X2 4600+ (2.4GHz).

Currently active testers in our team are:
Graham Banks, Ray Banks, Shaun Brewer, Kirill Kryukov, Dom Leste, Tom Logan, Andreas Schwartmann, Charles Smith, George Speight, Chris Taylor, Chuck Wilson and Gabor Szots.

We remain on the lookout for a few more testers to help, so please contact one of us if you're interested.
You do not need to own any commercial engines as amateurs are also a big part of our testing.
An extra tester who could carry out 2CPU 64-bit and 32-bit testing at 40/40 time control would be a valuable asset to our team.


40/40 Notes

762 games were added to our 40/40 database this week, making a total of 57,475 games.
A big thanks to all testers as usual.

Many engines on our list have few games and in many cases their ratings are likely to fluctuate (markedly for some) until a lot more games are played. Therefore no conclusions should be drawn about their strength yet.
To illustrate this point, when an engine has 200 games played, the error margin is still approximately +/-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!
This of course highlights the importance of looking at other rating lists that are also available in order to draw comparisons and get a more accurate overall picture.

We are not testing Rybka 2.3 at 40/40, preferring to wait for the bugfixed Rybka 2.3.2 to be released.


4CPU Engines
3105 - Rybka 2.2 64-bit 4CPU
3044 - Zap!Chess Zanzibar 64-bit 4CPU
2978 - Hiarcs 11.1 4CPU
2955 - Naum 2.1 64-bit 4CPU
2946 - Loop 13.6 64-bit 4CPU (only 139 games so far though)
2930 - Deep Shredder 10 64-bit 4CPU
2925 - Deep Fritz 10 4CPU
2920 - Deep Junior 10 4CPU
2860 - Glaurung 1.2.1 64-bit 4CPU


2CPU Engines
3079 - Rybka 2.1 64-bit 2CPU
3048 - Rybka 2.2 64-bit 2CPU
3029 - Zap!Chess Zanzibar 64-bit 2CPU (only 78 games so far though)
2931 - Hiarcs 11.1 2CPU
2927 - Deep Fritz 10 2CPU (only 64 games so far though)
2926 - LoopMP 12.32 2CPU
2916 - Loop 13.5 64-bit 2CPU
2908 - Naum 2.1 64-bit 2CPU
2900 - Deep Junior 10 2CPU
2890 - Deep Shredder 10 64-bit 2CPU
2816 - Glaurung 1.2.1 Avalanche 64-bit 2CPU
2740 - Deep Frenzee 3.0 64-bit 2CPU (only 56 games so far though)
2739 - Pharaon 3.5.1 32-bit 2CPU


Single CPU Engines
2999 - Rybka 2.2 64-bit
2935 - Loop 13.6 32-bit (good start, but only a handful of games)
2908 - Hiarcs 11.1
2905 - Zap!Chess Zanzibar 64-bit
2880 - Fritz 10
2872 - Shredder 10
2863 - Toga II 1.2.1a (not testing Toga II 1.3x4 yet)
2852 - Deep Sjeng 2.5 1CPU (only 20 games, so don't read too much into it)
2850 - Spike 1.2 Turin
2846 - Naum 2.1 32-bit (64-bit not tested)
2845 - Junior 10
2840 - Fruit 2.2.1
2821 - Junior 10.1 (more games still required though)
2804 - Ktulu 8.0
2797 - SmarThink 1.00 64-bit (only 85 games though)
2791 - Chess Tiger 2007
2788 - Scorpio 1.84
2776 - Glaurung 1.2.1 Avalanche 32-bit
2769 - Glaurung 1.2.1 64-bit
2769 - Chess Tiger 2007.1 (more games still required though)
2763 - CM9000 Enforcer
2762 - CM10th Xperience
2759 - Alaric 704 (only 58 games so far though)
2759 - Scorpio 1.91
2740 - Petir 4.39 (only 30 games so far though)
2742 - Slow Chess Blitz WV2.1
2734 - CM10th Default
2731 - Ruffian 2.1.0
2728 - Delfi 5.1
2727 - Pro Deo 1.2
2725 - WildCat 7
2724 - Gandalf 6


CCRL Amateur Championship (32-bit 1CPU):
4th Season Division 1 - Spike claimed the crown. The Baron, Movei and Pseudo will be relegated.

4th Season Division 2 - after 9 rounds, Trace, Pharaon, Ufim, Little Goliath and Alaric all lie within a point of each other at the top. Comet, Amyan, Matacz and Gaia are involved in a relegation battle at the other end.

4th Season Division 3 - at the halfway stage, Queen, Djinn, Gosu, Patzer and AnMon are currently vying for the top three promotion spots. Tytan, Thor's Hammer and Typhoon are struggling to beat the drop.

4th Season Division 4 - after the first 5 rounds. Horizon and Deuterium hold a narrow lead over Popochin, AliChess and Aice. Sage, GreKo and NanoSzachy are finding life tough by contrast.

4th Season Division 5 - will start after the higher divisions are completed. The final field will be posted once a Qualifier has been run for the final three spots.


We have a number of tournaments in progress and most of these can be followed in our public forum.


Blitz Notes

The 40/4 is updated separately to 40/40 with the latest update able to be viewed here:
http://computerchess.org.uk/ccrl/404/


Multi-CPU Engines (both 4CPU and 2CPU)
3078 - Rybka 2.3.1 64-bit 2CPU
3021 - Zap!Chess Zanzibar 64-bit 4CPU
2993 - Hiarcs 11.1 4CPU
2946 - Deep Shredder 10 64-bit 4CPU
2944 - Deep Fritz 10 4CPU
2938 - Naum 2.1 64-bit 4CPU
2925 - LoopMP 12.32 2CPU
2918 - Loop 13.5 64-bit 2CPU
2916 - Deep Junior 10 4CPU
2860 - Glaurung 1.2.1 32-bit 4CPU
2754 - Pharaon 3.5.1 2CPU
2736 - Deep Frenzee 3.0 64-bit 2CPU


Single CPU Engines
3025 - Rybka 2.2n 64-bit
2916 - Hiarcs 11.1
2910 - Loop 10.32f
2872 - Toga II 1.3x4
2869 - Fritz 10
2869 - Loop 13.5
2855 - Shredder 10
2854 - Naum 2.1 64-bit
2850 - Fruit 2.2.1
2848 - Zap!Chess Zanzibar
2843 - Junior 10
2828 - Spike 1.2 Turin
2826 - Chess Tiger 2007.1
2822 - Ktulu 8.0
2761 - CM10th Paralyse
2757 - Glaurung 1.2.1 32-bit (higher than 64-bit!)
2752 - Scorpio 1.91
2741 - Bright 0.1d
2722 - Delfi 5.1
2717 - Pro Deo 1.2
2714 - Slow Chess Blitz WV2.1
2713 - Alaric 704


FRC Notes

Ray tests only those engines that can play FRC through the Shredder Classic GUI.
For FRC the best list to look at is the pure list, and the ratings there are:

2921 - Hiarcs 11.1
2905 - Shredder 10
2893 - Loop 10.32f
2859 - Spike 1.2 Turin
2858 - Fruit 2.2.1
2811 - Naum 2.1
2781 - Glaurung 1.2.1
2673 - Pharaon 3.5.1
2619 - Ufim 8.02
2607 - Movei 0.08.383
2478 - Hermann 1.9
2390 - Aice 0.99.2
2363 - Hamsters 0.2
2362 - Ayito 0.2.994


Stats/Presentation Notes

The LOS stats to the right hand side of each rating list are "likelihood of superiority" stats. They tell you the likelihood in percentage terms of each engine being superior to the engine directly below them.

All games are also available for download through the link given at the top of this post. They can be downloaded by engine or by month.
ELO ratings are now saved in all game databases for those engines that have 150 games or more.

A list of games played this week per engine can be found in the update thread in the CCRL public forum, accessible through the link given at the top of this post.

Tony Thomas

Re: CCRL update (4th May 2007)

Post by Tony Thomas » Sat May 05, 2007 4:00 am

Is there something wrong with my eyes? I do not see any versions of Sjeng on your list.

Uri Blass
Posts: 8921
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: CCRL update (4th May 2007)

Post by Uri Blass » Sat May 05, 2007 4:55 am

Tony Thomas wrote:Is there something wrong with my eyes? I do not see any versions of Sjeng on your list.

It simply did not play enough games(only 20 games) and older versions of sjeng were not tested.

Deep Sjeng 2.5 1CPU 2852 +128 −120 65.0% −91.6 40.0% 20

Tony Thomas

Re: CCRL update (4th May 2007)

Post by Tony Thomas » Sat May 05, 2007 5:01 am

Uri Blass wrote:
Tony Thomas wrote:Is there something wrong with my eyes? I do not see any versions of Sjeng on your list.

It simply did not play enough games(only 20 games) and older versions of sjeng were not tested.

Deep Sjeng 2.5 1CPU 2852 +128 −120 65.0% −91.6 40.0% 20
Exactly my point, I never thought they would not care to test an engine like Sjeng 1.6.

Spock

Re: CCRL update (4th May 2007)

Post by Spock » Sat May 05, 2007 5:14 am

Tony Thomas wrote: Exactly my point, I never thought they would not care to test an engine like Sjeng 1.6.
We can't test everything :)

There was certainly no conscious decision not to test it - it just got forgotten I think

Tony Thomas

Re: CCRL update (4th May 2007)

Post by Tony Thomas » Sat May 05, 2007 5:24 am

Spock wrote:
Tony Thomas wrote: Exactly my point, I never thought they would not care to test an engine like Sjeng 1.6.
We can't test everything :)

There was certainly no conscious decision not to test it - it just got forgotten I think
Yes, to forget is human. Now, download it and start spitting out some 2 CPU or 4 Cpu games Ray

Spock

Re: CCRL update (4th May 2007)

Post by Spock » Sat May 05, 2007 5:45 am

Tony Thomas wrote:
Yes, to forget is human. Now, download it and start spitting out some 2 CPU or 4 Cpu games Ray
I have bought Deep Sjeng 2.5

After a week or so when Tom and I have finished Quad Loop 13.6 I've been thinking about whether or not to do Quad Sjeng. I did start a match against Quad Glaurung and the early results weren't promising. But that happens. I'll certainly do 2CPU or 4CPU, not sure yet

IWB
Posts: 1539
Joined: Thu Mar 09, 2006 1:02 pm

Re: CCRL update (4th May 2007)

Post by IWB » Sat May 05, 2007 8:22 pm

Hello CCRL Team

My questions is as follows:

>all adjusted to the AMD64 X2 4600+ (2.4GHz).

What does that mean?
What are the REAL time controls you are throwing in to one pot for each tester?
Have you ever made a rating list with this different time controls to see these lists are identical?
Do you suppose that engine behave identical on faster hardware and therefore shorter time control or are you basing this on some hard facts?

Just interested because I doubt this "into one pot" a little - even if I can not prove it - it is feeling out of my stomach.

Nevertheless I have a lot of respect for the amount of work you are investing!

Bye
Ingo

Uri Blass
Posts: 8921
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: CCRL update (4th May 2007)

Post by Uri Blass » Sat May 05, 2007 8:43 pm

IWB wrote:Hello CCRL Team

My questions is as follows:

>all adjusted to the AMD64 X2 4600+ (2.4GHz).

What does that mean?
What are the REAL time controls you are throwing in to one pot for each tester?
Have you ever made a rating list with this different time controls to see these lists are identical?
Do you suppose that engine behave identical on faster hardware and therefore shorter time control or are you basing this on some hard facts?

Just interested because I doubt this "into one pot" a little - even if I can not prove it - it is feeling out of my stomach.

Nevertheless I have a lot of respect for the amount of work you are investing!

Bye
Ingo
I am sure the lists will not be identical and one reason is that the relative speed of different program is different with different hardware.

There are other reasons and and time management of engines may be different at different time controls.

There is no assumption that the lists are going to be identical if they do seperate lists but if they do seperate lists you may not have enough games.

Note that hardware is not the only difference and there may be difference in book because not all testers use the same generic book.

for conditions see
http://kd.lab.nig.ac.jp/chess/discussio ... php?t=1486

Uri

IWB
Posts: 1539
Joined: Thu Mar 09, 2006 1:02 pm

Re: CCRL update (4th May 2007)

Post by IWB » Sat May 05, 2007 9:17 pm

Uri Blass wrote:
I am sure the lists will not be identical and one reason is that the relative speed of different program is different with different hardware.

http://kd.lab.nig.ac.jp/chess/discussio ... php?t=1486

Uri
So in short:

Different books, different hardware, different time control, different hash size (1 pr 2 CPUs), different CPU brands, different set of tablebases (4 or 5 Pcs). The exact conditions what game is played how are not known, right?

All this combines to one rating list.

Hmm - I have to think about what this is worth!?

Thx Uri and bye
Ingo

Post Reply