CCRL update (1st September 2007)

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Graham Banks
Posts: 35089
Joined: Sun Feb 26, 2006 9:52 am
Location: Auckland, NZ

CCRL update (1st September 2007)

Post by Graham Banks » Sat Sep 01, 2007 7:22 am

The September 1st update of the CCRL Rating Lists and Statistics is now available for viewing at:
http://www.computerchess.org.uk/ccrl/4040/

The links to the various rating lists can be found just beneath the default Best Versions list.
For example there is a 32-bit Single CPU list.

Our standard testing is at 40 moves in 40 minutes repeating while our current blitz testing is at both 40 moves in 4 minutes repeating and 40 moves in 12 minutes repeating, all adjusted to the AMD64 X2 4600+ (2.4GHz).

Currently active testers in our team are:
Graham Banks, Ray Banks, Shaun Brewer, Kirill Kryukov, Dom Leste, Tom Logan, Andreas Schwartmann, Charles Smith, George Speight, Chris Taylor, Chuck Wilson, Gabor Szots and Martin Thoresen.

A big thanks to all testers as usual for their efforts this week.


40/40 Notes

There currently 72,490 games in our 40/40 database.

Many engines on our list have few games and in many cases their ratings are likely to fluctuate (markedly for some) until a lot more games are played. Therefore no conclusions should be drawn about their strength yet.
To illustrate this point, when an engine has 200 games played, the error margin is still approximately +-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!
This of course highlights the importance of looking at other rating lists that are also available in order to draw comparisons and get a more accurate overall picture.


Multi CPU Engines

Although Rybka 2.3.2a 64-bit 4CPU seems to be only a small improvement over Rybka 2.2 64-bit 4CPU, the gap is greater on 2CPU.

Zap!Chess Zanzibar 64-bit 4CPU is clearly number 2 ahead of Naum 2.2 64-bit 4CPU and Hiarcs 11.1 4CPU.
Hiarcs 11.2 4CPU still requires more games.

The current ratings for Loop M1-T 64-bit 4CPU and 2CPU suggest that there is little gain from the extra two CPU.

Deep Shredder 10 64-bit 4CPU, Deep Fritz 10 4CPU and Deep Junior 10 4CPU are off the pace.

Glaurung 2 epsilon/5 64-bit 2CPU is the strongest free engine on this list.


Single CPU Engines

Rybka 2.3.2a leads the ratings here as well, although by a slightly larger margin.
It also looks like the 64-bit version could make more difference to strength than with previous versions.

Newly released Toga II 1.3.1 is battling it out for second spot with Zap!Chess Zanzibar!
Loop M1-T could well be a threat to both as it gets more games under its belt.
Toga II 1.3.1 looks to be stronger than Toga II 1.3.4 at this time control.

It now appears likely that Hiarcs 11.2 is not an improvement over Hiarcs 11.1.

Naum 2.2 has made a 40 ELO gain over the previous version and looks to be similar in strength to the well tested Fritz 10.
Strelka 1.8 is with them at present, but needs many more games before its strength becomes clearer.

Both Fruit 2.3.1 and Fruit 051103 lie between the aforementioned group and Shredder 10. Both Fruits also require further games.

Spike 1.2 Turin, Junior 10 and Deep Sjeng 2.5 are the next group of engines and seem to be very even in strength.
Junior 10.1 is weaker than Junior 10 according to our testing.

Ktulu 8.0, SmarThink 1.00, Glaurung 2 epsilon/5 and Chess Tiger 2007.1 are further back.


Free Engines

Rybka 1.0 retains its crown as the top free engine ahead of Strelka 1.8 and the newly released Togas and Fruits.

Spike 1.2 Turin has been left behind a little by some of the new releases.

Glaurung 2 epsilon/5 and Naum 2.0 are next, ahead of Alaric 707, Scorpio 1.91, Delfi 5.1 and SlowChess Blitz WV2.1.

Colossus 2007c needs further games before any statements can be made about its strength.

Zappa 1.1, WildCat 7, Pro Deo 1.2 and List 512 are further back.

As we make our way down the list, it should be noted that the most recent versions of Booot, DanaSah and Natwarlal seem to have made good gains over previous versions.
Others to keep an eye on as they get more games are the latest versions of Movei, Hamsters, BugChess2 and Popochin.

We test a very extensive range of amateur engines through our Amateur Championship divisions (32-bit 1CPU) plus other tournaments, all of which can be followed in our public forum.

Our aim is of course to ensure that all engines lower on our lists get at least 200 games.


Blitz Notes

There are currently 164,174 games in our 40/4 database.

The 40/4 update is usually done separately to our 40/40 update. The most recent update can always be viewed here:
http://computerchess.org.uk/ccrl/404.live/


FRC Notes

There are currently 19,400 games in the FRC 40/4 database.

Ray tests only those engines that can play FRC through the Shredder Classic GUI.
If engine authors have a new and stable version of their engine that will run under this GUI, they should contact Ray if they wish to see it tested.

Ray has recently tested Rybka 2.3.2, Hiarcs 11.2, Naum 2.2, Fruit 2.3, Fruit 051103, Hamsters 0.4, Hermann 2.0 and Movei 0.08.438.
All are now included in the ratings.

The improvement in strength of Hamsters 0.4 and Movei 0.08.438 is particularly noteworthy.

Rybka 2.3.2 (private) has predictably taken over top spot, but Hiarcs 11.1 is still the best available FRC engine.

For FRC the best list to look at is the pure list.
http://www.computerchess.org.uk/ccrl/404FRC/


Stats/Presentation Notes

The LOS stats to the right hand side of each rating list are "likelihood of superiority" stats. They tell you the likelihood in percentage terms of each engine being superior to the engine directly below them.

A list of games played this week per engine can be found in the update thread in the CCRL public forum, accessible through the link given at the top of this post.
Please note that our forum has been moved and is now much quicker to load and more readily accessible. The link given will redirect you automatically.

All games are available for download through the link given at the top of this post. They can be downloaded by engine or by month.
ELO ratings are now saved in all game databases for those engines that have 200 games or more.

Clicking on an engine name will give details as to opponents played plus homepage links where applicable.

Custom lists of engines can be selected for comparison.

An openings report page (link at bottom of index page) lists the number of games played by ECO codes with draw percentage and White win percentage. Clicking on a column heading will sort the list by that column.
Games can now be downloaded by ECO code.
gbanksnz at gmail.com

Uri Blass
Posts: 8948
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: CCRL update (1st September 2007)

Post by Uri Blass » Sat Sep 01, 2007 7:32 am

This is not correct that the most recent update can be viewed in
http://computerchess.org.uk/ccrl/404.live/

this list is from 21.8 when the next link is from 25.8

http://computerchess.org.uk/ccrl/404/

User avatar
Graham Banks
Posts: 35089
Joined: Sun Feb 26, 2006 9:52 am
Location: Auckland, NZ

Re: CCRL update (1st September 2007)

Post by Graham Banks » Sat Sep 01, 2007 7:35 am

Uri Blass wrote:This is not correct that the most recent update can be viewed in
http://computerchess.org.uk/ccrl/404.live/

this list is from 21.8 when the next link is from 25.8

http://computerchess.org.uk/ccrl/404/
The live link will only kick in once Ray runs the first live update for the week.
Try again early in the week.

Regards, Graham.
gbanksnz at gmail.com

Norm Pollock
Posts: 1031
Joined: Thu Mar 09, 2006 3:15 pm
Location: Long Island, NY, USA
Contact:

Re: CCRL update (1st September 2007)

Post by Norm Pollock » Sat Sep 01, 2007 4:01 pm

Hi Graham,

I decided to do a "draw-rate" analysis of current 40/40 CCRL games between Strelka versions and versions of Rybka, Toga and Fruit. I recall reading that Strelka may be based on those engines.

I noticed that CCRL considers Strelka games v Rybka, Toga and Fruit to be "inter-family" games. But does the "draw-rate" indicate "inter-family" or "intra-family"?

The "draw rates" between the "inter-family" games and the "intra-family" games in CCRL show a clear difference:
Based on 1,555 "intra-family" games, the "intra-family" draw rate is 58.7%.
Based on 70,935 "inter-family" games, the "inter-family" draw rate is 35.9%.
Based on all 72,490 games in CCRL 40/40, the draw rate is 36.4%.

There are 26 such games for Strelka 1.0b.
Game Results:
Strelka 1.0b v Fruit 2.1 : 0 wins, 1 draw, 1 loss
Strelka 1.0b v Toga II 1.2.1a: 0 wins, 1 draw, 1 loss
Strelka 1.0b v Rybka 2.2 32-bit: 0 wins, 1 draw, 1 loss
Strelka 1.0b v Rybka 2.3.2a 32-bit: 0 wins, 5 draws, 15 losses
total:
Strelka 1.0b: 0 wins, 8 draws, 18 losses
31% draw rate (inter-family draw rate is 35.9%)

There are 19 such games for Strelka 1.8:
Game Results:
Strelka 1.8 v Fruit 051103 : 1 win, 0 draws, 1 loss
Strelka 1.8 v Fruit 2.3.1 : 2 wins, 2 draws, 2 losses
Strelka 1.8 v Toga II 1.3.1: 0 wins, 1 draw, 0 losses
Strelka 1.8 v Toga II 1.3.4: 1 win, 5 draws, 0 losses
Strelka 1.8 v Rybka 2.3.2a 32-bit: 0 wins, 3 draws, 1 loss
total:
Strelka 1.8: 4 wins, 11 draws, 4 losses
58% draw rate (intra-family draw rate is 58.7%).

Interesting, I think, but again, I cannot make any conclusion because there are only a mere 45 games.

-Norm

Andrew
Posts: 176
Joined: Wed Mar 08, 2006 11:51 pm
Location: Australia

Re: CCRL update (1st September 2007)

Post by Andrew » Sun Sep 02, 2007 12:37 am

Hi Graham, I'm interested in doing some testing in the future. Do your
testers all use arena, all use the Chessbase interface or is it a mixture?

I'm using a 1.9 GHz machine and won't be testing all the time on it,
but hopefully I can run some interesting games with the newer Hiarcs and Rybka versions and a mix of older ones as well!

Andrew

User avatar
Graham Banks
Posts: 35089
Joined: Sun Feb 26, 2006 9:52 am
Location: Auckland, NZ

Re: CCRL update (1st September 2007)

Post by Graham Banks » Sun Sep 02, 2007 1:00 am

Andrew wrote:Hi Graham, I'm interested in doing some testing in the future. Do your
testers all use arena, all use the Chessbase interface or is it a mixture?

I'm using a 1.9 GHz machine and won't be testing all the time on it,
but hopefully I can run some interesting games with the newer Hiarcs and Rybka versions and a mix of older ones as well!

Andrew
Hi Andrew,

the CCRL testers use a variety of different GUIs.
For example, I currently use the Fritz 10, Shredder and Arena 1.1 GUIs.

I wasn't sure whether you were expressing an interest in doing CCRL testing or your own testing. If it's the former, feel free to contact me via email.

Regards, Graham.
gbanksnz at gmail.com

Andrew
Posts: 176
Joined: Wed Mar 08, 2006 11:51 pm
Location: Australia

Re: CCRL update (1st September 2007)

Post by Andrew » Sun Sep 02, 2007 4:08 am

Definitely a bit of the former, I just sent a message.

Andrew

User avatar
Kirill Kryukov
Posts: 492
Joined: Sun Mar 19, 2006 3:12 am

Re: CCRL update (1st September 2007)

Post by Kirill Kryukov » Sun Sep 02, 2007 4:50 am

Hi Norm!
Norm Pollock wrote:The "draw rates" between the "inter-family" games and the "intra-family" games in CCRL show a clear difference:
Based on 1,555 "intra-family" games, the "intra-family" draw rate is 58.7%.
Based on 70,935 "inter-family" games, the "inter-family" draw rate is 35.9%.
Based on all 72,490 games in CCRL 40/40, the draw rate is 36.4%.
Interesting! :-)
Norm Pollock wrote:There are 26 such games for Strelka 1.0b.
Game Results:
Strelka 1.0b v Fruit 2.1 : 0 wins, 1 draw, 1 loss
Strelka 1.0b v Toga II 1.2.1a: 0 wins, 1 draw, 1 loss
Strelka 1.0b v Rybka 2.2 32-bit: 0 wins, 1 draw, 1 loss
Strelka 1.0b v Rybka 2.3.2a 32-bit: 0 wins, 5 draws, 15 losses
total:
Strelka 1.0b: 0 wins, 8 draws, 18 losses
31% draw rate (inter-family draw rate is 35.9%)

There are 19 such games for Strelka 1.8:
Game Results:
Strelka 1.8 v Fruit 051103 : 1 win, 0 draws, 1 loss
Strelka 1.8 v Fruit 2.3.1 : 2 wins, 2 draws, 2 losses
Strelka 1.8 v Toga II 1.3.1: 0 wins, 1 draw, 0 losses
Strelka 1.8 v Toga II 1.3.4: 1 win, 5 draws, 0 losses
Strelka 1.8 v Rybka 2.3.2a 32-bit: 0 wins, 3 draws, 1 loss
total:
Strelka 1.8: 4 wins, 11 draws, 4 losses
58% draw rate (intra-family draw rate is 58.7%).

Interesting, I think, but again, I cannot make any conclusion because there are only a mere 45 games.
I agree.

You may be curious to do similar study with our 40/4 database. Strelka 1.8 has more games there.

BTW, do you know about the "ponder hit" measure which we compute and show on the web-site, based on our game database? Using "ponder hit" we can detect and discuss engine-engine similarities in much more direct way.

Best,
Kirill

Norm Pollock
Posts: 1031
Joined: Thu Mar 09, 2006 3:15 pm
Location: Long Island, NY, USA
Contact:

Re: CCRL update (1st September 2007)

Post by Norm Pollock » Mon Sep 03, 2007 2:38 am

Hi Kiril,

"Ponder hits" and "draw analysis" both attempt to see if 2 engines think alike. Each is not definitive proof as each can contain inaccuracies. But perhaps together, they are more persuasive than either alone.

"Ponder hits" as I understand it, can have inaccuracies due to the differing amount of time the engines have to decide on a move in the same position. Suppose engine A and engine B think alike. Engine A can only devote a fraction of its move time (say 1/4) to analyze the move it finally makes. That is because engine A has to also analyze all possible moves. But engine B will have a full move time cycle to decide on its response move. By having 4 times the time to analyze, engine B could find a better move than what engine A found for engine B's response (no ponder hit), yet by assumption the two engines think alike.

And two engines that do not think alike, could agree on the same move (ponder hit) based on differing analyses.

Likewise, "draw analysis" can have inaccuracies as well.

As for the 40/4 analysis of Strelka 1.0b and Strelka 1.8, here are the numerical results of my draw analysis.

Code: Select all

c404inter draw rate 29.8% based on 161748 games
c404intra draw rate 50.1% based on 2426 games
c404 (all) draw rate 30.1% based on 164174 games

Strelka 1.0b 32-bit v Rybka 1.0 Beta 32-bit: 9 wins, 32 draws, 23 losses

Strelka 1.0b 32-bit v Rybka 1.0 Beta 64-bit: 4 wins, 21 draws, 39 losses

Strelka 1.0b 32-bit v Rybka 1.1 32-bit: 8 wins, 24 draws, 32 losses

Strelka 1.0b 32-bit v Rybka 1.2f 32-bit: 6 wins, 24 draws, 34 losses

Strelka 1.0b 32-bit v Toga 1.2.1a 32-bit: 13 wins, 10 draws, 9 losses

total 288 games
Strelka 1.0b 32-bit: 40 wins, 111 draws, 137 losses
draw rate 38.5% (inter-family draw rate 29.8%, intra-family draw rate 50.1%)

---------------------------------

Strelka 1.8 32-bit v Fruit 2.3 4-men-egbb: 8 wins, 15 draws, 9 losses

Strelka 1.8 32-bit v Rybka Beta 32-bit: 10 wins, 33 draws, 21 losses

Strelka 1.8 32-bit v Rybka Beta 64-bit: 10 wins, 26 draws, 28 losses

Strelka 1.8 32-bit v Rybka 1.1 32-bit: 7 wins, 27 draws, 30 losses

Strelka 1.8 32-bit v Rybka 1.2f 32-bit: 5 wins, 30 draws, 28 losses

Strelka 1.8 32-bit v Toga II 1.2.1a 32-bit: 10 wins, 12 draws, 10 losses

total 319 games
Strelka 1.8 32-bit: 50 wins, 143 draws, 126 losses
draw rate 44.8% (inter-family draw rate 29.8%, intra-family draw rate 50.1%)
cheers,
Norm

User avatar
Kirill Kryukov
Posts: 492
Joined: Sun Mar 19, 2006 3:12 am

Re: CCRL update (1st September 2007)

Post by Kirill Kryukov » Tue Sep 04, 2007 4:07 am

Norm Pollock wrote:Hi Kiril,

"Ponder hits" and "draw analysis" both attempt to see if 2 engines think alike. Each is not definitive proof as each can contain inaccuracies. But perhaps together, they are more persuasive than either alone.

"Ponder hits" as I understand it, can have inaccuracies due to the differing amount of time the engines have to decide on a move in the same position. Suppose engine A and engine B think alike. Engine A can only devote a fraction of its move time (say 1/4) to analyze the move it finally makes. That is because engine A has to also analyze all possible moves. But engine B will have a full move time cycle to decide on its response move. By having 4 times the time to analyze, engine B could find a better move than what engine A found for engine B's response (no ponder hit), yet by assumption the two engines think alike.

And two engines that do not think alike, could agree on the same move (ponder hit) based on differing analyses.

Likewise, "draw analysis" can have inaccuracies as well.
Hi Norm. I agree with your general notice that both draw rate and ponder hit can give information about engine-engine similarities, and that both have their inaccuracies. However I'd like to point out that ponder hit has much (at least 10 times) more resolution in detecting engine-engine similarities. Yes, both are subject to statistical error, but with ponder hit you get one measurement per move, while with draw rate you have one measurement per game.

Suppose two engines played 20 games. No matter what a draw rate is, you can't say anything about their similarity based on that. However 20 games can already provide some good ponder hit statistics based on about a thousand moves.

Yes, ponder hit has its own inaccuracies and engine making a move has always more time to think than engine that was trying to predict that move. This is why you can never observe 100% ponder hit, even when you play two identical engines. In our experience, ponder hit of about 70% - 75% is what identical engines will show. Still ponder hit proved to be reliable measure, it is much smaller for unrelated engines.

Note that draw rate has it's own flaws. Ponder hit is directly related to engine-engine similarity. Connection of draw rate to engine-engine similarity is much less direct. A lot of draws can happen in games of independent engines, and it's not sure that it can't happen because of engine styles.

Also as I said a lot more games are necessary to really make use of draw rate for engine-engine similarity searches. This limits the draw rate to be used only between engine families, like what you did with Strelka. Ponder hit allows us to practically evaluate engine-engine similarity for any particular pair of engines.

When two engines are related but have very different strength, it is hard to detect it using draw rate (there will be few draws). While ponder hit still can detect such similarities to some extent.

Norm Pollock wrote:As for the 40/4 analysis of Strelka 1.0b and Strelka 1.8, here are the numerical results of my draw analysis.

Code: Select all

c404inter draw rate 29.8% based on 161748 games
c404intra draw rate 50.1% based on 2426 games
c404 (all) draw rate 30.1% based on 164174 games

Strelka 1.0b 32-bit v Rybka 1.0 Beta 32-bit: 9 wins, 32 draws, 23 losses

Strelka 1.0b 32-bit v Rybka 1.0 Beta 64-bit: 4 wins, 21 draws, 39 losses

Strelka 1.0b 32-bit v Rybka 1.1 32-bit: 8 wins, 24 draws, 32 losses

Strelka 1.0b 32-bit v Rybka 1.2f 32-bit: 6 wins, 24 draws, 34 losses

Strelka 1.0b 32-bit v Toga 1.2.1a 32-bit: 13 wins, 10 draws, 9 losses

total 288 games
Strelka 1.0b 32-bit: 40 wins, 111 draws, 137 losses
draw rate 38.5% (inter-family draw rate 29.8%, intra-family draw rate 50.1%)

---------------------------------

Strelka 1.8 32-bit v Fruit 2.3 4-men-egbb: 8 wins, 15 draws, 9 losses

Strelka 1.8 32-bit v Rybka Beta 32-bit: 10 wins, 33 draws, 21 losses

Strelka 1.8 32-bit v Rybka Beta 64-bit: 10 wins, 26 draws, 28 losses

Strelka 1.8 32-bit v Rybka 1.1 32-bit: 7 wins, 27 draws, 30 losses

Strelka 1.8 32-bit v Rybka 1.2f 32-bit: 5 wins, 30 draws, 28 losses

Strelka 1.8 32-bit v Toga II 1.2.1a 32-bit: 10 wins, 12 draws, 10 losses

total 319 games
Strelka 1.8 32-bit: 50 wins, 143 draws, 126 losses
draw rate 44.8% (inter-family draw rate 29.8%, intra-family draw rate 50.1%)
cheers,
Norm
Thanks for counting. I see that Strelka - [Fruit/Toga/Rybka] draw rate is something between the inter-family and intra-family draw rate. Can something be concluded from this?

Post Reply