CCRL 40/4 lists updated (28th July 2012)

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
Graham Banks
Posts: 41423
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

CCRL 40/4 lists updated (28th July 2012)

Post by Graham Banks »

The latest CCRL Rating Lists and Statistics are available for viewing from the following links:
http://computerchess.org.uk/ccrl/4040/ (40/40)
http://www.computerchess.org.uk/ccrl/404/ (40/4)
http://www.computerchess.org.uk/ccrl/404FRC/ (FRC 40/4)

Please note that the three lists are updated separately to each other. The 40/40 and 40/4 lists are updated once every two weeks and alternately to each other. The FRC list is updated when a new engine or engine version is being/has been tested.

The links to the various rating lists can be found just beneath the default Best Versions list (as in this screenshot). Specific 32-bit rating lists are denoted as such to the right of the default list in each category. The default lists contain the 64-bit engines.

Image

Our 40 moves in 40 minutes repeating and 40 moves in 4 minutes repeating are both adjusted to the AMD64 X2 4600+ (2.4GHz).

Be aware that in the early stages of testing, an engine's rating can often fluctuate a lot.
It is strongly advised to look at the many other rating lists available in order to get a more accurate overall picture of an engine's rating relative to others.

The LOS (likelihood of superiority) stats to the right hand side of each rating list tell you the likelihood in percentage terms of each engine being superior to the engine directly below them.

All games are available for download by engine, by month or by ECO code. The download databases by month or ECO code are only updated monthly, but the total games database in its entirety is always available.
The current ELO ratings are saved in all game databases for those engines that have 200 games or more.

Clicking on an engine name will give details as to opponents played plus homepage links where applicable.

Custom lists of engines can be selected for comparison.

An openings report page lists the number of games played by ECO codes with draw percentage and White win percentage. Clicking on a column heading will sort the list by that column.
gbanksnz at gmail.com
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Adam Hair »

I should state that we made two changes in the parameters used in Bayeselo to compute the 40/4 ratings:

1) We now use the command 'mm 1 1' instead of 'mm'. This related to the parameters (White) 'advantage' and 'drawelo' used in the Bayeselo's modified logistic model for computing the Elo ratings. Without going into depth, 'mm' uses the Bayeselo's default values, which are 32.8 for 'advantage' and 97.3 for 'drawelo', which were determined from the WBEC database. Using the command 'mm 1 1' makes Bayeselo compute and use the values associated with the 40/4 database (advantage=31.7828, drawelo=117.99). These parameters were built into Bayeselo for the purpose of extracting more information from a database. The additional information, in theory, allows more accurate estimation of Elo ratings. To fully make use of this ability, the values used for those parameters should come from CCRL data, as opposed to the default values.

2) We now use the command 'scale 1'. The parameter 'scale' was added to make Bayeselo's ratings resemble the ratings of SSDF and those produced by ELOStat. This has the effect of compressing the Elo ratings as compared to the Bayeselo ratings model. In other words, a result of 60% should be equal to an Elo difference of ~45, according to the Bayeselo model and and CCRl 40/4 values for advantage and drawelo. However, the scale factor that Bayeselo would apply (the scale value depends on the drawelo value) by default changes the predicted Elo difference to ~40 Elo.

The total effect of the changes is that the order of the engines will be somewhat different (better reflecting the game results), and the CCRL 40/4 Elo ratings have spread out approximately 15% (for the complete list). This accurately reflects what Bayeselo computes.
Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Uri Blass »

I do not understand the reason for a pure list that is different than a best version list that is based on all games.

I read that in the comment that the pure list removes
distortion that may occur from multiple version of same engine

I do not understand why do you think that having multiple versions of the same engine cause distortion in the rating list?

I think that a possible source of distortion(that may be the reason that top programs have worse rating at long time control relative to blitz) may be games between opponents with more than 100 elo difference and it may be interesting to have a list that does not have these games.

It is possible to improve 55% at blitz to 56% at long time control againat the same program but it seems to me that from results of more than 80% there is only one direction that is down because there are openings that it is imposible to win against good opponents and I am not sure if even the perfect player can achieve 100% in CCRL 40/40 conditions.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Adam Hair »

Uri Blass wrote:I do not understand the reason for a pure list that is different than a best version list that is based on all games.

I read that in the comment that the pure list removes
distortion that may occur from multiple version of same engine

I do not understand why do you think that having multiple versions of the same engine cause distortion in the rating list?
I was not a member when it was decided to maintain pure lists in addition to the complete list. But the rationale seems sound. There appears to be a lack of transitivity amongst engines of similar strength. A wins match against B and B wins match against C does not mean that A wins match against C (though I will admit I have not conducted matches with enough games to demonstrate this sort of phenomena with any statistical significance). Also, some authors are more active than other authors. So, a particular engine may have played games against multiple versions of another engine, each time performing better (worse) than would expected against an engine of similar strength. The accumulated effect could boost (lower) the engine's rating relative to the other engines.

However, I have not checked to see if this is actually true. I think that it is not possible to compare the pure list and the complete list directly. I did not have time before going to work to do any analysis, but I hope to do so when I return home.
I think that a possible source of distortion(that may be the reason that top programs have worse rating at long time control relative to blitz) may be games between opponents with more than 100 elo difference and it may be interesting to have a list that does not have these games.
I hope to also check this out when I am home.
It is possible to improve 55% at blitz to 56% at long time control againat the same program but it seems to me that from results of more than 80% there is only one direction that is down because there are openings that it is imposible to win against good opponents and I am not sure if even the perfect player can achieve 100% in CCRL 40/40 conditions.
I agree that there are openings used that are virtually impossible to win against a good opponent. However, creating an opening set that lacks deterministic openings seems to be a difficult job. Also, I am not certain that a perfect player could achieve 100% against the current top engines anyway. It could be that a top engine could play close enough to optimal to force some draws. Perhaps a perfect player with complete knowledge of how its opponent selects moves could achieve 100%. I do not know.
User avatar
Dan Honeycutt
Posts: 5258
Joined: Mon Feb 27, 2006 4:31 pm
Location: Atlanta, Georgia

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Dan Honeycutt »

Thanks Graham and the rest of the CCRL crew for testing Cupcake. At no. 252 there is plenty of room for improvement :)

Best
Dan H.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Adam Hair »

Dan Honeycutt wrote:Thanks Graham and the rest of the CCRL crew for testing Cupcake. At no. 252 there is plenty of room for improvement :)

Best
Dan H.
It should be top 30 on the Also-Rans list. I just have to take the time to update the list.
Modern Times
Posts: 3546
Joined: Thu Jun 07, 2012 11:02 pm

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Modern Times »

Uri Blass wrote:I do not understand the reason for a pure list that is different than a best version list that is based on all games.
IPON has the same thing, but he doesn't call it that, and it is only the top 20 engines. His is called the IPON-RRRL. So like CCRL, he has two lists calculated from different databases of games.
Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Uri Blass »

Adam Hair wrote:
I was not a member when it was decided to maintain pure lists in addition to the complete list. But the rationale seems sound. There appears to be a lack of transitivity amongst engines of similar strength.
I did not see a proof for it and all the data that I see in the FRC list does not suggest lack of transitivity(usually the program with the bigger rating wins and after looking at many results I did not find a single case when A beat B in a 100 game match inspite of being 50 elo or more weaker in rating points and I looked at results of all programs with rating 2570-3289).

Here are the only exceptions for better rating wins for these programs and the biggest difference is draw 50-50 with 43 elo difference

1)Stockfish 2.2.2 64-bit- Rybka 4 64-bit 50-50(12 elo difference)
2)Stockfish 2.0.1 64-bit-Critter1.01 64 bit 51-49(stockfish 20 elo weaker)
3)Stockfish 1.8 64-bit- Rybka 3 64-bit 50-50(stockfish 43 elo stronger)
4)Rybka 3 64-bit-Stockfish 1.7 64-bit 51-49(stockfish 25 elo stronger)
5)Chiron 1.1 64-bit-Shredder12 50.5-49.5(Chiron 34 elo weaker)
6)Hiarcs 13.2-Spike 1.4 Leiden 54-46 (hiarcs 21 elo weaker)
7) Deep Sjeng WC2008 64-bit-Shredder11 50.5-49.5(2 elo)
8)Deep Sjeng3 -Hiarcs 12 50-50(30 elo)
9)Hiarcs12-Naum3 52.5-47.5(hiarcs12 10 elo weaker)
10)Glaurung2.2 64 bit-Hiarcs12 50.5-49.5(glaurung 29 elo weaker)
11)Naum2.2 64 bit-Hiarcs11.1 50.5-49.5(naum 5 elo weaker)
12)Loop for chess960-Hiarcs11.1 50.5-49.5(loop 21 elo weaker)
13)Loop for chess960-Naum2.2 52-48(loop 16 elo weaker)
14)Fruit 051103-Shredder10 53-47(Fruit 10 elo weaker)
15)Fruit 051103-Hiarcs11.2 50.5(Fruit 1 elo weaker)
16)Loop for chess960-Hiarcs11.2 55(Loop 5 elo weaker)
17)Tornado 4.88 64-bit-Loop for chess960 53.5-46.5(Tornado 5 elo weaker)
18)Bright 0.4a-Tornado 4.88 64-bit 52-48(bright 28 elo weaker)
19)Spike1.2-Fruit2.2.1 51-49(Spike 6 elo weaker)
20)Naum2.1-Fruit2.2.1 51-49(Naum 38 elo weaker)
21)Tornado 4.4 64-bit-Spike1.2 53.5-46.5(tornado 23 elo weaker)
22) Glaurung 2.0.1 64-bit-Deep Sjeng 2.7 51.5-48.5(14 elo)
23) Deep Sjeng 2.5-Glaurung1.2.1 54.5-45.5(26 elo)
24)Frenzee Feb08 64-bit-Movei00.8.438 54-46 (2 elo)
25)Tornado 4.1 64-bit-Movei00.8.438 50.5-49.5(9 elo)
26)The Baron 2.23-Movei00.8.438 54-46(25 elo)
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Adam Hair »

Uri Blass wrote:
Adam Hair wrote:
I was not a member when it was decided to maintain pure lists in addition to the complete list. But the rationale seems sound. There appears to be a lack of transitivity amongst engines of similar strength.
I did not see a proof for it and all the data that I see in the FRC list does not suggest lack of transitivity(usually the program with the bigger rating wins and after looking at many results I did not find a single case when A beat B in a 100 game match inspite of being 50 elo or more weaker in rating points and I looked at results of all programs with rating 2570-3289).

Here are the only exceptions for better rating wins for these programs and the biggest difference is draw 50-50 with 43 elo difference

1)Stockfish 2.2.2 64-bit- Rybka 4 64-bit 50-50(12 elo difference)
2)Stockfish 2.0.1 64-bit-Critter1.01 64 bit 51-49(stockfish 20 elo weaker)
3)Stockfish 1.8 64-bit- Rybka 3 64-bit 50-50(stockfish 43 elo stronger)
4)Rybka 3 64-bit-Stockfish 1.7 64-bit 51-49(stockfish 25 elo stronger)
5)Chiron 1.1 64-bit-Shredder12 50.5-49.5(Chiron 34 elo weaker)
6)Hiarcs 13.2-Spike 1.4 Leiden 54-46 (hiarcs 21 elo weaker)
7) Deep Sjeng WC2008 64-bit-Shredder11 50.5-49.5(2 elo)
8)Deep Sjeng3 -Hiarcs 12 50-50(30 elo)
9)Hiarcs12-Naum3 52.5-47.5(hiarcs12 10 elo weaker)
10)Glaurung2.2 64 bit-Hiarcs12 50.5-49.5(glaurung 29 elo weaker)
11)Naum2.2 64 bit-Hiarcs11.1 50.5-49.5(naum 5 elo weaker)
12)Loop for chess960-Hiarcs11.1 50.5-49.5(loop 21 elo weaker)
13)Loop for chess960-Naum2.2 52-48(loop 16 elo weaker)
14)Fruit 051103-Shredder10 53-47(Fruit 10 elo weaker)
15)Fruit 051103-Hiarcs11.2 50.5(Fruit 1 elo weaker)
16)Loop for chess960-Hiarcs11.2 55(Loop 5 elo weaker)
17)Tornado 4.88 64-bit-Loop for chess960 53.5-46.5(Tornado 5 elo weaker)
18)Bright 0.4a-Tornado 4.88 64-bit 52-48(bright 28 elo weaker)
19)Spike1.2-Fruit2.2.1 51-49(Spike 6 elo weaker)
20)Naum2.1-Fruit2.2.1 51-49(Naum 38 elo weaker)
21)Tornado 4.4 64-bit-Spike1.2 53.5-46.5(tornado 23 elo weaker)
22) Glaurung 2.0.1 64-bit-Deep Sjeng 2.7 51.5-48.5(14 elo)
23) Deep Sjeng 2.5-Glaurung1.2.1 54.5-45.5(26 elo)
24)Frenzee Feb08 64-bit-Movei00.8.438 54-46 (2 elo)
25)Tornado 4.1 64-bit-Movei00.8.438 50.5-49.5(9 elo)
26)The Baron 2.23-Movei00.8.438 54-46(25 elo)

I am talking about lack of transitivity among engines of similar strength. I do not mean that transitivity does not occur, but it is not a general rule. For example, from the FRC list we have Hiarcs 11.1 beat Fruit 051103, Loop for Chess960 beat Hiarcs 11.1, and Fruit 051103 beat Loop for Chess960.

I understand why you looked at the FRC list. The individual matches are 100 games in length, while for the 40/4 list the matches are around 30 games, in most cases. I have no idea about the 40/40 list. However, the FRC list involves far fewer engines that are farther apart (on average) in strength. The number of engines comparable in strength to a randomly chosen engine is lower than compared to the other lists. There would be less chances for an intransitive set of engines to occur.

Also, there are fluctuations in performance against different engines. Engine A may be rated 50 Elo stronger than Engine B, but may only perform 30 Elo better against that engine. It may perform better against a different engine than the ratings indicate. These fluctuations may could cause a distortion if an engine played multiple versions of another engine, especially if those versions did not represent a large change in strength. As I said before, I have no proof that it is true or false at this moment, but it does seem to be plausible to me.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CCRL 40/4 lists updated (28th July 2012)

Post by lkaufman »

Uri Blass wrote:I do not understand the reason for a pure list that is different than a best version list that is based on all games.

I read that in the comment that the pure list removes
distortion that may occur from multiple version of same engine

I do not understand why do you think that having multiple versions of the same engine cause distortion in the rating list?

I think that a possible source of distortion(that may be the reason that top programs have worse rating at long time control relative to blitz) may be games between opponents with more than 100 elo difference and it may be interesting to have a list that does not have these games.

It is possible to improve 55% at blitz to 56% at long time control againat the same program but it seems to me that from results of more than 80% there is only one direction that is down because there are openings that it is imposible to win against good opponents and I am not sure if even the perfect player can achieve 100% in CCRL 40/40 conditions.
Some time ago I did a study on this last question (whether ratings are distorted by mismatches), and I concluded that there was no significant effect; in other words ratings of 100, 200, 300, and 400 elo apart all produced on average about what the formula predicts. The book might matter though; if deep books are used the problem you mention is more likely than if just five or six move books are used. With short books, you are very unlikely to reach an unloseable position from book.