CCRL 40/4 lists updated (28th July 2012)

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Adam Hair »

Uri,

I have computed the ratings for the 40/40 complete list, the complete list with all matches involving opponents within 100 Elo, and the pure list. I also did the same for the 40/4 list. I do not have time to do any comparisons, but you can download the lists from my Mediafire folder. I computed all of the ratings with Bayeselo using 'mm 1 1' and 'scale 1'.
carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Re: CCRL 40/4 lists updated (28th July 2012)

Post by carldaman »

Uri Blass wrote: I think that a possible source of distortion(that may be the reason that top programs have worse rating at long time control relative to blitz) may be games between opponents with more than 100 elo difference and it may be interesting to have a list that does not have these games.
Uri,
Not sure if this will address the point you were trying to make, but here's a thought your comment evoked -- there are engines, such as Chiron, that thrive on beating weaker engines (within 200 rating points difference, let's say), but then will struggle against the Houdinis, Critters, etc above it. Not playing those games against the weaker minnows would strip it of rating points and make it appear not as strong, at least until it starts playing against weaker opposition again.

The moral of story is that playing opponents of varied strength can't be avoided if testing is to be fair and accurate.

Regards,
CL
Uri Blass
Posts: 10892
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Uri Blass »

Thanks for the special list based on not more than 100 elo difference.
I will look at it and make another post

Another comment that I have about the CCRL.

I tried to translate elo difference to FRC CCRL results and I find something strange
http://www.computerchess.org.uk/ccrl/40 ... 1_1_64-bit
Chiron 1.1 64-bit-Deep Sjeng WC2008 64-bit 64.5-35.5
(50−21 and 29 draws)

rating difference 112
Chiron performance -9
Conclusion 64.5-35.5 means 103 elo difference

http://www.computerchess.org.uk/ccrl/40 ... 1_6_64-bit

Critter 1.6 64-bit- Stockfish 2.2.2 64-bit 65.5-34.5(44-13 and 43 draws)

rating difference 107
Critter performance -11

conclusion 65.5-34.5 means 96 elo difference

How is it possible that higher results mean less elo difference?
Uri Blass
Posts: 10892
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Uri Blass »

carldaman wrote:
Uri Blass wrote: I think that a possible source of distortion(that may be the reason that top programs have worse rating at long time control relative to blitz) may be games between opponents with more than 100 elo difference and it may be interesting to have a list that does not have these games.
Uri,
Not sure if this will address the point you were trying to make, but here's a thought your comment evoked -- there are engines, such as Chiron, that thrive on beating weaker engines (within 200 rating points difference, let's say), but then will struggle against the Houdinis, Critters, etc above it. Not playing those games against the weaker minnows would strip it of rating points and make it appear not as strong, at least until it starts playing against weaker opposition again.

The moral of story is that playing opponents of varied strength can't be avoided if testing is to be fair and accurate.

Regards,
CL
Note that I did not suggest not to play against opponents with varied strength(but only not against opponents with more than 100 elo difference) and practically I do not see that you are right for chiron.

http://www.computerchess.org.uk/ccrl/40 ... 1_1_64-bit

In the CCRL FRC list
Chiron clearly earned rating from losing 63-37 against rybka4

Note that there seem to be little difference between the lists and
my original thought was to fix the problem of smaller rating differences in the 40/40 list that may be not correct but it seems to be fixed inspite of including also games between opponent with more than 200 elo difference and I find that the difference between the strong programs and the weak programs is bigger at long time control.
Modern Times
Posts: 3748
Joined: Thu Jun 07, 2012 11:02 pm

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Modern Times »

The FRC list has not yet been recalculated with the new Bayeselo parameters. So comparing that to the normal 404 and 4040 lists may show some differences currently.
Uri Blass
Posts: 10892
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Uri Blass »

Adam Hair wrote:
Uri Blass wrote:
Adam Hair wrote:
I was not a member when it was decided to maintain pure lists in addition to the complete list. But the rationale seems sound. There appears to be a lack of transitivity amongst engines of similar strength.
I did not see a proof for it and all the data that I see in the FRC list does not suggest lack of transitivity(usually the program with the bigger rating wins and after looking at many results I did not find a single case when A beat B in a 100 game match inspite of being 50 elo or more weaker in rating points and I looked at results of all programs with rating 2570-3289).

Here are the only exceptions for better rating wins for these programs and the biggest difference is draw 50-50 with 43 elo difference

1)Stockfish 2.2.2 64-bit- Rybka 4 64-bit 50-50(12 elo difference)
2)Stockfish 2.0.1 64-bit-Critter1.01 64 bit 51-49(stockfish 20 elo weaker)
3)Stockfish 1.8 64-bit- Rybka 3 64-bit 50-50(stockfish 43 elo stronger)
4)Rybka 3 64-bit-Stockfish 1.7 64-bit 51-49(stockfish 25 elo stronger)
5)Chiron 1.1 64-bit-Shredder12 50.5-49.5(Chiron 34 elo weaker)
6)Hiarcs 13.2-Spike 1.4 Leiden 54-46 (hiarcs 21 elo weaker)
7) Deep Sjeng WC2008 64-bit-Shredder11 50.5-49.5(2 elo)
8)Deep Sjeng3 -Hiarcs 12 50-50(30 elo)
9)Hiarcs12-Naum3 52.5-47.5(hiarcs12 10 elo weaker)
10)Glaurung2.2 64 bit-Hiarcs12 50.5-49.5(glaurung 29 elo weaker)
11)Naum2.2 64 bit-Hiarcs11.1 50.5-49.5(naum 5 elo weaker)
12)Loop for chess960-Hiarcs11.1 50.5-49.5(loop 21 elo weaker)
13)Loop for chess960-Naum2.2 52-48(loop 16 elo weaker)
14)Fruit 051103-Shredder10 53-47(Fruit 10 elo weaker)
15)Fruit 051103-Hiarcs11.2 50.5(Fruit 1 elo weaker)
16)Loop for chess960-Hiarcs11.2 55(Loop 5 elo weaker)
17)Tornado 4.88 64-bit-Loop for chess960 53.5-46.5(Tornado 5 elo weaker)
18)Bright 0.4a-Tornado 4.88 64-bit 52-48(bright 28 elo weaker)
19)Spike1.2-Fruit2.2.1 51-49(Spike 6 elo weaker)
20)Naum2.1-Fruit2.2.1 51-49(Naum 38 elo weaker)
21)Tornado 4.4 64-bit-Spike1.2 53.5-46.5(tornado 23 elo weaker)
22) Glaurung 2.0.1 64-bit-Deep Sjeng 2.7 51.5-48.5(14 elo)
23) Deep Sjeng 2.5-Glaurung1.2.1 54.5-45.5(26 elo)
24)Frenzee Feb08 64-bit-Movei00.8.438 54-46 (2 elo)
25)Tornado 4.1 64-bit-Movei00.8.438 50.5-49.5(9 elo)
26)The Baron 2.23-Movei00.8.438 54-46(25 elo)

I am talking about lack of transitivity among engines of similar strength. I do not mean that transitivity does not occur, but it is not a general rule. For example, from the FRC list we have Hiarcs 11.1 beat Fruit 051103, Loop for Chess960 beat Hiarcs 11.1, and Fruit 051103 beat Loop for Chess960.
Not significant results
Hiarcs 11.1-Fruit 051103 57.5-42.5(44-29 and 27 draws)
Fruit 051103- Loop for Chess960 51.5-48.5(26-23 and 51 draws)
Loop for Chess960-Hiarcs 11.1 50.5-49.5(38-37 and 25 draws)

Hiarcs11.1 has the highest rating of these programs and
i suspect that with more games it is going to beat Loop
so we did not prove that there is lack of transitivity.
Modern Times
Posts: 3748
Joined: Thu Jun 07, 2012 11:02 pm

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Modern Times »

The FRC list has not yet been recalculated with the new Bayeselo parameters. So comparing that to the normal 404 and 4040 lists may show some differences currently.
The FRC list has now been updated, so all 3 lists are using the same bayeselo parameters.
Uri Blass
Posts: 10892
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Uri Blass »

I will ask again the same question

I tried to translate elo difference to FRC CCRL results and I find something strange

http://computerchess.org.uk/ccrl/404FRC ... 1_1_64-bit


Chiron 1.1 64-bit-Deep Sjeng WC2008 64-bit 64.5-35.5
(50−21 and 29 draws)

rating difference 119
Chiron performance -16

Conclusion 64.5-35.5 means 103 elo difference

http://computerchess.org.uk/ccrl/404FRC ... 1_6_64-bit


Critter 1.6 64-bit- Stockfish 2.2.2 64-bit 65.5-34.5(44-13 and 43 draws)

rating difference 114
Critter performance -18

conclusion 65.5-34.5 means 96 elo difference

How is it possible that higher results mean less elo difference?
Modern Times
Posts: 3748
Joined: Thu Jun 07, 2012 11:02 pm

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Modern Times »

No idea, ask the bayeselo author, that is where the numbers come from. Possibly due the the "prior" assumption in bayeselo. All too technical for me.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: CCRL 40/4 lists updated (28th July 2012)

Post by Adam Hair »

Uri Blass wrote:I will ask again the same question

I tried to translate elo difference to FRC CCRL results and I find something strange

http://computerchess.org.uk/ccrl/404FRC ... 1_1_64-bit


Chiron 1.1 64-bit-Deep Sjeng WC2008 64-bit 64.5-35.5
(50−21 and 29 draws)

rating difference 119
Chiron performance -16

Conclusion 64.5-35.5 means 103 elo difference

http://computerchess.org.uk/ccrl/404FRC ... 1_6_64-bit


Critter 1.6 64-bit- Stockfish 2.2.2 64-bit 65.5-34.5(44-13 and 43 draws)

rating difference 114
Critter performance -18

conclusion 65.5-34.5 means 96 elo difference

How is it possible that higher results mean less elo difference?
I am not certain that I truly understand your question.

However, though the score is the same in both cases, the draw rate is higher in the second case. In such situations, Bayeselo predicts a lower Elo difference for the case with the higher draw rate.