Uri,
I have computed the ratings for the 40/40 complete list, the complete list with all matches involving opponents within 100 Elo, and the pure list. I also did the same for the 40/4 list. I do not have time to do any comparisons, but you can download the lists from my Mediafire folder. I computed all of the ratings with Bayeselo using 'mm 1 1' and 'scale 1'.
CCRL 40/4 lists updated (28th July 2012)
Moderator: Ras
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
-
- Posts: 2287
- Joined: Sat Jun 02, 2012 2:13 am
Re: CCRL 40/4 lists updated (28th July 2012)
Uri,Uri Blass wrote: I think that a possible source of distortion(that may be the reason that top programs have worse rating at long time control relative to blitz) may be games between opponents with more than 100 elo difference and it may be interesting to have a list that does not have these games.
Not sure if this will address the point you were trying to make, but here's a thought your comment evoked -- there are engines, such as Chiron, that thrive on beating weaker engines (within 200 rating points difference, let's say), but then will struggle against the Houdinis, Critters, etc above it. Not playing those games against the weaker minnows would strip it of rating points and make it appear not as strong, at least until it starts playing against weaker opposition again.
The moral of story is that playing opponents of varied strength can't be avoided if testing is to be fair and accurate.
Regards,
CL
-
- Posts: 10892
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: CCRL 40/4 lists updated (28th July 2012)
Thanks for the special list based on not more than 100 elo difference.
I will look at it and make another post
Another comment that I have about the CCRL.
I tried to translate elo difference to FRC CCRL results and I find something strange
http://www.computerchess.org.uk/ccrl/40 ... 1_1_64-bit
Chiron 1.1 64-bit-Deep Sjeng WC2008 64-bit 64.5-35.5
(50−21 and 29 draws)
rating difference 112
Chiron performance -9
Conclusion 64.5-35.5 means 103 elo difference
http://www.computerchess.org.uk/ccrl/40 ... 1_6_64-bit
Critter 1.6 64-bit- Stockfish 2.2.2 64-bit 65.5-34.5(44-13 and 43 draws)
rating difference 107
Critter performance -11
conclusion 65.5-34.5 means 96 elo difference
How is it possible that higher results mean less elo difference?
I will look at it and make another post
Another comment that I have about the CCRL.
I tried to translate elo difference to FRC CCRL results and I find something strange
http://www.computerchess.org.uk/ccrl/40 ... 1_1_64-bit
Chiron 1.1 64-bit-Deep Sjeng WC2008 64-bit 64.5-35.5
(50−21 and 29 draws)
rating difference 112
Chiron performance -9
Conclusion 64.5-35.5 means 103 elo difference
http://www.computerchess.org.uk/ccrl/40 ... 1_6_64-bit
Critter 1.6 64-bit- Stockfish 2.2.2 64-bit 65.5-34.5(44-13 and 43 draws)
rating difference 107
Critter performance -11
conclusion 65.5-34.5 means 96 elo difference
How is it possible that higher results mean less elo difference?
-
- Posts: 10892
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: CCRL 40/4 lists updated (28th July 2012)
Note that I did not suggest not to play against opponents with varied strength(but only not against opponents with more than 100 elo difference) and practically I do not see that you are right for chiron.carldaman wrote:Uri,Uri Blass wrote: I think that a possible source of distortion(that may be the reason that top programs have worse rating at long time control relative to blitz) may be games between opponents with more than 100 elo difference and it may be interesting to have a list that does not have these games.
Not sure if this will address the point you were trying to make, but here's a thought your comment evoked -- there are engines, such as Chiron, that thrive on beating weaker engines (within 200 rating points difference, let's say), but then will struggle against the Houdinis, Critters, etc above it. Not playing those games against the weaker minnows would strip it of rating points and make it appear not as strong, at least until it starts playing against weaker opposition again.
The moral of story is that playing opponents of varied strength can't be avoided if testing is to be fair and accurate.
Regards,
CL
http://www.computerchess.org.uk/ccrl/40 ... 1_1_64-bit
In the CCRL FRC list
Chiron clearly earned rating from losing 63-37 against rybka4
Note that there seem to be little difference between the lists and
my original thought was to fix the problem of smaller rating differences in the 40/40 list that may be not correct but it seems to be fixed inspite of including also games between opponent with more than 200 elo difference and I find that the difference between the strong programs and the weak programs is bigger at long time control.
-
- Posts: 3748
- Joined: Thu Jun 07, 2012 11:02 pm
Re: CCRL 40/4 lists updated (28th July 2012)
The FRC list has not yet been recalculated with the new Bayeselo parameters. So comparing that to the normal 404 and 4040 lists may show some differences currently.
-
- Posts: 10892
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: CCRL 40/4 lists updated (28th July 2012)
Not significant resultsAdam Hair wrote:Uri Blass wrote:I did not see a proof for it and all the data that I see in the FRC list does not suggest lack of transitivity(usually the program with the bigger rating wins and after looking at many results I did not find a single case when A beat B in a 100 game match inspite of being 50 elo or more weaker in rating points and I looked at results of all programs with rating 2570-3289).Adam Hair wrote:
I was not a member when it was decided to maintain pure lists in addition to the complete list. But the rationale seems sound. There appears to be a lack of transitivity amongst engines of similar strength.
Here are the only exceptions for better rating wins for these programs and the biggest difference is draw 50-50 with 43 elo difference
1)Stockfish 2.2.2 64-bit- Rybka 4 64-bit 50-50(12 elo difference)
2)Stockfish 2.0.1 64-bit-Critter1.01 64 bit 51-49(stockfish 20 elo weaker)
3)Stockfish 1.8 64-bit- Rybka 3 64-bit 50-50(stockfish 43 elo stronger)
4)Rybka 3 64-bit-Stockfish 1.7 64-bit 51-49(stockfish 25 elo stronger)
5)Chiron 1.1 64-bit-Shredder12 50.5-49.5(Chiron 34 elo weaker)
6)Hiarcs 13.2-Spike 1.4 Leiden 54-46 (hiarcs 21 elo weaker)
7) Deep Sjeng WC2008 64-bit-Shredder11 50.5-49.5(2 elo)
8)Deep Sjeng3 -Hiarcs 12 50-50(30 elo)
9)Hiarcs12-Naum3 52.5-47.5(hiarcs12 10 elo weaker)
10)Glaurung2.2 64 bit-Hiarcs12 50.5-49.5(glaurung 29 elo weaker)
11)Naum2.2 64 bit-Hiarcs11.1 50.5-49.5(naum 5 elo weaker)
12)Loop for chess960-Hiarcs11.1 50.5-49.5(loop 21 elo weaker)
13)Loop for chess960-Naum2.2 52-48(loop 16 elo weaker)
14)Fruit 051103-Shredder10 53-47(Fruit 10 elo weaker)
15)Fruit 051103-Hiarcs11.2 50.5(Fruit 1 elo weaker)
16)Loop for chess960-Hiarcs11.2 55(Loop 5 elo weaker)
17)Tornado 4.88 64-bit-Loop for chess960 53.5-46.5(Tornado 5 elo weaker)
18)Bright 0.4a-Tornado 4.88 64-bit 52-48(bright 28 elo weaker)
19)Spike1.2-Fruit2.2.1 51-49(Spike 6 elo weaker)
20)Naum2.1-Fruit2.2.1 51-49(Naum 38 elo weaker)
21)Tornado 4.4 64-bit-Spike1.2 53.5-46.5(tornado 23 elo weaker)
22) Glaurung 2.0.1 64-bit-Deep Sjeng 2.7 51.5-48.5(14 elo)
23) Deep Sjeng 2.5-Glaurung1.2.1 54.5-45.5(26 elo)
24)Frenzee Feb08 64-bit-Movei00.8.438 54-46 (2 elo)
25)Tornado 4.1 64-bit-Movei00.8.438 50.5-49.5(9 elo)
26)The Baron 2.23-Movei00.8.438 54-46(25 elo)
I am talking about lack of transitivity among engines of similar strength. I do not mean that transitivity does not occur, but it is not a general rule. For example, from the FRC list we have Hiarcs 11.1 beat Fruit 051103, Loop for Chess960 beat Hiarcs 11.1, and Fruit 051103 beat Loop for Chess960.
Hiarcs 11.1-Fruit 051103 57.5-42.5(44-29 and 27 draws)
Fruit 051103- Loop for Chess960 51.5-48.5(26-23 and 51 draws)
Loop for Chess960-Hiarcs 11.1 50.5-49.5(38-37 and 25 draws)
Hiarcs11.1 has the highest rating of these programs and
i suspect that with more games it is going to beat Loop
so we did not prove that there is lack of transitivity.
-
- Posts: 3748
- Joined: Thu Jun 07, 2012 11:02 pm
Re: CCRL 40/4 lists updated (28th July 2012)
The FRC list has now been updated, so all 3 lists are using the same bayeselo parameters.The FRC list has not yet been recalculated with the new Bayeselo parameters. So comparing that to the normal 404 and 4040 lists may show some differences currently.
-
- Posts: 10892
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: CCRL 40/4 lists updated (28th July 2012)
I will ask again the same question
I tried to translate elo difference to FRC CCRL results and I find something strange
http://computerchess.org.uk/ccrl/404FRC ... 1_1_64-bit
Chiron 1.1 64-bit-Deep Sjeng WC2008 64-bit 64.5-35.5
(50−21 and 29 draws)
rating difference 119
Chiron performance -16
Conclusion 64.5-35.5 means 103 elo difference
http://computerchess.org.uk/ccrl/404FRC ... 1_6_64-bit
Critter 1.6 64-bit- Stockfish 2.2.2 64-bit 65.5-34.5(44-13 and 43 draws)
rating difference 114
Critter performance -18
conclusion 65.5-34.5 means 96 elo difference
How is it possible that higher results mean less elo difference?
I tried to translate elo difference to FRC CCRL results and I find something strange
http://computerchess.org.uk/ccrl/404FRC ... 1_1_64-bit
Chiron 1.1 64-bit-Deep Sjeng WC2008 64-bit 64.5-35.5
(50−21 and 29 draws)
rating difference 119
Chiron performance -16
Conclusion 64.5-35.5 means 103 elo difference
http://computerchess.org.uk/ccrl/404FRC ... 1_6_64-bit
Critter 1.6 64-bit- Stockfish 2.2.2 64-bit 65.5-34.5(44-13 and 43 draws)
rating difference 114
Critter performance -18
conclusion 65.5-34.5 means 96 elo difference
How is it possible that higher results mean less elo difference?
-
- Posts: 3748
- Joined: Thu Jun 07, 2012 11:02 pm
Re: CCRL 40/4 lists updated (28th July 2012)
No idea, ask the bayeselo author, that is where the numbers come from. Possibly due the the "prior" assumption in bayeselo. All too technical for me.
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: CCRL 40/4 lists updated (28th July 2012)
I am not certain that I truly understand your question.Uri Blass wrote:I will ask again the same question
I tried to translate elo difference to FRC CCRL results and I find something strange
http://computerchess.org.uk/ccrl/404FRC ... 1_1_64-bit
Chiron 1.1 64-bit-Deep Sjeng WC2008 64-bit 64.5-35.5
(50−21 and 29 draws)
rating difference 119
Chiron performance -16
Conclusion 64.5-35.5 means 103 elo difference
http://computerchess.org.uk/ccrl/404FRC ... 1_6_64-bit
Critter 1.6 64-bit- Stockfish 2.2.2 64-bit 65.5-34.5(44-13 and 43 draws)
rating difference 114
Critter performance -18
conclusion 65.5-34.5 means 96 elo difference
How is it possible that higher results mean less elo difference?
However, though the score is the same in both cases, the draw rate is higher in the second case. In such situations, Bayeselo predicts a lower Elo difference for the case with the higher draw rate.