What rating list to trust?
Moderator: Ras
-
bigo
What rating list to trust?
Which list is more accurate or is this even possible to know? I was looking for some rating information on the new Deep Seng 2.7, seems the CCRl list has Deep Seng 2.7 Much better then Cegt. Don't know who to trust. I downloaded the trial version of deep seng 2.7 this seems like a very interesting engine with a nice playing style.
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: What rating list to trust?
The error bars are huge. Far larger than elostat suggests. No surprise there...bigo wrote:Which list is more accurate or is this even possible to know? I was looking for some rating information on the new Deep Seng 2.7, seems the CCRl list has Deep Seng 2.7 Much better then Cegt. Don't know who to trust. I downloaded the trial version of deep seng 2.7 this seems like a very interesting engine with a nice playing style.
-
Mike S.
- Posts: 1480
- Joined: Thu Mar 09, 2006 5:33 am
Re: What rating list to trust?
CCRL have just started to test Deep Sjeng 2.7: 6 games only, yet
Of course the CEGT rating (for the 2 CPU version) is much more reliable, with more than 700 games.
Note that also the rating list level is different each, in other words you cannot simply take the Elo numbers for comparison. For example, the Rybka 2.2 32 bit single version has 2988 at CCRL and 2938 at CEGT, both after more than 1,000 test games. That indicates that the CCRL ratings are ~50 points higher - the numbers simply bigger - than CEGT's. The difference is certainly not exactly the same for each engine.
(I usually take a look at well known older engines in the 'neighborhood' for comparisons.)
Note that also the rating list level is different each, in other words you cannot simply take the Elo numbers for comparison. For example, the Rybka 2.2 32 bit single version has 2988 at CCRL and 2938 at CEGT, both after more than 1,000 test games. That indicates that the CCRL ratings are ~50 points higher - the numbers simply bigger - than CEGT's. The difference is certainly not exactly the same for each engine.
(I usually take a look at well known older engines in the 'neighborhood' for comparisons.)
Regards, Mike
-
Spock
Re: What rating list to trust?
If you look at the CCRL FRC list, there are 1,200 games played. The other 40/40 and 40/4 lists will no doubt catch up in due course
-
Graham Banks
- Posts: 44873
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: What rating list to trust?
At this point in time, you're absolutely correct, although as Ray also pointed out, our FRC list has Deep Sjeng 2.7 with over 1000 games played.Mike S. wrote:CCRL have just started to test Deep Sjeng 2.7: 6 games only, yetOf course the CEGT rating (for the 2 CPU version) is much more reliable, with more than 700 games.
gbanksnz at gmail.com
-
YL84
Re: What rating list to trust?
Hi,
I suggest to believe the list giving your prefered engine the best ranking
.
Cause there is not enough games played to give accurate playing levels.
My 2 cent,
Yves
I suggest to believe the list giving your prefered engine the best ranking
Cause there is not enough games played to give accurate playing levels.
My 2 cent,
Yves
-
Graham Banks
- Posts: 44873
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: What rating list to trust?
It's best to look at all rating lists available to get an overall idea.
gbanksnz at gmail.com
-
Tony Thomas
Re: What rating list to trust?
Or you can just look at CCRL> .You need to be pimping your own group Graham.Graham Banks wrote:It's best to look at all rating lists available to get an overall idea.
No goat molesting regards
Tony
-
hgm
- Posts: 28410
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: What rating list to trust?
The following is from the rules for the upcoming Dutch Open:
The CCRL scale is 22% compressed to the WBEC scale!
Note also that there might be differences in individual engines caused by the fact that difference testers use different CPUs. Some engines are much more sensitive than others to CPU micro-architecture, e.g. to the size of the L1-cache, and they rank significantly lower on a Pentium 4 (8KB L1) compared to an AMD (64KB L1).
You can see that there is not only a systematic shift between the various rating lists (which can be expected if they use different calibration standards), but that there even is a quite large difference in scale! (This should not happen if they use the same rating model for win-probability vs rating difference, but apparently it does).Dutch Open Programmers Prize wrote: 6) The 'base rating' of a program wil be based on its rating in the lists of WBEC, RWBC or CCRL as they are published on the internet the night before the start of the tournament. The RWBC and CCRL list will first be scaled to make them directly comparable to the WBEC ratings, according to the following formulae:
1.110 * RWBC - 228
1.226 * CCRL - 638
The CCRL scale is 22% compressed to the WBEC scale!
Note also that there might be differences in individual engines caused by the fact that difference testers use different CPUs. Some engines are much more sensitive than others to CPU micro-architecture, e.g. to the size of the L1-cache, and they rank significantly lower on a Pentium 4 (8KB L1) compared to an AMD (64KB L1).
-
YL84
Re: What rating list to trust?
It is wise.Graham Banks wrote:It's best to look at all rating lists available to get an overall idea.
Maybe we should have a kind of absolute reference for ranking (like for the units in sciences), and every tester could say what is the level of his reference compared to this absolute reference. Not sure it's feasable though. At least we should always give errorbars when giving ranking...
Yves