What rating list to trust?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

bigo

What rating list to trust?

Post by bigo »

Which list is more accurate or is this even possible to know? I was looking for some rating information on the new Deep Seng 2.7, seems the CCRl list has Deep Seng 2.7 Much better then Cegt. Don't know who to trust. I downloaded the trial version of deep seng 2.7 this seems like a very interesting engine with a nice playing style.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: What rating list to trust?

Post by bob »

bigo wrote:Which list is more accurate or is this even possible to know? I was looking for some rating information on the new Deep Seng 2.7, seems the CCRl list has Deep Seng 2.7 Much better then Cegt. Don't know who to trust. I downloaded the trial version of deep seng 2.7 this seems like a very interesting engine with a nice playing style.
The error bars are huge. Far larger than elostat suggests. No surprise there...
User avatar
Mike S.
Posts: 1480
Joined: Thu Mar 09, 2006 5:33 am

Re: What rating list to trust?

Post by Mike S. »

CCRL have just started to test Deep Sjeng 2.7: 6 games only, yet :mrgreen: Of course the CEGT rating (for the 2 CPU version) is much more reliable, with more than 700 games.

Note that also the rating list level is different each, in other words you cannot simply take the Elo numbers for comparison. For example, the Rybka 2.2 32 bit single version has 2988 at CCRL and 2938 at CEGT, both after more than 1,000 test games. That indicates that the CCRL ratings are ~50 points higher - the numbers simply bigger - than CEGT's. The difference is certainly not exactly the same for each engine.

(I usually take a look at well known older engines in the 'neighborhood' for comparisons.)
Regards, Mike
Spock

Re: What rating list to trust?

Post by Spock »

If you look at the CCRL FRC list, there are 1,200 games played. The other 40/40 and 40/4 lists will no doubt catch up in due course
User avatar
Graham Banks
Posts: 44873
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: What rating list to trust?

Post by Graham Banks »

Mike S. wrote:CCRL have just started to test Deep Sjeng 2.7: 6 games only, yet :mrgreen: Of course the CEGT rating (for the 2 CPU version) is much more reliable, with more than 700 games.
At this point in time, you're absolutely correct, although as Ray also pointed out, our FRC list has Deep Sjeng 2.7 with over 1000 games played.
gbanksnz at gmail.com
YL84

Re: What rating list to trust?

Post by YL84 »

Hi,
I suggest to believe the list giving your prefered engine the best ranking :wink: .
Cause there is not enough games played to give accurate playing levels.
My 2 cent,
Yves
User avatar
Graham Banks
Posts: 44873
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: What rating list to trust?

Post by Graham Banks »

It's best to look at all rating lists available to get an overall idea.
gbanksnz at gmail.com
Tony Thomas

Re: What rating list to trust?

Post by Tony Thomas »

Graham Banks wrote:It's best to look at all rating lists available to get an overall idea.
Or you can just look at CCRL> .You need to be pimping your own group Graham.
No goat molesting regards
Tony
User avatar
hgm
Posts: 28410
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: What rating list to trust?

Post by hgm »

The following is from the rules for the upcoming Dutch Open:
Dutch Open Programmers Prize wrote: 6) The 'base rating' of a program wil be based on its rating in the lists of WBEC, RWBC or CCRL as they are published on the internet the night before the start of the tournament. The RWBC and CCRL list will first be scaled to make them directly comparable to the WBEC ratings, according to the following formulae:
1.110 * RWBC - 228
1.226 * CCRL - 638
You can see that there is not only a systematic shift between the various rating lists (which can be expected if they use different calibration standards), but that there even is a quite large difference in scale! (This should not happen if they use the same rating model for win-probability vs rating difference, but apparently it does).

The CCRL scale is 22% compressed to the WBEC scale!

Note also that there might be differences in individual engines caused by the fact that difference testers use different CPUs. Some engines are much more sensitive than others to CPU micro-architecture, e.g. to the size of the L1-cache, and they rank significantly lower on a Pentium 4 (8KB L1) compared to an AMD (64KB L1).
YL84

Re: What rating list to trust?

Post by YL84 »

Graham Banks wrote:It's best to look at all rating lists available to get an overall idea.
It is wise.
Maybe we should have a kind of absolute reference for ranking (like for the units in sciences), and every tester could say what is the level of his reference compared to this absolute reference. Not sure it's feasable though. At least we should always give errorbars when giving ranking...
Yves