What rating list to trust?

bigo · Post by **bigo** » Thu Sep 13, 2007 3:17 am

Which list is more accurate or is this even possible to know? I was looking for some rating information on the new Deep Seng 2.7, seems the CCRl list has Deep Seng 2.7 Much better then Cegt. Don't know who to trust. I downloaded the trial version of deep seng 2.7 this seems like a very interesting engine with a nice playing style.

bob · Post by **bob** » Thu Sep 13, 2007 5:24 am

bigo wrote:Which list is more accurate or is this even possible to know? I was looking for some rating information on the new Deep Seng 2.7, seems the CCRl list has Deep Seng 2.7 Much better then Cegt. Don't know who to trust. I downloaded the trial version of deep seng 2.7 this seems like a very interesting engine with a nice playing style.

The error bars are huge. Far larger than elostat suggests. No surprise there...

Mike S. · Post by **Mike S.** » Thu Sep 13, 2007 6:08 am

CCRL have just started to test Deep Sjeng 2.7: 6 games only, yet

Of course the CEGT rating (for the 2 CPU version) is much more reliable, with more than 700 games.

Note that also the rating list level is different each, in other words you cannot simply take the Elo numbers for comparison. For example, the Rybka 2.2 32 bit single version has 2988 at CCRL and 2938 at CEGT, both after more than 1,000 test games. That indicates that the CCRL ratings are ~50 points higher - the numbers simply bigger - than CEGT's. The difference is certainly not exactly the same for each engine.

(I usually take a look at well known older engines in the 'neighborhood' for comparisons.)

Spock · Post by **Spock** » Thu Sep 13, 2007 7:26 am

If you look at the CCRL FRC list, there are 1,200 games played. The other 40/40 and 40/4 lists will no doubt catch up in due course

Graham Banks · Post by **Graham Banks** » Thu Sep 13, 2007 8:24 am

Mike S. wrote:CCRL have just started to test Deep Sjeng 2.7: 6 games only, yet Of course the CEGT rating (for the 2 CPU version) is much more reliable, with more than 700 games.

At this point in time, you're absolutely correct, although as Ray also pointed out, our FRC list has Deep Sjeng 2.7 with over 1000 games played.

YL84 · Post by **YL84** » Sun Sep 30, 2007 5:44 pm

Hi,
I suggest to believe the list giving your prefered engine the best ranking

.
Cause there is not enough games played to give accurate playing levels.
My 2 cent,
Yves

Graham Banks · Post by **Graham Banks** » Sun Sep 30, 2007 8:14 pm

It's best to look at all rating lists available to get an overall idea.

Tony Thomas · Post by **Tony Thomas** » Mon Oct 01, 2007 5:04 am

Graham Banks wrote:It's best to look at all rating lists available to get an overall idea.

Or you can just look at CCRL> .You need to be pimping your own group Graham.
No goat molesting regards
Tony

hgm · Post by **hgm** » Mon Oct 01, 2007 11:11 am

The following is from the rules for the upcoming Dutch Open:

Dutch Open Programmers Prize wrote: 6) The 'base rating' of a program wil be based on its rating in the lists of WBEC, RWBC or CCRL as they are published on the internet the night before the start of the tournament. The RWBC and CCRL list will first be scaled to make them directly comparable to the WBEC ratings, according to the following formulae:
1.110 * RWBC - 228
1.226 * CCRL - 638

You can see that there is not only a systematic shift between the various rating lists (which can be expected if they use different calibration standards), but that there even is a quite large difference in scale! (This should not happen if they use the same rating model for win-probability vs rating difference, but apparently it does).

The CCRL scale is 22% compressed to the WBEC scale!

Note also that there might be differences in individual engines caused by the fact that difference testers use different CPUs. Some engines are much more sensitive than others to CPU micro-architecture, e.g. to the size of the L1-cache, and they rank significantly lower on a Pentium 4 (8KB L1) compared to an AMD (64KB L1).

YL84 · Post by **YL84** » Mon Oct 01, 2007 7:56 pm

Graham Banks wrote:It's best to look at all rating lists available to get an overall idea.

It is wise.
Maybe we should have a kind of absolute reference for ranking (like for the units in sciences), and every tester could say what is the level of his reference compared to this absolute reference. Not sure it's feasable though. At least we should always give errorbars when giving ranking...
Yves

What rating list to trust?

What rating list to trust?

Re: What rating list to trust?

Re: What rating list to trust?

Re: What rating list to trust?

Re: What rating list to trust?

Re: What rating list to trust?

Re: What rating list to trust?

Re: What rating list to trust?

Re: What rating list to trust?

Re: What rating list to trust?