Question to the members of the ranking lists..

rainhaus · Post by **rainhaus** » Mon Jun 28, 2010 1:08 pm

Thread Summary

Based on the answers the table looks now as follows.

List  Since     h  Start/  Last Rating  Time    Clock   CPU   Ponder  Book    32/64
                u  Elo     Rybka3/32Bit move/   GHz                           Bit 
                m                               1 CPU
----------------------------------------------------------------------------------
SSDF  1984      y    ?       /          40/120  2,4     4****   yes   ?        64 
CCRL  2005      y   pool    3098        40/40*  2,4     1,2,4   no    several  par
CEGT  2006      ?   2750**  3048        40/20*  2.0     1,2,4   no    several  par
IPON  2009/Dec  y   2800*** 2848        5'+3''  3.0     1,2     yes   50 pos.  64
SWCR  2009/Dec  y   2800*** 2851        40/10   2.8     1       yes   Shr12    par
----------------------------------------------------------------------------------
hum= human oriented Elo calibration
*    several time controls available
**   reference engine = Shredder 9.1;1 CPU
***  reference engine = Shredder 12; 1 CPU
**** listed old programs and chess computers,too. 4 CPU since 2008
64=   only 64 Bit tested, if available
par=  most engines parallel 32 and 64 Bit tested
pool= calibration by more engines

Surely, one table could give a rough overview only. If someone wants to work on it or to add sub tables, please do it and let us know. For example, the 'Clock row" could be seen as a place holder and be replaced by bench marks. Some processor lists with Fritz marks are circulating in the net.
The little inquiry pointed out that the included ranking lists are obviously oriented on human results. (CEGT with a question mark, because nobody seems to remember the reasons why the start Elo was set to Shredder 9.1 2750 Elo). CCRL took over the SSDF-calibration and IPON and SCRW started a new calibration, based on a recommendation of two German GMs. The SSDF calibration dates from 2000, based on a selection of 115 Man vs Machine games. Because there was no reaction by SSDF and no comment at the site, I assume this calibration is still the current state.
Using the values of Rybka3/32Bit/1Cpu, you can find significant differences between CCRL, CEGT and IPON/SWCR. In practical terms, IPON and SWCR give Carlsen, Anand and Kramnik still a chance against the top chess programs, how many cores and which hardware may be used, is another question. Following the SSDF and CCRL calibration, the human chess elite would fight losing battles against the top engines. Who is right?
The Elo system is in the context of human and artificial intelligence a most popular and controversial topic which has often been discussed in the fora. No rarely you can find the statement that the scale is made for humans and doesn't correctly represent the ranking of chess programs.
GM, computer chess expert and multiple World Senior Chess Champion LARRY KAUFMAN honoured this thread with a few thoughts and experiences about that issue. The IPON/SCWCR calibration seems too low to him and the CCRL/CEGT calibration too high. As far as I have understood his explanations, I'll try an interpretation. Larry points to the high similarity of playing chess by the programs. The high congruence of the algorithms leads very quickly to a better scoring of programs when improvements are implemented. Humans play more individual and less constantly. Therefore the top players are scoring less significantly than programs and the matches often end much closer. The consequence is an inflating and overstated scoring of the superior engines, if only engines play against each other. IMO, a very complex hypothesis which should be researched scientifically. If human games and engine games cannot be subsumed to one statistical population, than you have to develop another resp. an adapted scale for the ranking of chess programs. I think there are a lot of statistal methods to test the relevant parameters. As long as such a study is not available, try it with Larry's 25% rule

Thanks for the mainly constructive answers. The threatening, virulent, thread destroying, all-around lurking clone discussion plague has remained limited, thankfully. Something more about calibration in the next GGT report.
Rainer

IWB · Post by **IWB** » Mon Jun 28, 2010 2:31 pm

Hello Rainer

Rainer Marian wrote: ...
Using the values of Rybka3/32Bit/1Cpu, you can find significant differences between CCRL, CEGT and IPON/SWCR. In practical terms, IPON and SWCR give Carlsen, Anand and Kramnik still a chance against the top chess programs....

One of the key sentences on my site is:

The Elo numbers shown here do not correlate to human ratings

and even if you where asking around, comparing the list you still cant resist to think my 2800 does have something to do with human ratings. I think I can delete the sentence as I have the feeling that it is completely useless. Maybe I should calibrate the last Engine to Zero to be VERY, VERY FAR OFF off anything which looks human.

Bye
Ingo

alpha123 · Post by **alpha123** » Mon Jun 28, 2010 8:01 pm

kranium wrote:
Graham Banks wrote:We took the SSDF ratings list from 24 Nov 2006, chose a basket of 14 engines, then calibrated our rating list to those.

Cheers,
Graham.
you 'cloned' the SSDF results to start a new site?
nice!

hmm...Nov. 2006? - about the same time Rybka was cloning Fruit!
kinda ironic don't you think?

Oh calm down Norm.

FWIW, you're about a year off.... Rybka 1.0 beta was released in Dec. 2005 IIRC....

Peter

lkaufman · Post by **lkaufman** » Mon Jun 28, 2010 9:11 pm

IWB wrote:Hello Rainer

Rainer Marian wrote:
One of the key sentences on my site is:

The Elo numbers shown here do not correlate to human ratings

Bye
Ingo
I must disagree with this. All the evidence suggests that engine vs engine ratings do correlate very well with human ratings; it's just that the correlation involves multiplying the engine ratings by some factor less than one (I suggest 0.75 based on the SSDF experience) and then adding an appropriate constant. That certainly worked very well for the SSDF ratings over more than two decades. Perhaps you are not using the term "correlate" in the mathematical sense.

Dann Corbit · Post by **Dann Corbit** » Mon Jun 28, 2010 9:34 pm

lkaufman wrote:
IWB wrote:Hello Rainer

Rainer Marian wrote:
One of the key sentences on my site is:

The Elo numbers shown here do not correlate to human ratings

Bye
Ingo
I must disagree with this. All the evidence suggests that engine vs engine ratings do correlate very well with human ratings; it's just that the correlation involves multiplying the engine ratings by some factor less than one (I suggest 0.75 based on the SSDF experience) and then adding an appropriate constant. That certainly worked very well for the SSDF ratings over more than two decades. Perhaps you are not using the term "correlate" in the mathematical sense.
What he meant was that the computer ratings are not equivalent to human ratings.

It is intuitively obvious that there will be a correlation between (for instance) CEGT strength and human strength.

bob · Post by **bob** » Mon Jun 28, 2010 10:00 pm

lkaufman wrote:
IWB wrote:Hello Rainer

Rainer Marian wrote:
One of the key sentences on my site is:

The Elo numbers shown here do not correlate to human ratings

Bye
Ingo
I must disagree with this. All the evidence suggests that engine vs engine ratings do correlate very well with human ratings; it's just that the correlation involves multiplying the engine ratings by some factor less than one (I suggest 0.75 based on the SSDF experience) and then adding an appropriate constant. That certainly worked very well for the SSDF ratings over more than two decades. Perhaps you are not using the term "correlate" in the mathematical sense.
I think he meant "are not comparable". That is, 3200 on the list does not mean that the program will win 15 of every 16 games against a 2800 human.

IWB · Post by **IWB** » Mon Jun 28, 2010 10:09 pm

Dann Corbit wrote:
What he meant was that the computer ratings are not equivalent to human ratings.

It is intuitively obvious that there will be a correlation between (for instance) CEGT strength and human strength.

At first I simply wanted to change my sentence it but then I looked at the meaning of 'correlate'.

The german wiktionary is more detailed than the english one:

"Korrelation: "Beziehung zwischen zwei oder mehr Ereignissen, die in der Regel eine geordnete und nahe zeitliche Abfolge besitzen
Mittellateinisch correlatio = Wechselbeziehung, zu lateinisch con- = mit- und relatio = Beziehung"

The translation is easy:
RELATION between two or more events, which usually have a order and a close timely sequence.
(Middel)Latin correlatio = con- = with and relatio = relation.

With my sentence I simply negate any relation - which is fine under this definition!

For now I stick with my understading of correlation - and my sentence as I do not see anything wrong with it.

The Elo numbers shown here do not correlate to human ratings.

Bye
Ingo

Dann Corbit · Post by **Dann Corbit** » Mon Jun 28, 2010 10:14 pm

IWB wrote:
Dann Corbit wrote:
What he meant was that the computer ratings are not equivalent to human ratings.

It is intuitively obvious that there will be a correlation between (for instance) CEGT strength and human strength.
At first I simply wanted to change my sentence it but then I looked at the meaning of 'correlate'.

The german wiktionary is more detailed than the english one:

"Korrelation: "Beziehung zwischen zwei oder mehr Ereignissen, die in der Regel eine geordnete und nahe zeitliche Abfolge besitzen
Mittellateinisch correlatio = Wechselbeziehung, zu lateinisch con- = mit- und relatio = Beziehung"

The translation is easy:
RELATION between two or more events, which usually have a order and a close timely sequence.
(Middel)Latin correlatio = con- = with and relatio = relation.

With my sentence I simply negate any relation - which is fine under this definition!

For now I stick with my understading of correlation - and my sentence as I do not see anything wrong with it.

The Elo numbers shown here do not correlate to human ratings.

Bye
Ingo

I guess that the tussle over words is due to the mathematical meaning:
http://en.wikipedia.org/wiki/Correlation_and_dependence

IWB · Post by **IWB** » Mon Jun 28, 2010 10:28 pm

Dann Corbit wrote:
I guess that the tussle over words is due to the mathematical meaning:
http://en.wikipedia.org/wiki/Correlation_and_dependence

My guess is that the old romans did think less mathematicaly than you guys - maybe I am too old-school!

Bye
Ingo

lkaufman · Post by **lkaufman** » Mon Jun 28, 2010 10:44 pm

Regardless of the wording issue, the real question is whether one can use engine-engine rating lists to predict (with reasonable margin of error) the ratings engines would have against top humans in competition. I say "yes, if the calculation is done correctly, but not just by looking at the raw number on the list or by just adding or subtracting a constant". I have the feeling that you (Ingo) would say "no, regardless of how the calculation is done". Please correct me if I am wrong.

Question to the members of the ranking lists..

Re: Question to the members of the ranking lists..

Re: Question to the members of the ranking lists..

Re: Question to the members of the ranking lists..

Re: Question to the members of the ranking lists..

Re: Question to the members of the ranking lists..

Re: Question to the members of the ranking lists..

Re: Question to the members of the ranking lists..

Re: Question to the members of the ranking lists..

Re: Question to the members of the ranking lists..

Re: Question to the members of the ranking lists..