Question to the members of the ranking lists..

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
rainhaus
Posts: 143
Joined: Sun Feb 01, 2009 6:26 pm

Question to the members of the ranking lists..

Post by rainhaus » Thu Jun 24, 2010 3:08 pm

Whole headline: Question to the members of the ranking lists: SSDF, CCRL, CEGT, IPON, SCRW

Hi,
I'm just about checking the traditional ranking lists and two newer ones for the purpose of a comparative study and for the comparison with the rankings of the GGT. I've read the sites and the test conditions, but I've missed some comments about the theoretical or practical background for the start values.

Based on which considerations, goals, comparisons or testing results the rankings had been calibrated resp. the start values had been fixed ?. Thank you in advance for any clarifying note.
Here is a first table for orientation.

Code: Select all

List  Since    Start/  Last Rating  Time    Clock   CPU    Ponder Book
               Elo     Rybka3/32Bit move/   GHz
                       1 CPU        min     
----------------------------------------------- -------------------------
SSDF  1984      ?        /          40/120  2,4     4**     yes   ?
CCRL  2005      ?       3098        40/40*  2,4     1,2,4   no    several
CEGT  2006      2761    3048        40/20*  2.0     1,2,4   no    several
IPON  2009/Dec  2800    2848        5'+3''  3.0     1,2     yes   50 pos.
SWCR  2009/Dec  2655    2851        40/10   2.8     1       yes   Shr12
--------------------------------------------------------------------------
*  several time controls available
** listed old programs and chess computers,too. 4 CPU since 2008
Notes:
- 10 years ago, SSDF had started some attempts calibrating computer tournament results to human rankings. Tony Hedlund: ..in 2000 we took 115 suitable games from Chris Carson's collection of Man vs Machine games and made a new calibration. We had to lower the list with 100 points.Question: What is the current state of the SSDF-calibration?
http://ssdf.bosjo.net/list.htm
- CCRL, I couldn't find any hints about calibration
http://computerchess.org.uk/ccrl/4040/about.html
- CEGT, gives a StartElo of 2761 without reference to an engine or to a calibration method? I don't assume the StartElo is drawn by lot : )
http://www.husvankempen.de/nunn/rating.htm
- SWCR and IPON are very new. Both were released at about the same time. The authors apparently attach importance to comparability. The StartElo for both lists were fixed for Shredder 12 = 2800 Elo. The SWCR value had been changed now to Sjeng WC-2008= 2655 Elo.
http://www.inwoba.de/
http://www.amateurschach.de/

Best
Rainer
Thread viewImage
flat view
is a bad view
without thread view

Frank Quisinsky
Posts: 4851
Joined: Wed Nov 18, 2009 6:16 pm
Location: Trier, Germany
Contact:

Re: Question to the members of the ranking lists..

Post by Frank Quisinsky » Thu Jun 24, 2010 5:13 pm

Hi Rainer,

today the new w-32 SWCR ratinglist is online.

I changed to Shredder = 2.800 (start ELO).

Have here a little problem.
The new SWCR-64 has not enough games so far. Ratings are not clear enough.

Shredder 12 x64 has after the first 600 games very bad results. The reason I swichted from Shredder to Sjeng (start ELO).

Unfortunately: I can't compare my w32 with the x64 list and can say ... engine x have with x64 x ELO more. Hope the problem will be solved if I have more games for the x64 rating list.

Since today I switched from Sjeng to Shredder again. I think with more games the problem will be solved.

Your table is interesting.

1.
Missed w32 / x64 information.
In SWCR for each one an own list is available.

2.
Clock / GHz.
Not to compare, better is to added a fritz-bech or crafty bench. I used 4x Quad Core Q9550 systems. CEGT / CCRL cpu-clock go to older / slower systems.

IPON and SWCR hasn't the same time control.

In SWCR a game need around 40 minutes (games are played up to mate / without resing). In Ipon a game need around 16 minutes (with resign).

3.
Interesting is to set the resign information in your table.

I play without resign for different reasons. Most important is to find out mistakes in engines. Furthermore I make different mate statistics.

If you compare the results:
Ponder with 30% and the hardware factor you will get the following time control:

~ around:
CCRL = 40 / 18 resign = on
CEGT = 40 / 8 resign = on
IPON = 40 / 5 resign = on
SWCR = 40 / 10 resign = off

If you have any questions feel free to write me.

4.
Games available: yes / no could be interesting for your table.

Best
Frank

To your question:
End of the year GM Jörg Hickl and I made at my home an interview with GM Georg Meyer (2.675 ELO). We are speaking about computerchess too. Georg Meyer is the number 2 in Germany. After his opinion and the opinion other GMs Georg knows Rybka 3 should be playing with 2 Cores, 32-Bit with around 1 minute pro move with 2.900 ELO.

I am speaking with Ingo about that. So we have the idea for the Shredder 12 start ELO with 2.800 !! Most of GMs using NBs with 32-bit and engines as kibitz in ChessBase 10 for analyzes.

Georg was speaking from ~ 2.900 +-
I like computer chess!

User avatar
Graham Banks
Posts: 32913
Joined: Sun Feb 26, 2006 9:52 am
Location: Auckland, NZ

Re: Question to the members of the ranking lists..

Post by Graham Banks » Thu Jun 24, 2010 7:56 pm

We took the SSDF ratings list from 24 Nov 2006, chose a basket of 14 engines, then calibrated our rating list to those.

Cheers,
Graham.
My email addresses:
gbanksnz at gmail.com
gbanksnz at yahoo.co.nz

User avatar
kranium
Posts: 1824
Joined: Thu May 29, 2008 8:43 am

Re: Question to the members of the ranking lists..

Post by kranium » Thu Jun 24, 2010 10:51 pm

Graham Banks wrote:We took the SSDF ratings list from 24 Nov 2006, chose a basket of 14 engines, then calibrated our rating list to those.

Cheers,
Graham.
you 'cloned' the SSDF results to start a new site?
nice!

hmm...Nov. 2006? - about the same time Rybka was cloning Fruit!
kinda ironic don't you think?
Last edited by kranium on Thu Jun 24, 2010 10:54 pm, edited 1 time in total.

Frank Quisinsky
Posts: 4851
Joined: Wed Nov 18, 2009 6:16 pm
Location: Trier, Germany
Contact:

Re: Question to the members of the ranking lists..

Post by Frank Quisinsky » Thu Jun 24, 2010 10:53 pm

you have again a bad day?
your problem!
I like computer chess!

IWB
Posts: 1539
Joined: Thu Mar 09, 2006 1:02 pm

Re: Question to the members of the ranking lists..

Post by IWB » Fri Jun 25, 2010 5:07 am

Hello Rainer
Rainer Marian wrote: - SWCR and IPON are very new. Both were released at about the same time. The authors apparently attach importance to comparability. The StartElo for both lists were fixed for Shredder 12 = 2800 Elo. The SWCR value had been changed now to Sjeng WC-2008= 2655 Elo.
http://www.inwoba.de/
http://www.amateurschach.de/
The history behind the IPON:

I run this list for quite a few years but went public (after removing ~150000 beta games) about a little less then a year ago. A few moth later Frank send out his first results for his list. At that time we both came to a conclusion to unify the rating lists and we approached the CEGT to find a new basis for all our lists. Unfortunately we did not find enough support with the "old school ponder off" guys. ;-)

Nevertheless Frank and I chosed to have an identical starting point. The 2800 for Shredder were an easy solution because we both had the 32bit Version wih quite a lot of games in our lists and we did not want to have a single engine (at that time) with 3000 Elo in our lists as this seems to be unrealistic to us. (Of course the values are not comparable to human rating, but people do compare ...)
Later Frank choosed to change the starting point because of a few missing elo (mainly to a lack of games) but recently he decided to go back.

I am still open to find a unified starting point for more engine list but I will not start the discussion again.

Bye
Ingo

User avatar
Werner
Posts: 2392
Joined: Wed Mar 08, 2006 9:09 pm

Re: Question to the members of the ranking lists..

Post by Werner » Fri Jun 25, 2010 6:59 am

Hi Rainer,
as far as I know, the lists are based on
336 Shredder 9.1 2750 +6 -6 9594.
The start elo is 2762 at the moment, so Shredder has 2750.
This is an older agreement between rating lists of former times.
Times are adjusted to AMD 4200+ with 2.2 GHz.
Werner

lkaufman
Posts: 3647
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: Question to the members of the ranking lists..

Post by lkaufman » Fri Jun 25, 2010 3:15 pm

IWB wrote:Hello Rainer
"Nevertheless Frank and I chosed to have an identical starting point. The 2800 for Shredder were an easy solution because we both had the 32bit Version wih quite a lot of games in our lists and we did not want to have a single engine (at that time) with 3000 Elo in our lists as this seems to be unrealistic to us. (Of course the values are not comparable to human rating, but people do compare ...)
Bye
Ingo
"

Regarding comparing engine ratings to human ratings, I'd like to make two points. First, 2800 for Shredder 12 is obviously too low in human terms, as it is much stronger than any of the programs that played Kasparov or Kramnik successfully (drawn or won matches). But I agree that the top ratings on the CCRL/CEGT lists seem too high in human terms, as they imply very little chance for Anand to get even a draw against Deep Rybka 4. The explanation is clear from a study of the SSDF ratings over more than two decades. SSDF had to regularly reduce their whole list to avoid having inflated ratings at the top. The reason is simply that engine vs. engine testing overstates rating gains compared to human vs. engine games, probably because the more similar two entities are, the more certain it is that superiority will decide a game. If two players have totally different evals and search, a doubling of search speed for one will have limited benefit, but if they are otherwise identical it will be decisive. Anyway, a study of the SSDF ratings indicates that engine vs. engine ratings need to be contracted by roughly 3/4 to be comparable to human ratings. So to estimate a human rating for an engine, first decide at what level the list being used seems right (maybe 3000 for IPON, maybe somewhere in the 2700-2800 range for CCRL and CEGT), then move the rating of a given engine by 25% towards this number. This should produce a pretty accurate estimate of the rating the engine would get in human competition.

Albert Silver
Posts: 2829
Joined: Wed Mar 08, 2006 8:57 pm
Location: Rio de Janeiro, Brazil

Re: Question to the members of the ranking lists..

Post by Albert Silver » Fri Jun 25, 2010 4:13 pm

kranium wrote:
Graham Banks wrote:We took the SSDF ratings list from 24 Nov 2006, chose a basket of 14 engines, then calibrated our rating list to those.

Cheers,
Graham.
you 'cloned' the SSDF results to start a new site?
nice!

hmm...Nov. 2006? - about the same time Rybka was cloning Fruit!
kinda ironic don't you think?
It is a pity you never have anything positive or constructive to post.

As to the date, I thought your contention was that Rybka Beta had cloned Fruit. I take it that you now think the cloning only took place in later versions?
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."

User avatar
Guenther
Posts: 2959
Joined: Wed Oct 01, 2008 4:33 am
Location: Regensburg, Germany
Full name: Guenther Simon
Contact:

Re: Question to the members of the ranking lists..

Post by Guenther » Fri Jun 25, 2010 5:08 pm

kranium wrote:
Graham Banks wrote:We took the SSDF ratings list from 24 Nov 2006, chose a basket of 14 engines, then calibrated our rating list to those.

Cheers,
Graham.
you 'cloned' the SSDF results to start a new site?
nice!

hmm...Nov. 2006? - about the same time Rybka was cloning Fruit!
kinda ironic don't you think?
as precise as all your messages, a year here a year there doesn't matter
much for you...

Post Reply