The grand unified rating list

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
pedrox
Posts: 1056
Joined: Fri Mar 10, 2006 6:07 am
Location: Basque Country (Spain)

Re: The grand unified rating list

Post by pedrox »

Code: Select all

2412 DanaSah 2.26/16                        2406   41   41   188   48%  2412   38%
2454 DanaSah 2.26                           2394   61   61   100   64%  2286   22%

4032 DanaSah 1.3.4                          1963  271  271     5   40%  2043    0%
4121 DanaSah 1.34                           1918  130  130    31   68%  1724    6% 
You can combine data and name it as DanaSah 2.26 and DanaSah 1.3.4

It's fun, in the list there are versions that I did not remember.

On the other hand I find that maybe my best version with enough games was private, DanaSah 4.07, I will have to review code source.

Pedro
User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: The grand unified rating list

Post by Evert »

Very useful indeed!
One correction: there is no "Jazz 4.44", it is the same version as "Jazz 444". I never bothered to give Jazz (or Sjaak) proper version numbers, the number associated with each version is just the SVN revision number corresponding to that release. This is confusion for many, something that hadn't occurred to me.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: The grand unified rating list

Post by lkaufman »

First of all, thanks for this nice work! I think it is a better overall rating list than any single list at this time. It tells me that we need 35 Elo over Komodo 4 to be number one on your single-core list. That won't be easy, but at least it is a realistic target. Maybe MP will show a different picture.
My only complaint is that by mixing the IPON blitz ratings with the other slower rating lists, you can get distortion, and also the mere fact that IPON can play more games due to the faster time limit gives the IPON ratings too much weight. I wonder if you would consider publishing an alternate list that had a minimum time limit of 40/20' or the equivalent? Then people could decide for themselves if they wanted the larger samples that including blitz games gives, or wanted the purity of only longer time control games. Or they could consider both lists and take the average.
Mark Mason
Posts: 175
Joined: Sun Apr 02, 2006 4:52 pm

Re: The grand unified rating list

Post by Mark Mason »

Great listing and invaluable as a single source look up. Thank you !
JVMerlino
Posts: 1357
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: The grand unified rating list

Post by JVMerlino »

Excellent work, Vincent!

You can combine:

Myrddin 0.84g 64-bit
Myrddin 0.85 64-bit

and call them "Myrddin 0.85 64-bit", as they are the same version.

Thanks very much!

jm
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: The grand unified rating list

Post by jdart »

I think you should just remove engines that have >100 ELO error bars. There is no sense ranking these.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The grand unified rating list

Post by bob »

I would second Jon's idea. For example, there is a "loop" in the list with 2 games. One simple way to cull would be requiring 100 games or something similar to make sure it is not just noise...
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: The grand unified rating list

Post by lkaufman »

I have another proposal, to address what I consider the most significant potential bias of the list. I'm referring to the fact that the list automatically underweights longer time limit games simply because fewer can be played. There are "universal" rating lists for human games (in Germany at least) based on the notion that some fairly slow time limit is considered standard, and that all faster games can still be rated, but only with a weighting in proportion to the time limit of "standard" games. So for example if game in one hour is standard, then game in five minutes can be rated, but each such game gets only 1/12 the weight of a standard game. In this manner, a tournament gets the same weight for a given amount of total playing time, regardless of whether they play 5 slow games or 60 blitz games in the same time. This seems fair to me, and counteracts the overweighting of blitz.

I think you could do the same here. Of course some judgments would need to be made, such as equating 40/20' with 20' + 10" increment, for example. Also unequal hardware must be allowed for, and testing with ponder on should count something like 1.3 times as much as with ponder off, based on a 30% ponder hit rate. So the person in charge of the list must make some tough calls, but they only need be made at the beginning, unless a new list is added or a list changes its conditions. Once the weighting is decided, it is no more work than the present system.

As a matter of disclosure, I should say that since I believe that the overweighting of blitz hurts Komodo relative to Houdini and Critter, I would benefit from anything that corrects for it. If as some claim there is no difference in scaling among these programs, then there should be no objection from them. But regardless, I think it is the right way to do it.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: The grand unified rating list

Post by IWB »

Hello Vincent,

Just my 2 cents about this attempt.

A lot of work and no doubt done for good reasons but I started my list years ago and made it public later exactly because I was unhappy with the established lists. Now you throw everything in a pot and stirr it. I have some doubt that the outcome is any good.
Even if I know that I can't stop you from doing it I would rather like to have the IPON games not in that list - it makes simply no sence to mix ALL that different conditions.

Thx
Ingo
User avatar
Graham Banks
Posts: 41432
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: The grand unified rating list

Post by Graham Banks »

IWB wrote:Hello Vincent,

Just my 2 cents about this attempt.

A lot of work and no doubt done for good reasons but I started my list years ago and made it public later exactly because I was unhappy with the established lists. Now you throw everything in a pot and stirr it. I have some doubt that the outcome is any good.
Even if I know that I can't stop you from doing it I would rather like to have the IPON games not in that list - it makes simply no sence to mix ALL that different conditions.

Thx
Ingo
No harm in what he's doing Ingo and a lot of people seem to appreciate it.
Ipon is still Ipon regardless, so nothing to worry about here really.
We wouldn't like it if certain engine authors objected to particular rating lists including their engine. Just an anology.
gbanksnz at gmail.com