What rating list to trust?

Dann Corbit · Post by **Dann Corbit** » Mon Oct 01, 2007 8:01 pm

Trust all of them.

Different results are not mutually exclusive and should not surprise us.

bob · Post by **bob** » Wed Oct 03, 2007 6:10 am

Spock wrote:If you look at the CCRL FRC list, there are 1,200 games played. The other 40/40 and 40/4 lists will no doubt catch up in due course

For programs within 100 points of each other, 1000 games is not nearly enough.

Graham Banks · Post by **Graham Banks** » Wed Oct 03, 2007 6:13 am

bob wrote:
Spock wrote:If you look at the CCRL FRC list, there are 1,200 games played. The other 40/40 and 40/4 lists will no doubt catch up in due course
For programs within 100 points of each other, 1000 games is not nearly enough.

Using Bayes ELO system you get roughly the following:

When an engine has 200 games played, the error margin is still approximately +-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!

bob · Post by **bob** » Wed Oct 03, 2007 7:42 am

Graham Banks wrote:
bob wrote:
Spock wrote:If you look at the CCRL FRC list, there are 1,200 games played. The other 40/40 and 40/4 lists will no doubt catch up in due course
For programs within 100 points of each other, 1000 games is not nearly enough.
Using Bayes ELO system you get roughly the following:

When an engine has 200 games played, the error margin is still approximately +-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!

Right. However, real testing shows that the variance is more than that for 1000 games... When you factor in opening books, along with the inherent randomness caused by inaccurate timing provided by the PC real-time clock, the error bar is _far_ wider than what any of the *elo programs would have you believe.

Graham Banks · Post by **Graham Banks** » Wed Oct 03, 2007 7:55 am

bob wrote:
Graham Banks wrote:
bob wrote:
Spock wrote:If you look at the CCRL FRC list, there are 1,200 games played. The other 40/40 and 40/4 lists will no doubt catch up in due course
For programs within 100 points of each other, 1000 games is not nearly enough.
Using Bayes ELO system you get roughly the following:

When an engine has 200 games played, the error margin is still approximately +-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!
Right. However, real testing shows that the variance is more than that for 1000 games... When you factor in opening books, along with the inherent randomness caused by inaccurate timing provided by the PC real-time clock, the error bar is _far_ wider than what any of the *elo programs would have you believe.

Hi Bob,

this is why it's best to look at all rating lists available. This way you can draw a pretty accurate picture of where a given engine is at.

Regards, Graham.

hgm · Post by **hgm** » Wed Oct 03, 2007 2:04 pm

bob wrote:Right. However, real testing shows that the variance is more than that for 1000 games... When you factor in opening books, along with the inherent randomness caused by inaccurate timing provided by the PC real-time clock, the error bar is _far_ wider than what any of the *elo programs would have you believe.

Please be informed that this is complete bullshit.

Every Elo program in existence assumes that all games you feed them are totally independent, uncorrelated random events. And most testers in fact go through great length to make sure that they are, using external books to prevent duplicate games etc.

So the error bars represent the statistical errors in an accurate and completely correct way.

Note, however, that the prior assumption made in BayesElo is not quite satisfied in a wide rating list. This leads to a systematic compression of the scale. This is not related to the number of games per engine, though.

Uri Blass · Post by **Uri Blass** » Wed Oct 03, 2007 2:40 pm

bob wrote:
Graham Banks wrote:
bob wrote:
Spock wrote:If you look at the CCRL FRC list, there are 1,200 games played. The other 40/40 and 40/4 lists will no doubt catch up in due course
For programs within 100 points of each other, 1000 games is not nearly enough.
Using Bayes ELO system you get roughly the following:

When an engine has 200 games played, the error margin is still approximately +-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!
Right. However, real testing shows that the variance is more than that for 1000 games... When you factor in opening books, along with the inherent randomness caused by inaccurate timing provided by the PC real-time clock, the error bar is _far_ wider than what any of the *elo programs would have you believe.

Real testing shows that the variance is clearly smaller than what you think.
I got it simply by comparing different rating lists because I have no time to make testing with thousands of games.

You even can use one rating list at 120/40 to predict rating at 20/40 with error that is always smaller than 35 elo

http://www.husvankempen.de/nunn/40_120_ ... liste.html

You have 29 programs in this list

You also have the same programs in the 20/40 rating list

http://www.husvankempen.de/nunn/40_40%2 ... liste.html

The biggest difference between rating of these different lists is 31 elo.

Deep Junior 10 2CPU 2803 16 16 1177 44.9 % 2839 34.8 % at 20/40
Deep Junior 10 2CPU 2834 17 17 1050 47.3 % 2852 34.1 % at 120/40

Note that I was surprised by this small difference and the small difference suggest that testing at long time control is almost useless if the target is to get rating(of course it is not useless if the target is to get better games) because in 29 out of 29 cases you can get the rating with error that is smaller than 32 elo by only playing games at 20/40

Note that this is surprising because common sense tell me that we can expect different rating at different time controls and it seems that this factor together with luck when part of the programs have less than 1000 games is not enough even to produce 32 elo difference.

Uri

What rating list to trust?

Re: What rating list to trust?

Re: What rating list to trust?

Re: What rating list to trust?

Re: What rating list to trust?

Re: What rating list to trust?

Re: What rating list to trust?

Re: What rating list to trust?