Trust all of them.
Different results are not mutually exclusive and should not surprise us.
What rating list to trust?
Moderator: Ras
-
Dann Corbit
- Posts: 12803
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: What rating list to trust?
For programs within 100 points of each other, 1000 games is not nearly enough.Spock wrote:If you look at the CCRL FRC list, there are 1,200 games played. The other 40/40 and 40/4 lists will no doubt catch up in due course
-
Graham Banks
- Posts: 44868
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: What rating list to trust?
Using Bayes ELO system you get roughly the following:bob wrote:For programs within 100 points of each other, 1000 games is not nearly enough.Spock wrote:If you look at the CCRL FRC list, there are 1,200 games played. The other 40/40 and 40/4 lists will no doubt catch up in due course
When an engine has 200 games played, the error margin is still approximately +-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!
gbanksnz at gmail.com
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: What rating list to trust?
Right. However, real testing shows that the variance is more than that for 1000 games... When you factor in opening books, along with the inherent randomness caused by inaccurate timing provided by the PC real-time clock, the error bar is _far_ wider than what any of the *elo programs would have you believe.Graham Banks wrote:Using Bayes ELO system you get roughly the following:bob wrote:For programs within 100 points of each other, 1000 games is not nearly enough.Spock wrote:If you look at the CCRL FRC list, there are 1,200 games played. The other 40/40 and 40/4 lists will no doubt catch up in due course
When an engine has 200 games played, the error margin is still approximately +-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!
-
Graham Banks
- Posts: 44868
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: What rating list to trust?
Hi Bob,bob wrote:Right. However, real testing shows that the variance is more than that for 1000 games... When you factor in opening books, along with the inherent randomness caused by inaccurate timing provided by the PC real-time clock, the error bar is _far_ wider than what any of the *elo programs would have you believe.Graham Banks wrote:Using Bayes ELO system you get roughly the following:bob wrote:For programs within 100 points of each other, 1000 games is not nearly enough.Spock wrote:If you look at the CCRL FRC list, there are 1,200 games played. The other 40/40 and 40/4 lists will no doubt catch up in due course
When an engine has 200 games played, the error margin is still approximately +-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!
this is why it's best to look at all rating lists available. This way you can draw a pretty accurate picture of where a given engine is at.
Regards, Graham.
gbanksnz at gmail.com
-
hgm
- Posts: 28409
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: What rating list to trust?
Please be informed that this is complete bullshit.bob wrote:Right. However, real testing shows that the variance is more than that for 1000 games... When you factor in opening books, along with the inherent randomness caused by inaccurate timing provided by the PC real-time clock, the error bar is _far_ wider than what any of the *elo programs would have you believe.
So the error bars represent the statistical errors in an accurate and completely correct way.
Note, however, that the prior assumption made in BayesElo is not quite satisfied in a wide rating list. This leads to a systematic compression of the scale. This is not related to the number of games per engine, though.
-
Uri Blass
- Posts: 10973
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: What rating list to trust?
Real testing shows that the variance is clearly smaller than what you think.bob wrote:Right. However, real testing shows that the variance is more than that for 1000 games... When you factor in opening books, along with the inherent randomness caused by inaccurate timing provided by the PC real-time clock, the error bar is _far_ wider than what any of the *elo programs would have you believe.Graham Banks wrote:Using Bayes ELO system you get roughly the following:bob wrote:For programs within 100 points of each other, 1000 games is not nearly enough.Spock wrote:If you look at the CCRL FRC list, there are 1,200 games played. The other 40/40 and 40/4 lists will no doubt catch up in due course
When an engine has 200 games played, the error margin is still approximately +-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!
I got it simply by comparing different rating lists because I have no time to make testing with thousands of games.
You even can use one rating list at 120/40 to predict rating at 20/40 with error that is always smaller than 35 elo
http://www.husvankempen.de/nunn/40_120_ ... liste.html
You have 29 programs in this list
You also have the same programs in the 20/40 rating list
http://www.husvankempen.de/nunn/40_40%2 ... liste.html
The biggest difference between rating of these different lists is 31 elo.
Deep Junior 10 2CPU 2803 16 16 1177 44.9 % 2839 34.8 % at 20/40
Deep Junior 10 2CPU 2834 17 17 1050 47.3 % 2852 34.1 % at 120/40
Note that I was surprised by this small difference and the small difference suggest that testing at long time control is almost useless if the target is to get rating(of course it is not useless if the target is to get better games) because in 29 out of 29 cases you can get the rating with error that is smaller than 32 elo by only playing games at 20/40
Note that this is surprising because common sense tell me that we can expect different rating at different time controls and it seems that this factor together with luck when part of the programs have less than 1000 games is not enough even to produce 32 elo difference.
Uri