Question how to calculate a better rating list

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

hirdelgird
Posts: 32
Joined: Sun Sep 09, 2007 12:43 am

Question how to calculate a better rating list

Post by hirdelgird »

hello

someone have experience if it is necassary for a better rating list when the gauging engine also do more games ?

I think it will be better, ... my ratinglist is gauged with Fruit 2.10 at startelo 2.700, but cause of an newer versions it is not playing

thanks for your answers
Marc MP

Re: Question how to calculate a better rating list

Post by Marc MP »

Gerhard.Schwager wrote:hello

someone have experience if it is necassary for a better rating list when the gauging engine also do more games ?

I think it will be better, ... my ratinglist is gauged with Fruit 2.10 at startelo 2.700, but cause of an newer versions it is not playing

thanks for your answers
I don't think it matters. Elo ratings are relative - only the ELO difference between engines counts, not the absolute value. If you set Fruit 2.1 to be your gauging engine at 2750 ELO instead of 2700, all engines on your list will gain 50 elo. The ELO given to Fruit 2.1 in this example will not affect the quality of the rating estimates done by Bayeselo, but only the levels of the ratings.

Of course, more games always lead to better estimation. As far as I can remember with Bayeselo, for a fixed number of games you can improve the estimation by doing 2 things:

1- Engines of similar strength should play together: you learn more about engine X if it is scores 5/10 against a 2600 ELO engine than if it scores 5/5 against a 1500 ELO engine and 0/5 against a 3000 ELO engine.

2- You should try to have each engine play (roughly at least) the same number of games. As an example it is better to have a 10 rounds round-robin tournament between engines A, B and C than having A and B playing 28 times together and once against C (for a 30 games total in both cases). The former option will lead to somewhat lower overall uncertainty about the ratings estimates (but once again there would be no gain in having the "gauging engine" play more games than the two others).

Hope that helps,