Yes, inter-opponent games can be re-used, and improve the reliability of ratings. But games between the opponents and previous versions of your program can be re-used too. And I see no reason why they would improve the accuracy of opponent ratings less than inter-opponent games.Sven Schüle wrote:Now I'm lost again ... What do you mean by "same amount" here? When adding inter-opponent games, I would expect that each of these matches has the same number of games as each of the matches "A vs. opponent". So there is additional computing power necessary (e.g. for 5 opponents 10*80 games in addition to the other 5*80), but only once in the beginning, since these inter-opponent games can always be reused when another new version of A comes to testing.Rémi Coulom wrote:No, this is not what I meant. With the same amount of computing power, I believe that the opposite may be true, but I am not sure.Sven Schüle wrote: What I learn from your statement is that rating only the games from a gauntlet of an engine A against some opponents may give results that are less reliable than rating these games together with games between the opponents.
This means that the idea of merging the two sets of games into one single PGN file being better than two separates runs of bayeseo generalizes to more than two sets of games.
Rémi