jhaglund wrote:Only thing I worry about is that I would like about 50K games to reach that +/-4 error pretty solidly. 40K is not quite there, usually +/- 4 to 5. I am looking for new opponents, but I would also like to drop at least one from the current group as it is almost too low in rating to matter now.
This is a good time for me to revamp opponents and positions since we are starting a new version and changing all the Elo numbers won't hurt.
A test run is all...
Suggest playing more different time controls to add up to 50k. I think the results are still valid...
3000 games... w&b.
10k @ 10 sec + 1
10k @ 30 sec + 1
10k @ 1 min + 1
10k @ 3 min + 1
10k @ 5 min + 1
I don't know how many opponents per set.
Joshua
I don't think that will work. For example when I try 10s + 0.1 or 20s + 0.2, I don't see much difference at all in the final results. That means that I am essentially playing each positon 4 times, twice as black, twice as white, same opponent. This is already known to be bad from the very early test results I saw on our cluster. We were playing using 40 positions, and could play a match 16 times and not get a single duplicate game. But the results were wrong, because you have more games, but duplicate outcomes, which BayesElo does not understand.
Imagine what happens if you just play 80 games, and then copy them multiple times into one big PGN file. Now you have (say) 80,000 games (f you duplicate the file 1,000 times) and BayesElo will report a wonderfully low error bar, and it will be completely wrong, because you didn't really try 80,000 independent trials. Every set of 1,000 trials are perfectly correlated and it won't know that nor understand it.