Modern Times wrote:So on 40/20, you have Komodo 9.3 x64 4CPU 19 Elo weaker than Komodo 9.2 x64 4CPU ?
Look at the error bars...
Can't conclude anything about a 19 Elo difference with a +/- 16 Elo error bar...
Very true. That was the trouble with testing 9.3 - with a claimed +15 Elo, the testing groups would never be able to verify it with the number of games they usually play.
With more opponents and the same quantity of games you are able to kick the error bar at the moon.
No calculation program we have considered quantity of opponents in the calculation and this is completly wrong.
Means ...
2.000 games vs. 40 opponents is earlier exactly as
2.000 games vs. 10 opponents
But error bar results is the same for each of the calculation programs, not important 1, 10, 50, or 100 opponents.
Can't be right!
Best
Frank
Unfortunately, statistics don't buy into that. # of games is all that matters, based purely on sampling theory. There is no way to shortchange the number of games without a corresponding loss of accuracy / increase in the error margin.
With more opponents and the same quantity of games you are able to kick the error bar at the moon.
No calculation program we have considered quantity of opponents in the calculation and this is completly wrong.
Means ...
2.000 games vs. 40 opponents is earlier exactly as
2.000 games vs. 10 opponents
But error bar results is the same for each of the calculation programs, not important 1, 10, 50, or 100 opponents.
Can't be right!
Best
Frank
Unfortunately, statistics don't buy into that. # of games is all that matters, based purely on sampling theory. There is no way to shortchange the number of games without a corresponding loss of accuracy / increase in the error margin.
If all of the rating lists show similar results for an engine, that is probably the best overall guide.
With more opponents and the same quantity of games you are able to kick the error bar at the moon.
No calculation program we have considered quantity of opponents in the calculation and this is completly wrong.
Means ...
2.000 games vs. 40 opponents is earlier exactly as
2.000 games vs. 10 opponents
But error bar results is the same for each of the calculation programs, not important 1, 10, 50, or 100 opponents.
Can't be right!
Best
Frank
Unfortunately, statistics don't buy into that. # of games is all that matters, based purely on sampling theory. There is no way to shortchange the number of games without a corresponding loss of accuracy / increase in the error margin.
If all of the rating lists show similar results for an engine, that is probably the best overall guide.
Not sure what you mean. He mentioned fewer games against more opponents gave a more accurate rating. That's not how Elo and sampling theory work. To get a specific error bar, you have to play the right number of games. There is no way to replace N thousand games with N hundreds of games and get the same accuracy.
Everybody wants to cheat the statistical gods that control the error bar. Won't ever happen, however. In his example, 2000 games gives a specific error bar, number of opponents doesn't any effect on that. IE 10 games vs one opponent or 1 games vs 10 opponents, you get the same error bar.