Edmund wrote:Eelco de Groot wrote:If it is really stronger running more games, add them to the total pgn for Bayeselo and the uncertainty range should come down. (...)
You would have to re-run the whole tournament as you may not alter the sample size depending on the current state.
eg. lets say 100 games were run between two equal strength engines
in the end one engine is ahead by 11 wins (this would happen in <10% of all cases)
So this would suggest a LOS of 90%
Lets now say you are not happy with the result (hoping for 95%) and want to run another 100 games
for LOS of 95% after 200 games you need to be 20 wins ahead
So you only need another 9 wins in the next 100 games to reach your objective. The likelyhood for this is 13.81%.
Lets now say you happen to test 10 changes (all of them are equal strenghts) 9 of them are LOS < 90%; 1 of them is LOS > 90%;
if you now use this condition and keep only the one > 90% and run another test you only need to reach a LOS of 86.19% if in fact you really wanted to achieve a 95% (which would also be displayed by BayesElo)
regards,
Edmund
Sure, the confidence intervals can not be strictly valid anymore. I believe there was a thread about this very subject recently, I did not happen to read all of it? The results of the added games as I understand it are not truly independant anymore if there are conditions whether they should be played or not, dependant on previous results. But there is also no reason why you would have to throw away the results of the first tournament. It is just as valid as a possible second. Statistics is just a tool. I am sure it must be possible to determine some sort of test, with which you could statistically determine such a progression of results, like in a medical test of a new treatment if the results are very good, it is no longer ethical to withhold the advantages of the treatment from patients receiving the placebo.
I'm sure somebody must have developed statistics for determining confidence intervals for cases if a test is terminated prematurely
Maybe these 'Double blind medical trial termination and progression statistics'

exist only in the form of patented software from some powerful medical company but I am sure they would have covered themselves by being able to provide the mathematical proof if needed, if the problem can be solved mathematically I'm sure they would have done it. Yet another reason that would be why patenting software is wrong in my opinion. But this patented medical software is just a complete speculation of mine, and maybe the problems can at present only numerically be simulated but I don't really think so, but I'm no mathematician or know enough about the statistics.
I only happened on your post just now Edmund, sorry missed it before, I have not done any research in the matter.
Regards, Eelco