Ok, guys. This wasn't meant as a cry for help. I knew the answer before posting. I was hoping to point out this real world scenario to all the "testers" that claim they know a new version is better.CRoberson wrote:Which version is better A or B?
Score of A vs Telepath6.030: 364 - 144 - 262
A scores (364+262/2)/770 = 495.0/770 = 64.286%
~= +104 Elo
Margins are +/- 21 Elo
Score of B vs Telepath6.030: 271 - 133 - 195
B scores (271+195/2)/599 = 368.5/599 = 61.519%
~= +84 Elo
Margins are +/- 25 Elo
vote in the poll and post your reasons.
So, lets look at Conkie's analysis. It was quite correct and simple. Both programs are barely within each others margins, so we can't tell if they are different.
I have seen others statements in the last year that if they are barely within the margins then the top one is probably better. Well that is not the way the math works. The math purely goes: you have a value that is either outside or inside the margins. If it is inside (anywhere inside), then you can't claim one is better unless (as Miguel stated) you lower the margins. If they are outside each others margins then you can make a statement. The problem here is a little more complicated than the classical stats test that I stated. Instead of comparing one variable against a constant value, we are comparing 2 variables. Thus, we have to consider the fact that the ranges for each overlap.
If I understood Lucas' statement correctly, he used LOS to decide that A is better than B. But, Munoz did LOS and said that it is unclear. So, I don't get that.
Well, here is the answer: A == B. I ran the first test to run 2400 games and it crashed after 770 games. So, I ran the same programs again and it crashed after 599 games. So, A and B are the same programs.
The point to the testers is that here is real world data for two seemingly different programs that are really two rather varied results for the same program still within the margins.
So, two really different programs that exhibit data like that shown here are not clearly stronger/weaker than each other.
