Hello!
ernest wrote:geots wrote:Code: Select all
1 Houdini 2.0c x64 +121/-96/=187 53.00% 214.5/404
2 Engine 40x(2) +96/-121/=187 47.00% 189.5/404
I guess a lot could happen when you have 596 games remaining.
Here you can see
the first effect of statistics.
All your previous runs gave a Houdini result of around
56%.
Now, since on Friday you had:
Code: Select all
1 Houdini 2.0c x64 +81/-54/=118 56.00% 140.0/253
2 Engine 40x(2) +54/-81/=118 44.00% 113.0/253
the latest run of 251 games (Friday to Sunday) is, by difference:
Code: Select all
1 Houdini 2.0c x64 +40/-42/=169 49.60% 124.5/251
49.6% ! Nothing is wrong, just statistics!
I agree with Ernest regarding statistics and I also agree with George, regarding that everything can happen in the remaining 596 games. By the way, I am very interested in this match.
I refined a little my tiny programme Minimum_score_for_no_regression and the ugly parameter
k is no longer required but internally calculated. Since Fortran 95 does not have erf function (or at least I am not aware of it), I had to approximate the definite integral of the probability density function of the normal distribution by the composite Simpson's rule, and then solve
k by regula falsi... a
difficult trick just due to the lack of erf function in Fortran 95! At least this trick works fine. I also added the calculated standard deviation to the output for getting more info.
Taking the example of George's match:
Code: Select all
Minimum_score_for_no_regression, ® 2012.
Calculation of the minimum score for no regression in a match between two engines:
Write down the number of games of the match (it must be a positive integer, up to 1073741823):
404
Write down the draw ratio (in percentage):
46.287128712871
Write down the confidence level (in percentage) between 75% and 99.9%:
95
Calculating...
Theoretical minimum score for no regression: 53.5564 %
Theoretical standard deviation in this case: 1.8145 %
Minimum number of won points for the engine in this match: 216.5 points.
Minimum Elo advantage, which is also the negative part of the error bar:
24.9827 Elo
End of the calculations.
Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
With those 404 games, Houdini can not claim (with 95% confidence) that is better than Engine 40x(2)
using my imperfect model. Although it is better very likely because Houdini score is very close to 216.5 out of 404. LOS tables are very useful here and, of course, I am not smart enough to even try to calculate them!
A question for George: are you generating those code boxes with scores using Fritz 11 GUI? I see that scores are like 56.00% - 44.00% or 53.00% - 47.00%, but those .00 are not correct (i.e. in the last update it should be more less 53.09% - 46.91%), and make me think that Fritz 11 GUI (if you are using it for generate those code boxes) does not round up to 0.01% but up to 1%... just a guess. Please keep up the good work.
Regards from Spain.
Ajedrecista.