Hello!
geots wrote:Houdini 2.0c x64 vs Rainbow Limited- beta 2
Houdini has been able to slightly increase his lead, and at some point in time Limited- beta 2 needs to make a run at him. "Holding his own" and "playing even with him" won't get the job done now. If "Limited" wants to have any chance at all- he is going to have to soon make a run at Houdini. He can't afford to get any further behind.
Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
5'+5"
Match=500 games
[after 245 games]
Code: Select all
Houdini 2.0c x64 +23 +74/-58/=113 53.50% 130.5/245
Rainbow Limited- beta 2 -23 +58/-74/=113 46.70% 114.5/245
Close enough to call this the halfway mark. Hopefully Limited- beta 2 can begin to make this a close match again. It's not out of reach yet.
Back soon-
george
Beta 2 seems is holding a little more than beta 1, although they are playing in different time controls. Here are my results regarding error bars and LOS:
Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012.
----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------
(The input and output data is referred to the first engine).
Please write down non-negative integers.
Write down the number of wins:
74
Write down the number of loses:
58
Write down the number of draws:
113
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
(Only 1, 2 and 3-sigma confidence error bars are calculated, if possible).
***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************
---------------------------------------
Elo interval for 1-sigma confidence:
Elo rating difference: 22.72 Elo
Lower rating difference: 6.46 Elo
Upper rating difference: 39.08 Elo
Lower bound uncertainty: -16.26 Elo
Upper bound uncertainty: 16.36 Elo
Average error: +/- 16.31 Elo
K = (average error)*[sqrt(n)] = 255.29
Elo interval: ] 6.46, 39.08[
---------------------------------------
Elo interval for 2-sigma confidence:
Elo rating difference: 22.72 Elo
Lower rating difference: -9.77 Elo
Upper rating difference: 55.62 Elo
Lower bound uncertainty: -32.49 Elo
Upper bound uncertainty: 32.89 Elo
Average error: +/- 32.69 Elo
K = (average error)*[sqrt(n)] = 511.72
Elo interval: ] -9.77, 55.62[
---------------------------------------
Elo interval for 3-sigma confidence:
Elo rating difference: 22.72 Elo
Lower rating difference: -26.04 Elo
Upper rating difference: 72.40 Elo
Lower bound uncertainty: -48.77 Elo
Upper bound uncertainty: 49.68 Elo
Average error: +/- 49.22 Elo
K = (average error)*[sqrt(n)] = 770.48
Elo interval: ] -26.04, 72.40[
---------------------------------------
Number of games of the match: 245
Score: 53.27 %
Elo rating difference: 22.72 Elo
Draw ratio: 46.12 %
**********************************************
1 sigma: 2.3354 % of the points of the match.
2 sigma: 4.6708 % of the points of the match.
3 sigma: 7.0063 % of the points of the match.
**********************************************
Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.
-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------
LOS: 91.90 %
This value of LOS is rounded up to 0.01%
End of the calculations. Approximated elapsed time: 57 ms.
Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
After 245 games, Houdini is in the lead with ~ +23 ± 33 Elo (with ~ 95.45% confidence, more less 21 out of 22 times) and a LOS value of 91.9% more less, which is not very significant IMHO. Anyway, I think that Houdini will win this match, and this is not a surprise at all.
With the model I use, the score of Houdini for ensuring a 95% of LOS should be:
Code: Select all
Minimum_score_for_no_regression, ® 2012.
Calculation of the minimum score for no regression (i.e. negative Elo gain) in a match between two engines:
Write down the number of games of the match (it must be a positive integer, up to 1073741823):
245
Write down the draw ratio (in percentage):
46.1224489795
Write down the confidence level (in percentage) between 75% and 99.9%:
95
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
Theoretical minimum score for no regression: 53.8356 %
Theoretical standard deviation in this case: 3.8356 %
Minimum number of won points for the engine in this match: 132.0 points.
Minimum Elo advantage, which is also the negative part of the error bar:
26.9982 Elo
End of the calculations. Approximated elapsed time: 19 ms.
Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
The score should be 132/245, which is very near to the actual 130.5/245; running again Minimum_score_for_no_regression with 500 games and a draw ratio of 46%, a LOS of 95% (in a one-sided test) by Houdini will be reached with a score of 263.5/500 = 52.7% (it implies an advantage of ~ 19 Elo, with error bars of around ± 22 or ± 23 Elo with 95% confidence in a two-sided test). So, it looks reasonably that Houdini is stronger that this beta 2 IMHO. Thanks for running this match!
Regards from Spain.
Ajedrecista.