Hi George!
geots wrote:Houdini 2.0c x64 vs Rainbow UNLimited
This match could get interesting fast if Rainbow gets a few wins in a row. Tho he still remains down- even by another game or 2- he has knocked the hell out of Houdini's elo lead. Eaten 13 points off the lead since the last update.
I know- but is the glass half-full or half-empty?!
Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
5'+5"
Match=500 games
[thru game 195]
Code: Select all
Houdini 2.0c x64 +27 +59/-44/=92 53.80% 105.0/195
Rainbow UNLimited -27 +44/-59/=92 46.20% 90.0/195
Rainbow needs a mini-run of sorts. He has plenty of time- if he has the smarts.
And tomorrow-
george
Houdini is a hard nut to crack, it is not a surprise.
I have done many improvements in my programme LOS_and_Elo_uncertainties_calculator: I have rewritten a big part of its code for allowing to enter any confidence interval (which will be later rounded up to 0.01%) between 65% and 99.9% (they were chosen by random), giving the programme greater versatility than before, when it only could calculate its things for 1, 2 and 3-sigma confidence (~ 68.27%, ~ 95.45% and ~ 99.73% confidence); now, it also calculates
sometimes the LOS value in the way proposed by Rémi Coulom in the last equation of
this post, that does not take into account draws. This calculation is very slow (in comparison with the approach of the normal distribution that takes into account draws) when the number of wins and loses is very high, so I do not calculate it if wins + loses > 16000 (for example, if wins + loses = 20000, I had problems, maybe overflows, and I did not want to deal with them). But those values will differ less and less when the number of wins and loses increases (I suppose that it is due to the central limit theorem), so it is enough with one value. If wins + loses < 16001, a comparison can be made between the two obtained LOS values. It is very possible that some bugs remain (introduced in this rewrite), although I have fix many of them. Here are the results of my programme for this match (up to 195 games), from Rainbow POV:
Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012.
----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------
(The input and output data is referred to the first engine).
Please write down non-negative integers.
Write down the number of wins (up to 1825361100):
44
Write down the number of loses (up to 1825361100):
59
Write down the number of draws (up to 2147483646):
92
Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%):
95
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
---------------------------------------
Elo interval for 95.00 % confidence:
Elo rating difference: -26.78 Elo
Lower rating difference: -62.64 Elo
Upper rating difference: 8.52 Elo
Lower bound uncertainty: -35.86 Elo
Upper bound uncertainty: 35.30 Elo
Average error: +/- 35.58 Elo
K = (average error)*[sqrt(n)] = 496.82
Elo interval: ] -62.64, 8.52[
---------------------------------------
Number of games of the match: 195
Score: 46.15 %
Elo rating difference: -26.78 Elo
Draw ratio: 47.18 %
*********************************************************
Standard deviation: 5.0717 % of the points of the match.
*********************************************************
Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.
-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------
LOS (taking into account draws) is always calculated, if possible.
LOS (not taking into account draws) is only calculated if wins + loses < 16001.
LOS (average value) is calculated only when LOS (not taking into account draws) is calculated.
______________________________________________
LOS: 6.86 % (taking into account draws).
LOS: 7.05 % (not taking into account draws).
LOS: 6.95 % (average value).
______________________________________________
These values of LOS are rounded up to 0.01%
End of the calculations. Approximated elapsed time: 54 ms.
Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
Regarding my other programme, I have just done little cosmetic changes (like use a LOS value rounded up to 0.01% and print this rounded value, and rename 'confidence' to 'LOS' in the code). In this match, Houdini should score 106 points out of 195 (with the model I use) for getting a minimum LOS of 95%:
Code: Select all
Minimum_score_for_no_regression, ® 2012.
Calculation of the minimum score for no regression (i.e. negative Elo gain) in a match between two engines:
Write down the number of games of the match (it must be a positive integer, up to 1073741823):
195
Write down the draw ratio (in percentage):
47.1794871794871795
Write down the likelihood of superiority (in percentage) between 75% and 99.9% (LOS will be rounded up to 0.01%):
95
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
Theoretical minimum score for no regression: 54.2510 %
Theoretical standard deviation in this case: 4.2510 %
Minimum number of won points for the engine in this match: 106.0 points.
Minimum Elo advantage, which is also the negative part of the error bar:
30.3663 Elo (for a LOS value of 95.00 %).
End of the calculations. Approximated elapsed time: 20 ms.
Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
If you do not mind and I do not forget it, I will upload my two programmes in the topic where you will post the end of this match, so then everybody who want can use my programmes at their risk. These programmes can be more user friendly if they had their GUIs, but of course it is an impossible task for me.
Please keep up your good work and sorry for this long post.
Regards from Spain.
Ajedrecista.