Hello George:
geots wrote:Houdini 2.0c x64 vs Rainbow UNLimited
Even with what I knew, after the 1st half of the match I figured Houdini would stretch the lead to bigger numbers- as he always does. I got blindsided. I didn't have a clue it was coming- and neither, I would imagine, did Houdini.
Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
5'+5"
Match=500 games
[thru 360 games]
Code: Select all
Houdini 2.0c x64 +14 +99/-86/=175 52.00% 186.5/360
Rainbow UNLimited -14 +86/-99/=175 48.00% 173.5/360
It looks like Rainbow woke up and decided it was time to make a run at Houdini. No crystal ball- but I would be just a wee bit surprised if the run was over. It's just that right now he is playing so much better chess than Houdini. He is giving "getting outplayed" a whole new meaning.
He took the 20 game Houdini lead and closed it to 13 games.
He took Houdini's elo difference and cut it right down the midddle- from +28 to +14.
With 140 games remaining- and the match running as we speak- I hope you don't mind if I run back and check the games.
george
Life surprises you. Here are my error bars and LOS for this match, up to 360 games, from Rainbow POV:
Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012.
----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------
(The input and output data is referred to the first engine).
Please write down non-negative integers.
Write down the number of wins (up to 1825361100):
86
Write down the number of loses (up to 1825361100):
99
Write down the number of draws (up to 2147483646):
175
Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%):
95
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
---------------------------------------
Elo interval for 95.00 % confidence:
Elo rating difference: -12.55 Elo
Lower rating difference: -38.40 Elo
Upper rating difference: 13.16 Elo
Lower bound uncertainty: -25.85 Elo
Upper bound uncertainty: 25.71 Elo
Average error: +/- 25.78 Elo
K = (average error)*[sqrt(n)] = 489.07
Elo interval: ] -38.40, 13.16[
---------------------------------------
Number of games of the match: 360
Score: 48.19 %
Elo rating difference: -12.55 Elo
Draw ratio: 48.61 %
*********************************************************
Standard deviation: 3.6979 % of the points of the match.
*********************************************************
Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.
-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------
LOS (taking into account draws) is always calculated, if possible.
LOS (not taking into account draws) is only calculated if wins + loses < 16001.
LOS (average value) is calculated only when LOS (not taking into account draws) is calculated.
______________________________________________
LOS: 16.93 % (taking into account draws).
LOS: 17.03 % (not taking into account draws).
LOS: 16.98 % (average value).
______________________________________________
These values of LOS are rounded up to 0.01%
End of the calculations. Approximated elapsed time: 57 ms.
Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
More less -13 ± 26 Elo with 95% confidence, and a LOS value of around 17%. Running Minimum_score_for_no_regression with a LOS value of 97.5%, Houdini should have scored 193.5 (+106 -79 =175) for being
sure with 97.5% LOS (which means wrong 1/40 of the times).
------------------------
Going off-topic: I have written a twin programme of Minimum_score_for_no_regression: it is Minimum_number_of_games, and its function is enough clear with this name. It is just the inverse problem of its twin programme.
Around 2/3 of the code are a direct copy of Minimum_score_for_no_regression. It works in this way: one must input the desired Elo gain in a match between two engines, the wanted likelihood of superiority and the programme will calculate in a very short time (circa 18 ms in my PC) the minimum number of needed games for ensure this Elo gain with that LOS value. I got the idea from
this web, posted by Fermín Serrano in
this post. The minimum number of games that my programme calculates is always even for giving both engines the same chances with white and black.
I use LOS of a one-sided test; the equivalent confidence interval of a two-sided test is, as I understand (in percentage): (confidence) = 2·LOS - 100; LOS = 50 + (confidence)/2. These formulæ are included in the Readme file. Taking the example 3 of Mizar chess engine web: LOS = 50 + 95/2 = 97.5%; (upper limit) - (lower limit) = 4%: scores are 52% - 48%, which means ~ 13.9 Elo difference. The minimum number of games should be 2401 more less, with that curious method. I get 2400 games with Minimum_number_of_games:
Code: Select all
Minimum_number_of_games, ® 2012.
Calculation of the minimum number of games in a match between two engines to ensure an Elo gain with a given LOS value:
Write down the wanted Elo gain between 0.1 and 40 Elo (it will be rounded up to 0.01 Elo):
13.9
Write down the likelihood of superiority (in percentage) between 90% and 99.9%(LOS will be rounded up to 0.01%):
97.5
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
_______________________________________________________________________________
Score for a wanted gain of 13.90 Elo: 51.9993 %
Standard deviation for 97.50 % of LOS: 1.9993 %
A LOS value of 97.50 % is equivalent to 95.00 % confidence in a two-sided test.
Minimum number of needed games: 2400 games.
_______________________________________________________________________________
End of the calculations. Approximated elapsed time: 18 ms.
Thanks for using Minimum_number_of_games. Press Enter to exit.
With the current statistics of this Houdini vs. Rainbow match, the minimum number of games should be 2944 (12.55 Elo difference and 97.5% LOS of a one-sided test = 95% confidence in a two-sided test).
I got rid of the draw ratio because I think that I get more reliable results without it (this decision implies less code!

It is easier for me). So, results seem to be more less right. I will upload my three programmes (LOS_and_Elo_uncertainties_calculator, Minimum_number_of_games and Minimum_score_for_no_regression) when this 500-game match finishes... I hope that they do not have nasty bugs!
Thanks for this match! Houdini 3 is coming...
Regards from Spain.
Ajedrecista.