Hello George:
geots wrote:Houdini 2.0c x64 vs Rainbow UNLimited
Things would be a lot simpler if the last 40 games since the previous update had not taken a different turn. Instead of a 26 game and +20 elo lead, Rainbow makes a late run and it ends a bit differently.
Which puts me in a bad position. I couldn't get out of the bed all day and night with my back acting up- and post it when I wanted to. And I couldn't reach anyone for advice- so I'm sorta stuck. I gotta decide if I think the chance that Rainbow could make up this difference and possibly win the match is a credible enough thought to carry it to 1000 games or not.
Perhaps I will get some advice from Jesus.
Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
5'+5"
Match=500 games
[thru 500 games]
Code: Select all
Houdini 2.0c x64 +15 +140/-119/=241 52.10% 260.5/500
Rainbow UNLimited -15 +119/-140/=241 47.90% 239.5/500
Well, I can sleep a couple hours and: decide to stop it here or run another 500 and go to 1000 or maybe have a massive heart attack while asleep and not have to worry with the crap any longer.
Cornered- from inception to resurrection. (I could blame it on my mother and father who abandoned me when I was 3 days old. You think?)
Ho-hum,
george
My advices are meaningless because my knowledge in statistics is limited. Running my programmes:
Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012.
----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------
(The input and output data is referred to the first engine).
Please write down non-negative integers.
Write down the number of wins (up to 1825361100):
119
Write down the number of loses (up to 1825361100):
140
Write down the number of draws (up to 2147483646):
241
Write down the confidence level (in percentage) between 65% and 99.9% (it will
be rounded up to 0.01%):
95
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time
of the calculations:
3
---------------------------------------
Elo interval for 95.00 % confidence:
Elo rating difference: -14.60 Elo
Lower rating difference: -36.61 Elo
Upper rating difference: 7.29 Elo
Lower bound uncertainty: -22.01 Elo
Upper bound uncertainty: 21.89 Elo
Average error: +/- 21.95 Elo
K = (average error)*[sqrt(n)] = 490.79
Elo interval: ] -36.61, 7.29[
---------------------------------------
Number of games of the match: 500
Score: 47.90 %
Elo rating difference: -14.60 Elo
Draw ratio: 48.20 %
*********************************************************
Standard deviation: 3.1489 % of the points of the match.
*********************************************************
Error bars were calculated with two-sided tests; values are rounded up to 0.01
Elo, or 0.01 in the case of K.
-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------
LOS (taking into account draws) is always calculated, if possible.
LOS (not taking into account draws) is only calculated if wins + loses < 16001.
LOS (average value) is calculated only when LOS (not taking into account draws)
is calculated.
______________________________________________
LOS: 9.56 % (taking into account draws).
LOS: 9.63 % (not taking into account draws).
LOS: 9.60 % (average value).
______________________________________________
These values of LOS are rounded up to 0.01%
End of the calculations. Approximated elapsed time: 58 ms.
Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
From Rainbow POV, after 500 games there is a difference of more less -15 ± 22 Elo with 95% confidence. LOS is around 9.6%, which is not very significant in Houdini favour IMHO, although it has the upper hand.
-----------------------
Code: Select all
Minimum_score_for_no_regression, ® 2012.
Calculation of the minimum score for no regression (i.e. negative Elo gain) in
a match between two engines:
Write down the number of games of the match (it must be a positive integer, up
to 1073741823):
500
Write down the draw ratio (in percentage):
48.2
Write down the likelihood of superiority (in percentage) between 75% and 99.9%
(LOS will be rounded up to 0.01%):
97.5
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time
of the calculations:
3
_______________________________________________________________________________
Theoretical minimum score for no regression: 53.1422 %
Theoretical standard deviation in this case: 3.1422 %
Minimum number of won points for the engine in this match: 266.0 points.
Minimum Elo advantage, which is also the negative part of the error bar:
22.2663 Elo (for a LOS value of 97.50 %).
A LOS value of 97.50 % is equivalent to 95.00 % confidence in a two-sided test.
_______________________________________________________________________________
End of the calculations. Approximated elapsed time: 17 ms.
Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
Houdini should have scored at least 266 points (53.2% of 500 games) for getting a LOS > 97.5% = 39/40; it is a bit far from this score.
-----------------------
Regarding the minimum number of games for ensure a 14.6 Elo difference with a given LOS:
Code: Select all
LOS > 90%: n > 930 games.
LOS > 95%: n > 1532 games.
LOS > 97.5%: n > 2176 games.
LOS > 99%: n > 3064 games.
LOS > 99.5%: n > 3756 games.
LOS > 99.9%: n > 5406 games.
I hope that these numbers are correct. Maybe 1000 games are not enough, but 2000. Thousands of games will need insane amounts of time, so 500 would be fine... Houdini can do a slaughter in the next 500 games! Or Rainbow can hold... who knows? I am sorry because I do not know which is the best choice.
-----------------------
Uri Blass wrote:I think that it is not very interesting testing rainbow against a weak version of houdini
1.5 seems to be better than 2c when we do not talk about blitz and I did not buy 2c exactly for the reason that I saw no convincing evidence that it is stronger than the free version 1.5 .
From the CEGT 40/20 rating list
1 Houdini 1.5a x64 1CPU 3013 14 14 1698 68.6% 2877
2 Houdini 2.0c x64 1CPU 3002 15 15 1293 63.1% 2909
From the CCRL 40/40 rating list
Houdini 1.5a 64-bit 3156 +17 −17 64.1% −93.3 42.4% 1169
Houdini 2.0c 64-bit 3144 +16 −16 67.3% −121.8 36.2% 1334
If you use houdini2 than houdini2s is significantly stronger than 2c if to believe other people and I do not have houdini2(maybe not at very fast time control that Robebrt houdart tests but at longer time control).
The results between 2s and 2c at blitz 5+3 after 1001 games are
+318,=473,-210 for 2s so I guess it is better to stop to test 2c and start to test 2s.
http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=25226
@Uri: I get around +37.6 ± 15.7 Elo for 95% confidence and LOS values of almost 100%. If it is true, then s settings would be clearly stronger than default ones. Bearing in mind that Robert tests with thousands of games, it looks strange to me than he has not reached similar settings; I also read from him that he tested various settings (s, z and T4) and they are not significant improvements. Here is the post:
Re: Houdini 2.0 : Settings (Z, T3, Baracuda, Baracuda T3)
Who knows?
-----------------------
Going off-topic: I promised to upload my three programmes when this 500-game match finished. Here is the download link:
Three_Fortran_programmes.rar (0.64 MB)
I hope that they are useful and easy to use.
Regards from Spain.
Ajedrecista.