Houdini & Rainbow Ltd.- beta 2: TOE TO TOE!

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Houdini & Rainbow Ltd.- beta 2: TOE TO TOE!

Post by geots »

Houdini 2.0c x64 vs Rainbow Limited- beta 2


Houdini has been able to slightly increase his lead, and at some point in time Limited- beta 2 needs to make a run at him. "Holding his own" and "playing even with him" won't get the job done now. If "Limited" wants to have any chance at all- he is going to have to soon make a run at Houdini. He can't afford to get any further behind.


Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

5'+5"
Match=500 games



[after 245 games]

Code: Select all

Houdini 2.0c x64           +23    +74/-58/=113   53.50%   130.5/245
Rainbow Limited- beta 2    -23    +58/-74/=113   46.70%   114.5/245


Close enough to call this the halfway mark. Hopefully Limited- beta 2 can begin to make this a close match again. It's not out of reach yet.




Back soon-

george
User avatar
Ajedrecista
Posts: 2124
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Houdini & Rainbow Ltd. - beta 2: TOE TO TOE!

Post by Ajedrecista »

Hello!
geots wrote:Houdini 2.0c x64 vs Rainbow Limited- beta 2


Houdini has been able to slightly increase his lead, and at some point in time Limited- beta 2 needs to make a run at him. "Holding his own" and "playing even with him" won't get the job done now. If "Limited" wants to have any chance at all- he is going to have to soon make a run at Houdini. He can't afford to get any further behind.


Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

5'+5"
Match=500 games



[after 245 games]

Code: Select all

Houdini 2.0c x64           +23    +74/-58/=113   53.50%   130.5/245
Rainbow Limited- beta 2    -23    +58/-74/=113   46.70%   114.5/245


Close enough to call this the halfway mark. Hopefully Limited- beta 2 can begin to make this a close match again. It's not out of reach yet.




Back soon-

george
Beta 2 seems is holding a little more than beta 1, although they are playing in different time controls. Here are my results regarding error bars and LOS:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

74

Write down the number of loses:

58

Write down the number of draws:

113

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

(Only 1, 2 and 3-sigma confidence error bars are calculated, if possible).

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:     22.72 Elo

Lower rating difference:    6.46 Elo
Upper rating difference:   39.08 Elo

Lower bound uncertainty:  -16.26 Elo
Upper bound uncertainty:   16.36 Elo
Average error:        +/-  16.31 Elo

K = (average error)*[sqrt(n)] =  255.29

Elo interval: ]   6.46,   39.08[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:     22.72 Elo

Lower rating difference:   -9.77 Elo
Upper rating difference:   55.62 Elo

Lower bound uncertainty:  -32.49 Elo
Upper bound uncertainty:   32.89 Elo
Average error:        +/-  32.69 Elo

K = (average error)*[sqrt(n)] =  511.72

Elo interval: ]  -9.77,   55.62[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:     22.72 Elo

Lower rating difference:  -26.04 Elo
Upper rating difference:   72.40 Elo

Lower bound uncertainty:  -48.77 Elo
Upper bound uncertainty:   49.68 Elo
Average error:        +/-  49.22 Elo

K = (average error)*[sqrt(n)] =  770.48

Elo interval: ] -26.04,   72.40[
---------------------------------------

Number of games of the match:                245
Score: 53.27 %
Elo rating difference:   22.72 Elo
Draw ratio: 46.12 %

**********************************************
1 sigma:  2.3354 % of the points of the match.
2 sigma:  4.6708 % of the points of the match.
3 sigma:  7.0063 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS:  91.90 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time:  57 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
After 245 games, Houdini is in the lead with ~ +23 ± 33 Elo (with ~ 95.45% confidence, more less 21 out of 22 times) and a LOS value of 91.9% more less, which is not very significant IMHO. Anyway, I think that Houdini will win this match, and this is not a surprise at all.

With the model I use, the score of Houdini for ensuring a 95% of LOS should be:

Code: Select all

Minimum_score_for_no_regression, ® 2012.

 Calculation of the minimum score for no regression (i.e. negative Elo gain) in a match between two engines:

 Write down the number of games of the match (it must be a positive integer, up to 1073741823):

245

Write down the draw ratio (in percentage):

46.1224489795

Write down the confidence level (in percentage) between 75% and 99.9%:

95

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

Theoretical minimum score for no regression: 53.8356 %
Theoretical standard deviation in this case:  3.8356 %

Minimum number of won points for the engine in this match:       132.0 points.

Minimum Elo advantage, which is also the negative part of the error bar:
 26.9982 Elo

End of the calculations. Approximated elapsed time:  19 ms.

Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
The score should be 132/245, which is very near to the actual 130.5/245; running again Minimum_score_for_no_regression with 500 games and a draw ratio of 46%, a LOS of 95% (in a one-sided test) by Houdini will be reached with a score of 263.5/500 = 52.7% (it implies an advantage of ~ 19 Elo, with error bars of around ± 22 or ± 23 Elo with 95% confidence in a two-sided test). So, it looks reasonably that Houdini is stronger that this beta 2 IMHO. Thanks for running this match!

Regards from Spain.

Ajedrecista.