Houdini vs Rainbow Limited- AN EPIC STRUGGLE!

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Houdini vs Rainbow Limited- AN EPIC STRUGGLE!

Post by geots »

Houdini 2.0c x64 vs Rainbow Limited- beta 2


There is no way I would stop this match on purpose- which being run in Fritz 13 gui means I have no access to the database of games at this time. This control of 5'+5" obviously goes quite a bit faster, and a while after posting the update last night with Houdini having a 2 game lead- I checked the match again. Houdini had gone on a mini-run of sorts, and upped the lead from 2 games to 10 games! And almost 24 hours later- it still remains at 10 games. As in the other match, Houdini takes a lead and it seems to hold there. I would love to see the PGNs and the games where Houdini went from a 2 game lead to a 10 game lead- but not possible at this time. At least not that I know of.


Intel i5 w/4TCs

Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

5'+5"
Match=500 games


[thru game 175]

Code: Select all

Houdini 2.0c x64           +20    +51/-41/=83   52.86%   92.5/175                                 
Rainbow Limited- beta 2    -20    +41/-51/=83   47.14%   82.5/175


When I run matches, I don't have favorites. All I want to see is an exciting match- which means the closer the better. So naturally I am pulling for Rainbow to even this thing up. On the one hand- I think he is quite capable of it. But OTOH- Houdini is just so damn strong- that it is just really difficult to deal with him in OTB play. But we have 325 games remaining- and a lot of things can happen in that many games.




Again- we shall see,

george
User avatar
Ajedrecista
Posts: 1971
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Houdini vs. Rainbow Limited - AN EPIC STRUGGLE!

Post by Ajedrecista »

Hi George!
geots wrote:Houdini 2.0c x64 vs Rainbow Limited- beta 2


There is no way I would stop this match on purpose- which being run in Fritz 13 gui means I have no access to the database of games at this time. This control of 5'+5" obviously goes quite a bit faster, and a while after posting the update last night with Houdini having a 2 game lead- I checked the match again. Houdini had gone on a mini-run of sorts, and upped the lead from 2 games to 10 games! And almost 24 hours later- it still remains at 10 games. As in the other match, Houdini takes a lead and it seems to hold there. I would love to see the PGNs and the games where Houdini went from a 2 game lead to a 10 game lead- but not possible at this time. At least not that I know of.


Intel i5 w/4TCs

Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

5'+5"
Match=500 games


[thru game 175]

Code: Select all

Houdini 2.0c x64           +20    +51/-41/=83   52.86%   92.5/175                                 
Rainbow Limited- beta 2    -20    +41/-51/=83   47.14%   82.5/175


When I run matches, I don't have favorites. All I want to see is an exciting match- which means the closer the better. So naturally I am pulling for Rainbow to even this thing up. On the one hand- I think he is quite capable of it. But OTOH- Houdini is just so damn strong- that it is just really difficult to deal with him in OTB play. But we have 325 games remaining- and a lot of things can happen in that many games.




Again- we shall see,

george
Very interesting match, the same as the other one that you are running in parallel. Congratulations.

I have been waiting a little for posting my results of LOS and error bars until you have a decent amount of games. Of course I could wait until the end of the match, but I can not wait more!

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

51

Write down the number of loses:

41

Write down the number of draws:

83

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

(Only 1, 2 and 3-sigma confidence error bars are calculated, if possible).

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:     19.88 Elo

Lower rating difference:    0.87 Elo
Upper rating difference:   39.00 Elo

Lower bound uncertainty:  -19.01 Elo
Upper bound uncertainty:   19.12 Elo
Average error:        +/-  19.07 Elo

K = (average error)*[sqrt(n)] =  252.21

Elo interval: ]   0.87,   39.00[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:     19.88 Elo

Lower rating difference:  -18.13 Elo
Upper rating difference:   58.36 Elo

Lower bound uncertainty:  -38.01 Elo
Upper bound uncertainty:   38.49 Elo
Average error:        +/-  38.25 Elo

K = (average error)*[sqrt(n)] =  505.96

Elo interval: ] -18.13,   58.36[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:     19.88 Elo

Lower rating difference:  -37.24 Elo
Upper rating difference:   78.09 Elo

Lower bound uncertainty:  -57.11 Elo
Upper bound uncertainty:   58.22 Elo
Average error:        +/-  57.67 Elo

K = (average error)*[sqrt(n)] =  762.85

Elo interval: ] -37.24,   78.09[
---------------------------------------

Number of games of the match:                175
Score: 52.86 %
Elo rating difference:   19.88 Elo
Draw ratio: 47.43 %

**********************************************
1 sigma:  2.7320 % of the points of the match.
2 sigma:  5.4639 % of the points of the match.
3 sigma:  8.1959 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS:  85.22 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time:  46 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
I slightly improved this programme (mainly cosmetic changes in extreme cases) with some 'go to' commands for skip a part of code when necessary, especially with very low number of games where LOS was not calculated... although with such low number of games, my result must not be taken seriously; with these 175 games, my LOS value is more less correct (I guess an absolute error of 0.2% at maximum). So, around -20 ± 38 Elo with 2-sigma confidence: Rainbow is in the fight! Also, a LOS value of ~ 85% is not conclusive.

Regarding my other programme, I have done huge improvements over the past days, mainly optimizing internal loops with the initial conditions... now, for the same internal accuracy (or even a little more now, but in no case is noticeable in the output), it takes now around 25 times less time! Here is an example with this match:

Code: Select all

Minimum_score_for_no_regression, ® 2012.

 Calculation of the minimum score for no regression (i.e. negative Elo gain) in a match between two engines:

 Write down the number of games of the match (it must be a positive integer, up to 1073741823):

175

Write down the draw ratio (in percentage):

47.4285714285714286

Write down the confidence level (in percentage) between 75% and 99.9%:

95

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

Theoretical minimum score for no regression: 54.4732 %
Theoretical standard deviation in this case:  4.4732 %

Minimum number of won points for the engine in this match:        95.5 points.

Minimum Elo advantage, which is also the negative part of the error bar:
 31.8545 Elo

End of the calculations. Approximated elapsed time:  19 ms.

Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
Typical times are around 20 ms in my computer, where they were around 515 ms in the past! The development of these two programmes should be finished now. Houdini should be winning with 95.5 points out of 175 for ensuring a LOS value of 95% or slightly more... it is a little far, but only a little! There is still fight.

Regarding Rainbow, is it a new branch of 40x engine? Please keep up the good work.

Regards from Spain.

Ajedrecista.
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: Houdini vs. Rainbow Limited - AN EPIC STRUGGLE!

Post by geots »

Ajedrecista wrote:Hi George!
geots wrote:Houdini 2.0c x64 vs Rainbow Limited- beta 2


There is no way I would stop this match on purpose- which being run in Fritz 13 gui means I have no access to the database of games at this time. This control of 5'+5" obviously goes quite a bit faster, and a while after posting the update last night with Houdini having a 2 game lead- I checked the match again. Houdini had gone on a mini-run of sorts, and upped the lead from 2 games to 10 games! And almost 24 hours later- it still remains at 10 games. As in the other match, Houdini takes a lead and it seems to hold there. I would love to see the PGNs and the games where Houdini went from a 2 game lead to a 10 game lead- but not possible at this time. At least not that I know of.


Intel i5 w/4TCs

Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

5'+5"
Match=500 games


[thru game 175]

Code: Select all

Houdini 2.0c x64           +20    +51/-41/=83   52.86%   92.5/175                                 
Rainbow Limited- beta 2    -20    +41/-51/=83   47.14%   82.5/175


When I run matches, I don't have favorites. All I want to see is an exciting match- which means the closer the better. So naturally I am pulling for Rainbow to even this thing up. On the one hand- I think he is quite capable of it. But OTOH- Houdini is just so damn strong- that it is just really difficult to deal with him in OTB play. But we have 325 games remaining- and a lot of things can happen in that many games.




Again- we shall see,

george
Very interesting match, the same as the other one that you are running in parallel. Congratulations.

I have been waiting a little for posting my results of LOS and error bars until you have a decent amount of games. Of course I could wait until the end of the match, but I can not wait more!

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

51

Write down the number of loses:

41

Write down the number of draws:

83

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

(Only 1, 2 and 3-sigma confidence error bars are calculated, if possible).

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:     19.88 Elo

Lower rating difference:    0.87 Elo
Upper rating difference:   39.00 Elo

Lower bound uncertainty:  -19.01 Elo
Upper bound uncertainty:   19.12 Elo
Average error:        +/-  19.07 Elo

K = (average error)*[sqrt(n)] =  252.21

Elo interval: ]   0.87,   39.00[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:     19.88 Elo

Lower rating difference:  -18.13 Elo
Upper rating difference:   58.36 Elo

Lower bound uncertainty:  -38.01 Elo
Upper bound uncertainty:   38.49 Elo
Average error:        +/-  38.25 Elo

K = (average error)*[sqrt(n)] =  505.96

Elo interval: ] -18.13,   58.36[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:     19.88 Elo

Lower rating difference:  -37.24 Elo
Upper rating difference:   78.09 Elo

Lower bound uncertainty:  -57.11 Elo
Upper bound uncertainty:   58.22 Elo
Average error:        +/-  57.67 Elo

K = (average error)*[sqrt(n)] =  762.85

Elo interval: ] -37.24,   78.09[
---------------------------------------

Number of games of the match:                175
Score: 52.86 %
Elo rating difference:   19.88 Elo
Draw ratio: 47.43 %

**********************************************
1 sigma:  2.7320 % of the points of the match.
2 sigma:  5.4639 % of the points of the match.
3 sigma:  8.1959 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS:  85.22 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time:  46 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
I slightly improved this programme (mainly cosmetic changes in extreme cases) with some 'go to' commands for skip a part of code when necessary, especially with very low number of games where LOS was not calculated... although with such low number of games, my result must not be taken seriously; with these 175 games, my LOS value is more less correct (I guess an absolute error of 0.2% at maximum). So, around -20 ± 38 Elo with 2-sigma confidence: Rainbow is in the fight! Also, a LOS value of ~ 85% is not conclusive.

Regarding my other programme, I have done huge improvements over the past days, mainly optimizing internal loops with the initial conditions... now, for the same internal accuracy (or even a little more now, but in no case is noticeable in the output), it takes now around 25 times less time! Here is an example with this match:

Code: Select all

Minimum_score_for_no_regression, ® 2012.

 Calculation of the minimum score for no regression (i.e. negative Elo gain) in a match between two engines:

 Write down the number of games of the match (it must be a positive integer, up to 1073741823):

175

Write down the draw ratio (in percentage):

47.4285714285714286

Write down the confidence level (in percentage) between 75% and 99.9%:

95

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

Theoretical minimum score for no regression: 54.4732 %
Theoretical standard deviation in this case:  4.4732 %

Minimum number of won points for the engine in this match:        95.5 points.

Minimum Elo advantage, which is also the negative part of the error bar:
 31.8545 Elo

End of the calculations. Approximated elapsed time:  19 ms.

Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
Typical times are around 20 ms in my computer, where they were around 515 ms in the past! The development of these two programmes should be finished now. Houdini should be winning with 95.5 points out of 175 for ensuring a LOS value of 95% or slightly more... it is a little far, but only a little! There is still fight.

Regarding Rainbow, is it a new branch of 40x engine? Please keep up the good work.

Regards from Spain.

Ajedrecista.


Your figures are way over my head- but that doesn't matter. What I love is looking at your results. I don't have to know anything about the methods you use to enjoy your results. I look forward to it every match like this I run. I have been hoping you would respond soon. As to your question about Rainbow being possibly a branch of this or that- when I say I think you know who the author is- I'm fairly sure I am answering your question.
:wink: :wink:


Thanks Jesus,

george
carldaman
Posts: 2283
Joined: Sat Jun 02, 2012 2:13 am

Re: Houdini vs. Rainbow Limited - AN EPIC STRUGGLE!

Post by carldaman »

Hi George,

Will you make the games available for download after the match is over?

Thanks,
Carl
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: Houdini vs. Rainbow Limited - AN EPIC STRUGGLE!

Post by geots »

carldaman wrote:Hi George,

Will you make the games available for download after the match is over?

Thanks,
Carl

Hi Carl. This is strictly up to the programmer- not me. But I would say it is not likely. But there is precedent. If I am wrong, I feel quite sure Ingo will correct me. But in his IPON testing, I don't think he EVER makes the pgns available to the public. Which is his choice, and I don't have a problem with it. As for me' I always make them available unless the author requests that I don't.


Best,

george
Robert Flesher
Posts: 1280
Joined: Tue Aug 18, 2009 3:06 am

Re: Houdini vs. Rainbow Limited - AN EPIC STRUGGLE!

Post by Robert Flesher »

geots wrote:
carldaman wrote:Hi George,

Will you make the games available for download after the match is over?

Thanks,
Carl

Hi Carl. This is strictly up to the programmer- not me. But I would say it is not likely. But there is precedent. If I am wrong, I feel quite sure Ingo will correct me. But in his IPON testing, I don't think he EVER makes the pgns available to the public. Which is his choice, and I don't have a problem with it. As for me' I always make them available unless the author requests that I don't.


Best,

george

Hello George, this is a very interesting match. In your opinion why would an author of a chess engine not want the pgn of the games shown? This seems absurd to me and begs the question? What does the author have to hide?