Houdini & Rainbow- FINAL UPDATE? 500/500

geots · Post by **geots** » Mon Jul 16, 2012 11:47 am

Houdini 2.0c x64 vs Rainbow UNLimited

Things would be a lot simpler if the last 40 games since the previous update had not taken a different turn. Instead of a 26 game and +20 elo lead, Rainbow makes a late run and it ends a bit differently.

Which puts me in a bad position. I couldn't get out of the bed all day and night with my back acting up- and post it when I wanted to. And I couldn't reach anyone for advice- so I'm sorta stuck. I gotta decide if I think the chance that Rainbow could make up this difference and possibly win the match is a credible enough thought to carry it to 1000 games or not.

Perhaps I will get some advice from Jesus.

Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
5'+5"
Match=500 games

[thru 500 games]

Code: Select all

Houdini 2.0c x64     +15    +140/-119/=241   52.10%   260.5/500
Rainbow UNLimited    -15    +119/-140/=241   47.90%   239.5/500

Well, I can sleep a couple hours and: decide to stop it here or run another 500 and go to 1000 or maybe have a massive heart attack while asleep and not have to worry with the crap any longer.

Cornered- from inception to resurrection. (I could blame it on my mother and father who abandoned me when I was 3 days old. You think?)

Ho-hum,

george

Uri Blass · Post by **Uri Blass** » Mon Jul 16, 2012 12:07 pm

I think that it is not very interesting testing rainbow against a weak version of houdini

1.5 seems to be better than 2c when we do not talk about blitz and I did not buy 2c exactly for the reason that I saw no convincing evidence that it is stronger than the free version 1.5 .

From the CEGT 40/20 rating list

1 Houdini 1.5a x64 1CPU 3013 14 14 1698 68.6% 2877
2 Houdini 2.0c x64 1CPU 3002 15 15 1293 63.1% 2909

From the CCRL 40/40 rating list

Houdini 1.5a 64-bit 3156 +17 −17 64.1% −93.3 42.4% 1169
Houdini 2.0c 64-bit 3144 +16 −16 67.3% −121.8 36.2% 1334

If you use houdini2 than houdini2s is significantly stronger than 2c if to believe other people and I do not have houdini2(maybe not at very fast time control that Robebrt houdart tests but at longer time control).

The results between 2s and 2c at blitz 5+3 after 1001 games are

+318,=473,-210 for 2s so I guess it is better to stop to test 2c and start to test 2s.

http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=25226

Ajedrecista · Post by **Ajedrecista** » Mon Jul 16, 2012 12:46 pm

Hello George:

geots wrote:Houdini 2.0c x64 vs Rainbow UNLimited

Things would be a lot simpler if the last 40 games since the previous update had not taken a different turn. Instead of a 26 game and +20 elo lead, Rainbow makes a late run and it ends a bit differently.

Which puts me in a bad position. I couldn't get out of the bed all day and night with my back acting up- and post it when I wanted to. And I couldn't reach anyone for advice- so I'm sorta stuck. I gotta decide if I think the chance that Rainbow could make up this difference and possibly win the match is a credible enough thought to carry it to 1000 games or not.

Perhaps I will get some advice from Jesus.

Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
5'+5"
Match=500 games

[thru 500 games]
Code: Select all
Houdini 2.0c x64     +15    +140/-119/=241   52.10%   260.5/500
Rainbow UNLimited    -15    +119/-140/=241   47.90%   239.5/500
Well, I can sleep a couple hours and: decide to stop it here or run another 500 and go to 1000 or maybe have a massive heart attack while asleep and not have to worry with the crap any longer.

Cornered- from inception to resurrection. (I could blame it on my mother and father who abandoned me when I was 3 days old. You think?)

Ho-hum,

george

My advices are meaningless because my knowledge in statistics is limited. Running my programmes:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins (up to 1825361100):

119

Write down the number of loses (up to 1825361100):

140

Write down the number of draws (up to 2147483646):

241

 Write down the confidence level (in percentage) between 65% and 99.9% (it will
be rounded up to 0.01%):

95

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time
of the calculations:

3

---------------------------------------
Elo interval for 95.00 % confidence:

Elo rating difference:    -14.60 Elo

Lower rating difference:  -36.61 Elo
Upper rating difference:    7.29 Elo

Lower bound uncertainty:  -22.01 Elo
Upper bound uncertainty:   21.89 Elo
Average error:        +/-  21.95 Elo

K = (average error)*[sqrt(n)] =  490.79

Elo interval: ] -36.61,    7.29[
---------------------------------------

Number of games of the match:       500
Score: 47.90 %
Elo rating difference:  -14.60 Elo
Draw ratio: 48.20 %

*********************************************************
Standard deviation:  3.1489 % of the points of the match.
*********************************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01
Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS (taking into account draws) is always calculated, if possible.

LOS (not taking into account draws) is only calculated if wins + loses < 16001.

LOS (average value) is calculated only when LOS (not taking into account draws)
is calculated.
______________________________________________

LOS:   9.56 % (taking into account draws).
LOS:   9.63 % (not taking into account draws).
LOS:   9.60 % (average value).
______________________________________________

These values of LOS are rounded up to 0.01%

End of the calculations. Approximated elapsed time:   58 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.

From Rainbow POV, after 500 games there is a difference of more less -15 ± 22 Elo with 95% confidence. LOS is around 9.6%, which is not very significant in Houdini favour IMHO, although it has the upper hand.

-----------------------

Code: Select all

Minimum_score_for_no_regression, ® 2012.

 Calculation of the minimum score for no regression (i.e. negative Elo gain) in
a match between two engines:

 Write down the number of games of the match (it must be a positive integer, up
to 1073741823):

500

Write down the draw ratio (in percentage):

48.2

 Write down the likelihood of superiority (in percentage) between 75% and 99.9%
(LOS will be rounded up to 0.01%):

97.5

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time
of the calculations:

3
_______________________________________________________________________________

Theoretical minimum score for no regression: 53.1422 %
Theoretical standard deviation in this case:  3.1422 %

Minimum number of won points for the engine in this match:       266.0 points.

Minimum Elo advantage, which is also the negative part of the error bar:
 22.2663 Elo (for a LOS value of 97.50 %).

A LOS value of 97.50 % is equivalent to 95.00 % confidence in a two-sided test.
_______________________________________________________________________________

End of the calculations. Approximated elapsed time:  17 ms.

Thanks for using Minimum_score_for_no_regression. Press Enter to exit.

Houdini should have scored at least 266 points (53.2% of 500 games) for getting a LOS > 97.5% = 39/40; it is a bit far from this score.

-----------------------

Regarding the minimum number of games for ensure a 14.6 Elo difference with a given LOS:

Code: Select all

LOS > 90%:   n >  930 games.
LOS > 95%:   n > 1532 games.
LOS > 97.5%: n > 2176 games.
LOS > 99%:   n > 3064 games.
LOS > 99.5%: n > 3756 games.
LOS > 99.9%: n > 5406 games.

I hope that these numbers are correct. Maybe 1000 games are not enough, but 2000. Thousands of games will need insane amounts of time, so 500 would be fine... Houdini can do a slaughter in the next 500 games! Or Rainbow can hold... who knows? I am sorry because I do not know which is the best choice.

-----------------------

Uri Blass wrote:I think that it is not very interesting testing rainbow against a weak version of houdini

1.5 seems to be better than 2c when we do not talk about blitz and I did not buy 2c exactly for the reason that I saw no convincing evidence that it is stronger than the free version 1.5 .

From the CEGT 40/20 rating list

1 Houdini 1.5a x64 1CPU 3013 14 14 1698 68.6% 2877
2 Houdini 2.0c x64 1CPU 3002 15 15 1293 63.1% 2909

From the CCRL 40/40 rating list

Houdini 1.5a 64-bit 3156 +17 −17 64.1% −93.3 42.4% 1169
Houdini 2.0c 64-bit 3144 +16 −16 67.3% −121.8 36.2% 1334

If you use houdini2 than houdini2s is significantly stronger than 2c if to believe other people and I do not have houdini2(maybe not at very fast time control that Robebrt houdart tests but at longer time control).

The results between 2s and 2c at blitz 5+3 after 1001 games are

+318,=473,-210 for 2s so I guess it is better to stop to test 2c and start to test 2s.

http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=25226

@Uri: I get around +37.6 ± 15.7 Elo for 95% confidence and LOS values of almost 100%. If it is true, then s settings would be clearly stronger than default ones. Bearing in mind that Robert tests with thousands of games, it looks strange to me than he has not reached similar settings; I also read from him that he tested various settings (s, z and T4) and they are not significant improvements. Here is the post:

Re: Houdini 2.0 : Settings (Z, T3, Baracuda, Baracuda T3)

Who knows?

-----------------------

Going off-topic: I promised to upload my three programmes when this 500-game match finished. Here is the download link:

Three_Fortran_programmes.rar (0.64 MB)

I hope that they are useful and easy to use.

Regards from Spain.

Ajedrecista.

gerold · Post by **gerold** » Mon Jul 16, 2012 3:02 pm

Is Rainbow a clone of Houdini. If so you could run 2000 games and still come up with the same results.

geots · Post by **geots** » Mon Jul 16, 2012 7:26 pm

Uri Blass wrote:I think that it is not very interesting testing rainbow against a weak version of houdini

1.5 seems to be better than 2c when we do not talk about blitz and I did not buy 2c exactly for the reason that I saw no convincing evidence that it is stronger than the free version 1.5 .

From the CEGT 40/20 rating list

1 Houdini 1.5a x64 1CPU 3013 14 14 1698 68.6% 2877
2 Houdini 2.0c x64 1CPU 3002 15 15 1293 63.1% 2909

From the CCRL 40/40 rating list

Houdini 1.5a 64-bit 3156 +17 −17 64.1% −93.3 42.4% 1169
Houdini 2.0c 64-bit 3144 +16 −16 67.3% −121.8 36.2% 1334

If you use houdini2 than houdini2s is significantly stronger than 2c if to believe other people and I do not have houdini2(maybe not at very fast time control that Robebrt houdart tests but at longer time control).

The results between 2s and 2c at blitz 5+3 after 1001 games are

+318,=473,-210 for 2s so I guess it is better to stop to test 2c and start to test 2s.

http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=25226

Strange. Based on the fact that CCRL 40/40 in 1cpu says that Komodo 3 64bit is stronger than Critter 1.4 64bit I would have to do my own testing there. And that is version 3!. A typo or testing error? So don't assume that any rating list is foolproof. (I did not say it is written in stone CCRL is wrong about K3 and Critter 1.4. Just that those results seem so screwy to me, that if cared, I would have to run my own tests.)

Secondly, I cannot help where you got your info- but if it shows that Houdini 2.0c is weaker at blitz than 1.5a- your info is all bad. I really think your first mistake is judging an engine and quoting results when you have never owned it or run one game with it. That begins your problems- and they just magnify from there. Garbage in-garbage out.

gts

geots · Post by **geots** » Mon Jul 16, 2012 7:57 pm

Ajedrecista wrote:Hello George:
geots wrote:Houdini 2.0c x64 vs Rainbow UNLimited

Things would be a lot simpler if the last 40 games since the previous update had not taken a different turn. Instead of a 26 game and +20 elo lead, Rainbow makes a late run and it ends a bit differently.

Which puts me in a bad position. I couldn't get out of the bed all day and night with my back acting up- and post it when I wanted to. And I couldn't reach anyone for advice- so I'm sorta stuck. I gotta decide if I think the chance that Rainbow could make up this difference and possibly win the match is a credible enough thought to carry it to 1000 games or not.

Perhaps I will get some advice from Jesus.

Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
5'+5"
Match=500 games

[thru 500 games]
Code: Select all
Houdini 2.0c x64     +15    +140/-119/=241   52.10%   260.5/500
Rainbow UNLimited    -15    +119/-140/=241   47.90%   239.5/500
Well, I can sleep a couple hours and: decide to stop it here or run another 500 and go to 1000 or maybe have a massive heart attack while asleep and not have to worry with the crap any longer.

Cornered- from inception to resurrection. (I could blame it on my mother and father who abandoned me when I was 3 days old. You think?)

Ho-hum,

george
My advices are meaningless because my knowledge in statistics is limited. Running my programmes:
Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins (up to 1825361100):

119

Write down the number of loses (up to 1825361100):

140

Write down the number of draws (up to 2147483646):

241

 Write down the confidence level (in percentage) between 65% and 99.9% (it will
be rounded up to 0.01%):

95

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time
of the calculations:

3

---------------------------------------
Elo interval for 95.00 % confidence:

Elo rating difference:    -14.60 Elo

Lower rating difference:  -36.61 Elo
Upper rating difference:    7.29 Elo

Lower bound uncertainty:  -22.01 Elo
Upper bound uncertainty:   21.89 Elo
Average error:        +/-  21.95 Elo

K = (average error)*[sqrt(n)] =  490.79

Elo interval: ] -36.61,    7.29[
---------------------------------------

Number of games of the match:       500
Score: 47.90 %
Elo rating difference:  -14.60 Elo
Draw ratio: 48.20 %

*********************************************************
Standard deviation:  3.1489 % of the points of the match.
*********************************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01
Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS (taking into account draws) is always calculated, if possible.

LOS (not taking into account draws) is only calculated if wins + loses < 16001.

LOS (average value) is calculated only when LOS (not taking into account draws)
is calculated.
______________________________________________

LOS:   9.56 % (taking into account draws).
LOS:   9.63 % (not taking into account draws).
LOS:   9.60 % (average value).
______________________________________________

These values of LOS are rounded up to 0.01%

End of the calculations. Approximated elapsed time:   58 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
From Rainbow POV, after 500 games there is a difference of more less -15 ± 22 Elo with 95% confidence. LOS is around 9.6%, which is not very significant in Houdini favour IMHO, although it has the upper hand.

-----------------------
Code: Select all
Minimum_score_for_no_regression, ® 2012.

 Calculation of the minimum score for no regression (i.e. negative Elo gain) in
a match between two engines:

 Write down the number of games of the match (it must be a positive integer, up
to 1073741823):

500

Write down the draw ratio (in percentage):

48.2

 Write down the likelihood of superiority (in percentage) between 75% and 99.9%
(LOS will be rounded up to 0.01%):

97.5

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time
of the calculations:

3
_______________________________________________________________________________

Theoretical minimum score for no regression: 53.1422 %
Theoretical standard deviation in this case:  3.1422 %

Minimum number of won points for the engine in this match:       266.0 points.

Minimum Elo advantage, which is also the negative part of the error bar:
 22.2663 Elo (for a LOS value of 97.50 %).

A LOS value of 97.50 % is equivalent to 95.00 % confidence in a two-sided test.
_______________________________________________________________________________

End of the calculations. Approximated elapsed time:  17 ms.

Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
Houdini should have scored at least 266 points (53.2% of 500 games) for getting a LOS > 97.5% = 39/40; it is a bit far from this score.

-----------------------

Regarding the minimum number of games for ensure a 14.6 Elo difference with a given LOS:
Code: Select all
LOS > 90%:   n >  930 games.
LOS > 95%:   n > 1532 games.
LOS > 97.5%: n > 2176 games.
LOS > 99%:   n > 3064 games.
LOS > 99.5%: n > 3756 games.
LOS > 99.9%: n > 5406 games.
I hope that these numbers are correct. Maybe 1000 games are not enough, but 2000. Thousands of games will need insane amounts of time, so 500 would be fine... Houdini can do a slaughter in the next 500 games! Or Rainbow can hold... who knows? I am sorry because I do not know which is the best choice.

-----------------------

Uri Blass wrote:I think that it is not very interesting testing rainbow against a weak version of houdini

1.5 seems to be better than 2c when we do not talk about blitz and I did not buy 2c exactly for the reason that I saw no convincing evidence that it is stronger than the free version 1.5 .

From the CEGT 40/20 rating list

1 Houdini 1.5a x64 1CPU 3013 14 14 1698 68.6% 2877
2 Houdini 2.0c x64 1CPU 3002 15 15 1293 63.1% 2909

From the CCRL 40/40 rating list

Houdini 1.5a 64-bit 3156 +17 −17 64.1% −93.3 42.4% 1169
Houdini 2.0c 64-bit 3144 +16 −16 67.3% −121.8 36.2% 1334

If you use houdini2 than houdini2s is significantly stronger than 2c if to believe other people and I do not have houdini2(maybe not at very fast time control that Robebrt houdart tests but at longer time control).

The results between 2s and 2c at blitz 5+3 after 1001 games are

+318,=473,-210 for 2s so I guess it is better to stop to test 2c and start to test 2s.

http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=25226
@Uri: I get around +37.6 ± 15.7 Elo for 95% confidence and LOS values of almost 100%. If it is true, then s settings would be clearly stronger than default ones. Bearing in mind that Robert tests with thousands of games, it looks strange to me than he has not reached similar settings; I also read from him that he tested various settings (s, z and T4) and they are not significant improvements. Here is the post:

Re: Houdini 2.0 : Settings (Z, T3, Baracuda, Baracuda T3)

Who knows?

-----------------------

Going off-topic: I promised to upload my three programmes when this 500-game match finished. Here is the download link:

Three_Fortran_programmes.rar (0.64 MB)

I hope that they are useful and easy to use.

Regards from Spain.

Ajedrecista.

Jesus, I am going to be brutally honest with you- because that is the only way I can operate. This thread of yours means to me that I should go out and buy you a trophy of some kind for the most in-depth and most likely true, analysis of any match by any statistics guy in the history of computer chess. As you said- it may not all be perfect- but I cannot answer to that. What I can attest to is that you have not gone the proverbial "extra mile" to help me here, but rather in comparison an extra 5 miles. And all for the reason of trying to help me. And for that I will be eternally grateful.- Plus, with no qualms at all, I have complete faith and trust in your analysis.

And I do thank you for the link/links you promised. I think anyone who delves in and deals with these areas would be a fool not to check it out.

I want to say in closing that you can get hung up in a huge mess if you try to follow a lot of Uri's lines of thought. But he is a good guy and means no harm. And usually his ideas have a lot of merit.

Again- thank you so much.

george

PS: I am going to let these 500 be the end of this match. Bigger fish to fry, and you will really be interested in this!!! TRUST ME!

geots · Post by **geots** » Mon Jul 16, 2012 8:07 pm

Since you are the only other person breathing who knows the true authors of Engine 40x and Rainbow- I would tell you now my next plans. It's wild! And I could get some set-up advice from you which I need- but I have no email address to reach you, and I am just sick of PMs.

Best,

george

geots · Post by **geots** » Mon Jul 16, 2012 8:13 pm

gerold wrote:Is Rainbow a clone of Houdini. If so you could run 2000 games and still come up with the same results.

No Gerold, it would not- I don't think- be possible for it or Engine 40x to be further from Houdini than they both are. I can firmly 100% with surety tell you they are both in the top 5 of "most original engines". That is a promise from me to you.

Best,

george

ernest · Post by **ernest** » Tue Jul 17, 2012 1:27 am

Uri Blass wrote:testing rainbow against a weak version of houdini

If you use houdini2 than houdini2s is significantly stronger than 2c if to believe other people

Uri, I have seen you in better inspired posts...

Uri Blass · Post by **Uri Blass** » Tue Jul 17, 2012 6:44 am

geots wrote:
Uri Blass wrote:I think that it is not very interesting testing rainbow against a weak version of houdini

1.5 seems to be better than 2c when we do not talk about blitz and I did not buy 2c exactly for the reason that I saw no convincing evidence that it is stronger than the free version 1.5 .

From the CEGT 40/20 rating list

1 Houdini 1.5a x64 1CPU 3013 14 14 1698 68.6% 2877
2 Houdini 2.0c x64 1CPU 3002 15 15 1293 63.1% 2909

From the CCRL 40/40 rating list

Houdini 1.5a 64-bit 3156 +17 −17 64.1% −93.3 42.4% 1169
Houdini 2.0c 64-bit 3144 +16 −16 67.3% −121.8 36.2% 1334

If you use houdini2 than houdini2s is significantly stronger than 2c if to believe other people and I do not have houdini2(maybe not at very fast time control that Robebrt houdart tests but at longer time control).

The results between 2s and 2c at blitz 5+3 after 1001 games are

+318,=473,-210 for 2s so I guess it is better to stop to test 2c and start to test 2s.

http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=25226

Strange. Based on the fact that CCRL 40/40 in 1cpu says that Komodo 3 64bit is stronger than Critter 1.4 64bit I would have to do my own testing there. And that is version 3!. A typo or testing error? So don't assume that any rating list is foolproof. (I did not say it is written in stone CCRL is wrong about K3 and Critter 1.4. Just that those results seem so screwy to me, that if cared, I would have to run my own tests.)

Secondly, I cannot help where you got your info- but if it shows that Houdini 2.0c is weaker at blitz than 1.5a- your info is all bad. I really think your first mistake is judging an engine and quoting results when you have never owned it or run one game with it. That begins your problems- and they just magnify from there. Garbage in-garbage out.

gts

If we talk about blitz 40/4 than houdini2c is probably slightly stronger than houdini1.5a based on the CEGT list

9 Houdini 2.0 x64 1CPU 3028 11 11 2800 73.8% 2848 31.8%
13 Houdini 1.5 x64 1CPU 3019 9 9 4950 72.3% 2853 30.3%

40/4 means 6 seconds per move

At 40/20(30 seconds per move) it seems to be the opposite

Your time control is 5+5 is in average between 6 seconds per move and 30 seconds per move so I can say that the difference between 2c and 1.5c is probably a small difference.

Note also that
2s is probably stronger than both 2c and 1.5a

Houdini & Rainbow- FINAL UPDATE? 500/500

Houdini & Rainbow- FINAL UPDATE? 500/500

Re: Houdini & Rainbow- FINAL UPDATE? 500/500

Re: Houdini & Rainbow - FINAL UPDATE? 500/500.

Re: Houdini & Rainbow- FINAL UPDATE? 500/500

Re: Houdini & Rainbow- FINAL UPDATE? 500/500

Re: Houdini & Rainbow - FINAL UPDATE? 500/500.

Re: Houdini & Rainbow - FINAL UPDATE? 500/500.

Re: Houdini & Rainbow- FINAL UPDATE? 500/500

Re: Houdini & Rainbow- FINAL UPDATE? 500/500

Re: Houdini & Rainbow- FINAL UPDATE? 500/500