UPDATE NO. 3- Tues.- for Houdini & Rainbow!

geots · Post by **geots** » Tue Jul 10, 2012 3:31 pm

Houdini 2.0c x64 vs Rainbow UNLimited

This match could get interesting fast if Rainbow gets a few wins in a row. Tho he still remains down- even by another game or 2- he has knocked the hell out of Houdini's elo lead. Eaten 13 points off the lead since the last update.

I know- but is the glass half-full or half-empty?!

Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
5'+5"
Match=500 games

[thru game 195]

Code: Select all

Houdini 2.0c x64     +27    +59/-44/=92   53.80%   105.0/195
Rainbow UNLimited    -27    +44/-59/=92   46.20%    90.0/195

Rainbow needs a mini-run of sorts. He has plenty of time- if he has the smarts.

And tomorrow-

george

Ajedrecista · Post by **Ajedrecista** » Tue Jul 10, 2012 5:44 pm

Hi George!

geots wrote:Houdini 2.0c x64 vs Rainbow UNLimited

This match could get interesting fast if Rainbow gets a few wins in a row. Tho he still remains down- even by another game or 2- he has knocked the hell out of Houdini's elo lead. Eaten 13 points off the lead since the last update.

I know- but is the glass half-full or half-empty?!

Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
5'+5"
Match=500 games

[thru game 195]
Code: Select all
Houdini 2.0c x64     +27    +59/-44/=92   53.80%   105.0/195
Rainbow UNLimited    -27    +44/-59/=92   46.20%    90.0/195
Rainbow needs a mini-run of sorts. He has plenty of time- if he has the smarts.

And tomorrow-

george

Houdini is a hard nut to crack, it is not a surprise.

I have done many improvements in my programme LOS_and_Elo_uncertainties_calculator: I have rewritten a big part of its code for allowing to enter any confidence interval (which will be later rounded up to 0.01%) between 65% and 99.9% (they were chosen by random), giving the programme greater versatility than before, when it only could calculate its things for 1, 2 and 3-sigma confidence (~ 68.27%, ~ 95.45% and ~ 99.73% confidence); now, it also calculates sometimes the LOS value in the way proposed by Rémi Coulom in the last equation of this post, that does not take into account draws. This calculation is very slow (in comparison with the approach of the normal distribution that takes into account draws) when the number of wins and loses is very high, so I do not calculate it if wins + loses > 16000 (for example, if wins + loses = 20000, I had problems, maybe overflows, and I did not want to deal with them). But those values will differ less and less when the number of wins and loses increases (I suppose that it is due to the central limit theorem), so it is enough with one value. If wins + loses < 16001, a comparison can be made between the two obtained LOS values. It is very possible that some bugs remain (introduced in this rewrite), although I have fix many of them. Here are the results of my programme for this match (up to 195 games), from Rainbow POV:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins (up to 1825361100):

44

Write down the number of loses (up to 1825361100):

59

Write down the number of draws (up to 2147483646):

92

 Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%):

95

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

---------------------------------------
Elo interval for 95.00 % confidence:

Elo rating difference:    -26.78 Elo

Lower rating difference:  -62.64 Elo
Upper rating difference:    8.52 Elo

Lower bound uncertainty:  -35.86 Elo
Upper bound uncertainty:   35.30 Elo
Average error:        +/-  35.58 Elo

K = (average error)*[sqrt(n)] =  496.82

Elo interval: ] -62.64,    8.52[
---------------------------------------

Number of games of the match:       195
Score: 46.15 %
Elo rating difference:    -26.78 Elo
Draw ratio: 47.18 %

*********************************************************
Standard deviation:  5.0717 % of the points of the match.
*********************************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS (taking into account draws) is always calculated, if possible.

LOS (not taking into account draws) is only calculated if wins + loses < 16001.

LOS (average value) is calculated only when LOS (not taking into account draws) is calculated.
______________________________________________

LOS:   6.86 % (taking into account draws).
LOS:   7.05 % (not taking into account draws).
LOS:   6.95 % (average value).
______________________________________________

These values of LOS are rounded up to 0.01%

End of the calculations. Approximated elapsed time:   54 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.

Regarding my other programme, I have just done little cosmetic changes (like use a LOS value rounded up to 0.01% and print this rounded value, and rename 'confidence' to 'LOS' in the code). In this match, Houdini should score 106 points out of 195 (with the model I use) for getting a minimum LOS of 95%:

Code: Select all

Minimum_score_for_no_regression, ® 2012.

 Calculation of the minimum score for no regression (i.e. negative Elo gain) in a match between two engines:

 Write down the number of games of the match (it must be a positive integer, up to 1073741823):

195

Write down the draw ratio (in percentage):

47.1794871794871795

 Write down the likelihood of superiority (in percentage) between 75% and 99.9% (LOS will be rounded up to 0.01%):

95

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

Theoretical minimum score for no regression: 54.2510 %
Theoretical standard deviation in this case:  4.2510 %

Minimum number of won points for the engine in this match:       106.0 points.

Minimum Elo advantage, which is also the negative part of the error bar:
 30.3663 Elo (for a LOS value of 95.00 %).

End of the calculations. Approximated elapsed time:  20 ms.

Thanks for using Minimum_score_for_no_regression. Press Enter to exit.

If you do not mind and I do not forget it, I will upload my two programmes in the topic where you will post the end of this match, so then everybody who want can use my programmes at their risk. These programmes can be more user friendly if they had their GUIs, but of course it is an impossible task for me.

Please keep up your good work and sorry for this long post.

Regards from Spain.

Ajedrecista.

geots · Post by **geots** » Tue Jul 10, 2012 6:27 pm

Ajedrecista wrote:Hi George!
geots wrote:Houdini 2.0c x64 vs Rainbow UNLimited

This match could get interesting fast if Rainbow gets a few wins in a row. Tho he still remains down- even by another game or 2- he has knocked the hell out of Houdini's elo lead. Eaten 13 points off the lead since the last update.

I know- but is the glass half-full or half-empty?!

Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
5'+5"
Match=500 games

[thru game 195]
Code: Select all
Houdini 2.0c x64     +27    +59/-44/=92   53.80%   105.0/195
Rainbow UNLimited    -27    +44/-59/=92   46.20%    90.0/195
Rainbow needs a mini-run of sorts. He has plenty of time- if he has the smarts.

And tomorrow-

george
Houdini is a hard nut to crack, it is not a surprise.

I have done many improvements in my programme LOS_and_Elo_uncertainties_calculator: I have rewritten a big part of its code for allowing to enter any confidence interval (which will be later rounded up to 0.01%) between 65% and 99.9% (they were chosen by random), giving the programme greater versatility than before, when it only could calculate its things for 1, 2 and 3-sigma confidence (~ 68.27%, ~ 95.45% and ~ 99.73% confidence); now, it also calculates sometimes the LOS value in the way proposed by Rémi Coulom in the last equation of this post, that does not take into account draws. This calculation is very slow (in comparison with the approach of the normal distribution that takes into account draws) when the number of wins and loses is very high, so I do not calculate it if wins + loses > 16000 (for example, if wins + loses = 20000, I had problems, maybe overflows, and I did not want to deal with them). But those values will differ less and less when the number of wins and loses increases (I suppose that it is due to the central limit theorem), so it is enough with one value. If wins + loses < 16001, a comparison can be made between the two obtained LOS values. It is very possible that some bugs remain (introduced in this rewrite), although I have fix many of them. Here are the results of my programme for this match (up to 195 games), from Rainbow POV:
Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins (up to 1825361100):

44

Write down the number of loses (up to 1825361100):

59

Write down the number of draws (up to 2147483646):

92

 Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%):

95

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

---------------------------------------
Elo interval for 95.00 % confidence:

Elo rating difference:    -26.78 Elo

Lower rating difference:  -62.64 Elo
Upper rating difference:    8.52 Elo

Lower bound uncertainty:  -35.86 Elo
Upper bound uncertainty:   35.30 Elo
Average error:        +/-  35.58 Elo

K = (average error)*[sqrt(n)] =  496.82

Elo interval: ] -62.64,    8.52[
---------------------------------------

Number of games of the match:       195
Score: 46.15 %
Elo rating difference:    -26.78 Elo
Draw ratio: 47.18 %

*********************************************************
Standard deviation:  5.0717 % of the points of the match.
*********************************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS (taking into account draws) is always calculated, if possible.

LOS (not taking into account draws) is only calculated if wins + loses < 16001.

LOS (average value) is calculated only when LOS (not taking into account draws) is calculated.
______________________________________________

LOS:   6.86 % (taking into account draws).
LOS:   7.05 % (not taking into account draws).
LOS:   6.95 % (average value).
______________________________________________

These values of LOS are rounded up to 0.01%

End of the calculations. Approximated elapsed time:   54 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
Regarding my other programme, I have just done little cosmetic changes (like use a LOS value rounded up to 0.01% and print this rounded value, and rename 'confidence' to 'LOS' in the code). In this match, Houdini should score 106 points out of 195 (with the model I use) for getting a minimum LOS of 95%:
Code: Select all
Minimum_score_for_no_regression, ® 2012.

 Calculation of the minimum score for no regression (i.e. negative Elo gain) in a match between two engines:

 Write down the number of games of the match (it must be a positive integer, up to 1073741823):

195

Write down the draw ratio (in percentage):

47.1794871794871795

 Write down the likelihood of superiority (in percentage) between 75% and 99.9% (LOS will be rounded up to 0.01%):

95

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

Theoretical minimum score for no regression: 54.2510 %
Theoretical standard deviation in this case:  4.2510 %

Minimum number of won points for the engine in this match:       106.0 points.

Minimum Elo advantage, which is also the negative part of the error bar:
 30.3663 Elo (for a LOS value of 95.00 %).

End of the calculations. Approximated elapsed time:  20 ms.

Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
If you do not mind and I do not forget it, I will upload my two programmes in the topic where you will post the end of this match, so then everybody who want can use my programmes at their risk. These programmes can be more user friendly if they had their GUIs, but of course it is an impossible task for me.

Please keep up your good work and sorry for this long post.

Regards from Spain.

Ajedrecista.

It is fine with me, Jesus. You can upload whenever you choose to. And your threads are NEVER too long. I don't know about the others- but I feel sure they all enjoy learning from you as much as I do.

One mistake on my part. Cheating Rainbow out of a point. I said it had knocked 13 elo from Houdini's lead since the last update, but 41-27 obviously is not 13.

And I am heading to get some sleep- and I just checked the match first. Rainbow has moved from 15 games behind to 13 games behind, and the elo diff. has dropped from the 27 elo in this update down to 22 elo right now.

After 8 hours or so when I wake up- I may post another update- tho generally I'm not big on posting 2 different ones on the same day. Just depends on the number of games added, etc. But if Rainbow can just hold Houdini's lead at 13 right now- the extra games will bring the elo difference down more still. Problem so far for RainBow has been when he makes up 2 or 3 games on the lead, he can't draw a few and then add a few more before Houdini does. He needs to decrease Houdini's lead and hold it there- not letting him turn around and get it right back.

We shall see-

george

UPDATE NO. 3- Tues.- for Houdini & Rainbow!

UPDATE NO. 3- Tues.- for Houdini & Rainbow!

Re: UPDATE NO. 3 - Tues. - for Houdini & Rainbow!

Re: UPDATE NO. 3 - Tues. - for Houdini & Rainbow!