2- 500 Game Matches Cut Short- RESULTS & REASONS!

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

2- 500 Game Matches Cut Short- RESULTS & REASONS!

Post by geots »

Houdini 2.0c x64 vs Rainbow 1.0 beta

I thought there was a possibility that if I gave Rainbow another generic book and a gui other than Fritz- it might possibly make a difference. But at the 240 game mark- it was obvious to me that Houdini was not going to be caught in this match. Core time is valuable to me- so I called the coroner:


Intel i5 w/4TCs
Shredder 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
HS-Book 2.0 bkt. w/12-move limit
10'+10"

Match=240 game stoppage*

Code: Select all

Houdini 2.0c x64    +26    +71/-53/=116   53.75%   129.0/240
Rainbow 1.0 beta    -26    +53/-71/=116   46.25%   111.0/240   

So goes this match- now to the next one:




Houdini 2.0c x64 vs Rainbow Limited- beta 2


Now this was the match where I supposedly put it all on the line. And to be honest- I was mistaken. I thought "Limited- beta 2" had a better than even chance of taking Houdini 2.0c. I was wrong- another call to the coroner:


Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
5'+5"

Match=330 game stoppage*

Code: Select all

Houdini 2.0c x64           +28    +101/-74/=155   54.09%   178.5/330
Rainbow Limited- beta 2    -28    +74/-101/=155   45.91%   151.5/330

And if you will notice, after 240 games and 330 games- the elo differences in both matches are pretty much the same.

So this brings us up to date- and here is where hopefully there may be a surprise. Notice I said "may". Do not misunderstand me- my life does not hinge on beating Houdini. Rather I want to provide you with the most exciting match possible. And what could be more exciting than an opponent taking Houdini right down to the wire in a 300 or 400 game match. I would consider an opponent that could "just keep it very interesting" a success.

Which brings us to the next subject. Stay tuned- don't leave us yet!!



george
User avatar
Ajedrecista
Posts: 2178
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: 2- 500 Game Matches Cut Short - RESULTS & REASONS!

Post by Ajedrecista »

Hi George!
geots wrote:Houdini 2.0c x64 vs Rainbow 1.0 beta

I thought there was a possibility that if I gave Rainbow another generic book and a gui other than Fritz- it might possibly make a difference. But at the 240 game mark- it was obvious to me that Houdini was not going to be caught in this match. Core time is valuable to me- so I called the coroner:


Intel i5 w/4TCs
Shredder 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
HS-Book 2.0 bkt. w/12-move limit
10'+10"

Match=240 game stoppage*

Code: Select all

Houdini 2.0c x64    +26    +71/-53/=116   53.75%   129.0/240
Rainbow 1.0 beta    -26    +53/-71/=116   46.25%   111.0/240   

So goes this match- now to the next one:




Houdini 2.0c x64 vs Rainbow Limited- beta 2


Now this was the match where I supposedly put it all on the line. And to be honest- I was mistaken. I thought "Limited- beta 2" had a better than even chance of taking Houdini 2.0c. I was wrong- another call to the coroner:


Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
5'+5"

Match=330 game stoppage*

Code: Select all

Houdini 2.0c x64           +28    +101/-74/=155   54.09%   178.5/330
Rainbow Limited- beta 2    -28    +74/-101/=155   45.91%   151.5/330

And if you will notice, after 240 games and 330 games- the elo differences in both matches are pretty much the same.

So this brings us up to date- and here is where hopefully there may be a surprise. Notice I said "may". Do not misunderstand me- my life does not hinge on beating Houdini. Rather I want to provide you with the most exciting match possible. And what could be more exciting than an opponent taking Houdini right down to the wire in a 300 or 400 game match. I would consider an opponent that could "just keep it very interesting" a success.

Which brings us to the next subject. Stay tuned- don't leave us yet!!



george
Thank you very much for your efforts. Good try by Rainbow! But Houdini is still the king. I ran LOS_and_Elo_uncertainties_calculator and obtained these results:

For Rainbow 1.0 beta, after 240 games:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

53

Write down the number of loses:

71

Write down the number of draws:

116

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time
of the calculations:

3

(Only 1, 2 and 3-sigma confidence error bars are calculated, if possible).

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:    -26.11 Elo

Lower rating difference:  -42.30 Elo
Upper rating difference:  -10.03 Elo

Lower bound uncertainty:  -16.19 Elo
Upper bound uncertainty:   16.08 Elo
Average error:        +/-  16.13 Elo

K = (average error)*[sqrt(n)] =  249.96

Elo interval: ] -42.30,  -10.03[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:    -26.11 Elo

Lower rating difference:  -58.67 Elo
Upper rating difference:    6.01 Elo

Lower bound uncertainty:  -32.57 Elo
Upper bound uncertainty:   32.11 Elo
Average error:        +/-  32.34 Elo

K = (average error)*[sqrt(n)] =  501.02

Elo interval: ] -58.67,    6.01[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:    -26.11 Elo

Lower rating difference:  -75.31 Elo
Upper rating difference:   22.07 Elo

Lower bound uncertainty:  -49.21 Elo
Upper bound uncertainty:   48.18 Elo
Average error:        +/-  48.69 Elo

K = (average error)*[sqrt(n)] =  754.31

Elo interval: ] -75.31,   22.07[
---------------------------------------

Number of games of the match:                240
Score: 46.25 %
Elo rating difference:  -26.11 Elo
Draw ratio: 48.33 %

**********************************************
1 sigma:  2.3072 % of the points of the match.
2 sigma:  4.6145 % of the points of the match.
3 sigma:  6.9217 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01
Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS:   5.20 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time:  49 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
More less -26 ± 32 Elo with ~ 95.45% confidence; the LOS value calculated by my programme is ~ 5.2%. Using the model of not counting draws for LOS, proposed by Rémi Coulom in this post (the last equation), I get a LOS value of ~ 5.35% using Derive 6 (I used Rémi's method for check the validity of my calculation).

------------------------

For Rainbow Limited - beta 2, after 330 games:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

74

Write down the number of loses:

101

Write down the number of draws:

155

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time
of the calculations:

3

(Only 1, 2 and 3-sigma confidence error bars are calculated, if possible).

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:    -28.49 Elo

Lower rating difference:  -42.48 Elo
Upper rating difference:  -14.60 Elo

Lower bound uncertainty:  -13.99 Elo
Upper bound uncertainty:   13.89 Elo
Average error:        +/-  13.94 Elo

K = (average error)*[sqrt(n)] =  253.24

Elo interval: ] -42.48,  -14.60[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:    -28.49 Elo

Lower rating difference:  -56.60 Elo
Upper rating difference:   -0.75 Elo

Lower bound uncertainty:  -28.11 Elo
Upper bound uncertainty:   27.74 Elo
Average error:        +/-  27.93 Elo

K = (average error)*[sqrt(n)] =  507.31

Elo interval: ] -56.60,   -0.75[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:    -28.49 Elo

Lower rating difference:  -70.91 Elo
Upper rating difference:   13.10 Elo

Lower bound uncertainty:  -42.42 Elo
Upper bound uncertainty:   41.59 Elo
Average error:        +/-  42.01 Elo

K = (average error)*[sqrt(n)] =  763.08

Elo interval: ] -70.91,   13.10[
---------------------------------------

Number of games of the match:                330
Score: 45.91 %
Elo rating difference:  -28.49 Elo
Draw ratio: 46.97 %

**********************************************
1 sigma:  1.9917 % of the points of the match.
2 sigma:  3.9833 % of the points of the match.
3 sigma:  5.9750 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01
Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS:   2.00 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time:  51 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
More less -28 ± 28 Elo with ~ 95.45% confidence; the LOS value calculated by my programme is ~ 2%. Using Rémi's method again, I get a LOS value of ~ 2.08%. IMHO both models give similar results. :)

------------------------

I wish you very good luck with your new project. I hope that more people can help you! I also wish good luck to programmers. I will stay tuned to your posts.

Regards from Spain.

Ajedrecista.