TUESDAY UPDATE- 40x(2) v Houdini 2.0c!

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

TUESDAY UPDATE- 40x(2) v Houdini 2.0c!

Post by geots »

Houdini 2.0c x64 v Engine 40x(2) - UPDATE 6


This update takes us thru game 606. Meaning 83 games have been added since yesterday's update. Houdini increased his lead by 5 games- to +50 games.

Since I could not access the databases, and it would have been 3 crosstables anyway- I went ahead and computed the elo difference here myself.


Code: Select all

Houdini 2.0c x64    +29   +188/-138/=280   54.13%   328.0/606  
Engine 40x(2)       -29   +138/-188/=280   45.87%   278.0/606


Interesting that before I started this match- 606 games and counting ago- I told a couple people that with an error factor of + or - 5 elo- what the elo difference of these 2 engines were. With a couple elo change in a particular direction in the next 394 games- I could easily be dead-on the mark. But that is certainly nothing to brag about- it isn't like this is brain surgery.



Until tomorrow-

george
User avatar
Ajedrecista
Posts: 2164
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: TUESDAY UPDATE - 40x(2) vs. Houdini 2.0c!

Post by Ajedrecista »

Hi George:
geots wrote:Houdini 2.0c x64 v Engine 40x(2) - UPDATE 6


This update takes us thru game 606. Meaning 83 games have been added since yesterday's update. Houdini increased his lead by 5 games- to +50 games.

Since I could not access the databases, and it would have been 3 crosstables anyway- I went ahead and computed the elo difference here myself.


Code: Select all

Houdini 2.0c x64    +29   +188/-138/=280   54.13%   328.0/606  
Engine 40x(2)       -29   +138/-188/=280   45.87%   278.0/606


Interesting that before I started this match- 606 games and counting ago- I told a couple people that with an error factor of + or - 5 elo- what the elo difference of these 2 engines were. With a couple elo change in a particular direction in the next 394 games- I could easily be dead-on the mark. But that is certainly nothing to brag about- it isn't like this is brain surgery.



Until tomorrow-

george
I guess that you thought Houdini ahead 40x(2) in (+25 ± 5 ) Elo, that is, between +20 Elo and +30 Elo. If I am right, those Elo advantages will be reached between 529 - 471 and 543 - 457 (Houdini wins, of course). In other words, Houdini should score between 201/394 (~ 51.02%) and 215/394 (~ 54.57%) to be between +20 Elo and +30 Elo ahead after 1000 games... who knows?

Here are my error bars for this match update:

Code: Select all

Elo_uncertainties_calculator, ® 2012.

Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

188

Write down the number of loses:

138

Write down the number of draws:

280

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:     28.73 Elo

Lower rating difference:   18.40 Elo
Upper rating difference:   39.12 Elo

Lower bound uncertainty:  -10.33 Elo
Upper bound uncertainty:   10.39 Elo
Average error:        +/-  10.36 Elo

K = (average error)*[sqrt(n)] =  255.02

Elo interval: ]  18.40,   39.12[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:     28.73 Elo

Lower rating difference:    8.10 Elo
Upper rating difference:   49.57 Elo

Lower bound uncertainty:  -20.64 Elo
Upper bound uncertainty:   20.84 Elo
Average error:        +/-  20.74 Elo

K = (average error)*[sqrt(n)] =  510.51

Elo interval: ]   8.10,   49.57[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:     28.73 Elo

Lower rating difference:   -2.19 Elo
Upper rating difference:   60.12 Elo

Lower bound uncertainty:  -30.92 Elo
Upper bound uncertainty:   31.39 Elo
Average error:        +/-  31.15 Elo

K = (average error)*[sqrt(n)] =  766.93

Elo interval: ]  -2.19,   60.12[
---------------------------------------

Number of games of the match:                606
Score: 54.13 %
Elo rating difference:   28.73 Elo
Draw ratio: 46.20 %

**********************************************
1 sigma:  1.4803 % of the points of the match.
2 sigma:  2.9605 % of the points of the match.
3 sigma:  4.4408 % of the points of the match.
**********************************************

End of the calculations.

Thanks for using Elo_uncertainties_calculator. Press Enter to exit.
I have just refined this programme in the last part of the code (2 sigma ~ 2.9605%, etc.) because I realized that:

Code: Select all

2d-4*nint(1d6*sigma,KIND=3)
3d-4*nint(1d6*sigma,KIND=3)
Is not the same as:

Code: Select all

1d-4*nint(2d6*sigma,KIND=3)
1d-4*nint(3d6*sigma,KIND=3)
The correct one is the last code box (that is now in the code of my programme); the first one is a bad rounding (one more!), so I noticed some little strange things; but I think that finally all is OK.

Regarding the minimum score for avoiding negative Elo gains with a given confidence interval, this is what I get for this update (using my imperfect model):

Code: Select all

90%   confidence: 318   points for Houdini.
95%   confidence: 321   points for Houdini.
98%   confidence: 324   points for Houdini.
99%   confidence: 326.5 points for Houdini.
99.5% confidence: 328.5 points for Houdini.
So, Houdini is better with more than 99% confidence and less than 99.5% confidence after these 606 games! I guess that Houdini is too much Houdini...

In other update I posted the following info:

Code: Select all

Write down the confidence level (in percentage) between 75% and 99.9%: 

95 

Calculating... 

Theoretical minimum score for no regression: 53.5564 % 
Theoretical standard deviation in this case:  1.8145 %
This standard deviation is just one standard deviation (roundings included); as 95% confidence is more less 1.96-sigma confidence, you can see that (53.5564 - 50)/1.8145 ~ 1.96... but I should print directly 3.5564% instead of 1.8145% for avoiding confusions, just as I do in Elo_uncertainties_calculator, where I found my last rounding error (my bad), so I go for a very short fix: only two characters (add k*)!

Code: Select all

1d-4*nint(1d6*sigma(5),KIND=3)  ! The one that can bring confusion.

1d-4*nint(1d6*k*sigma(5),KIND=3)  ! The best choice.
Thank you very much for this match. I stay tuned for the next update!

Regards from Spain.

Ajedrecista.
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: TUESDAY UPDATE - 40x(2) vs. Houdini 2.0c!

Post by geots »

Ajedrecista wrote:Hi George:
geots wrote:Houdini 2.0c x64 v Engine 40x(2) - UPDATE 6


This update takes us thru game 606. Meaning 83 games have been added since yesterday's update. Houdini increased his lead by 5 games- to +50 games.

Since I could not access the databases, and it would have been 3 crosstables anyway- I went ahead and computed the elo difference here myself.


Code: Select all

Houdini 2.0c x64    +29   +188/-138/=280   54.13%   328.0/606  
Engine 40x(2)       -29   +138/-188/=280   45.87%   278.0/606


Interesting that before I started this match- 606 games and counting ago- I told a couple people that with an error factor of + or - 5 elo- what the elo difference of these 2 engines were. With a couple elo change in a particular direction in the next 394 games- I could easily be dead-on the mark. But that is certainly nothing to brag about- it isn't like this is brain surgery.



Until tomorrow-

george
I guess that you thought Houdini ahead 40x(2) in (+25 ± 5 ) Elo, that is, between +20 Elo and +30 Elo. If I am right, those Elo advantages will be reached between 529 - 471 and 543 - 457 (Houdini wins, of course). In other words, Houdini should score between 201/394 (~ 51.02%) and 215/394 (~ 54.57%) to be between +20 Elo and +30 Elo ahead after 1000 games... who knows?

Here are my error bars for this match update:

Code: Select all

Elo_uncertainties_calculator, ® 2012.

Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

188

Write down the number of loses:

138

Write down the number of draws:

280

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:     28.73 Elo

Lower rating difference:   18.40 Elo
Upper rating difference:   39.12 Elo

Lower bound uncertainty:  -10.33 Elo
Upper bound uncertainty:   10.39 Elo
Average error:        +/-  10.36 Elo

K = (average error)*[sqrt(n)] =  255.02

Elo interval: ]  18.40,   39.12[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:     28.73 Elo

Lower rating difference:    8.10 Elo
Upper rating difference:   49.57 Elo

Lower bound uncertainty:  -20.64 Elo
Upper bound uncertainty:   20.84 Elo
Average error:        +/-  20.74 Elo

K = (average error)*[sqrt(n)] =  510.51

Elo interval: ]   8.10,   49.57[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:     28.73 Elo

Lower rating difference:   -2.19 Elo
Upper rating difference:   60.12 Elo

Lower bound uncertainty:  -30.92 Elo
Upper bound uncertainty:   31.39 Elo
Average error:        +/-  31.15 Elo

K = (average error)*[sqrt(n)] =  766.93

Elo interval: ]  -2.19,   60.12[
---------------------------------------

Number of games of the match:                606
Score: 54.13 %
Elo rating difference:   28.73 Elo
Draw ratio: 46.20 %

**********************************************
1 sigma:  1.4803 % of the points of the match.
2 sigma:  2.9605 % of the points of the match.
3 sigma:  4.4408 % of the points of the match.
**********************************************

End of the calculations.

Thanks for using Elo_uncertainties_calculator. Press Enter to exit.
I have just refined this programme in the last part of the code (2 sigma ~ 2.9605%, etc.) because I realized that:

Code: Select all

2d-4*nint(1d6*sigma,KIND=3)
3d-4*nint(1d6*sigma,KIND=3)
Is not the same as:

Code: Select all

1d-4*nint(2d6*sigma,KIND=3)
1d-4*nint(3d6*sigma,KIND=3)
The correct one is the last code box (that is now in the code of my programme); the first one is a bad rounding (one more!), so I noticed some little strange things; but I think that finally all is OK.

Regarding the minimum score for avoiding negative Elo gains with a given confidence interval, this is what I get for this update (using my imperfect model):

Code: Select all

90%   confidence: 318   points for Houdini.
95%   confidence: 321   points for Houdini.
98%   confidence: 324   points for Houdini.
99%   confidence: 326.5 points for Houdini.
99.5% confidence: 328.5 points for Houdini.
So, Houdini is better with more than 99% confidence and less than 99.5% confidence after these 606 games! I guess that Houdini is too much Houdini...

In other update I posted the following info:

Code: Select all

Write down the confidence level (in percentage) between 75% and 99.9%: 

95 

Calculating... 

Theoretical minimum score for no regression: 53.5564 % 
Theoretical standard deviation in this case:  1.8145 %
This standard deviation is just one standard deviation (roundings included); as 95% confidence is more less 1.96-sigma confidence, you can see that (53.5564 - 50)/1.8145 ~ 1.96... but I should print directly 3.5564% instead of 1.8145% for avoiding confusions, just as I do in Elo_uncertainties_calculator, where I found my last rounding error (my bad), so I go for a very short fix: only two characters (add k*)!

Code: Select all

1d-4*nint(1d6*sigma(5),KIND=3)  ! The one that can bring confusion.

1d-4*nint(1d6*k*sigma(5),KIND=3)  ! The best choice.
Thank you very much for this match. I stay tuned for the next update!

Regards from Spain.

Ajedrecista.



Thank you for your work. I won't pretend I understand your methods, but I am quite sure it is accurate knowing your ability. I really do appreciate your interest. All I really said before the match that from what I had seen from 40x- I felt like 40x(2), its upgrade- would be 32 elo weaker than Houdini- with a + and - error bar of 5 elo. From my point of view- the biggest question I felt was exactly how much of an improvement 40x(2) would be over 40x. If correct- my observations could easily have been just a lucky guess.

I am not sure if there will be a Wed. update here or not. Windows wanted to do a restart to effectively install its updates, and even tho those security updates can be very important at times- I put him off about as long as I could. I finally clicked on "hold off for 4 more hours" and went to bed. Naturally when I got back no games were playing. I haven't yet checked to see how many have been played since the 606 game mark, but quite likely not enough for an update. I shall see. And again, thank you.


Best,

george