SAT. UPDATE- 40x(2) v Houdini 2.0c

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

SAT. UPDATE- 40x(2) v Houdini 2.0c

Post by geots »

Houdini 2.0c x64 v Engine 40x(2) - Winding Down!


Another 162 games are added here to the update, taking us thru game 880. That leaves 120 games to go and they are running as we speak. In the last update, I believe Houdini's lead had dropped by 10 games. In this update, the lead is back up by 10 games- to a 49 game lead. Actually that is not a hell of a lot of games- considering 880 have been played. After a slow start- 40x(2) has held his own thru most of the match.


Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

10'+10"
Match=1000 games


Code: Select all

Houdini 2.0c x64    +21    +253/-204/=423   53.00%   464.5/880 
Engine 40x(2)       -21    +204/-253/=423   47.00%   415.5/880 


If not tomorrow, by Monday for sure I should be posting the conclusion of this match. Stay tuned.




george
User avatar
Ajedrecista
Posts: 2164
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: SATURDAY UPDATE - 40x(2) vs. Houdini 2.0c!

Post by Ajedrecista »

Hello:
geots wrote:Houdini 2.0c x64 v Engine 40x(2) - Winding Down!


Another 162 games are added here to the update, taking us thru game 880. That leaves 120 games to go and they are running as we speak. In the last update, I believe Houdini's lead had dropped by 10 games. In this update, the lead is back up by 10 games- to a 49 game lead. Actually that is not a hell of a lot of games- considering 880 have been played. After a slow start- 40x(2) has held his own thru most of the match.


Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

10'+10"
Match=1000 games


Code: Select all

Houdini 2.0c x64    +21    +253/-204/=423   53.00%   464.5/880 
Engine 40x(2)       -21    +204/-253/=423   47.00%   415.5/880 


If not tomorrow, by Monday for sure I should be posting the conclusion of this match. Stay tuned.




george
88% of the match is now completed and error bars are narrowing slowly. This is what I get:

Code: Select all

Elo_uncertainties_calculator, ® 2012.

Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

253

Write down the number of loses:

204

Write down the number of draws:

423

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:     19.37 Elo

Lower rating difference:   10.93 Elo
Upper rating difference:   27.82 Elo

Lower bound uncertainty:   -8.43 Elo
Upper bound uncertainty:    8.45 Elo
Average error:        +/-   8.44 Elo

K = (average error)*[sqrt(n)] =  250.45

Elo interval: ]  10.93,   27.82[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:     19.37 Elo

Lower rating difference:    2.52 Elo
Upper rating difference:   36.31 Elo

Lower bound uncertainty:  -16.85 Elo
Upper bound uncertainty:   16.94 Elo
Average error:        +/-  16.90 Elo

K = (average error)*[sqrt(n)] =  501.20

Elo interval: ]   2.52,   36.31[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:     19.37 Elo

Lower rating difference:   -5.90 Elo
Upper rating difference:   44.84 Elo

Lower bound uncertainty:  -25.27 Elo
Upper bound uncertainty:   25.47 Elo
Average error:        +/-  25.37 Elo

K = (average error)*[sqrt(n)] =  752.56

Elo interval: ]  -5.90,   44.84[
---------------------------------------

Number of games of the match:                880
Score: 52.78 %
Elo rating difference:   19.37 Elo
Draw ratio: 48.07 %

**********************************************
1 sigma:  1.2110 % of the points of the match.
2 sigma:  2.4220 % of the points of the match.
3 sigma:  3.6330 % of the points of the match.
**********************************************

End of the calculations. Approximated elapsed time:  31 ms.

Thanks for using Elo_uncertainties_calculator. Press Enter to exit.
Elo difference is ~ 19 instead of 21 for 880 games; you may thought about the Elo difference in the last 162 games, where 400·log(86/76) ~ 21.47. For 2-sigma confidence (~ 95.45% confidence) error bars are now ± 17 Elo, more less. As you see, I learnt to measure the elapsed time of the calculations of my programme and, up to date, this elapsed time is always 31 ms or 47 ms in my PC... so fast! However, I am almost sure that this time is rounded up to 15 or 16 ms (I mean: that the subroutine I use, CLOCK@, is not of get 25 ms for instance because it goes in steps of 15 ms or 16 ms)... so the elapsed time is approximated.

Regarding the other programme, it is clearly slower due to the internal calculations for getting the correct parameter of the confidence interval... anyway, it takes less than 0.6 seconds in my PC. Here are the minimum scores for Houdini (with 880 games played) for remain over Engine 40x(2), including error bars:

Code: Select all

95%   confidence: 461 points (approximated elapsed time: 515 ms).
97.5% confidence: 464 points (approximated elapsed time: 532 ms).
98%   confidence: 465 points (approximated elapsed time: 547 ms).
So, Houdini is better (with the results of this match up to 880 games) between 97.5% and 98% confidence, using my model, maybe similar to EloSTAT. I notice that the draw ratio has raised over 48% for the first time in the match, if I am not wrong. I stay tuned for the end of this 1000-game match. Thank you very much, George.

Regards from Spain.

Ajedrecista.
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: SATURDAY UPDATE - 40x(2) vs. Houdini 2.0c!

Post by geots »

Ajedrecista wrote:Hello:
geots wrote:Houdini 2.0c x64 v Engine 40x(2) - Winding Down!


Another 162 games are added here to the update, taking us thru game 880. That leaves 120 games to go and they are running as we speak. In the last update, I believe Houdini's lead had dropped by 10 games. In this update, the lead is back up by 10 games- to a 49 game lead. Actually that is not a hell of a lot of games- considering 880 have been played. After a slow start- 40x(2) has held his own thru most of the match.


Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

10'+10"
Match=1000 games


Code: Select all

Houdini 2.0c x64    +21    +253/-204/=423   53.00%   464.5/880 
Engine 40x(2)       -21    +204/-253/=423   47.00%   415.5/880 


If not tomorrow, by Monday for sure I should be posting the conclusion of this match. Stay tuned.




george
88% of the match is now completed and error bars are narrowing slowly. This is what I get:

Code: Select all

Elo_uncertainties_calculator, ® 2012.

Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

253

Write down the number of loses:

204

Write down the number of draws:

423

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:     19.37 Elo

Lower rating difference:   10.93 Elo
Upper rating difference:   27.82 Elo

Lower bound uncertainty:   -8.43 Elo
Upper bound uncertainty:    8.45 Elo
Average error:        +/-   8.44 Elo

K = (average error)*[sqrt(n)] =  250.45

Elo interval: ]  10.93,   27.82[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:     19.37 Elo

Lower rating difference:    2.52 Elo
Upper rating difference:   36.31 Elo

Lower bound uncertainty:  -16.85 Elo
Upper bound uncertainty:   16.94 Elo
Average error:        +/-  16.90 Elo

K = (average error)*[sqrt(n)] =  501.20

Elo interval: ]   2.52,   36.31[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:     19.37 Elo

Lower rating difference:   -5.90 Elo
Upper rating difference:   44.84 Elo

Lower bound uncertainty:  -25.27 Elo
Upper bound uncertainty:   25.47 Elo
Average error:        +/-  25.37 Elo

K = (average error)*[sqrt(n)] =  752.56

Elo interval: ]  -5.90,   44.84[
---------------------------------------

Number of games of the match:                880
Score: 52.78 %
Elo rating difference:   19.37 Elo
Draw ratio: 48.07 %

**********************************************
1 sigma:  1.2110 % of the points of the match.
2 sigma:  2.4220 % of the points of the match.
3 sigma:  3.6330 % of the points of the match.
**********************************************

End of the calculations. Approximated elapsed time:  31 ms.

Thanks for using Elo_uncertainties_calculator. Press Enter to exit.
Elo difference is ~ 19 instead of 21 for 880 games; you may thought about the Elo difference in the last 162 games, where 400·log(86/76) ~ 21.47. For 2-sigma confidence (~ 95.45% confidence) error bars are now ± 17 Elo, more less. As you see, I learnt to measure the elapsed time of the calculations of my programme and, up to date, this elapsed time is always 31 ms or 47 ms in my PC... so fast! However, I am almost sure that this time is rounded up to 15 or 16 ms (I mean: that the subroutine I use, CLOCK@, is not of get 25 ms for instance because it goes in steps of 15 ms or 16 ms)... so the elapsed time is approximated.

Regarding the other programme, it is clearly slower due to the internal calculations for getting the correct parameter of the confidence interval... anyway, it takes less than 0.6 seconds in my PC. Here are the minimum scores for Houdini (with 880 games played) for remain over Engine 40x(2), including error bars:

Code: Select all

95%   confidence: 461 points (approximated elapsed time: 515 ms).
97.5% confidence: 464 points (approximated elapsed time: 532 ms).
98%   confidence: 465 points (approximated elapsed time: 547 ms).
So, Houdini is better (with the results of this match up to 880 games) between 97.5% and 98% confidence, using my model, maybe similar to EloSTAT. I notice that the draw ratio has raised over 48% for the first time in the match, if I am not wrong. I stay tuned for the end of this 1000-game match. Thank you very much, George.

Regards from Spain.

Ajedrecista.



Thanks Jesus. Can you believe Huggins?! A derivative of Houdini! I wonder if he wants ketchup or mustard to put on his hat. :lol: :lol: The end should come tonight or tomorrow. I am fast approaching the time where I will have to close out 1 of the 3 guis running the match as it gets closer to the end. Less than 50 games left.


Best,

george
User avatar
Ajedrecista
Posts: 2164
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: SATURDAY UPDATE - 40x(2) vs. Houdini 2.0c!

Post by Ajedrecista »

Hello:
geots wrote:Thanks Jesus. Can you believe Huggins?! A derivative of Houdini! I wonder if he wants ketchup or mustard to put on his hat. The end should come tonight or tomorrow. I am fast approaching the time where I will have to close out 1 of the 3 guis running the match as it gets closer to the end. Less than 50 games left.


Best,

george
Well, it could be the case, although it is not.

I must say that I was doing things slightly bad with the confidence intervals I computed (no surprise that I went wrong). The results I gave in all your updates were right in the minimum number of points but wrong with the confidence interval, because I was computing two-sided tests where the correct thing were one-sided tests (I am very careless with those things). I explain a little more: where I wrote 95% confidence, really is 97.5% confidence; where I wrote 98% confidence, really is 99% confidence, and so on. In a general case, where I wrote C% confidence, really is (50 + C/2)% confidence. So, I uploaded Minimum_score_for_no_regression and that version is not correct... it is almost correct and people who downloaded it (I counted six downloads as minimum, which is a total success for me) can correct the results with the trick of C and (50 + C/2). Sorry for the inconvenience.

Before today, I looked here and I immediately noticed that issue. Today, thinking a little brought me the reason of the fail. Now, my results match perfectly with the ones found in CPW.

I also solved the timing issue: CLOCK@ seems to give an accuracy of 1/64 of second, that is, 15.625 ms. What I do now is the following: count the number of CPU clocks between the start and the end (using the intrinsic routine CPU_CLOCK@() of Fortran 95), divide by the clock rate of the CPU (which must be input now; in my case: 3 GHz), then round up to milliseconds. It is an ugly method, but it seems that works fine.

So, running again Minimum_scores_for_no_regression:

Code: Select all

98% confidence: 462 points (approximated elapsed time: 514 ms).
99% confidence: 465 points (approximated elapsed time: 525 ms).
So, Houdini is better than 40x(2) with a confidence between 98% and 99% (after those 880 games), which should make sense now with LOS tables.

Regards from Spain.

Ajedrecista.
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: SATURDAY UPDATE - 40x(2) vs. Houdini 2.0c!

Post by geots »

Ajedrecista wrote:Hello:
geots wrote:Thanks Jesus. Can you believe Huggins?! A derivative of Houdini! I wonder if he wants ketchup or mustard to put on his hat. The end should come tonight or tomorrow. I am fast approaching the time where I will have to close out 1 of the 3 guis running the match as it gets closer to the end. Less than 50 games left.


Best,

george
Well, it could be the case, although it is not.

I must say that I was doing things slightly bad with the confidence intervals I computed (no surprise that I went wrong). The results I gave in all your updates were right in the minimum number of points but wrong with the confidence interval, because I was computing two-sided tests where the correct thing were one-sided tests (I am very careless with those things). I explain a little more: where I wrote 95% confidence, really is 97.5% confidence; where I wrote 98% confidence, really is 99% confidence, and so on. In a general case, where I wrote C% confidence, really is (50 + C/2)% confidence. So, I uploaded Minimum_score_for_no_regression and that version is not correct... it is almost correct and people who downloaded it (I counted six downloads as minimum, which is a total success for me) can correct the results with the trick of C and (50 + C/2). Sorry for the inconvenience.

Before today, I looked here and I immediately noticed that issue. Today, thinking a little brought me the reason of the fail. Now, my results match perfectly with the ones found in CPW.

I also solved the timing issue: CLOCK@ seems to give an accuracy of 1/64 of second, that is, 15.625 ms. What I do now is the following: count the number of CPU clocks between the start and the end (using the intrinsic routine CPU_CLOCK@() of Fortran 95), divide by the clock rate of the CPU (which must be input now; in my case: 3 GHz), then round up to milliseconds. It is an ugly method, but it seems that works fine.

So, running again Minimum_scores_for_no_regression:

Code: Select all

98% confidence: 462 points (approximated elapsed time: 514 ms).
99% confidence: 465 points (approximated elapsed time: 525 ms).
So, Houdini is better than 40x(2) with a confidence between 98% and 99% (after those 880 games), which should make sense now with LOS tables.

Regards from Spain.

Ajedrecista.


And thank you again for your interest, time and effort. You are greatly appreciated. Stay close around as we move on to bigger and better things. Plus we still have this one to close out.

george