WCEC- Battle for No.1- Wed.- UPDATE 6!

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

WCEC- Battle for No.1- Wed.- UPDATE 6!

Post by geots »

Houdini 2.0c x64 vs Rainbow 1.0 beta


At the 200 game mark- Houdini has increased his lead by 3 games. But interestingly enough- the elo difference remains the same. So Houdini has increased his advantage, or beta 1 is still holding steady- whichever way of looking at it floats your boat.


Intel i5 w/4TCs
Shredder 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
HS-Book 2.0 bkt. w/12-move limit

10'+10"
Match=500 games


[after 200 games]

Code: Select all

Houdini 2.0c x64    +33    +59/-40/=101   54.75%   109.5/200
Rainbow 1.0 beta    -33    +40/-59/=101   45.25%    90.5/200 


Now I am not quite sure the next update will take us to the halfway mark. Maybe- maybe not. So there is way more than enough time left for beta 1 to make a couple of nice runs. OTOH, making up 19 games against Houdini- that can be problematic.

Hopefully Jesus will stop by and give us some insight into possibilities.




In search of another match update-

george
User avatar
Ajedrecista
Posts: 2170
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: WCEC - Battle for No.1 - Wed. - UPDATE 6!

Post by Ajedrecista »

Hello!
geots wrote:Houdini 2.0c x64 vs Rainbow 1.0 beta


At the 200 game mark- Houdini has increased his lead by 3 games. But interestingly enough- the elo difference remains the same. So Houdini has increased his advantage, or beta 1 is still holding steady- whichever way of looking at it floats your boat.


Intel i5 w/4TCs
Shredder 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
HS-Book 2.0 bkt. w/12-move limit

10'+10"
Match=500 games


[after 200 games]

Code: Select all

Houdini 2.0c x64    +33    +59/-40/=101   54.75%   109.5/200
Rainbow 1.0 beta    -33    +40/-59/=101   45.25%    90.5/200 


Now I am not quite sure the next update will take us to the halfway mark. Maybe- maybe not. So there is way more than enough time left for beta 1 to make a couple of nice runs. OTOH, making up 19 games against Houdini- that can be problematic.

Hopefully Jesus will stop by and give us some insight into possibilities.




In search of another match update-

george
Rainbow is getting into a little problem... other form to say the same is that Houdini is too much Houdini! And the new Houdini 3 is closer and it will not be worse than Houdini 2.0c... Here are my results (remember, take them with lots of care!):

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

59

Write down the number of loses:

40

Write down the number of draws:

101

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time
of the calculations:

3

(Only 1, 2 and 3-sigma confidence error bars are calculated, if possible).

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:     33.11 Elo

Lower rating difference:   15.89 Elo
Upper rating difference:   50.49 Elo

Lower bound uncertainty:  -17.22 Elo
Upper bound uncertainty:   17.38 Elo
Average error:        +/-  17.30 Elo

K = (average error)*[sqrt(n)] =  244.62

Elo interval: ]  15.89,   50.49[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:     33.11 Elo

Lower rating difference:   -1.25 Elo
Upper rating difference:   68.12 Elo

Lower bound uncertainty:  -34.35 Elo
Upper bound uncertainty:   35.01 Elo
Average error:        +/-  34.68 Elo

K = (average error)*[sqrt(n)] =  490.49

Elo interval: ]  -1.25,   68.12[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:     33.11 Elo

Lower rating difference:  -18.39 Elo
Upper rating difference:   86.11 Elo

Lower bound uncertainty:  -51.50 Elo
Upper bound uncertainty:   53.00 Elo
Average error:        +/-  52.25 Elo

K = (average error)*[sqrt(n)] =  738.90

Elo interval: ] -18.39,   86.11[
---------------------------------------

Number of games of the match:                200
Score: 54.75 %
Elo rating difference:   33.11 Elo
Draw ratio: 50.50 %

**********************************************
1 sigma:  2.4647 % of the points of the match.
2 sigma:  4.9294 % of the points of the match.
3 sigma:  7.3941 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01
Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS:  97.30 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time:  56 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
A LOS value of ~ 97.3% means that Houdini should not be better than Rainbow only in more less 1/37 of the cases! So, Houdini is very likely to be better than Rainbow, although the match is still on its 40%. A very difficult task for Rainbow...

Regarding error bars, Houdini has the lead with ~ +33 ± 35 Elo (with 2-sigma confidence), so it is still a little soon for making some predictions, but Houdini clearly has the edge in my unexperienced POV. I take this occasion for wish good luck to all the programmers, testers, opening book makers...

Regards from Spain.

Ajedrecista.