I tried a little match at game/4 minutes between the newly released Crafty 23.5 and Arasan 15.0, on a i5-2500k with both engines having 2 cores and 256M hash. Both programs used their own books.
Results were:
+63 -61 =76
for Arasan over 200 games (50.5%). I know the error bars are large here so am not concluding much from this. But Arasan scored similarly against Crafty 23.4 in my hyper-bullet testing:
+1891 -2018 =1091 5000 total 48.73%
--Jon
Crafty-Arasan match
Moderator: Ras
-
jdart
- Posts: 4420
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
-
Ajedrecista
- Posts: 2170
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Crafty-Arasan match.
Hello Jon:
You are right about the first test (± 38 Elo with 95% confidence and a LOS value of only 57%).
Taking a look on the second match:
LOS is telltale here: only 2.11%... but you are certainly close to a true legend as Crafty is! Please keep your good work and you will be even more close to it.
Regards from Spain.
Ajedrecista.
Running my Fortran 95 programme LOS_and_Elo_uncertainties_calculator:jdart wrote:I tried a little match at game/4 minutes between the newly released Crafty 23.5 and Arasan 15.0, on a i5-2500k with both engines having 2 cores and 256M hash. Both programs used their own books.
Results were:
+63 -61 =76
for Arasan over 200 games (50.5%). I know the error bars are large here so am not concluding much from this. But Arasan scored similarly against Crafty 23.4 in my hyper-bullet testing:
+1891 -2018 =1091 5000 total 48.73%
--Jon
Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012.
----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------
(The input and output data is referred to the first engine).
Please write down non-negative integers.
Maximum number of games supported: 2147483647.
Write down the number of wins (up to 1825361100):
63
Write down the number of loses (up to 1825361100):
61
Write down the number of draws (up to 2147483523):
76
Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%):
95
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
---------------------------------------
Elo interval for 95.00 % confidence:
Elo rating difference: 3.47 Elo
Lower rating difference: -34.55 Elo
Upper rating difference: 41.58 Elo
Lower bound uncertainty: -38.02 Elo
Upper bound uncertainty: 38.11 Elo
Average error: +/- 38.07 Elo
K = (average error)*[sqrt(n)] = 538.34
Elo interval: ] -34.55, 41.58[
---------------------------------------
Number of games of the match: 200
Score: 50.50 %
Elo rating difference: 3.47 Elo
Draw ratio: 38.00 %
*********************************************************
Standard deviation: 5.4559 % of the points of the match.
*********************************************************
Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.
-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------
LOS (taking into account draws) is always calculated, if possible.
LOS (not taking into account draws) is only calculated if wins + loses < 16001.
LOS (average value) is calculated only when LOS (not taking into account draws) is calculated.
______________________________________________
LOS: 57.13 % (taking into account draws).
LOS: 57.10 % (not taking into account draws).
LOS: 57.11 % (average value).
______________________________________________
These values of LOS are rounded up to 0.01%
End of the calculations. Approximated elapsed time: 73 ms.
Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
Taking a look on the second match:
Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012.
----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------
(The input and output data is referred to the first engine).
Please write down non-negative integers.
Maximum number of games supported: 2147483647.
Write down the number of wins (up to 1825361100):
1891
Write down the number of loses (up to 1825361100):
2018
Write down the number of draws (up to 2147479738):
1091
Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%):
95
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
---------------------------------------
Elo interval for 95.00 % confidence:
Elo rating difference: -8.83 Elo
Lower rating difference: -17.35 Elo
Upper rating difference: -0.31 Elo
Lower bound uncertainty: -8.52 Elo
Upper bound uncertainty: 8.51 Elo
Average error: +/- 8.52 Elo
K = (average error)*[sqrt(n)] = 602.36
Elo interval: ] -17.35, -0.31[
---------------------------------------
Number of games of the match: 5000
Score: 48.73 %
Elo rating difference: -8.83 Elo
Draw ratio: 21.82 %
*********************************************************
Standard deviation: 1.2249 % of the points of the match.
*********************************************************
Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.
-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------
LOS (taking into account draws) is always calculated, if possible.
LOS (not taking into account draws) is only calculated if wins + loses < 16001.
LOS (average value) is calculated only when LOS (not taking into account draws) is calculated.
______________________________________________
LOS: 2.11 % (taking into account draws).
LOS: 2.11 % (not taking into account draws).
LOS: 2.11 % (average value).
______________________________________________
These values of LOS are rounded up to 0.01%
End of the calculations. Approximated elapsed time: 93 ms.
Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
Regards from Spain.
Ajedrecista.
-
Jouni
- Posts: 3778
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: Crafty-Arasan match
I have difficulties to believe your result. Crafty simply is at least 2 classes stronger. I ran short 100 games match in dual Pentium.
1: Crafty-23.5-64bit 69.5/100
2: Arasanx-64 30.5/100
So +147 ELO - no need for any statistics, Crafty is CLEARLY stronger.
1: Crafty-23.5-64bit 69.5/100
2: Arasanx-64 30.5/100
So +147 ELO - no need for any statistics, Crafty is CLEARLY stronger.
Jouni