rvida wrote:Why not implement ches960 support then? It would surely help to prove or disprove your hypothesis. Btw. looking at CCRL 40/4 FRC list, I might start spreading a hypothesis too ... Also note the 100 elo gap between #2 and #3 (and between #4 - #5). It would be nice if more strong engines supported FRC.
A late reaction, but I've just run a test match showing that Critter 1.6a indeed appears to be slightly stronger than Houdini 2.0 in FRC - playing without opening book from the initial 960 positions with reversed colors.
After the 1920 games at 2'+2", single thread, the match result was 1010-910 for Critter (41% draws), scoring 52.6% or a performance of +18 Elo +/- 9 Elo. Congrats, Richard!
I'm now running a similar match against a pre-beta Houdini 3 DEV, results are quite different .
Robert
Why do you care about a result of 9 Elo points for the entity with a deviation of >22 Elo points? Another aspect is that all tune against Houdini. In other words such results are crap. Now they are all desperastely waiting for H3 or even better R5, but Vas would be gepoudert with Wäscheklammertjes if he would deliver.
-Popper and Lakatos are good but I'm stuck on Leibowitz
rvida wrote:Why not implement ches960 support then? It would surely help to prove or disprove your hypothesis. Btw. looking at CCRL 40/4 FRC list, I might start spreading a hypothesis too ... Also note the 100 elo gap between #2 and #3 (and between #4 - #5). It would be nice if more strong engines supported FRC.
A late reaction, but I've just run a test match showing that Critter 1.6a indeed appears to be slightly stronger than Houdini 2.0 in FRC - playing without opening book from the initial 960 positions with reversed colors.
After the 1920 games at 2'+2", single thread, the match result was 1010-910 for Critter (41% draws), scoring 52.6% or a performance of +18 Elo +/- 9 Elo. Congrats, Richard!
I'm now running a similar match against a pre-beta Houdini 3 DEV, results are quite different .
Robert
May I ask you for the confidence interval of that error bar? For 1920 games and around 41% of draws, I get ± 9 Elo (using my own programme) with a confidence interval of around 86%, which is fairly low IMHO. I get ~ ± 12 Elo for a more common confidence interval of 95%. If I take +616 -516 =788 (which are very close to 1010 - 910 with 41% of draws), this is what I get for 95% confidence:
LOS_and_Elo_uncertainties_calculator, ® 2012.
----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------
(The input and output data is referred to the first engine).
Please write down non-negative integers.
Maximum number of games supported: 2147483647.
Write down the number of wins (up to 1825361100):
616
Write down the number of loses (up to 1825361100):
516
Write down the number of draws (up to 2147482515):
788
Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%):
95
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
---------------------------------------
Elo interval for 95.00 % confidence:
Elo rating difference: 18.11 Elo
Lower rating difference: 6.19 Elo
Upper rating difference: 30.08 Elo
Lower bound uncertainty: -11.92 Elo
Upper bound uncertainty: 11.96 Elo
Average error: +/- 11.94 Elo
K = (average error)*[sqrt(n)] = 523.29
Elo interval: ] 6.19, 30.08[
---------------------------------------
Number of games of the match: 1920
Score: 52.60 %
Elo rating difference: 18.11 Elo
Draw ratio: 41.04 %
*********************************************************
Standard deviation: 1.7133 % of the points of the match.
*********************************************************
Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.
-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------
LOS (taking into account draws) is always calculated, if possible.
LOS (not taking into account draws) is only calculated if wins + loses < 16001.
LOS (average value) is calculated only when LOS (not taking into account draws) is calculated.
______________________________________________
LOS: 99.86 % (taking into account draws).
LOS: 99.85 % (not taking into account draws).
LOS: 99.85 % (average value).
______________________________________________
These values of LOS are rounded up to 0.01%
End of the calculations. Approximated elapsed time: 99 ms.
Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
Sorry for the long output of my programme; I get around +18 ± 12 Elo for Critter with your provided data (given a reasonable confidence interval and 1920 games (with more less tied scores) and 41% of draws, my error bars should not differ a lot in comparison with BayesElo results, this is why I am confused). The LOS value is also telltale... that value should differ very little if you calculate it with a better programme as BayesElo is. By the way, have you calculated that error bar (± 9 Elo) with BayesElo or by yourself? Thanks in advance.
People (including me) will thank you if you post the results of Critter vs. Houdini 3 DEV once this match is finished. Good luck with Houdini and your telescope!
rvida wrote:Why not implement ches960 support then? It would surely help to prove or disprove your hypothesis. Btw. looking at CCRL 40/4 FRC list, I might start spreading a hypothesis too ... Also note the 100 elo gap between #2 and #3 (and between #4 - #5). It would be nice if more strong engines supported FRC.
A late reaction, but I've just run a test match showing that Critter 1.6a indeed appears to be slightly stronger than Houdini 2.0 in FRC - playing without opening book from the initial 960 positions with reversed colors. After the 1920 games at 2'+2", single thread, the match result was 1010-910 for Critter (41% draws), scoring 52.6% or a performance of +18 Elo +/- 9 Elo. Congrats, Richard! I'm now running a similar match against a pre-beta Houdini 3 DEV, results are quite different . Robert
Why do you care about a result of 9 Elo points for the entity with a deviation of >22 Elo points? Another aspect is that all tune against Houdini. In other words such results are crap. Now they are all desperastely waiting for H3 or even better R5, but Vas would be gepoudert with Wäscheklammertjes if he would deliver.
Howdy Rolfy....Long time no see
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
Ajedrecista wrote:Sorry for the long output of my programme; I get around +18 ± 12 Elo for Critter with your provided data (given a reasonable confidence interval and 1920 games (with more less tied scores) and 41% of draws, my error bars should not differ a lot in comparison with BayesElo results, this is why I am confused).
My bad, the correct 95% confidence interval is indeed +18 ± 12 Elo.
Thanks for the correction!
Ajedrecista wrote:People (including me) will thank you if you post the results of Critter vs. Houdini 3 DEV once this match is finished. Good luck with Houdini and your telescope!
Regards from Spain.
Ajedrecista.
The match just finished, match result was 1134-786 for Houdini 3 DEV (44% draws), scoring 59.1% or a performance of +64 ± 12 Elo.
Ajedrecista wrote:People (including me) will thank you if you post the results of Critter vs. Houdini 3 DEV once this match is finished. Good luck with Houdini and your telescope!
Regards from Spain.
Ajedrecista.
The match just finished, match result was 1134-786 for Houdini 3 DEV (44% draws), scoring 59.1% or a performance of +64 ± 12 Elo.
Robert
Thank you Robert,
perhaps i missed something, may i ask you the time control? Is it 2' + 2''?