Hello Richard!
rvida wrote:Hi,
I have just answered some PMs about the status of Critter development. I thought I might post in public as well.
New version is due to be released next month. Not sure about exact ELO improvement, though its performance should be pretty close to H1.5.
To give an idea, here is a quick copy&paste from the results of current test run:
Code: Select all
1.3.53c
Critter dev 64-bit - Critter 1.2 64-bit SSE4 116.0 - 106.0 52.25%
Critter dev 64-bit - Stockfish 2.1.1 JA 64bit 132.0 - 90.0 59.46%
Critter dev 64-bit - Komodo64 3 SSE 128.0 - 94.0 57.66%
Critter dev 64-bit - Rybka 4 x64 121.5 - 99.5 54.98%
Critter dev 64-bit - IvanHoe 9.50b x64 131.0 - 91.0 59.01%
Critter dev 64-bit - Houdini 1.5 x64 107.5 - 113.5 48.64%
736.0 - 594.0 55.34%
1330 out of 2100 games played
Level: 40 Moves in 1 min
40/1, 1 core, ponder off, without tablebases, Shredder classic default book
Non-PGO build without SSE4.
Richard
First of all, congratulations for your achievements. 1330 games are a nice amount and I tried to find (with the only help of pencil, paper and Derive 6) the
uncertainty of this test with ~ 95.45% confidence (2-sigma confidence). Of course, these
uncertainties depend on the number of draws (the draw ratio). Looking your results is easy to see that the minimum number of draws is 0 (0%, I guess this is not the case) and the maximum is 1181 (~ 88.8%; 594·2 - 7, if Critter 1.1.53c ended unbeaten in all its winning matches and is not able to win a single game against Houdini... I guess again that it is not the case). Here is what I found, given in steps of 5% (I know that 5% of 1330 = 66.5, an impossible number of draws, but I think an step of 5% is not hard to the eyes) except the last datum (hoping no typos and/or errors in my clumsy calculations):
Code: Select all
Draw ratio = D; Elo difference is rounded (not exact numbers).
Rating difference = rd = 400·log(736/594) ~ +37.2
Uncertainty (error given in Elo points) = |e|
Confidence ~ 95.45% ===> rd ± |e| (referred to the average rating of all these six engines).
D = 0% ---> +37.2 ± 19.3 ~ ]+17.9, +56.5[
D = 5% ---> +37.2 ± 18.8 ~ ]+18.4, +56[
D = 10% ---> +37.2 ± 18.3 ~ ]+18.9, +55.5[
D = 15% ---> +37.2 ± 17.8 ~ ]+19.4, +55[
D = 20% ---> +37.2 ± 17.2 ~ ]+20, +54.4[
D = 25% ---> +37.2 ± 16.7 ~ ]+20.5, +53.9[
D = 30% ---> +37.2 ± 16.1 ~ ]+21.1, +53.3[
D = 35% ---> +37.2 ± 15.5 ~ ]+21.7, +52.7[
D = 40% ---> +37.2 ± 14.9 ~ ]+22.3, +52.1[
D = 45% ---> +37.2 ± 14.2 ~ ]+23, +51.4[
D = 50% ---> +37.2 ± 13.5 ~ ]+23.7, +50.7[
D = 55% ---> +37.2 ± 12.8 ~ ]+24.4, +50[
D = 60% ---> +37.2 ± 12.1 ~ ]+25.1, +49.3[
D = 65% ---> +37.2 ± 11.3 ~ ]+25.9, +48.5[
D = 70% ---> +37.2 ± 10.4 ~ ]+26.8, +47.6[
D = 75% ---> +37.2 ± 9.4 ~ ]+27.8, +46.6[
D = 80% ---> +37.2 ± 8.4 ~ ]+28.8, +45.6[
D = 85% ---> +37.2 ± 7.2 ~ ]+30, +44.4[
D ~ 88.8% ---> +37.2 ± 6.1 ~ ]+31.1, +43.3[
So, those
uncertainties should be between ± 6 and ± 19 (more less); ignoring the cases with too many draws or too few draws, this bar is reduced between ± 11 (2/3 of draws) and ± 15.7 (1/3 of draws), which seems pretty reasonable from my inexpert point of view. So, rounding again around [+25, +50] ahead of the average rating of this bunch of pretty good engines... hats off to you.
Just for curiosity: the previous development version was Critter 1.1.36 (if I remember well) and finally Critter 1.2 was released; now, with Critter 1.3.53c development version... the next public release will be Critter 1.4? Thanks in advance and congratulations once again.
Regards from Spain.
Ajedrecista.