Critter 1.4 32-bit vs Ivanhoe B47f0.2 32-bit
As promised, but a change in the Ivanhoe version used. I had stated that I would use Ivanhoe 46e by PeterPan. But at the last minute I decided to change horses. The above version I used has played and beaten the very latest versions of Komodo, Stockfish and Rybka. Ivanhoe 46e, also by PeterPan has only played Rybka 4.1 out of the 3, but won by a larger margin. Anyway- I just thought this would be more than enough of a test for Critter. And for whatever my reasoning- I thought it would be the tougher opponent for Critter. At any rate- away we go:
1CPU/32bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
40/3 Repeating
Match=50 games
Critter 1.4 32-bit............. +49------+18/-11/=21-----57.00%-----28.5/50
Ivanhoe B47f0.2 [PP].......-49-------+11/-18/=21-----43.00%-----21.5/50
Make no mistake- this new Critter version is a 32bit monster. There are a lot of people here who do not understand the implications of the results of this match. Is this the strongest Ivanhoe version- maybe, but probably not. But there are none at the moment anywhere near 49 Elo better, which is what Critter won this match by.
You are being deluged by results from this new Critter version, the new Komodo version and the Houdini of your choice. Just remember to keep it all in perspective. There are good things you can take from each match that has been run by different people. And remember that early results, including mine, can have that + or - error factor.
The only test results I have seen that I am positive could have been done much better are from a few testers who are including Ivanhoe versions in their testing. Inexplicably, they think there is an "official Ivanhoe release site"- which is actually only the "official" site of the compiles by KLO. So they end up disregarding and missing the strongest of a number of Ivanhoe versions. And for that- I have the definitive proof. But I do not get paid to offer explanations and ideas in that area. And it is not really my concern- my plate is already full.
Critter 1.4 is certainly going to be stronger than version 1.2 when all is said and done. Strong enough to pass Houdini 2.0? The jury is out.
Enjoy, and bye-bye-
g
e
o
r
g
e
Critter 1.4 32bit vs. Ivanhoe B47f0.2- AS Promised
Moderator: Ras
-
- Posts: 4790
- Joined: Sat Mar 11, 2006 12:42 am
-
- Posts: 2124
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Approximated values of uncertainties of this match.
Hello George:
Thank you very much for your tests. Your work is really appreciated.
Regards from Spain.
Ajedrecista.
Thank you very much for your tests. Your work is really appreciated.
You can be sure that there are +/- factors (which I call uncertainties). I have calculated them (in a clumsy way) for this match:There are good things you can take from each match that has been run by different people. And remember that early results, including mine, can have that + or - error factor.
I think these formulæ (I took the formula of standard deviation from Immortal223 Forum long time ago, now ImmortalChess Forum) are more similar to EloStat than to BayesianElo (I am not sure), although the general consensus is that BayesElo is more accurate than EloStat. The uncertainties that I have calculated for you are merely orientative and never exact values, but I think that 'my' values will not differ so much with the true ones. Please have a nice day.(Referred to Critter 1.4 w32):
n = number of games
w = number of wins
l = number of loses
d = number of draws
D = draw ratio
mu = relative score
rd = rating difference
sd = standard deviation
rd(+) = upper rating difference
rd(-) = lower rating difference
e(+) = uncertainty between rd and rd(+)
e(-) = uncertainty between rd and rd(-)
<e> = average uncertainty
n = w + l + d
D = d/n
mu = (w + d/2)/n
1 - mu = (d/2 + l)/n
rd = 400·log[mu/(1 - mu)]
sd = sqrt{(1/n)·[mu·(1 - mu) - D/4]}
rd(+) = 400·log[(mu + 2·sd)/(1 - mu - 2·sd)]
rd(-) = 400·log[(mu - 2·sd)/(1 - mu + 2·sd)]
e(+) = [rd(+)] - rd > 0
e(-) = [rd(-)] - rd < 0
<e> = ±[|e(+)| + |e(-)|]/2 = ±{[e(+)] - [e(-)]}/2
K = |<e>|·sqrt(n)
K is a 'sanity check': usual values (most of the time, but not always) for 2-sigma confidence are between 500 and 600, according with my tiny experience.
Rating difference interval (with 2-sigma confidence ~ 95.45% confidence): ]rd(-), rd(+)[
(Calculations have been done with a Casio calculator, so may contain errors).
==================================================================================================
n = 50:
28.5 - 21.5 (+18 -11 = 21)
rd ~ +48.96
2n·sd ~ 5.2934 points
rd(+) ~ +127.66 ; e(+) ~ +78.7
rd(-) ~ -24.97 ; e(-) ~ -73.93
<e> ~ ± 76.32 ; K = |<e>|·sqrt(n) ~ 539.66
[Rating difference interval (with 2-sigma confidence ~ 95.45% confidence)] ~ ]-24.97, +125.28[
Regards from Spain.
Ajedrecista.
-
- Posts: 4790
- Joined: Sat Mar 11, 2006 12:42 am
Re: Approximated values of uncertainties of this match.
Ajedrecista wrote:Hello George:
Thank you very much for your tests. Your work is really appreciated.
You can be sure that there are +/- factors (which I call uncertainties). I have calculated them (in a clumsy way) for this match:There are good things you can take from each match that has been run by different people. And remember that early results, including mine, can have that + or - error factor.
I think these formulæ (I took the formula of standard deviation from Immortal223 Forum long time ago, now ImmortalChess Forum) are more similar to EloStat than to BayesianElo (I am not sure), although the general consensus is that BayesElo is more accurate than EloStat. The uncertainties that I have calculated for you are merely orientative and never exact values, but I think that 'my' values will not differ so much with the true ones. Please have a nice day.(Referred to Critter 1.4 w32):
n = number of games
w = number of wins
l = number of loses
d = number of draws
D = draw ratio
mu = relative score
rd = rating difference
sd = standard deviation
rd(+) = upper rating difference
rd(-) = lower rating difference
e(+) = uncertainty between rd and rd(+)
e(-) = uncertainty between rd and rd(-)
<e> = average uncertainty
n = w + l + d
D = d/n
mu = (w + d/2)/n
1 - mu = (d/2 + l)/n
rd = 400·log[mu/(1 - mu)]
sd = sqrt{(1/n)·[mu·(1 - mu) - D/4]}
rd(+) = 400·log[(mu + 2·sd)/(1 - mu - 2·sd)]
rd(-) = 400·log[(mu - 2·sd)/(1 - mu + 2·sd)]
e(+) = [rd(+)] - rd > 0
e(-) = [rd(-)] - rd < 0
<e> = ±[|e(+)| + |e(-)|]/2 = ±{[e(+)] - [e(-)]}/2
K = |<e>|·sqrt(n)
K is a 'sanity check': usual values (most of the time, but not always) for 2-sigma confidence are between 500 and 600, according with my tiny experience.
Rating difference interval (with 2-sigma confidence ~ 95.45% confidence): ]rd(-), rd(+)[
(Calculations have been done with a Casio calculator, so may contain errors).
==================================================================================================
n = 50:
28.5 - 21.5 (+18 -11 = 21)
rd ~ +48.96
2n·sd ~ 5.2934 points
rd(+) ~ +127.66 ; e(+) ~ +78.7
rd(-) ~ -24.97 ; e(-) ~ -73.93
<e> ~ ± 76.32 ; K = |<e>|·sqrt(n) ~ 539.66
[Rating difference interval (with 2-sigma confidence ~ 95.45% confidence)] ~ ]-24.97, +125.28[
Regards from Spain.
Ajedrecista.
Thank you very much and I appreciate your interest. Tho I do not understand your calculations, I am quite sure they are fine and not clumsy. I really wish I could understand your work here to the point I could at least comment with you on it, but I am afraid my mind does not take me to those levels. I really wish it did. If just one person like you has interest in, or gets enjoyment from my work- then it was worth it.
Best to You and Happy New Year,
george