We use a different confidence than the default. I think I set it to 98%Ajedrecista wrote:Hello Don:
Don wrote:Code: Select all
Hardware: i7-2630QM CPU @ 2.00GHz (notebook) OS: Linux TC: 40 / 3 minutes repeating Hash: 64 meg Ponder: OFF Book: private 35,553 line 10 ply Rank Name Elo + - games score oppo. draws 1 Komodo4 3020.4 20.1 20.1 1025 53.3% 3000.0 46.3% 2 IvanHoe 3000.0 20.1 20.1 1025 46.7% 3020.4 46.3% TIME RATIO log(r) NODES log(r) ave DEPTH GAMES PLAYER --------- ---------- -------- -------- -------- --------- ------- ------- 4.7201 0.992 -0.008 3.074 -0.679 17.1057 1025 Komodo4 4.7564 1.000 0.000 6.059 0.000 17.9695 1025 IvanHoe
Thanks for the match! It is interesting. I thought that in a direct match of N games between two engines, the rating difference should be 400·log(score_Komodo/score_IvanHoe). If this is not the case, please try to explain in simple words where I am wrong. According to the data posted by you, I get: +309 -241 =475 (please correct me if I am wrong). With this data I get ~ +23.1 Elo for Komodo, and not +20.4 Elo. Are you using BayesElo or other programme? I also get different error bars: for what confidence level are yours? I post here what I get with my own programme:
For 2-sigma confidence (~ 95.45% confidence), I get ~ ± 15.9 Elo; as you see, I also calculate a thing that I call K, where K = |(average uncertainty)·[sqrt(number of games)]|. In this case, I get K ~ 509.5, which is very reasonable for my patzer views and little experience. This K decreases when the draw ratio grows, and typical values for an even match (like the one you post) are K ~ 580, K ~ 600, ... for draw ratios of around 30%. If I calculate K with your data, K ~ 20.1 · sqrt(1025) ~ 643.5, which is very high for 2-sigma confidence (it corresponds with draw ratios of around 15% in an even match, and not over 45%). Everything changes if you are calculating those intervals with other confidence level, around 99% in a fast calculation by me with pencil and paper (always using my method, which of course will not be the best).Code: Select all
Elo_uncertainties_calculator, © 2012. Calculation of Elo uncertainties in a match between two engines: ---------------------------------------------------------------- (The input and output data is referred to the first engine). Please write down non-negative integers. Write down the number of wins: 309 Write down the number of loses: 241 Write down the number of draws: 475 *************************************** 1-sigma confidence ~ 68.27% confidence. 2-sigma confidence ~ 95.45% confidence. 3-sigma confidence ~ 99.73% confidence. *************************************** ----------------------------------------------------------------------- Confidence interval for 1-sigma: Elo rating difference: 23.083289669143689 Elo Lower rating difference: 15.142280756656772 Elo Upper rating difference: 31.048458042726799 Elo Lower bound uncertainty: -7.9410089124869165 Elo Upper bound uncertainty: 7.9651683735831103 Elo Average error: +- 7.9530886430350134 Elo K = (average error)*[sqrt(n)] = 254.62307326334710 Elo interval: ] 15.142280756656772 , 31.048458042726799 [ ----------------------------------------------------------------------- Confidence interval for 2-sigma: Elo rating difference: 23.083289669143689 Elo Lower rating difference: 7.2170537956172484 Elo Upper rating difference: 39.046316237476915 Elo Lower bound uncertainty: -15.866235873526440 Elo Upper bound uncertainty: 15.963026568333227 Elo Average error: +- 15.914631220929833 Elo K = (average error)*[sqrt(n)] = 509.51680450270673 Elo interval: ] 7.2170537956172484 , 39.046316237476915 [ ----------------------------------------------------------------------- Confidence interval for 3-sigma: Elo rating difference: 23.083289669143689 Elo Lower rating difference: -0.70066962596180779 Elo Upper rating difference: 47.085603654880218 Elo Lower bound uncertainty: -23.783959295105496 Elo Upper bound uncertainty: 24.002313985736530 Elo Average error: +- 23.893136640421013 Elo K = (average error)*[sqrt(n)] = 764.95361165287328 Elo interval: ] -0.70066962596180779 , 47.085603654880218 [ ----------------------------------------------------------------------- Number of games of the match: 1025 Score: 53.317073170731707 % Elo rating difference: 23.083289669143689 Elo Draw ratio: 46.341463414634146 % ***************************************************************** 1-sigma: 1.1393024995912142 % of the points of the match. 2-sigma: 2.2786049991824283 % of the points of the match. 3-sigma: 3.4179074987736425 % of the points of the match. ***************************************************************** End of the calculations. Thanks for using Elo_uncertainties_calculator. Press Enter to exit.
I have more questions regarding the data you provide: what do you refer with TIME? The average time (in minutes) spent by each engine in each game? I understand RATIO, where you normalize TIME respect to IvanHoe 46e; I also understand ln(RATIO), although I do not understand the need of taking logarithms (it surely be better, but I am puzzled). Regarding NODES, I have the same doubt as in TIME: can it be the average number of millions of nodes searched in each move of a game? Again, I understand ln(NODES), having the same doubt that in ln(RATIO). I suppose that average depth refers to the average depth that each engine reached in each move.
Finally, I have the more interesting question for the majority of readers of the forum: how goes the progress in Komodo MP? I hope that well, and I wish you good luck. I hope that the new Komodo will be released before the end of June of this year. Thanks for your attention and your patience for many questions.
Regards from Spain.
Ajedrecista.
Don