Komodo_Ivanhoe- 2 Nice Final Matches!

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Some questions about your test, Don.

Post by Don »

Ajedrecista wrote:Hello Don:
Don wrote:

Code: Select all

   Hardware:   i7-2630QM CPU @ 2.00GHz  (notebook) 
         OS:   Linux 
         TC:   40 / 3 minutes repeating 
       Hash:   64 meg 
     Ponder:   OFF 
       Book:   private 35,553 line 10 ply 

 Rank Name       Elo      +      -    games   score   oppo.   draws 
   1 Komodo4  3020.4   20.1   20.1    1025   53.3%  3000.0   46.3% 
   2 IvanHoe  3000.0   20.1   20.1    1025   46.7%  3020.4   46.3% 


      TIME       RATIO    log(r)     NODES    log(r)  ave DEPTH    GAMES   PLAYER 
 ---------  ----------  --------  --------  --------  ---------  -------   ------- 
    4.7201       0.992    -0.008     3.074    -0.679    17.1057     1025   Komodo4 
    4.7564       1.000     0.000     6.059     0.000    17.9695     1025   IvanHoe


Thanks for the match! It is interesting. I thought that in a direct match of N games between two engines, the rating difference should be 400·log(score_Komodo/score_IvanHoe). If this is not the case, please try to explain in simple words where I am wrong. According to the data posted by you, I get: +309 -241 =475 (please correct me if I am wrong). With this data I get ~ +23.1 Elo for Komodo, and not +20.4 Elo. Are you using BayesElo or other programme? I also get different error bars: for what confidence level are yours? I post here what I get with my own programme:

Code: Select all

 Elo_uncertainties_calculator, © 2012.

 Calculation of Elo uncertainties in a match between two engines:
 ----------------------------------------------------------------

 (The input and output data is referred to the first engine).

 Please write down non-negative integers.

 Write down the number of wins:

309

 Write down the number of loses:

241

 Write down the number of draws:

475

 ***************************************
 1-sigma confidence ~ 68.27% confidence.
 2-sigma confidence ~ 95.45% confidence.
 3-sigma confidence ~ 99.73% confidence.
 ***************************************

 -----------------------------------------------------------------------

 Confidence interval for            1-sigma:

 Elo rating difference:      23.083289669143689    Elo

 Lower rating difference:      15.142280756656772    Elo
 Upper rating difference:      31.048458042726799    Elo

 Lower bound uncertainty:     -7.9410089124869165    Elo
 Upper bound uncertainty:      7.9651683735831103    Elo
 Average error: +-     7.9530886430350134    Elo

 K = (average error)*[sqrt(n)] =      254.62307326334710

 Elo interval: ]     15.142280756656772    ,     31.048458042726799    [
 -----------------------------------------------------------------------

 Confidence interval for            2-sigma:

 Elo rating difference:      23.083289669143689    Elo

 Lower rating difference:      7.2170537956172484    Elo
 Upper rating difference:      39.046316237476915    Elo

 Lower bound uncertainty:     -15.866235873526440    Elo
 Upper bound uncertainty:      15.963026568333227    Elo
 Average error: +-     15.914631220929833    Elo

 K = (average error)*[sqrt(n)] =      509.51680450270673

 Elo interval: ]     7.2170537956172484    ,     39.046316237476915    [
 -----------------------------------------------------------------------

 Confidence interval for            3-sigma:

 Elo rating difference:      23.083289669143689    Elo

 Lower rating difference:    -0.70066962596180779    Elo
 Upper rating difference:      47.085603654880218    Elo

 Lower bound uncertainty:     -23.783959295105496    Elo
 Upper bound uncertainty:      24.002313985736530    Elo
 Average error: +-     23.893136640421013    Elo

 K = (average error)*[sqrt(n)] =      764.95361165287328

 Elo interval: ]   -0.70066962596180779    ,     47.085603654880218    [
 -----------------------------------------------------------------------

 Number of games of the match:         1025
 Score:      53.317073170731707    %
 Elo rating difference:      23.083289669143689    Elo
 Draw ratio:      46.341463414634146    %

 *****************************************************************
 1-sigma:      1.1393024995912142    % of the points of the match.
 2-sigma:      2.2786049991824283    % of the points of the match.
 3-sigma:      3.4179074987736425    % of the points of the match.
 *****************************************************************

 End of the calculations.

 Thanks for using Elo_uncertainties_calculator. Press Enter to exit.
For 2-sigma confidence (~ 95.45% confidence), I get ~ ± 15.9 Elo; as you see, I also calculate a thing that I call K, where K = |(average uncertainty)·[sqrt(number of games)]|. In this case, I get K ~ 509.5, which is very reasonable for my patzer views and little experience. This K decreases when the draw ratio grows, and typical values for an even match (like the one you post) are K ~ 580, K ~ 600, ... for draw ratios of around 30%. If I calculate K with your data, K ~ 20.1 · sqrt(1025) ~ 643.5, which is very high for 2-sigma confidence (it corresponds with draw ratios of around 15% in an even match, and not over 45%). Everything changes if you are calculating those intervals with other confidence level, around 99% in a fast calculation by me with pencil and paper (always using my method, which of course will not be the best).

I have more questions regarding the data you provide: what do you refer with TIME? The average time (in minutes) spent by each engine in each game? I understand RATIO, where you normalize TIME respect to IvanHoe 46e; I also understand ln(RATIO), although I do not understand the need of taking logarithms (it surely be better, but I am puzzled). Regarding NODES, I have the same doubt as in TIME: can it be the average number of millions of nodes searched in each move of a game? Again, I understand ln(NODES), having the same doubt that in ln(RATIO). I suppose that average depth refers to the average depth that each engine reached in each move.

Finally, I have the more interesting question for the majority of readers of the forum: how goes the progress in Komodo MP? I hope that well, and I wish you good luck. I hope that the new Komodo will be released before the end of June of this year. Thanks for your attention and your patience for many questions.

Regards from Spain.

Ajedrecista.
We use a different confidence than the default. I think I set it to 98%

Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
gleperlier
Posts: 1033
Joined: Sat Feb 04, 2012 10:03 pm

Re: Komodo_Ivanhoe- 2 Nice Final Matches!

Post by gleperlier »

Can someone give a link to download Ivanhoe B46e x64 :?:

Thanks :!: