noctiferus wrote:Hi, Don.
I'm not taking any position in this discussion, up to now.
As a former professor of Probability, Statistics and Data Mining in my University, I thought a bit about your first test, and I had some perplexities about your conclusions, mainly due to the use of point estimates, not taking into account the confidence intervals of your predictions, that are fundamental in drawing correct statistical conclusions (I'm waiting for the licence of my preferred statistical software, expected in the next week, to make my own analyses and to share them with you), and due to the use of a linear or cubic model, proposed here, for interpolation-extrapolation (this choice basically excludes the possibility of reaching an asymptotic value, that is a relevant alternative, as could be done with an exponential model...It seems sort of "petitio principii").
I welcome your input, Probability and Statistics is not my forte and if there is anything wrong with my methodology I'm willing to hear about it.
Now I see you changed the test methodology. Would you please tell me if I understood well your planned experiment, and, in the case, correct me?
At generic level i:
Round robin among engines:
Houdini at time level i
Houdini at time level i-1
Houdini at time level i-2
Komodo at time level i
Komodo at time level i-1
Komodo at time level i-2
Every match among engines: 2000 games.
Did I understand well?
Thanks for your attention
Enrico
I am going to play Houdini vs Komodo at levels 00 through 08 where each level is a doubling in time. Level 00 is 6 seconds + 0.1 increment Fischer time control.
Since it's terribly inefficient playing 08 vs 01 I am confining all matches to within 2 levels. What that means is that 05 will play 04 and 03 but not 02. It will also play 06 and 07 but not 08. Komodo will play BOTH Houdini and other Komodo's at all relevant levels, i.e. k4 will play these programs: k2, k3, k5, k6, h2, h3, h4, h5 and h6.
I have not fixed the number of games in this test, I really intend to let this go as long as possible (several weeks or even months which it may take) and keep the levels balanced. If it's desirable statistically to define the exact test in advance I will do this on your recommendation but I would like the total games played for each version to be at least 2000 games, which means about 200-250 per pairing. The top 2 levels and bottom 2 levels will not have the same number of games due to the rule that no program plays more then 2 levels up or down. We could say 300 games per pairing, each of 150 different starting positions.
I want to understand this fully, I know that programs that scale differently tend to move away from each other with depth, if the scale the same they tend to approach each other. But when both are happening to an extent it's hard to separate the two things except perhaps by comparing self-play games for either program.
The plan is that when I get significant data I will plot 2 lines using gnuplot where the Y-AXIS is the level (00 - 08) and then I can Y adjust the lines for one of the programs to force them to cross over each other. Of course if the scalability is very similar they won't cross over but appear nearly on the same slope. Of course this can be treated statistically too without the plot but as they say, "a picture is worth a thousand words."
Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.