I don't think there is any doubt Fruit has been a large influence byLaskos wrote:Adam Hair wrote:
Me too . In the trial version of Systat, I can change the linkage
methods also. I hope to have some time this weekend to do some
reading, and then experiment with these.
I found in SPSS a more reliable hierarchical cluster analysis algorithm. Average Linkage between Groups, before the first merge,
let N_i=1 for i = 1 to N (engines).
Update s_tr (S = similarity matrix) by
s_tr = s_pr+s_qr
Update N_t by
N_t = N_p+N_q
and then choose the most similar pair based on the value
s_ij / (N_i * N_j)
Measure is Pearson correlation (bivariate correlations algorithm)
r_ij = C_ij / sqrt(C_ii * C_jj).
The distances on the horizontal axis to the common ancestor is the degree of relatedness. This algorithm takes care of all the space, from leaves to upper branches. Upper branches were more fragile in the Complete Linkage between Groups with Euclidian measure.
Apparently Fruitish 2.1 branch is denoted by Fruit 2.1, apparenly Rybkish 3 branch is denoted by Rybka 3. A and B are general groups, A might have some fruitish ideas, B does not. If that assumption is correct, then Fruit 2.1 had a tremendous influence upon the recent (strong) engines.
Kai
anybody.
Average linkage between groups is more robust than complete linkage.
I have made two graphs, using Systat and your data, as well as Average
and Pearson. I changed the diagonal from 100% to 75%. That was the
cause of the difference in scale between your graphs and mine.
As you can see, the graph is basically identical to yours. I just doing
this so that we both know that Systat and SPSS will produce the same
results.
For the second graph, I removed Houdini, Strelka, Ivanhoe, Rybka 4,
and Naum 4.2.
I did this to point out, as you also did, that some care has to be given
to which engines are included. The clusters can change with the inclusion
and exclusion of engines. My belief is, in order to avoid bias as much as
possible, several versions from each engine family should be included.
And as many engine families as possible should be included. Then I
think the clustering analysis can give us a true picture.