- Correlations: This analysis finds which pairs of metrics are
representing equivalent facets of the datasets. We obtain the Pearson
correlation coefficient. The score is in the range [-1,1].
- Perfect correlations: -1 (negative), and 1 (positive)
- Stability: This analysis permits to estimate whether the
clustering is meaningfully affected by small variations in the data
[1]. The stability index is the mean of the Jaccard coefficient [2]
values of 1000 bootstrap replicates. The values are in the range
[0,1], having the following meaning:
- Unstable: [0, 0.60[
- Doubtful: [0.60, 0.75]
- Stable: ]0.75, 0.85]
- Highly Stable: ]0.85, 1]
- Goodness of classifications: The goodness of the
classifications are assessed by validating the clusters generated. For
this purpose, we use the Silhouette width as validity index. Kaufman
and Rousseeuw [3] suggested the interpretation of the global
Silhouette width score as the effectiveness of the clustering
structure. The values are in the range [0,1], having the following
meaning:
- There is no substantial clustering structure: [-1, 0.25].
- The clustering structure is weak and could be artificial: ]0.25,
0.50].
- There is a reasonable clustering structure: ]0.50, 0.70].
- A strong clustering structure has been found: ]0.70, 1].
[1] Cheng, R. and Milligan, G. W. (1996). Measuring the influence of
individual data points in a cluster analysis. Journal of Classification,
13, 315–335.
[2] Jaccard, C. (1901). Distribution de la flore alpine dans le Basin de
Dranses et dans quelques regions voisines. Bulletin de la Societe Vaudoise
des Sciences Naturelles, 37, 241–272.
[3] Kaufman, L. and Rousseeuw, P. (1990). Finding Groups in Data: An
Introduction to Cluster Analysis. Wiley.