@Reference(authors="J. C. Gower", title="A comparison of some methods of cluster analysis", booktitle="Biometrics (1967)", url="https://doi.org/10.2307/2528417", bibkey="doi:10.2307/2528417") @Alias(value={"centroid","upgmc","de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.CentroidLinkageMethod"}) public class CentroidLinkage extends java.lang.Object implements Linkage
This is closely related to GroupAverageLinkage
(UPGMA), but the
resulting distance corresponds to the distance of the cluster centroids when
used with squared Euclidean distance.
For Lance-Williams, we can then obtain the following recursive definition: \[d_{\text{UPGMC}}(A\cup B,C)=\tfrac{|A|}{|A|+|B|} d(A,C) + \tfrac{|B|}{|A|+|B|} d(B,C) - \tfrac{|A|\cdot|B|}{(|A|+|B|)^2} d(A,B)\]
With squared Euclidean distance, we then get the cluster distance: \[d_{\text{UPGMC}}(A,B)=||\tfrac{1}{|A|}\sum\nolimits_{a\in A} a, \tfrac{1}{|B|}\sum\nolimits_{b\in B} b||^2\] but for other distances, this will not generally be true.
Because the ELKI implementations use Lance-Williams, this linkage should only be used with (squared) Euclidean distance.
While titled "unweighted", this method does take cluster sizes into account when merging clusters with Lance-Williams.
While the idea of this method — at least for squared Euclidean —
is compelling (distance of cluster centers), it is not as well behaved as one
may think. It can yield so called "inversions", where a later merge has a
smaller distance than an early merge, because a cluster center can
be closer to a neighboring cluster than any of the individual points. Because
of this, the GroupAverageLinkage
(UPGMA) is usually preferable.
Reference:
J. C. Gower
A comparison of some methods of cluster analysis
Biometrics (1967): 623-637.
Modifier and Type | Class and Description |
---|---|
static class |
CentroidLinkage.Parameterizer
Class parameterizer.
|
Modifier and Type | Field and Description |
---|---|
static CentroidLinkage |
STATIC
Static instance of class.
|
Constructor and Description |
---|
CentroidLinkage()
Deprecated.
use the static instance
STATIC instead. |
Modifier and Type | Method and Description |
---|---|
double |
combine(int sizex,
double dx,
int sizey,
double dy,
int sizej,
double dxy)
Compute combined linkage for two clusters.
|
public static final CentroidLinkage STATIC
@Deprecated public CentroidLinkage()
STATIC
instead.public double combine(int sizex, double dx, int sizey, double dy, int sizej, double dxy)
Linkage
combine
in interface Linkage
sizex
- Size of first cluster x before mergingdx
- Distance of cluster x to j before mergingsizey
- Size of second cluster y before mergingdy
- Distance of cluster y to j before mergingsizej
- Size of candidate cluster jdxy
- Distance between clusters x and y before mergingCopyright © 2019 ELKI Development Team. License information.