@Reference(authors="R. R. Sokal, C. D. Michener", title="A statistical method for evaluating systematic relationship", booktitle="University of Kansas science bulletin 28", url="https://archive.org/details/cbarchive_33927_astatisticalmethodforevaluatin1902", bibkey="journals/kansas/SokalM1902") @Alias(value={"upgma","average","average-link","average-linkage","UPGMA","de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.GroupAverageLinkageMethod"}) @Priority(value=201) public class GroupAverageLinkage extends java.lang.Object implements Linkage
This is a good default linkage to use with hierarchical clustering, as it neither exhibits the single-link chaining effect, nor has the strong tendency of complete linkage to split large clusters. It is also easy to understand, and it can be used with arbitrary distances and similarity functions.
The distances of two clusters is defined as the between-group average distance of two points $a$ and $b$, one from each cluster. It should be noted that this is not the average distance within the resulting cluster, because it does not take within-cluster distances into account.
The distance of two clusters in this method is: \[d_{\text{UPGMA}}(A,B)=\tfrac{1}{|A|\cdot|B|} \sum\nolimits_{a\in A}\sum\nolimits_{b\in B} d(a,b)\]
For Lance-Williams, we can then obtain the following recursive definition: \[d_{\text{UPGMA}}(A\cup B,C)=\tfrac{|A|}{|A|+|B|} d(A,C) + \tfrac{|B|}{|A|+|B|} d(B,C)\]
While the method is also called "Unweighted Pair Group Method with Arithmetic
mean", it uses weights in the Lance-Williams formulation that account for the
cluster size. It is unweighted in the sense that every point keeps the same
weight, whereas in WeightedAverageLinkage
(WPGMA), the weight of
points effectively depends on the depth in the cluster tree.
Reference:
R. R. Sokal, C. D. Michener
A statistical method for evaluating systematic relationship
University of Kansas science bulletin, 28, 1409-1438.
Modifier and Type | Class and Description |
---|---|
static class |
GroupAverageLinkage.Parameterizer
Class parameterizer.
|
Modifier and Type | Field and Description |
---|---|
static GroupAverageLinkage |
STATIC
Static instance of class.
|
Constructor and Description |
---|
GroupAverageLinkage()
Deprecated.
use the static instance
STATIC instead. |
Modifier and Type | Method and Description |
---|---|
double |
combine(int sizex,
double dx,
int sizey,
double dy,
int sizej,
double dxy)
Compute combined linkage for two clusters.
|
public static final GroupAverageLinkage STATIC
@Deprecated public GroupAverageLinkage()
STATIC
instead.public double combine(int sizex, double dx, int sizey, double dy, int sizej, double dxy)
Linkage
combine
in interface Linkage
sizex
- Size of first cluster x before mergingdx
- Distance of cluster x to j before mergingsizey
- Size of second cluster y before mergingdy
- Distance of cluster y to j before mergingsizej
- Size of candidate cluster jdxy
- Distance between clusters x and y before mergingCopyright © 2019 ELKI Development Team. License information.