O
- Object type@Reference(authors="M. R. Anderberg", title="Hierarchical Clustering Methods", booktitle="Cluster Analysis for Applications", bibkey="books/academic/Anderberg73/Ch6") @Priority(value=200) public class AnderbergHierarchicalClustering<O> extends AbstractDistanceBasedAlgorithm<O,PointerHierarchyRepresentationResult> implements HierarchicalClusteringAlgorithm
Instead of scanning the matrix (with cost O(n²)) to find the minimum, the nearest neighbor of each object is remembered. On the downside, we need to check these values at every merge, and it may now cost O(n²) to perform a merge, so there is no worst-case advantage to this approach. The average case however improves from O(n³) to O(n²), which yields a considerable improvement in running time.
This optimization is attributed to M. R. Anderberg.
Reference:
M. R. Anderberg
Hierarchical Clustering Methods
Cluster Analysis for Applications
ISBN: 0120576503
Modifier and Type | Class and Description |
---|---|
static class |
AnderbergHierarchicalClustering.Parameterizer<O>
Parameterization class
|
Modifier and Type | Field and Description |
---|---|
(package private) Linkage |
linkage
Current linkage method in use.
|
private static Logging |
LOG
Class logger
|
ALGORITHM_ID
DISTANCE_FUNCTION_ID
Constructor and Description |
---|
AnderbergHierarchicalClustering(DistanceFunction<? super O> distanceFunction,
Linkage linkage)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
protected void |
findBest(int size,
double[] scratch,
double[] bestd,
int[] besti,
int j) |
protected int |
findMerge(int size,
MatrixParadigm mat,
double[] bestd,
int[] besti,
PointerHierarchyRepresentationBuilder builder)
Perform the next merge step.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
private static void |
initializeNNCache(double[] scratch,
double[] bestd,
int[] besti)
Initialize the NN cache.
|
protected void |
merge(int size,
MatrixParadigm mat,
double[] bestd,
int[] besti,
PointerHierarchyRepresentationBuilder builder,
double mindist,
int x,
int y)
Execute the cluster merge.
|
PointerHierarchyRepresentationResult |
run(Database db,
Relation<O> relation)
Run the algorithm
|
private void |
updateCache(int size,
double[] scratch,
double[] bestd,
int[] besti,
int x,
int y,
int j,
double d)
Update the cache.
|
protected void |
updateMatrix(int size,
double[] scratch,
DBIDArrayIter ij,
double[] bestd,
int[] besti,
PointerHierarchyRepresentationBuilder builder,
double mindist,
int x,
int y,
int sizex,
int sizey)
Update the scratch distance matrix.
|
getDistanceFunction
run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging LOG
Linkage linkage
public AnderbergHierarchicalClustering(DistanceFunction<? super O> distanceFunction, Linkage linkage)
distanceFunction
- Distance function to uselinkage
- Linkage methodpublic PointerHierarchyRepresentationResult run(Database db, Relation<O> relation)
db
- Databaserelation
- Relationprivate static void initializeNNCache(double[] scratch, double[] bestd, int[] besti)
scratch
- Scratch spacebestd
- Best distancebesti
- Best indexprotected int findMerge(int size, MatrixParadigm mat, double[] bestd, int[] besti, PointerHierarchyRepresentationBuilder builder)
size
- Data set sizemat
- Matrix paradigmbestd
- Best distancebesti
- Index of best distancebuilder
- Hierarchy builderprotected void merge(int size, MatrixParadigm mat, double[] bestd, int[] besti, PointerHierarchyRepresentationBuilder builder, double mindist, int x, int y)
size
- Data set sizemat
- Matrix paradigmbestd
- Best distancebesti
- Index of best distancebuilder
- Hierarchy buildermindist
- Distance that was used for mergingx
- First matrix positiony
- Second matrix positionprotected void updateMatrix(int size, double[] scratch, DBIDArrayIter ij, double[] bestd, int[] besti, PointerHierarchyRepresentationBuilder builder, double mindist, int x, int y, int sizex, int sizey)
size
- Data set sizescratch
- Scratch matrix.ij
- Iterator to reusebestd
- Best distancebesti
- Index of best distancebuilder
- Hierarchy buildermindist
- Distance that was used for mergingx
- First matrix positiony
- Second matrix positionsizex
- Old size of first clustersizey
- Old size of second clusterprivate void updateCache(int size, double[] scratch, double[] bestd, int[] besti, int x, int y, int j, double d)
size
- Working set sizescratch
- Scratch matrixbestd
- Best distancebesti
- Best indexx
- First clustery
- Second cluster, y < x
j
- Updated value d(y, j)d
- New distanceprotected void findBest(int size, double[] scratch, double[] bestd, int[] besti, int j)
public TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<PointerHierarchyRepresentationResult>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<PointerHierarchyRepresentationResult>
Copyright © 2019 ELKI Development Team. License information.