@Reference(authors="Erich Schubert, Michael Gertz", title="Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding", booktitle="ArXiV preprint, 1708.03569", url="http://arxiv.org/abs/1708.03569", bibkey="DBLP:journals/corr/abs-1708-03569") @Priority(value=206) public class ClustersWithNoiseExtraction extends java.lang.Object implements ClusteringAlgorithm<Clustering<Model>>
This will execute the highest-most cut where we retain k clusters, each with a minimum size, plus noise (single points that would only merge afterwards). If no such cut can be found, it returns a result with a relaxed k.
You need to specify: A) the minimum size of a cluster (it does not make much sense to use 1 - then it will simply execute all but the last k merges) and B) the desired number of clusters with at least minSize elements each.
Reference:
Erich Schubert, Michael Gertz
Semantic Word Clouds with Background Corpus Normalization and t-distributed
Stochastic Neighbor Embedding
ArXiV preprint, 1708.03569
TODO: Also provide representatives and last merge height for clusters.
Modifier and Type | Class and Description |
---|---|
protected class |
ClustersWithNoiseExtraction.Instance
Instance for a single data set.
|
static class |
ClustersWithNoiseExtraction.Parameterizer
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
private HierarchicalClusteringAlgorithm |
algorithm
Clustering algorithm to run to obtain the hierarchy.
|
private static Logging |
LOG
Class logger.
|
private int |
minClSize
Minimum cluster size.
|
private int |
numCl
Minimum number of clusters.
|
Constructor and Description |
---|
ClustersWithNoiseExtraction(HierarchicalClusteringAlgorithm algorithm,
int numCl,
int minClSize)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
Clustering<Model> |
run(Database database)
Runs the algorithm.
|
Clustering<Model> |
run(PointerHierarchyRepresentationResult pointerresult)
Process an existing result.
|
private static final Logging LOG
private int numCl
private int minClSize
private HierarchicalClusteringAlgorithm algorithm
public ClustersWithNoiseExtraction(HierarchicalClusteringAlgorithm algorithm, int numCl, int minClSize)
algorithm
- Algorithm to runnumCl
- Number of clustersminClSize
- Minimum cluster sizepublic Clustering<Model> run(Database database)
Algorithm
run
in interface Algorithm
run
in interface ClusteringAlgorithm<Clustering<Model>>
database
- the database to run the algorithm onpublic Clustering<Model> run(PointerHierarchyRepresentationResult pointerresult)
pointerresult
- Existing result in pointer representation.public TypeInformation[] getInputTypeRestriction()
Algorithm
getInputTypeRestriction
in interface Algorithm
Copyright © 2019 ELKI Development Team. License information.