@Reference(authors="Robert Haralick, Rave Harpaz", title="Linear manifold clustering in high dimensional spaces by stochastic search", booktitle="Pattern Recognition volume 40, Issue 10", url="http://dx.doi.org/10.1016/j.patcog.2007.01.020") public class LMCLUS extends AbstractAlgorithm<Clustering<Model>>
Robert Haralick, Rave Harpaz
Linear manifold clustering in high dimensional spaces by stochastic search
In: Pattern Recognition volume 40, Issue 10
Modifier and Type | Class and Description |
---|---|
static class |
LMCLUS.Parameterizer
Parameterization class
|
private static class |
LMCLUS.Separation
Class to represent a linear manifold separation
|
Modifier and Type | Field and Description |
---|---|
private static int |
BINS
Histogram resolution
|
private static Logging |
LOG
The logger for this class.
|
private int |
maxLMDim
Maximum cluster dimensionality
|
private int |
minsize
Minimum cluster size
|
private static double |
NOT_FROM_ONE_CLUSTER_PROBABILITY
Epsilon
|
private RandomFactory |
rnd
Random factory
|
private int |
samplingLevel
Number of sampling rounds to find a good split
|
private double |
sensitivityThreshold
The current threshold value calculated by the findSeperation Method.
|
Constructor and Description |
---|
LMCLUS(int maxdim,
int minsize,
int samplingLevel,
double sensitivityThreshold,
RandomFactory rnd)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
private double |
deviation(Vector delta,
Matrix beta)
Deviation from a manifold described by beta.
|
private double[] |
findAndEvaluateThreshold(DoubleDynamicHistogram histogram)
Evaluate the histogram to find a suitable threshold
|
private LMCLUS.Separation |
findSeparation(Relation<NumberVector> relation,
DBIDs currentids,
int dimension,
Random r)
This method samples a number of linear manifolds an tries to determine
which the one with the best cluster is.
|
private Matrix |
generateOrthonormalBasis(List<Vector> vectors)
This Method generates an orthonormal basis from a set of Vectors.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
Clustering<Model> |
run(Database database,
Relation<NumberVector> relation)
The main LMCLUS (Linear manifold clustering algorithm) is processed in this
method.
|
makeParameterDistanceFunction, run
private static final Logging LOG
private static final double NOT_FROM_ONE_CLUSTER_PROBABILITY
private static final int BINS
private final double sensitivityThreshold
private final int maxLMDim
private final int minsize
private final int samplingLevel
private final RandomFactory rnd
public LMCLUS(int maxdim, int minsize, int samplingLevel, double sensitivityThreshold, RandomFactory rnd)
maxdim
- Maximum dimensionalityminsize
- Minimum cluster sizesamplingLevel
- Sampling levelsensitivityThreshold
- Thresholdrnd
- Random factorypublic Clustering<Model> run(Database database, Relation<NumberVector> relation)
The algorithm samples random linear manifolds and tries to find clusters in it.
It calculates a distance histogram searches for a threshold and partitions the
points in two groups the ones in the cluster and everything else.
Then the best fitting linear manifold is searched and registered as a cluster.
The process is started over until all points are clustered.
The last cluster should contain all the outliers. (or the whole data if no clusters have been found.)
For details see LMCLUS
.
database
- The database to operate onrelation
- Relationprivate double deviation(Vector delta, Matrix beta)
delta
- Delta from origin vectorbeta
- Manifoldprivate LMCLUS.Separation findSeparation(Relation<NumberVector> relation, DBIDs currentids, int dimension, Random r)
A number of sample points according to the dimension of the linear manifold are taken. The basis (B) and the origin(o) of the manifold are calculated. A distance histogram using the distance function ||x-o|| -||B^t*(x-o)|| is generated. The best threshold is searched using the elevate threshold function. The overall goodness of the threshold is determined. The process is redone until a specific number of samples is taken.
relation
- The vector relationcurrentids
- Current DBIDsdimension
- the dimension of the linear manifold to sample.r
- Random generatorprivate Matrix generateOrthonormalBasis(List<Vector> vectors)
u_1 = v_1 u_k = v_k -proj_u1(v_k)...proj_u(k-1)(v_k); Where proj_u(v) =/ *u
vectors
- The set of vectors to generate the orthonormal basis fromRuntimeException
- if the given vectors are not linear independent.private double[] findAndEvaluateThreshold(DoubleDynamicHistogram histogram)
histogram
- Histogram to evaluateprotected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<Model>>
public TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<Model>>
Copyright © 2015 ELKI Development Team, Lehr- und Forschungseinheit für Datenbanksysteme, Ludwig-Maximilians-Universität München. License information.