V
- Vector type@Title(value="CASH: Robust clustering in arbitrarily oriented subspaces") @Description(value="Subspace clustering algorithm based on the Hough transform.") @Reference(authors="E. Achtert, C. B\u00f6hm, J. David, P. Kr\u00f6ger, A. Zimek", title="Robust clustering in arbitraily oriented subspaces", booktitle="Proc. 8th SIAM Int. Conf. on Data Mining (SDM\'08), Atlanta, GA, 2008", url="http://www.siam.org/proceedings/datamining/2008/dm08_69_AchtertBoehmDavidKroegerZimek.pdf") public class CASH<V extends NumberVector> extends AbstractAlgorithm<Clustering<Model>> implements ClusteringAlgorithm<Clustering<Model>>
E. Achtert, C. Böhm, J. David, P. Kröger, A. Zimek:
Robust clustering in arbitrarily oriented subspaces.
In Proc. 8th SIAM Int. Conf. on Data Mining (SDM'08), Atlanta, GA, 2008
Modifier and Type | Class and Description |
---|---|
static class |
CASH.Parameterizer
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
protected boolean |
adjust
Apply adjustment heuristic for interval choosing.
|
private Relation<ParameterizationFunction> |
fulldatabase
The entire relation.
|
protected double |
jitter
Maximum jitter for distance values.
|
private static Logging |
LOG
The logger for this class.
|
protected int |
maxLevel
Maximum level for splitting the hypercube.
|
protected int |
minDim
Minimum dimensionality of the subspaces to be found
|
protected int |
minPts
Threshold for minimum number of points in a cluster
|
private int |
noiseDim
Holds the dimensionality for noise.
|
private ModifiableDBIDs |
processedIDs
Holds a set of processed ids.
|
Constructor and Description |
---|
CASH(int minPts,
int maxLevel,
int minDim,
double jitter,
boolean adjust)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
private MaterializedRelation<ParameterizationFunction> |
buildDB(int dim,
Matrix basis,
DBIDs ids,
Relation<ParameterizationFunction> relation)
Builds a dim-1 dimensional database where the objects are projected into
the specified subspace.
|
private Database |
buildDerivatorDB(Relation<ParameterizationFunction> relation,
CASHInterval interval)
Builds a database for the derivator consisting of the ids in the specified
interval.
|
private Database |
buildDerivatorDB(Relation<ParameterizationFunction> relation,
DBIDs ids)
Builds a database for the derivator consisting of the ids in the specified
interval.
|
private Matrix |
determineBasis(double[] alpha)
Determines a basis defining a subspace described by the specified alpha
values.
|
private double[] |
determineMinMaxDistance(Relation<ParameterizationFunction> relation,
int dimensionality)
Determines the minimum and maximum function value of all parameterization
functions stored in the specified database.
|
private CASHInterval |
determineNextIntervalAtMaxLevel(ObjectHeap<IntegerPriorityObject<CASHInterval>> heap)
Determines the next ''best'' interval at maximum level, i.e. the next
interval containing the most unprocessed objects.
|
private static int |
dimensionality(Relation<ParameterizationFunction> relation)
Get the dimensionality of a vector field.
|
private CASHInterval |
doDetermineNextIntervalAtMaxLevel(ObjectHeap<IntegerPriorityObject<CASHInterval>> heap)
Recursive helper method to determine the next ''best'' interval at maximum
level, i.e. the next interval containing the most unprocessed objects
|
private Clustering<Model> |
doRun(Relation<ParameterizationFunction> relation,
FiniteProgress progress)
Runs the CASH algorithm on the specified database, this method is
recursively called until only noise is left.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
private void |
initHeap(ObjectHeap<IntegerPriorityObject<CASHInterval>> heap,
Relation<ParameterizationFunction> relation,
int dim,
DBIDs ids)
Initializes the heap with the root intervals.
|
private Relation<ParameterizationFunction> |
preprocess(Database db,
Relation<V> vrel)
Preprocess the dataset, precomputing the parameterization functions.
|
private ParameterizationFunction |
project(Matrix basis,
ParameterizationFunction f)
Projects the specified parameterization function into the subspace
described by the given basis.
|
Clustering<Model> |
run(Database database,
Relation<V> vrel)
Run CASH on the relation.
|
private Matrix |
runDerivator(Relation<ParameterizationFunction> relation,
int dim,
CASHInterval interval,
ModifiableDBIDs ids)
Runs the derivator on the specified interval and assigns all points having
a distance less then the standard deviation of the derivator model to the
model to this model.
|
private LinearEquationSystem |
runDerivator(Relation<ParameterizationFunction> relation,
int dimensionality,
DBIDs ids)
Runs the derivator on the specified interval and assigns all points having
a distance less then the standard deviation of the derivator model to the
model to this model.
|
private double |
sinusProduct(int start,
int end,
double[] alpha)
Computes the product of all sinus values of the specified angles from start
to end index.
|
makeParameterDistanceFunction, run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging LOG
protected int minPts
protected int maxLevel
protected int minDim
protected double jitter
protected boolean adjust
private int noiseDim
private ModifiableDBIDs processedIDs
private Relation<ParameterizationFunction> fulldatabase
public CASH(int minPts, int maxLevel, int minDim, double jitter, boolean adjust)
minPts
- MinPts parametermaxLevel
- Maximum levelminDim
- Minimum dimensionalityjitter
- Jitteradjust
- Adjustpublic Clustering<Model> run(Database database, Relation<V> vrel)
database
- Databasevrel
- Relationprivate Relation<ParameterizationFunction> preprocess(Database db, Relation<V> vrel)
db
- Databasevrel
- Vector relationprivate Clustering<Model> doRun(Relation<ParameterizationFunction> relation, FiniteProgress progress)
relation
- the Relation to run the CASH algorithm onprogress
- the progress object for verbose messagesprivate static int dimensionality(Relation<ParameterizationFunction> relation)
relation
- Relationprivate void initHeap(ObjectHeap<IntegerPriorityObject<CASHInterval>> heap, Relation<ParameterizationFunction> relation, int dim, DBIDs ids)
heap
- the heap to be initializedrelation
- the database storing the parameterization functionsdim
- the dimensionality of the databaseids
- the ids of the databaseprivate MaterializedRelation<ParameterizationFunction> buildDB(int dim, Matrix basis, DBIDs ids, Relation<ParameterizationFunction> relation)
dim
- the dimensionality of the databasebasis
- the basis defining the subspaceids
- the ids for the new databaserelation
- the database storing the parameterization functionsprivate ParameterizationFunction project(Matrix basis, ParameterizationFunction f)
basis
- the basis defining he subspacef
- the parameterization function to be projectedprivate Matrix determineBasis(double[] alpha)
alpha
- the alpha valuesprivate double sinusProduct(int start, int end, double[] alpha)
start
- the index to startend
- the index to endalpha
- the array of anglesprivate CASHInterval determineNextIntervalAtMaxLevel(ObjectHeap<IntegerPriorityObject<CASHInterval>> heap)
heap
- the heap storing the intervalsprivate CASHInterval doDetermineNextIntervalAtMaxLevel(ObjectHeap<IntegerPriorityObject<CASHInterval>> heap)
heap
- the heap storing the intervalsprivate double[] determineMinMaxDistance(Relation<ParameterizationFunction> relation, int dimensionality)
relation
- the database containing the parameterization functions.dimensionality
- the dimensionality of the databaseprivate Matrix runDerivator(Relation<ParameterizationFunction> relation, int dim, CASHInterval interval, ModifiableDBIDs ids)
relation
- the database containing the parameterization functionsinterval
- the interval to build the modeldim
- the dimensionality of the databaseids
- an empty set to assign the idsprivate Database buildDerivatorDB(Relation<ParameterizationFunction> relation, CASHInterval interval)
relation
- the database storing the parameterization functionsinterval
- the interval to build the database fromprivate LinearEquationSystem runDerivator(Relation<ParameterizationFunction> relation, int dimensionality, DBIDs ids)
relation
- the database containing the parameterization functionsids
- the ids to build the modeldimensionality
- the dimensionality of the subspaceprivate Database buildDerivatorDB(Relation<ParameterizationFunction> relation, DBIDs ids)
relation
- the database storing the parameterization functionsids
- the ids to build the database frompublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<Model>>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<Model>>
Copyright © 2015 ELKI Development Team, Lehr- und Forschungseinheit für Datenbanksysteme, Ludwig-Maximilians-Universität München. License information.